[Cod-bugs] Data registration
Takanori Nakane
tnakane.protein at osaka-u.ac.jp
Tue May 16 04:36:54 EEST 2023
Dear Saulius,
Thank you very much for your suggestions.
> If you and your co-authors do not object, I will insert the
> "_diffrn_radiation_probe" data element into the mentioned files.
Yes, please.
> Also, please let me know to which deposited data file the crystal
> numbers pertain and into which deposited files should I insert the
> account on the number of used crystals.
The number of merged crystals are as follows:
3000438: OLN-OXA.cif 70 crystals
3000440: OLN-SUCA.cif 34 crystals
3000441: OLN.cif 28 crystals
3000442: SUCA.cif 41 crystals
If you are interested in this study, our preprint is
available at
https://chemrxiv.org/engage/chemrxiv/article-details/6438d40c08c86922fff1f519.
Unfortunately, the journal policy prohibits uploading of
the revised manuscript with more information.
Best regards,
Takanori Nakane
On 2023/05/16 1:05, Saulius Gražulis wrote:
> Dear Takanori Nakane,
>
> thank you very much for the thorough explanation of your files! Indeed,
> "classical" R-merge has limitations when it comes to data from multiple
> crystals, or high-multiplicity data. I also agree on inclusion of low
> intensity reflections, for what I know this is the state-of-the art whit
> modern software.
>
> I also see that your files contain the responses to the PLATON alerts –
> I have overlooked them at the very beginning. Sorry for that!
>
> Regarding your note:
>
> "The OLN-SUCA dataset resulted from 34 crystals out of 244 measured
> crystals"
>
> I think this is very valuable information and it would be beneficial if
> it is provided in the corresponding CIFs that you have deposited. I
> would suggest including this phrase in the additional data item,
> _chemical_compound_source, which is a free string giving details of the
> sample synthesis, identity and handling. It would then clarify the
> statistics of these files.
>
> Also, I suggest that the data item "_diffrn_radiation_probe electron" is
> added to entries 3000438 and 3000440–3000442 that report electrons as
> radiation type; in this way these structures will be clearly identified
> as being solved by the means of electron diffraction.
>
> If you and your co-authors do not object, I will insert the
> "_diffrn_radiation_probe" data element into the mentioned files.
>
> Also, please let me know to which deposited data file the crystal
> numbers pertain and into which deposited files should I insert the
> account on the number of used crystals.
>
> Sincerely yours,
> Saulius
>
> On 2023-05-14 07:57, Takanori Nakane wrote:
>> Dear Dr. Saulius Gražulis,
>>
>> I am in charge of MicroED data processing of the structure
>> Dr. Toshiyuki Sasaki is trying to deposit.
>>
>> I am writing you about data quality concerns you mentioned.
>> Toshiyuki will write to you separately on how to proceed the deposition.
>>
>> > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is
>> zero,
>> > and the remaining absorption correction parameters are not specified. I
>> > see that very few of the electron diffraction studies reported in the
>> > COD use absorption correction, but if it is technically possible to
>> > apply it, maybe the final refinement R-factors will get lower?
>>
>> Unlikely. The high R factors (merging and refinement) are due to
>> other reasons (see below).
>>
>> In electron diffraction at 200 kV, absorption effects are negligible
>> for light elements. Effects of inelastic scattering can be
>> treated by absorption scaling, but this is a very crude, ad hoc
>> approximation and does not have physical meaning. In dials.scale,
>> which we used for scaling, the scaling factors are empirically modeled
>> as a smooth function of resolution, rotation angle and position on the
>> detector. This approach is different from physics-based modeling
>> based on the crystal composition (mu).
>>
>> > I also see that there are large values reported for the symmetry
>> > equivalent reflection agreement:
>> >
>> > _diffrn_reflns_av_R_equivalents 0.8897
>> > _diffrn_reflns_av_unetI/netI 0.3546
>> >
>> > The COD min / avg / max values are 0.0661 / 0.218177 (sample σ = 0.099)
>> > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944
>> > (sample σ = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the
>> > values in your file seem quite high compared to what we see in the COD
>> > (for the 35 structures explicitly reported as electron diffraction
>> > studies by setting 'radiation' column to 'electron'), the
>> > _diffrn_reflns_av_R_equivalents is beyond 5*σ. Could it be that
>> > applying absorption correction would decrease these statistics as well?
>>
>> While traditional small molecular crystallography collects
>> a dataset from one or a few crystal(s), we take a massively
>> high multiplicity approach. The OLN-SUCA dataset resulted from
>> 34 crystals out of 244 measured crystals.
>>
>> The traditional R factor increases with the multiplicity of a dataset
>> and is considered inadequate as a resolution metric.
>> This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997
>> for the case of a related metric R(merge) in macromolecular
>> crystallography.
>>
>> Similarly, unetI/netI is not a great metric, because it takes
>> the absolute value of intensities. Weak reflections can
>> have negative observed intensities (due to background subtraction),
>> so taking absolute values is not adequate.
>> In advanced data processing methods (common in macromolecular
>> crystallography), information in negative reflections can still
>> be utilized via maximum-likelihood intensity based target or
>> French-Wilson scaling. Thus, we didn't remove such reflections
>> but this led to worse statistics.
>>
>> > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are
>> > somewhat high, even for electron diffraction studies.
>>
>> We applied kinematical refinement, not dynamical refinement.
>> Many variants of electron diffraction exist (precession electron
>> diffraction, convergent-beam electron diffraction, etc) and
>> they suffer from dynamical diffraction to varying extents.
>> Among MicroED, we don't consider our statistics is significantly
>> worse than others.
>>
>> Another reason for seemingly worse metrics is that we included weaker
>> reflections. Excluding noisy high resolution structure factors does not
>> improve the structure accuracy, provided that each reflection
>> is weighted properly. Removal can degrade the refined structure
>> because valuable information, however noisy, is excluded.
>>
>> This is a modern view of crystallographic structure refinement,
>> initially introduced in macromolecular crystallography.
>> Please see Diederichs and Karplus "Better models by discarding data?"
>> Acta Crystallographica Section D 69.7 (2013): 1215-1222,
>> which says:
>> "even though discarding the weaker data leads to improvements in
>> the merging R values, the refined models based on these data are
>> of lower quality."
>>
>> Thus, we used higher resolution reflections for refinement
>> than traditional small molecular crystallographers would use.
>> In addition, the deposition includes high resolution reflections not
>> used in refinement, in the hope that future algorithm developments might
>> allow extraction of more information from them.
>>
>> > I did not use the CheckCIF [1] tool to avoid breaking confidentiality
>> > of your structure, but it would be interesting if your could get a
>> > CheckCIF / PLATO reports on your structure. Are there any Level A
>> > alerts?
>>
>> The level A alert was about high R(int), which is caused by
>> high multiplicity and dynamical effects.
>>
>> Thank you very much for your feedback.
>> I hope this explanation helps.
>>
>> Best regards,
>>
>> Takanori Nakane
>>
>
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Cod-bugs
mailing list