[Cod-bugs] Data registration

Takanori Nakane tnakane.protein at osaka-u.ac.jp
Tue May 16 04:36:54 EEST 2023


Dear Saulius,

Thank you very much for your suggestions.

 > If you and your co-authors do not object, I will insert the
 > "_diffrn_radiation_probe" data element into the mentioned files.

Yes, please.

 > Also, please let me know to which deposited data file the crystal
 > numbers pertain and into which deposited files should I insert the
 > account on the number of used crystals.

The number of merged crystals are as follows:

3000438: OLN-OXA.cif 70 crystals
3000440: OLN-SUCA.cif 34 crystals
3000441: OLN.cif 28 crystals
3000442: SUCA.cif 41 crystals

If you are interested in this study, our preprint is
available at 
https://chemrxiv.org/engage/chemrxiv/article-details/6438d40c08c86922fff1f519.
Unfortunately, the journal policy prohibits uploading of
the revised manuscript with more information.

Best regards,

Takanori Nakane

On 2023/05/16 1:05, Saulius Gražulis wrote:
> Dear Takanori Nakane,
> 
> thank you very much for the thorough explanation of your files! Indeed, 
> "classical" R-merge has limitations when it comes to data from multiple 
> crystals, or high-multiplicity data. I also agree on inclusion of low 
> intensity reflections, for what I know this is the state-of-the art whit 
> modern software.
> 
> I also see that your files contain the responses to the PLATON alerts – 
> I have overlooked them at the very beginning. Sorry for that!
> 
> Regarding your note:
> 
> "The OLN-SUCA dataset resulted from 34 crystals out of 244 measured 
> crystals"
> 
> I think this is very valuable information and it would be beneficial if 
> it is provided in the corresponding CIFs that you have deposited. I 
> would suggest including this phrase in the additional data item, 
> _chemical_compound_source, which is a free string giving details of the 
> sample synthesis, identity and handling. It would then clarify the 
> statistics of these files.
> 
> Also, I suggest that the data item "_diffrn_radiation_probe electron" is 
> added to entries 3000438 and 3000440–3000442 that report electrons as 
> radiation type; in this way these structures will  be clearly identified 
> as being solved by the means of electron diffraction.
> 
> If you and your co-authors do not object, I will insert the 
> "_diffrn_radiation_probe" data element into the mentioned files.
> 
> Also, please let me know to which deposited data file the crystal 
> numbers pertain and into which deposited files should I insert the 
> account on the number of used crystals.
> 
> Sincerely yours,
> Saulius
> 
> On 2023-05-14 07:57, Takanori Nakane wrote:
>> Dear Dr. Saulius Gražulis,
>>
>> I am in charge of MicroED data processing of the structure
>> Dr. Toshiyuki Sasaki is trying to deposit.
>>
>> I am writing you about data quality concerns you mentioned.
>> Toshiyuki will write to you separately on how to proceed the deposition.
>>
>> > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is 
>> zero,
>> > and the remaining absorption correction parameters are not specified. I
>> > see that very few of the electron diffraction studies reported in the
>> > COD use absorption correction, but if it is technically possible to
>> > apply it, maybe the final refinement R-factors will get lower?
>>
>> Unlikely. The high R factors (merging and refinement) are due to
>> other reasons (see below).
>>
>> In electron diffraction at 200 kV, absorption effects are negligible
>> for light elements. Effects of inelastic scattering can be
>> treated by absorption scaling, but this is a very crude, ad hoc
>> approximation and does not have physical meaning. In dials.scale,
>> which we used for scaling, the scaling factors are empirically modeled
>> as a smooth function of resolution, rotation angle and position on the
>> detector. This approach is different from physics-based modeling
>> based on the crystal composition (mu).
>>
>> > I also see that there are large values reported for the symmetry
>> > equivalent reflection agreement:
>> >
>> >     _diffrn_reflns_av_R_equivalents    0.8897
>> >     _diffrn_reflns_av_unetI/netI       0.3546
>> >
>> > The COD min / avg / max values are 0.0661 / 0.218177 (sample σ = 0.099)
>> > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944
>> > (sample σ = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the
>> > values in your file seem quite high compared to what we see in the COD
>> > (for the 35 structures explicitly reported as electron diffraction
>> > studies by setting 'radiation' column to 'electron'), the
>> > _diffrn_reflns_av_R_equivalents is beyond 5*σ. Could it  be that
>> > applying absorption correction would decrease these statistics as well?
>>
>> While traditional small molecular crystallography collects
>> a dataset from one or a few crystal(s), we take a massively
>> high multiplicity approach. The OLN-SUCA dataset resulted from
>> 34 crystals out of 244 measured crystals.
>>
>> The traditional R factor increases with the multiplicity of a dataset
>> and is considered inadequate as a resolution metric.
>> This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997
>> for the case of a related metric R(merge) in macromolecular
>> crystallography.
>>
>> Similarly, unetI/netI is not a great metric, because it takes
>> the absolute value of intensities. Weak reflections can
>> have negative observed intensities (due to background subtraction),
>> so taking absolute values is not adequate.
>> In advanced data processing methods (common in macromolecular
>> crystallography), information in negative reflections can still
>> be utilized via maximum-likelihood intensity based target or
>> French-Wilson scaling. Thus, we didn't remove such reflections
>> but this led to worse statistics.
>>
>> > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are
>> > somewhat high, even for electron diffraction studies.
>>
>> We applied kinematical refinement, not dynamical refinement.
>> Many variants of electron diffraction exist (precession electron
>> diffraction, convergent-beam electron diffraction, etc) and
>> they suffer from dynamical diffraction to varying extents.
>> Among MicroED, we don't consider our statistics is significantly
>> worse than others.
>>
>> Another reason for seemingly worse metrics is that we included weaker
>> reflections. Excluding noisy high resolution structure factors does not
>> improve the structure accuracy, provided that each reflection
>> is weighted properly. Removal can degrade the refined structure
>> because valuable information, however noisy, is excluded.
>>
>> This is a modern view of crystallographic structure refinement,
>> initially introduced in macromolecular crystallography.
>> Please see Diederichs and Karplus "Better models by discarding data?"
>> Acta Crystallographica Section D 69.7 (2013): 1215-1222,
>> which says:
>> "even though discarding the weaker data leads to improvements in
>> the merging R values, the refined models based on these data are
>> of lower quality."
>>
>> Thus, we used higher resolution reflections for refinement
>> than traditional small molecular crystallographers would use.
>> In addition, the deposition includes high resolution reflections not
>> used in refinement, in the hope that future algorithm developments might
>> allow extraction of more information from them.
>>
>> > I did not use the CheckCIF [1] tool to avoid breaking confidentiality
>> > of your structure, but it would be interesting if your could get a
>> > CheckCIF / PLATO reports on your structure. Are there any Level A
>> > alerts?
>>
>> The level A alert was about high R(int), which is caused by
>> high multiplicity and dynamical effects.
>>
>> Thank you very much for your feedback.
>> I hope this explanation helps.
>>
>> Best regards,
>>
>> Takanori Nakane
>>
> 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list