[Cod-bugs] Data registration

Saulius Gražulis grazulis at ibt.lt
Mon May 15 19:05:49 EEST 2023


Dear Takanori Nakane,

thank you very much for the thorough explanation of your files! Indeed, 
"classical" R-merge has limitations when it comes to data from multiple 
crystals, or high-multiplicity data. I also agree on inclusion of low 
intensity reflections, for what I know this is the state-of-the art whit 
modern software.

I also see that your files contain the responses to the PLATON alerts – 
I have overlooked them at the very beginning. Sorry for that!

Regarding your note:

"The OLN-SUCA dataset resulted from 34 crystals out of 244 measured 
crystals"

I think this is very valuable information and it would be beneficial if 
it is provided in the corresponding CIFs that you have deposited. I 
would suggest including this phrase in the additional data item, 
_chemical_compound_source, which is a free string giving details of the 
sample synthesis, identity and handling. It would then clarify the 
statistics of these files.

Also, I suggest that the data item "_diffrn_radiation_probe electron" is 
added to entries 3000438 and 3000440–3000442 that report electrons as 
radiation type; in this way these structures will  be clearly identified 
as being solved by the means of electron diffraction.

If you and your co-authors do not object, I will insert the 
"_diffrn_radiation_probe" data element into the mentioned files.

Also, please let me know to which deposited data file the crystal 
numbers pertain and into which deposited files should I insert the 
account on the number of used crystals.

Sincerely yours,
Saulius

On 2023-05-14 07:57, Takanori Nakane wrote:
> Dear Dr. Saulius Gražulis,
>
> I am in charge of MicroED data processing of the structure
> Dr. Toshiyuki Sasaki is trying to deposit.
>
> I am writing you about data quality concerns you mentioned.
> Toshiyuki will write to you separately on how to proceed the deposition.
>
> > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is 
> zero,
> > and the remaining absorption correction parameters are not specified. I
> > see that very few of the electron diffraction studies reported in the
> > COD use absorption correction, but if it is technically possible to
> > apply it, maybe the final refinement R-factors will get lower?
>
> Unlikely. The high R factors (merging and refinement) are due to
> other reasons (see below).
>
> In electron diffraction at 200 kV, absorption effects are negligible
> for light elements. Effects of inelastic scattering can be
> treated by absorption scaling, but this is a very crude, ad hoc
> approximation and does not have physical meaning. In dials.scale,
> which we used for scaling, the scaling factors are empirically modeled
> as a smooth function of resolution, rotation angle and position on the
> detector. This approach is different from physics-based modeling
> based on the crystal composition (mu).
>
> > I also see that there are large values reported for the symmetry
> > equivalent reflection agreement:
> >
> >     _diffrn_reflns_av_R_equivalents    0.8897
> >     _diffrn_reflns_av_unetI/netI       0.3546
> >
> > The COD min / avg / max values are 0.0661 / 0.218177 (sample σ = 0.099)
> > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944
> > (sample σ = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the
> > values in your file seem quite high compared to what we see in the COD
> > (for the 35 structures explicitly reported as electron diffraction
> > studies by setting 'radiation' column to 'electron'), the
> > _diffrn_reflns_av_R_equivalents is beyond 5*σ. Could it  be that
> > applying absorption correction would decrease these statistics as well?
>
> While traditional small molecular crystallography collects
> a dataset from one or a few crystal(s), we take a massively
> high multiplicity approach. The OLN-SUCA dataset resulted from
> 34 crystals out of 244 measured crystals.
>
> The traditional R factor increases with the multiplicity of a dataset
> and is considered inadequate as a resolution metric.
> This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997
> for the case of a related metric R(merge) in macromolecular
> crystallography.
>
> Similarly, unetI/netI is not a great metric, because it takes
> the absolute value of intensities. Weak reflections can
> have negative observed intensities (due to background subtraction),
> so taking absolute values is not adequate.
> In advanced data processing methods (common in macromolecular
> crystallography), information in negative reflections can still
> be utilized via maximum-likelihood intensity based target or
> French-Wilson scaling. Thus, we didn't remove such reflections
> but this led to worse statistics.
>
> > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are
> > somewhat high, even for electron diffraction studies.
>
> We applied kinematical refinement, not dynamical refinement.
> Many variants of electron diffraction exist (precession electron
> diffraction, convergent-beam electron diffraction, etc) and
> they suffer from dynamical diffraction to varying extents.
> Among MicroED, we don't consider our statistics is significantly
> worse than others.
>
> Another reason for seemingly worse metrics is that we included weaker
> reflections. Excluding noisy high resolution structure factors does not
> improve the structure accuracy, provided that each reflection
> is weighted properly. Removal can degrade the refined structure
> because valuable information, however noisy, is excluded.
>
> This is a modern view of crystallographic structure refinement,
> initially introduced in macromolecular crystallography.
> Please see Diederichs and Karplus "Better models by discarding data?"
> Acta Crystallographica Section D 69.7 (2013): 1215-1222,
> which says:
> "even though discarding the weaker data leads to improvements in
> the merging R values, the refined models based on these data are
> of lower quality."
>
> Thus, we used higher resolution reflections for refinement
> than traditional small molecular crystallographers would use.
> In addition, the deposition includes high resolution reflections not
> used in refinement, in the hope that future algorithm developments might
> allow extraction of more information from them.
>
> > I did not use the CheckCIF [1] tool to avoid breaking confidentiality
> > of your structure, but it would be interesting if your could get a
> > CheckCIF / PLATO reports on your structure. Are there any Level A
> > alerts?
>
> The level A alert was about high R(int), which is caused by
> high multiplicity and dynamical effects.
>
> Thank you very much for your feedback.
> I hope this explanation helps.
>
> Best regards,
>
> Takanori Nakane
>

-- 
Dr. Saulius Gražulis
Vilnius University, Life Science Center, Institute of Biotechnology
Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20230515/7469c759/attachment.sig>


More information about the Cod-bugs mailing list