[Cod-bugs] Data registration

Takanori Nakane tnakane.protein at osaka-u.ac.jp
Sun May 14 07:57:35 EEST 2023


Dear Dr. Saulius Gražulis,

I am in charge of MicroED data processing of the structure
Dr. Toshiyuki Sasaki is trying to deposit.

I am writing you about data quality concerns you mentioned.
Toshiyuki will write to you separately on how to proceed the deposition.

 > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is zero,
 > and the remaining absorption correction parameters are not specified. I
 > see that very few of the electron diffraction studies reported in the
 > COD use absorption correction, but if it is technically possible to
 > apply it, maybe the final refinement R-factors will get lower?

Unlikely. The high R factors (merging and refinement) are due to
other reasons (see below).

In electron diffraction at 200 kV, absorption effects are negligible
for light elements. Effects of inelastic scattering can be
treated by absorption scaling, but this is a very crude, ad hoc
approximation and does not have physical meaning. In dials.scale,
which we used for scaling, the scaling factors are empirically modeled
as a smooth function of resolution, rotation angle and position on the
detector. This approach is different from physics-based modeling
based on the crystal composition (mu).

 > I also see that there are large values reported for the symmetry
 > equivalent reflection agreement:
 >
 >     _diffrn_reflns_av_R_equivalents    0.8897
 >     _diffrn_reflns_av_unetI/netI       0.3546
 >
 > The COD min / avg / max values are 0.0661 / 0.218177 (sample σ = 0.099)
 > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944
 > (sample σ = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the
 > values in your file seem quite high compared to what we see in the COD
 > (for the 35 structures explicitly reported as electron diffraction
 > studies by setting 'radiation' column to 'electron'), the
 > _diffrn_reflns_av_R_equivalents is beyond 5*σ. Could it  be that
 > applying absorption correction would decrease these statistics as well?

While traditional small molecular crystallography collects
a dataset from one or a few crystal(s), we take a massively
high multiplicity approach. The OLN-SUCA dataset resulted from
34 crystals out of 244 measured crystals.

The traditional R factor increases with the multiplicity of a dataset
and is considered inadequate as a resolution metric.
This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997
for the case of a related metric R(merge) in macromolecular
crystallography.

Similarly, unetI/netI is not a great metric, because it takes
the absolute value of intensities. Weak reflections can
have negative observed intensities (due to background subtraction),
so taking absolute values is not adequate.
In advanced data processing methods (common in macromolecular
crystallography), information in negative reflections can still
be utilized via maximum-likelihood intensity based target or
French-Wilson scaling. Thus, we didn't remove such reflections
but this led to worse statistics.

 > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are
 > somewhat high, even for electron diffraction studies.

We applied kinematical refinement, not dynamical refinement.
Many variants of electron diffraction exist (precession electron
diffraction, convergent-beam electron diffraction, etc) and
they suffer from dynamical diffraction to varying extents.
Among MicroED, we don't consider our statistics is significantly
worse than others.

Another reason for seemingly worse metrics is that we included weaker
reflections. Excluding noisy high resolution structure factors does not
improve the structure accuracy, provided that each reflection
is weighted properly. Removal can degrade the refined structure
because valuable information, however noisy, is excluded.

This is a modern view of crystallographic structure refinement,
initially introduced in macromolecular crystallography.
Please see Diederichs and Karplus "Better models by discarding data?"
Acta Crystallographica Section D 69.7 (2013): 1215-1222,
which says:
"even though discarding the weaker data leads to improvements in
the merging R values, the refined models based on these data are
of lower quality."

Thus, we used higher resolution reflections for refinement
than traditional small molecular crystallographers would use.
In addition, the deposition includes high resolution reflections not
used in refinement, in the hope that future algorithm developments might
allow extraction of more information from them.

 > I did not use the CheckCIF [1] tool to avoid breaking confidentiality
 > of your structure, but it would be interesting if your could get a
 > CheckCIF / PLATO reports on your structure. Are there any Level A
 > alerts?

The level A alert was about high R(int), which is caused by
high multiplicity and dynamical effects.

Thank you very much for your feedback.
I hope this explanation helps.

Best regards,

Takanori Nakane

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list