From toshiyuki.sasaki at spring8.or.jp Fri May 12 03:14:32 2023 From: toshiyuki.sasaki at spring8.or.jp (Toshiyuki Sasaki) Date: Fri, 12 May 2023 09:14:32 +0900 Subject: [Cod-bugs] Data registration Message-ID: <00e601d98466$b9f40d10$2ddc2730$@spring8.or.jp> Dear a person in charge, I?m Toshiyuki Sasaki, a researcher in Japan Synchrotron Radiation Research Institute. Before publishing a paper, I want to upload CIF files to COD. However, due to the warnings of ?data item ?_refine_ls_R_factor_gt? and ?_refine_ls_wR_factor_ref??, I could not deposit the files. What can I do for the problem? The paper is almost accepted with the CIF files. Sincerely yours, Toshiyuki ************************************************************** ?????????????????? (Japan Synchrotron Radiation Research Institute (JASRI)) ???????? (Diffraction and Scattering Division) ??????????? (Tenure-track researcher) ?????? (Dr. Toshiyuki Sasaki) ?679-5198????????????1??1?1? (1-1-1, Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5198 Japan) ************************************************************** -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Sat May 13 13:18:27 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Sat, 13 May 2023 13:18:27 +0300 Subject: [Cod-bugs] Data registration In-Reply-To: <00e601d98466$b9f40d10$2ddc2730$@spring8.or.jp> References: <00e601d98466$b9f40d10$2ddc2730$@spring8.or.jp> Message-ID: Dear Dr. Sasaki, thank you very much for your inquire and for entrusting your data to Crystallography Open Database! Below, I'll give you suggestions how to solve your data deposition problem and some thoughts on the parameters of the CIF. On 2023-05-12 03:14, Toshiyuki Sasaki wrote: > > I?m Toshiyuki Sasaki, a researcher in Japan Synchrotron Radiation > Research Institute. > > Before publishing a paper, I want to upload CIF files to COD. > > However, due to the warnings of ?data item ?_refine_ls_R_factor_gt? > and ?_refine_ls_wR_factor_ref??, I could not deposit the files. > > What can I do for the problem? > > The paper is almost accepted with the CIF files. > May I suggest two options for your deposition: a) you temporarily remove the? ?_refine_ls_R_factor_gt? and ?_refine_ls_wR_factor_ref? data items from the CIF (best of all, you use the "Edit" button in the COD deposition Web site) and commit. The system should allow the deposition. After that you send us the two removed lines in a separate e-mail, along with the COD ID that your structure has been assigned, and we will reinsert the data into your deposited structure as a part of the data curation process. b) you send us the original CIF, together with the rest of the metadata (the complete author list, on-hold period, intended journal, etc.), and we deposit it for you directly to the low-level database, and send you the assigned COD ID. c) if your uploaded file is the "OLN-SUCA.cif" as I see from the COD server logs, you can send us a checksum (MD5, SHA1 or SHA256) of your original uploaded file, together with the metadata (the complete author list, on-hold period, intended journal, etc.), and confirm that your uploaded this file for deposition; we will then extract the file from the COD server logs and finish the deposition for you, sending the COD ID ASAP. Of course we assure that your data will be confidential before the release date that you will specify, or before the publication of your paper, whichever comes first. The (a) route would be the fastest way to obtain the COD ID and to proceed with your publication, but it will require you to edit the file, preferably using the COD deposition Web site. In case of (b) or (c), we'll do our best to deposit the structure as soon as possible, but it may a working day or two to process. Please let us know what is your preferred way to proceed. I have several questions regarding your structure. To troubleshoot your question, I have allowed myself to look into your crude deposited files on the COD server. It is a very interesting case of electron diffraction, as I understood from the data. The structure seems very reasonable, with bond lengths and angles (as I can judge from the visual inspection on the computer graphics) comfortable within expected limits, and thermal ellipsoids close to isotropicity and with no unexpected features. The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are somewhat high, even for electron diffraction studies. Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is zero, and the remaining absorption correction parameters are not specified. I see that very few of the electron diffraction studies reported in the COD use absorption correction, but if it is technically possible to apply it, maybe the final refinement R-factors will get lower? I also see that there are large values reported for the symmetry equivalent reflection agreement: > _diffrn_reflns_av_R_equivalents??? 0.8897 > _diffrn_reflns_av_unetI/netI?????? 0.3546 The COD min / avg / max values are 0.0661 / 0.218177 (sample ? = 0.099) / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944 (sample ? = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the values in your file seem quite high compared to what we see in the COD (for the 35 structures explicitly reported as electron diffraction studies by setting 'radiation' column to 'electron'), the _diffrn_reflns_av_R_equivalents is beyond 5*?. Could it? be that applying absorption correction would decrease these statistics as well? I did not use the CheckCIF [1] tool to avoid breaking confidentiality of your structure, but it would be interesting if your could get a CheckCIF / PLATO reports on your structure. Are there any Level A alerts? Sicnerely yours, Saulius [1] CheckCIF, a service of the IUCr (2023). URL: https://checkcif.iucr.org/ [accessed 2023-05-13T13:15+03:00] -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: grazulis.vcf Type: text/x-vcard Size: 4 bytes Desc: not available URL: From tnakane.protein at osaka-u.ac.jp Sun May 14 07:57:35 2023 From: tnakane.protein at osaka-u.ac.jp (Takanori Nakane) Date: Sun, 14 May 2023 13:57:35 +0900 Subject: [Cod-bugs] Data registration Message-ID: Dear Dr. Saulius Gra?ulis, I am in charge of MicroED data processing of the structure Dr. Toshiyuki Sasaki is trying to deposit. I am writing you about data quality concerns you mentioned. Toshiyuki will write to you separately on how to proceed the deposition. > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is zero, > and the remaining absorption correction parameters are not specified. I > see that very few of the electron diffraction studies reported in the > COD use absorption correction, but if it is technically possible to > apply it, maybe the final refinement R-factors will get lower? Unlikely. The high R factors (merging and refinement) are due to other reasons (see below). In electron diffraction at 200 kV, absorption effects are negligible for light elements. Effects of inelastic scattering can be treated by absorption scaling, but this is a very crude, ad hoc approximation and does not have physical meaning. In dials.scale, which we used for scaling, the scaling factors are empirically modeled as a smooth function of resolution, rotation angle and position on the detector. This approach is different from physics-based modeling based on the crystal composition (mu). > I also see that there are large values reported for the symmetry > equivalent reflection agreement: > > _diffrn_reflns_av_R_equivalents 0.8897 > _diffrn_reflns_av_unetI/netI 0.3546 > > The COD min / avg / max values are 0.0661 / 0.218177 (sample ? = 0.099) > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944 > (sample ? = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the > values in your file seem quite high compared to what we see in the COD > (for the 35 structures explicitly reported as electron diffraction > studies by setting 'radiation' column to 'electron'), the > _diffrn_reflns_av_R_equivalents is beyond 5*?. Could it be that > applying absorption correction would decrease these statistics as well? While traditional small molecular crystallography collects a dataset from one or a few crystal(s), we take a massively high multiplicity approach. The OLN-SUCA dataset resulted from 34 crystals out of 244 measured crystals. The traditional R factor increases with the multiplicity of a dataset and is considered inadequate as a resolution metric. This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997 for the case of a related metric R(merge) in macromolecular crystallography. Similarly, unetI/netI is not a great metric, because it takes the absolute value of intensities. Weak reflections can have negative observed intensities (due to background subtraction), so taking absolute values is not adequate. In advanced data processing methods (common in macromolecular crystallography), information in negative reflections can still be utilized via maximum-likelihood intensity based target or French-Wilson scaling. Thus, we didn't remove such reflections but this led to worse statistics. > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are > somewhat high, even for electron diffraction studies. We applied kinematical refinement, not dynamical refinement. Many variants of electron diffraction exist (precession electron diffraction, convergent-beam electron diffraction, etc) and they suffer from dynamical diffraction to varying extents. Among MicroED, we don't consider our statistics is significantly worse than others. Another reason for seemingly worse metrics is that we included weaker reflections. Excluding noisy high resolution structure factors does not improve the structure accuracy, provided that each reflection is weighted properly. Removal can degrade the refined structure because valuable information, however noisy, is excluded. This is a modern view of crystallographic structure refinement, initially introduced in macromolecular crystallography. Please see Diederichs and Karplus "Better models by discarding data?" Acta Crystallographica Section D 69.7 (2013): 1215-1222, which says: "even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality." Thus, we used higher resolution reflections for refinement than traditional small molecular crystallographers would use. In addition, the deposition includes high resolution reflections not used in refinement, in the hope that future algorithm developments might allow extraction of more information from them. > I did not use the CheckCIF [1] tool to avoid breaking confidentiality > of your structure, but it would be interesting if your could get a > CheckCIF / PLATO reports on your structure. Are there any Level A > alerts? The level A alert was about high R(int), which is caused by high multiplicity and dynamical effects. Thank you very much for your feedback. I hope this explanation helps. Best regards, Takanori Nakane -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.