From toshiyuki.sasaki at spring8.or.jp Mon May 15 05:00:02 2023 From: toshiyuki.sasaki at spring8.or.jp (Toshiyuki Sasaki) Date: Mon, 15 May 2023 11:00:02 +0900 Subject: [Cod-bugs] Data registration In-Reply-To: References: Message-ID: <008001d986d0$f672f9b0$e358ed10$@spring8.or.jp> Dear Dr. Saulius Gra?ulis, I uploaded the CIF files by the (a) route. Here are the ID and removed lines. 3000440, _refine_ls_R_factor_gt 0.1565, _refine_ls_wR_factor_ref 0.3685 3000442, _refine_ls_wR_factor_ref 0.4200 Thank you for your help. Sincerely yours, Toshiyuki Sasaki -----Original Message----- From: Takanori Nakane [mailto:tnakane.protein at osaka-u.ac.jp] Sent: Sunday, May 14, 2023 1:58 PM To: grazulis at ibt.lt; Toshiyuki Sasaki Cc: cod-bugs at ibt.lt; genji.kurisu.protein at osaka-u.ac.jp; kawamoto at protein.osaka-u.ac.jp; 'Ranjit Thakuria' ; 'Diptajyoti Gogoi' ; ???? Subject: Re: [Cod-bugs] Data registration Dear Dr. Saulius Gra?ulis, I am in charge of MicroED data processing of the structure Dr. Toshiyuki Sasaki is trying to deposit. I am writing you about data quality concerns you mentioned. Toshiyuki will write to you separately on how to proceed the deposition. > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is zero, > and the remaining absorption correction parameters are not specified. I > see that very few of the electron diffraction studies reported in the > COD use absorption correction, but if it is technically possible to > apply it, maybe the final refinement R-factors will get lower? Unlikely. The high R factors (merging and refinement) are due to other reasons (see below). In electron diffraction at 200 kV, absorption effects are negligible for light elements. Effects of inelastic scattering can be treated by absorption scaling, but this is a very crude, ad hoc approximation and does not have physical meaning. In dials.scale, which we used for scaling, the scaling factors are empirically modeled as a smooth function of resolution, rotation angle and position on the detector. This approach is different from physics-based modeling based on the crystal composition (mu). > I also see that there are large values reported for the symmetry > equivalent reflection agreement: > > _diffrn_reflns_av_R_equivalents 0.8897 > _diffrn_reflns_av_unetI/netI 0.3546 > > The COD min / avg / max values are 0.0661 / 0.218177 (sample ? = 0.099) > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944 > (sample ? = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the > values in your file seem quite high compared to what we see in the COD > (for the 35 structures explicitly reported as electron diffraction > studies by setting 'radiation' column to 'electron'), the > _diffrn_reflns_av_R_equivalents is beyond 5*?. Could it be that > applying absorption correction would decrease these statistics as well? While traditional small molecular crystallography collects a dataset from one or a few crystal(s), we take a massively high multiplicity approach. The OLN-SUCA dataset resulted from 34 crystals out of 244 measured crystals. The traditional R factor increases with the multiplicity of a dataset and is considered inadequate as a resolution metric. This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997 for the case of a related metric R(merge) in macromolecular crystallography. Similarly, unetI/netI is not a great metric, because it takes the absolute value of intensities. Weak reflections can have negative observed intensities (due to background subtraction), so taking absolute values is not adequate. In advanced data processing methods (common in macromolecular crystallography), information in negative reflections can still be utilized via maximum-likelihood intensity based target or French-Wilson scaling. Thus, we didn't remove such reflections but this led to worse statistics. > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are > somewhat high, even for electron diffraction studies. We applied kinematical refinement, not dynamical refinement. Many variants of electron diffraction exist (precession electron diffraction, convergent-beam electron diffraction, etc) and they suffer from dynamical diffraction to varying extents. Among MicroED, we don't consider our statistics is significantly worse than others. Another reason for seemingly worse metrics is that we included weaker reflections. Excluding noisy high resolution structure factors does not improve the structure accuracy, provided that each reflection is weighted properly. Removal can degrade the refined structure because valuable information, however noisy, is excluded. This is a modern view of crystallographic structure refinement, initially introduced in macromolecular crystallography. Please see Diederichs and Karplus "Better models by discarding data?" Acta Crystallographica Section D 69.7 (2013): 1215-1222, which says: "even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality." Thus, we used higher resolution reflections for refinement than traditional small molecular crystallographers would use. In addition, the deposition includes high resolution reflections not used in refinement, in the hope that future algorithm developments might allow extraction of more information from them. > I did not use the CheckCIF [1] tool to avoid breaking confidentiality > of your structure, but it would be interesting if your could get a > CheckCIF / PLATO reports on your structure. Are there any Level A > alerts? The level A alert was about high R(int), which is caused by high multiplicity and dynamical effects. Thank you very much for your feedback. I hope this explanation helps. Best regards, Takanori Nakane -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From grazulis at ibt.lt Mon May 15 18:51:30 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Mon, 15 May 2023 18:51:30 +0300 Subject: [Cod-bugs] Data registration In-Reply-To: <008001d986d0$f672f9b0$e358ed10$@spring8.or.jp> References: <008001d986d0$f672f9b0$e358ed10$@spring8.or.jp> Message-ID: <28faae81-f4c5-bc02-432e-d4999ee83898@ibt.lt> Dear Dr. Toshiyuki Sasaki, thank you for your update. I can confirm that your data are successfully deposited to the COD. On 2023-05-15 05:00, Toshiyuki Sasaki wrote: > I uploaded the CIF files by the (a) route. > Here are the ID and removed lines. > > 3000440, _refine_ls_R_factor_gt 0.1565, _refine_ls_wR_factor_ref 0.3685 > 3000442, _refine_ls_wR_factor_ref 0.4200 > > Thank you for your help. > Sincerely yours, > > Toshiyuki Sasaki I have reinserted, as promised, the parameters that you send in your e-mail into the COD on-hold entries 3000440 and 3000442. When your paper is published, you can either release the COD on-hold entries using the COD deposition and data management Web site, or you can drop us an e-mail and we'll handle the release for you. Looking forward to see you new publication, very interested in your solution methods! Sincerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From grazulis at ibt.lt Mon May 15 19:05:49 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Mon, 15 May 2023 19:05:49 +0300 Subject: [Cod-bugs] Data registration In-Reply-To: References: Message-ID: <42584438-c639-8c0a-05a8-fe9ba74a69a3@ibt.lt> Dear Takanori Nakane, thank you very much for the thorough explanation of your files! Indeed, "classical" R-merge has limitations when it comes to data from multiple crystals, or high-multiplicity data. I also agree on inclusion of low intensity reflections, for what I know this is the state-of-the art whit modern software. I also see that your files contain the responses to the PLATON alerts ? I have overlooked them at the very beginning. Sorry for that! Regarding your note: "The OLN-SUCA dataset resulted from 34 crystals out of 244 measured crystals" I think this is very valuable information and it would be beneficial if it is provided in the corresponding CIFs that you have deposited. I would suggest including this phrase in the additional data item, _chemical_compound_source, which is a free string giving details of the sample synthesis, identity and handling. It would then clarify the statistics of these files. Also, I suggest that the data item "_diffrn_radiation_probe electron" is added to entries 3000438 and 3000440?3000442 that report electrons as radiation type; in this way these structures will? be clearly identified as being solved by the means of electron diffraction. If you and your co-authors do not object, I will insert the "_diffrn_radiation_probe" data element into the mentioned files. Also, please let me know to which deposited data file the crystal numbers pertain and into which deposited files should I insert the account on the number of used crystals. Sincerely yours, Saulius On 2023-05-14 07:57, Takanori Nakane wrote: > Dear Dr. Saulius Gra?ulis, > > I am in charge of MicroED data processing of the structure > Dr. Toshiyuki Sasaki is trying to deposit. > > I am writing you about data quality concerns you mentioned. > Toshiyuki will write to you separately on how to proceed the deposition. > > > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is > zero, > > and the remaining absorption correction parameters are not specified. I > > see that very few of the electron diffraction studies reported in the > > COD use absorption correction, but if it is technically possible to > > apply it, maybe the final refinement R-factors will get lower? > > Unlikely. The high R factors (merging and refinement) are due to > other reasons (see below). > > In electron diffraction at 200 kV, absorption effects are negligible > for light elements. Effects of inelastic scattering can be > treated by absorption scaling, but this is a very crude, ad hoc > approximation and does not have physical meaning. In dials.scale, > which we used for scaling, the scaling factors are empirically modeled > as a smooth function of resolution, rotation angle and position on the > detector. This approach is different from physics-based modeling > based on the crystal composition (mu). > > > I also see that there are large values reported for the symmetry > > equivalent reflection agreement: > > > >???? _diffrn_reflns_av_R_equivalents??? 0.8897 > >???? _diffrn_reflns_av_unetI/netI?????? 0.3546 > > > > The COD min / avg / max values are 0.0661 / 0.218177 (sample ? = 0.099) > > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944 > > (sample ? = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the > > values in your file seem quite high compared to what we see in the COD > > (for the 35 structures explicitly reported as electron diffraction > > studies by setting 'radiation' column to 'electron'), the > > _diffrn_reflns_av_R_equivalents is beyond 5*?. Could it? be that > > applying absorption correction would decrease these statistics as well? > > While traditional small molecular crystallography collects > a dataset from one or a few crystal(s), we take a massively > high multiplicity approach. The OLN-SUCA dataset resulted from > 34 crystals out of 244 measured crystals. > > The traditional R factor increases with the multiplicity of a dataset > and is considered inadequate as a resolution metric. > This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997 > for the case of a related metric R(merge) in macromolecular > crystallography. > > Similarly, unetI/netI is not a great metric, because it takes > the absolute value of intensities. Weak reflections can > have negative observed intensities (due to background subtraction), > so taking absolute values is not adequate. > In advanced data processing methods (common in macromolecular > crystallography), information in negative reflections can still > be utilized via maximum-likelihood intensity based target or > French-Wilson scaling. Thus, we didn't remove such reflections > but this led to worse statistics. > > > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are > > somewhat high, even for electron diffraction studies. > > We applied kinematical refinement, not dynamical refinement. > Many variants of electron diffraction exist (precession electron > diffraction, convergent-beam electron diffraction, etc) and > they suffer from dynamical diffraction to varying extents. > Among MicroED, we don't consider our statistics is significantly > worse than others. > > Another reason for seemingly worse metrics is that we included weaker > reflections. Excluding noisy high resolution structure factors does not > improve the structure accuracy, provided that each reflection > is weighted properly. Removal can degrade the refined structure > because valuable information, however noisy, is excluded. > > This is a modern view of crystallographic structure refinement, > initially introduced in macromolecular crystallography. > Please see Diederichs and Karplus "Better models by discarding data?" > Acta Crystallographica Section D 69.7 (2013): 1215-1222, > which says: > "even though discarding the weaker data leads to improvements in > the merging R values, the refined models based on these data are > of lower quality." > > Thus, we used higher resolution reflections for refinement > than traditional small molecular crystallographers would use. > In addition, the deposition includes high resolution reflections not > used in refinement, in the hope that future algorithm developments might > allow extraction of more information from them. > > > I did not use the CheckCIF [1] tool to avoid breaking confidentiality > > of your structure, but it would be interesting if your could get a > > CheckCIF / PLATO reports on your structure. Are there any Level A > > alerts? > > The level A alert was about high R(int), which is caused by > high multiplicity and dynamical effects. > > Thank you very much for your feedback. > I hope this explanation helps. > > Best regards, > > Takanori Nakane > -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From tnakane.protein at osaka-u.ac.jp Tue May 16 04:36:54 2023 From: tnakane.protein at osaka-u.ac.jp (Takanori Nakane) Date: Tue, 16 May 2023 10:36:54 +0900 Subject: [Cod-bugs] Data registration In-Reply-To: <42584438-c639-8c0a-05a8-fe9ba74a69a3@ibt.lt> References: <42584438-c639-8c0a-05a8-fe9ba74a69a3@ibt.lt> Message-ID: Dear Saulius, Thank you very much for your suggestions. > If you and your co-authors do not object, I will insert the > "_diffrn_radiation_probe" data element into the mentioned files. Yes, please. > Also, please let me know to which deposited data file the crystal > numbers pertain and into which deposited files should I insert the > account on the number of used crystals. The number of merged crystals are as follows: 3000438: OLN-OXA.cif 70 crystals 3000440: OLN-SUCA.cif 34 crystals 3000441: OLN.cif 28 crystals 3000442: SUCA.cif 41 crystals If you are interested in this study, our preprint is available at https://chemrxiv.org/engage/chemrxiv/article-details/6438d40c08c86922fff1f519. Unfortunately, the journal policy prohibits uploading of the revised manuscript with more information. Best regards, Takanori Nakane On 2023/05/16 1:05, Saulius Gra?ulis wrote: > Dear Takanori Nakane, > > thank you very much for the thorough explanation of your files! Indeed, > "classical" R-merge has limitations when it comes to data from multiple > crystals, or high-multiplicity data. I also agree on inclusion of low > intensity reflections, for what I know this is the state-of-the art whit > modern software. > > I also see that your files contain the responses to the PLATON alerts ? > I have overlooked them at the very beginning. Sorry for that! > > Regarding your note: > > "The OLN-SUCA dataset resulted from 34 crystals out of 244 measured > crystals" > > I think this is very valuable information and it would be beneficial if > it is provided in the corresponding CIFs that you have deposited. I > would suggest including this phrase in the additional data item, > _chemical_compound_source, which is a free string giving details of the > sample synthesis, identity and handling. It would then clarify the > statistics of these files. > > Also, I suggest that the data item "_diffrn_radiation_probe electron" is > added to entries 3000438 and 3000440?3000442 that report electrons as > radiation type; in this way these structures will? be clearly identified > as being solved by the means of electron diffraction. > > If you and your co-authors do not object, I will insert the > "_diffrn_radiation_probe" data element into the mentioned files. > > Also, please let me know to which deposited data file the crystal > numbers pertain and into which deposited files should I insert the > account on the number of used crystals. > > Sincerely yours, > Saulius > > On 2023-05-14 07:57, Takanori Nakane wrote: >> Dear Dr. Saulius Gra?ulis, >> >> I am in charge of MicroED data processing of the structure >> Dr. Toshiyuki Sasaki is trying to deposit. >> >> I am writing you about data quality concerns you mentioned. >> Toshiyuki will write to you separately on how to proceed the deposition. >> >> > Moreover, I have noticed that the _exptl_absorpt_coefficient_mu is >> zero, >> > and the remaining absorption correction parameters are not specified. I >> > see that very few of the electron diffraction studies reported in the >> > COD use absorption correction, but if it is technically possible to >> > apply it, maybe the final refinement R-factors will get lower? >> >> Unlikely. The high R factors (merging and refinement) are due to >> other reasons (see below). >> >> In electron diffraction at 200 kV, absorption effects are negligible >> for light elements. Effects of inelastic scattering can be >> treated by absorption scaling, but this is a very crude, ad hoc >> approximation and does not have physical meaning. In dials.scale, >> which we used for scaling, the scaling factors are empirically modeled >> as a smooth function of resolution, rotation angle and position on the >> detector. This approach is different from physics-based modeling >> based on the crystal composition (mu). >> >> > I also see that there are large values reported for the symmetry >> > equivalent reflection agreement: >> > >> >???? _diffrn_reflns_av_R_equivalents??? 0.8897 >> >???? _diffrn_reflns_av_unetI/netI?????? 0.3546 >> > >> > The COD min / avg / max values are 0.0661 / 0.218177 (sample ? = 0.099) >> > / 0.4322 for _diffrn_reflns_av_R_equivalents and 0.0007 / 0.141944 >> > (sample ? = 0.120) / 0.5071 for _diffrn_reflns_av_unetI/netI; thus the >> > values in your file seem quite high compared to what we see in the COD >> > (for the 35 structures explicitly reported as electron diffraction >> > studies by setting 'radiation' column to 'electron'), the >> > _diffrn_reflns_av_R_equivalents is beyond 5*?. Could it? be that >> > applying absorption correction would decrease these statistics as well? >> >> While traditional small molecular crystallography collects >> a dataset from one or a few crystal(s), we take a massively >> high multiplicity approach. The OLN-SUCA dataset resulted from >> 34 crystals out of 244 measured crystals. >> >> The traditional R factor increases with the multiplicity of a dataset >> and is considered inadequate as a resolution metric. >> This is pointed out in Diederichs & Karplus, Nat. Struct. Biol., 1997 >> for the case of a related metric R(merge) in macromolecular >> crystallography. >> >> Similarly, unetI/netI is not a great metric, because it takes >> the absolute value of intensities. Weak reflections can >> have negative observed intensities (due to background subtraction), >> so taking absolute values is not adequate. >> In advanced data processing methods (common in macromolecular >> crystallography), information in negative reflections can still >> be utilized via maximum-likelihood intensity based target or >> French-Wilson scaling. Thus, we didn't remove such reflections >> but this led to worse statistics. >> >> > The _refine_ls_R_factor_gt and _refine_ls_wR_factor_ref, however, are >> > somewhat high, even for electron diffraction studies. >> >> We applied kinematical refinement, not dynamical refinement. >> Many variants of electron diffraction exist (precession electron >> diffraction, convergent-beam electron diffraction, etc) and >> they suffer from dynamical diffraction to varying extents. >> Among MicroED, we don't consider our statistics is significantly >> worse than others. >> >> Another reason for seemingly worse metrics is that we included weaker >> reflections. Excluding noisy high resolution structure factors does not >> improve the structure accuracy, provided that each reflection >> is weighted properly. Removal can degrade the refined structure >> because valuable information, however noisy, is excluded. >> >> This is a modern view of crystallographic structure refinement, >> initially introduced in macromolecular crystallography. >> Please see Diederichs and Karplus "Better models by discarding data?" >> Acta Crystallographica Section D 69.7 (2013): 1215-1222, >> which says: >> "even though discarding the weaker data leads to improvements in >> the merging R values, the refined models based on these data are >> of lower quality." >> >> Thus, we used higher resolution reflections for refinement >> than traditional small molecular crystallographers would use. >> In addition, the deposition includes high resolution reflections not >> used in refinement, in the hope that future algorithm developments might >> allow extraction of more information from them. >> >> > I did not use the CheckCIF [1] tool to avoid breaking confidentiality >> > of your structure, but it would be interesting if your could get a >> > CheckCIF / PLATO reports on your structure. Are there any Level A >> > alerts? >> >> The level A alert was about high R(int), which is caused by >> high multiplicity and dynamical effects. >> >> Thank you very much for your feedback. >> I hope this explanation helps. >> >> Best regards, >> >> Takanori Nakane >> > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From grazulis at ibt.lt Tue May 16 09:22:41 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Tue, 16 May 2023 09:22:41 +0300 Subject: [Cod-bugs] Data registration In-Reply-To: References: <42584438-c639-8c0a-05a8-fe9ba74a69a3@ibt.lt> Message-ID: <11fb692c-bc8e-7ebe-2019-903d81e54b15@ibt.lt> Dear Takanori, thank you very much for your reply! I'll include the suggested data items and your data into the deposited files during as a data curation process. On 2023-05-16 04:36, Takanori Nakane wrote: > Dear Saulius, > > Thank you very much for your suggestions. > > > If you and your co-authors do not object, I will insert the > > "_diffrn_radiation_probe" data element into the mentioned files. > > Yes, please. OK, I'll do. > > > Also, please let me know to which deposited data file the crystal > > numbers pertain and into which deposited files should I insert the > > account on the number of used crystals. > > The number of merged crystals are as follows: > > 3000438: OLN-OXA.cif 70 crystals > 3000440: OLN-SUCA.cif 34 crystals > 3000441: OLN.cif 28 crystals > 3000442: SUCA.cif 41 crystals Perfect, thanks a lot for this detailed information! I'll insert these data items ASAP. > > If you are interested in this study, our preprint is > available at > https://chemrxiv.org/engage/chemrxiv/article-details/6438d40c08c86922fff1f519. > Very interesting, thanks for the preprint link, I'm looking into it. > Unfortunately, the journal policy prohibits uploading of > the revised manuscript with more information. That's fine, the preprint contains all the necessary information, and I hope my University will have the access once your publication is out in the journal. Sicnerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.