From grazulis at ibt.lt Thu Feb 23 16:20:51 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 23 Feb 2023 16:20:51 +0200 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: Dear Steef, many great thanks for your updated CIFs! I'm sorry that I did not yet answer your previous e-mail, which I will do in a while; for now I want to discuss the final amendments to the CIFs so that we commit you corrections to the COD. On 2023-02-22 10:07, Steef Boerrigter wrote: > An iron-clad test to see if the operators are self-consistent is to > apply each operator to each other operator which in each case should > produce an existing operator. > I ran this test on all the database entries and it failed in (only!) 15 cases. > In most cases there are missing operators, which means that software > that relies on the operators directly will have missing atoms in the > unit cell. > In other cases,it is clear that the authors attempted to manually edit > the spacegroup symmetry operators, but made some mistake in doing so. > > I manually went through each case and fixed the issue in a way that > requires the least amount of edits to make clear exactly where the > mistake was made -- as opposed to just completely replacing the entire > operator block. > > I have added a diff file of (today's) database entry with the fixed > version of the cif. The diff file includes my analysis of what went > wrong with the operators. > I am also including the fixed cif files. Thanks a lot for the fixes! They are extremely valuable! The corrections are just fine and can be committed to the COD. Would your wish/agree to sign these corrections by your name? The COD data curation practice is to describe what changes were done to the entry in the entry itself, and to attribute the authorship of the changes to the person who suggested or made them. I attach one of the files sent by you with the final comments before the commit. Your name is mentioned at the '_cod_changelog_entry', id 2. If you agree with mentioning your name, I will prepare the remaining file in the same way and commit them to the COD. You may (as per copyright law :) ) stay anonymous, but I think it is very fair to give you credit for these fixes. I envisage the future where data curation will count as scientific output in our universities ;) If you would like to include your ORCID, we can do that as well (we'll have to prepare a new data name for that, but this should be quick). Please let me know how you feel about the changes. Meanwhile I'll annotate the remaining files, and at that point we are one 'Enter' press way from getting them into the COD :). Regards, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- A non-text attachment was scrubbed... Name: 2102130.cif Type: chemical/x-cif Size: 10501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From grazulis at ibt.lt Thu Feb 23 16:49:38 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 23 Feb 2023 16:49:38 +0200 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: <4a21f0ae-3913-1356-c249-0f57eb00813c@ibt.lt> Hi, Steef, On 2023-02-22 10:07, Steef Boerrigter wrote: > I have added a diff file of (today's) database entry with the fixed > version of the cif. The diff file includes my analysis of what went > wrong with the operators. > I am also including the fixed cif files. There is one more gotcha with the change of symmetry operators. The operator indices are used in _geom_bond and _geom_angle_ tables. When we change the symmetry operators, these _geom_ references are no longer valid. This is one of the reasons why we try not to modify symops in general. But in the case of the 15 files that you have corrected, the initial symmetry operators where wrong, for one reason or another. Therefore we can not rely on the references to these operators in the _geom_ tables since we do not know what the authors meant. For this reason I will replace the '_geom_bond_site_symmetry_...' values and other similar values with question marks ('?', without the quotes), to indicate that these references are not know. You do not need to do anything about this change (other than to make sure that your software tolerates CIFs with '?' values ;) ); I just inform you that I will make additional changes to the files you have sent. Sincerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From sxmboer at gmail.com Thu Feb 23 20:07:18 2023 From: sxmboer at gmail.com (Steef Boerrigter) Date: Thu, 23 Feb 2023 13:07:18 -0500 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: <4a21f0ae-3913-1356-c249-0f57eb00813c@ibt.lt> References: <4a21f0ae-3913-1356-c249-0f57eb00813c@ibt.lt> Message-ID: Saulius, For the 15 cases I checked that the given space group symbols are correct. I can send you the computer output from my program, if you are interested to see it. I double-checked the structures visually in mercury to see that the fixed version indeed look correct. Speaking of gotchas... Something I learned recently is the issue of the symmetry of ADPs. ADPs can be given in different formats and the different formats basically project the ADP into different sets of projection vectors. The symmetry copies of the ADPs pose limitations on the degrees of freedom and the refined values reflect that. So, making changes to the space group, such as for instance changing the setting to the international tables standard setting is not straightforward. When changing crystallographic settings, all the ADPs will typically change accordingly. Also, the given ADPs of the asymmetric unit are not necessarily the same for the generated symmetry copies, only in the case of inversion symmetry, I believe, so you also not generally change the coordinates to more conveniently looking positions (for instance to have a single molecule show as the asymmetric unit in the jsmol as opposed to atoms scattered around in neighboring unit cells.) Lattice transformations are not straightforward and I know of professional programs that do not do it correctly. Making changes to a structure indeed causes a train of things to be changed accordingly to keep the crystal structure consistent. So, long story short, it makes perfect sense to try to keep changes to a submitted CIF to a minimum. Steef P.S. Speaking of _geom entries, I usually don't take too much interest in these numbers, because they are relatively easy to calculate. In a way, it is derived data, so what is the point of including it in the CIF anyway? Well, it does offer a great mechanism to see if a CIF file is internally consistent. The _geom entries rely on a system of referencing by atomic labels. ADPs work in similar fashion and that's where I found that in a number of entries the atomic labels are not unique. So it isn't clear which _geom or ADP is actually being given. Like I said, the _geom is not all that important to me, because I just calculate the numbers on the spot for a label in my structure viewer. But, the ADPs are a different story. Assigning the wrong ADP to an atom is typically not going to give a huge difference when comparing the calculated powder pattern for an organic structure, but I found it can make a significant difference for the relative intensities of inorganics. Enough so, that the relative intensities of the most intense peaks are affected. And that, in turn, affects how the structure is indexed for database matching. ICDD only looks at the 3 most intense peaks and I assume that most matching algorithms will have such a limitation. I flagged those cases, but the list is quite extensive and I need to do some additional work to see how to handle those cases. I think the pragmatic -- but formally incorrect -- approach is to assume that the references follow the same order of appearance in the file. This can be confirmed by recalculating the values and see which values match. The correct solution is of course to rename the affected atoms to make the labels unique, but given the diverse use of labeling systems, this may not be feasible. On Thu, Feb 23, 2023 at 9:49 AM Saulius Gra?ulis wrote: > > Hi, Steef, > > On 2023-02-22 10:07, Steef Boerrigter wrote: > > I have added a diff file of (today's) database entry with the fixed > > version of the cif. The diff file includes my analysis of what went > > wrong with the operators. > > I am also including the fixed cif files. > > There is one more gotcha with the change of symmetry operators. The > operator indices are used in _geom_bond and _geom_angle_ tables. When we > change the symmetry operators, these _geom_ references are no longer > valid. This is one of the reasons why we try not to modify symops in > general. > > But in the case of the 15 files that you have corrected, the initial > symmetry operators where wrong, for one reason or another. Therefore we > can not rely on the references to these operators in the _geom_ tables > since we do not know what the authors meant. For this reason I will > replace the '_geom_bond_site_symmetry_...' values and other similar > values with question marks ('?', without the quotes), to indicate that > these references are not know. > > You do not need to do anything about this change (other than to make > sure that your software tolerates CIFs with '?' values ;) ); I just > inform you that I will make additional changes to the files you have sent. > > Sincerely yours, > Saulius > > -- > Dr. Saulius Gra?ulis > Vilnius University, Life Science Center, Institute of Biotechnology > Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) > phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > From sxmboer at gmail.com Thu Feb 23 20:12:24 2023 From: sxmboer at gmail.com (Steef Boerrigter) Date: Thu, 23 Feb 2023 13:12:24 -0500 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: Saulius, I certainly agree to signing to these changes. I was just not aware of how the system works. I do most of this work at home as a hobby on my laptop after dinner. I will look at it tonight. Steef On Thu, Feb 23, 2023 at 9:20 AM Saulius Gra?ulis wrote: > > Dear Steef, > > many great thanks for your updated CIFs! I'm sorry that I did not yet > answer your previous e-mail, which I will do in a while; for now I want > to discuss the final amendments to the CIFs so that we commit you > corrections to the COD. > > On 2023-02-22 10:07, Steef Boerrigter wrote: > > An iron-clad test to see if the operators are self-consistent is to > > apply each operator to each other operator which in each case should > > produce an existing operator. > > I ran this test on all the database entries and it failed in (only!) 15 cases. > > In most cases there are missing operators, which means that software > > that relies on the operators directly will have missing atoms in the > > unit cell. > > In other cases,it is clear that the authors attempted to manually edit > > the spacegroup symmetry operators, but made some mistake in doing so. > > > > I manually went through each case and fixed the issue in a way that > > requires the least amount of edits to make clear exactly where the > > mistake was made -- as opposed to just completely replacing the entire > > operator block. > > > > I have added a diff file of (today's) database entry with the fixed > > version of the cif. The diff file includes my analysis of what went > > wrong with the operators. > > I am also including the fixed cif files. > > Thanks a lot for the fixes! They are extremely valuable! The corrections > are just fine and can be committed to the COD. > > Would your wish/agree to sign these corrections by your name? The COD > data curation practice is to describe what changes were done to the > entry in the entry itself, and to attribute the authorship of the > changes to the person who suggested or made them. I attach one of the > files sent by you with the final comments before the commit. Your name > is mentioned at the '_cod_changelog_entry', id 2. If you agree with > mentioning your name, I will prepare the remaining file in the same way > and commit them to the COD. > > You may (as per copyright law :) ) stay anonymous, but I think it is > very fair to give you credit for these fixes. I envisage the future > where data curation will count as scientific output in our universities > ;) If you would like to include your ORCID, we can do that as well > (we'll have to prepare a new data name for that, but this should be quick). > > Please let me know how you feel about the changes. Meanwhile I'll > annotate the remaining files, and at that point we are one 'Enter' press > way from getting them into the COD :). > > Regards, > Saulius > > -- > Dr. Saulius Gra?ulis > Vilnius University, Life Science Center, Institute of Biotechnology > Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) > phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 > From sxmboer at gmail.com Fri Feb 24 06:05:39 2023 From: sxmboer at gmail.com (Steef Boerrigter) Date: Thu, 23 Feb 2023 23:05:39 -0500 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: Saulius, my ORCID is 0000-0002-6874-0692. I am confused. In this email you asked me to annotate the files as you showed in this example. The comments were basically what I wrote in the diff file. between the file entries. In the other email you basically wrote that you already put the comments in and that I don't have to do anything else. Is that correct? On Thu, Feb 23, 2023 at 9:20 AM Saulius Gra?ulis wrote: > > Dear Steef, > > many great thanks for your updated CIFs! I'm sorry that I did not yet > answer your previous e-mail, which I will do in a while; for now I want > to discuss the final amendments to the CIFs so that we commit you > corrections to the COD. > > On 2023-02-22 10:07, Steef Boerrigter wrote: > > An iron-clad test to see if the operators are self-consistent is to > > apply each operator to each other operator which in each case should > > produce an existing operator. > > I ran this test on all the database entries and it failed in (only!) 15 cases. > > In most cases there are missing operators, which means that software > > that relies on the operators directly will have missing atoms in the > > unit cell. > > In other cases,it is clear that the authors attempted to manually edit > > the spacegroup symmetry operators, but made some mistake in doing so. > > > > I manually went through each case and fixed the issue in a way that > > requires the least amount of edits to make clear exactly where the > > mistake was made -- as opposed to just completely replacing the entire > > operator block. > > > > I have added a diff file of (today's) database entry with the fixed > > version of the cif. The diff file includes my analysis of what went > > wrong with the operators. > > I am also including the fixed cif files. > > Thanks a lot for the fixes! They are extremely valuable! The corrections > are just fine and can be committed to the COD. > > Would your wish/agree to sign these corrections by your name? The COD > data curation practice is to describe what changes were done to the > entry in the entry itself, and to attribute the authorship of the > changes to the person who suggested or made them. I attach one of the > files sent by you with the final comments before the commit. Your name > is mentioned at the '_cod_changelog_entry', id 2. If you agree with > mentioning your name, I will prepare the remaining file in the same way > and commit them to the COD. > > You may (as per copyright law :) ) stay anonymous, but I think it is > very fair to give you credit for these fixes. I envisage the future > where data curation will count as scientific output in our universities > ;) If you would like to include your ORCID, we can do that as well > (we'll have to prepare a new data name for that, but this should be quick). > > Please let me know how you feel about the changes. Meanwhile I'll > annotate the remaining files, and at that point we are one 'Enter' press > way from getting them into the COD :). > > Regards, > Saulius > > -- > Dr. Saulius Gra?ulis > Vilnius University, Life Science Center, Institute of Biotechnology > Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) > phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 > From grazulis at ibt.lt Sat Feb 25 09:04:54 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Sat, 25 Feb 2023 09:04:54 +0200 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: Deat Steef, dear CODers, On 2023-02-24 06:05, Steef Boerrigter wrote: > The comments were basically what I wrote in the diff file. between the > file entries. > In the other email you basically wrote that you already put the > comments in and that I don't have to do anything else. An update: I have finished inserting the documentation of the changes and additional changes into the files. Please have a look at the final files: https://saulius-grazulis.lt/~saulius/.7548a541a5b58a648ffcb18765f5e1aa79441a38/cifs/finished/ svn://saulius.grazulis.lt/COD-CIF-corrections/trunk If you are fine with the current files, I'll commit them to the COD, presumably on Monday. I'm CC'ing my colleagues so that they have also a chance to look and review the files. Sincerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- A non-text attachment was scrubbed... Name: grazulis.vcf Type: text/x-vcard Size: 4 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From sxmboer at gmail.com Sun Feb 26 09:13:44 2023 From: sxmboer at gmail.com (Steef Boerrigter) Date: Sun, 26 Feb 2023 02:13:44 -0500 Subject: [Cod-bugs] Cif files with incorrect set of group operators In-Reply-To: References: Message-ID: Dear Saulius, I didn't mean the entire diff file to be added as comments in the CIF files, so I condensed them a bit. I committed the changes and received this message, so it looks to me those changes were committed successfully. Let me know if I screwed anything up, like I said, it's been a while since I last used SVN. trunk>svn commit -m "Condensed the explanations" Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2000246.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2000404.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2001551.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2003347.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2003882.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2004188.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2004351.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2006998.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2007181.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2102130.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2105555.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2105675.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/2105676.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/4062540.cif Sending logs/2023-02-22-Steef-Boerrigter-cif-corrections/cifs/finished/8101007.cif Transmitting file data ...............done Committing transaction... Committed revision 79. On Sat, Feb 25, 2023 at 2:04 AM Saulius Gra?ulis wrote: > > Deat Steef, > dear CODers, > > On 2023-02-24 06:05, Steef Boerrigter wrote: > > The comments were basically what I wrote in the diff file. between the > > file entries. > > In the other email you basically wrote that you already put the > > comments in and that I don't have to do anything else. > > An update: I have finished inserting the documentation of the changes > and additional changes into the files. > > Please have a look at the final files: > > https://saulius-grazulis.lt/~saulius/.7548a541a5b58a648ffcb18765f5e1aa79441a38/cifs/finished/ > > svn://saulius.grazulis.lt/COD-CIF-corrections/trunk > > If you are fine with the current files, I'll commit them to the COD, > presumably on Monday. > > I'm CC'ing my colleagues so that they have also a chance to look and > review the files. > > Sincerely yours, > Saulius > > -- > Dr. Saulius Gra?ulis > Vilnius University Institute of Biotechnology, Saul?tekio al. 7 > LT-10257 Vilnius, Lietuva (Lithuania) > fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 > mobile: (+370-684)-49802, (+370-614)-36366 >