From grazulis at ibt.lt Mon Aug 5 09:51:18 2024 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=C5=BEulis?=) Date: Mon, 5 Aug 2024 09:51:18 +0300 Subject: [Cod-bugs] [EXT] AMCSD duplicate atoms and missing aniso links? In-Reply-To: References: <8e52e340-f530-666c-2a38-0a0d29e6b142@ibt.lt> Message-ID: <4e817632-02a8-4243-868d-5ebf8a0796eb@ibt.lt> Hi, Bob, thank you for the swift answer! Below are m y answers and comments. On 2024-08-03 21:04, Downs, Robert T - (rdowns) wrote: > > Thanks for pointing that out. Looks like a labeling problem. Pb* and > Pb? are the same atom. The Pb atoms are on a split site. > Thanks for the clarification! I also understood it the same way! > > I think that my code to write cifs tried to avoid using a quote mark ? > because it gets interpreted poorly on some applications, especially > when searching, so I changed the quote to an asterisk, *, but I must > have only done it for atomic positions and didn?t do it for the > displacement factors. Ugh. > So the single quote character ("'") is regularly changed to an asterisk ("*") when converting from AMC to CIF format, right? And yes, it should be the ASCII quote (ASCII DEC 39 / HEX 27), not the Unicode quote "?" (UTF-8 HEX sequence: e2 80 98). As far as the CIF and COD parser goes, the single ASCII quote in atom names ("'") is as good (or bad) as the asterisk. It is standard CIF, and the software MUST read it; if some program does not, IMHO the program should be fixed, not the data file. Adapting dat presentation to the quirks of buggy programs is, IMHO, a bad strategy ? next time some program will /insist/ (erroneously) on having a quote and will reject the asterisk, and we are stuck :). Actually, all modern programs, notably Jmol, can process atom names with terminating quotes, even though this needs to use a new syntax in the program scripts... Also, changing "'" to "*" makes AMC and CIF atom naming different, so finding correspondences becomes more difficult. If identifier-like atom labels are required, maybe it is better to spell out 'prime' and 'doubleprime' in both AMC and CIF files? In any case, there is a second problem related to asterisk conversion: in several AMCSD CIFs (e.g. COD 9002008 [1], AMCSD "0002070" [2]) two atoms Pb' and Pb" are present, which both become Pb* after conversion: > loop_ > _atom_site_aniso_label > _atom_site_aniso_U_11 > _atom_site_aniso_U_22 > _atom_site_aniso_U_33 > _atom_site_aniso_U_12 > _atom_site_aniso_U_13 > _atom_site_aniso_U_23 > *Pb'* 0.02100 0.05400 0.03400 -0.01300 0.00700 -0.00100 > *Pb"* 0.01900 0.08100 0.06100 0.00600 0.01200 0.00500 and > loop_ > _atom_site_label > _atom_site_fract_x > _atom_site_fract_y > _atom_site_fract_z > _atom_site_occupancy > *Pb** 0.27050 -0.01360 0.07260 0.59000 > *Pb** 0.27870 0.00890 0.06860 0.41000 Thus, it is no longer possible for CIF programs to tell which is which without resorting to some wild assumptions (i.e. that the CIF atoms follow the AMC file ordering and no program has rearranged the atoms in the meantime...) What would be your policy on handling double quotes in atom labels? Are you going to leave them as double quotes, or maybe convert to double asterisks, e.g. to "Pb**"? > Anyways, its my mistake. Thanks for catching it. > A great deal of credit goes to Antanas Vaitkus who wrote the 'cif_validate' software ? it is now very handy to catch such situations in a systematic way :) > > You can go to amcsd web site and see the amc file, and that would have > helped you see the problem. I make cifs from the amc format. > Thanks, great tip! I somehow overlooked this; we can use AMC files to double check the CIFS! > Also, the atoms are always in the same order in the two parts of the > cif, (xyz and adf) so that might help you to solve issues. Its because > the cif is always made from the amc. > Excellent! Good to know, I'll use this info to restore the aniso <-> atom_site connections. > > I am considering how to change this somehow in the future. We are > building a new website for the future, amcsd will become part of a > NASA host of mineral databases. I expect to have an agreement with the > society journals to automatically download their deposited cifs. > This sounds very promising, cool! > > Unfortunately, these cifs are usually so poorly made that I know this > will be a bad solution. It?s the reason why I have always made amc > format and then computed the cifs. But I am planning for the future > when it will not be me doing the databases. > Regarding the CIFs, we have at the COD a pretty decent pipeline for ingesting CIFs, even some broken ones. Could that be useful? Maybe we could collaborate on the CIF processing and reviewing? Regards, Saulius Refs.: [1] http://www.crystallography.net/cod/9002008.cif [2] https://rruff.geo.arizona.edu/AMS/CIF_text_files/02094_cif.txt, https://rruff.geo.arizona.edu/AMS/AMC_text_files/02094_amc.txt, https://rruff.geo.arizona.edu/AMS/result.php -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toshiyuki.sasaki at spring8.or.jp Thu Aug 8 15:23:08 2024 From: toshiyuki.sasaki at spring8.or.jp (Toshiyuki Sasaki) Date: Thu, 8 Aug 2024 21:23:08 +0900 Subject: [Cod-bugs] COD Data registration In-Reply-To: References: <006701d9ae4b$cc8d8ff0$65a8afd0$@spring8.or.jp> <7179b502-51ad-38ad-ba82-62e192d93919@ibt.lt> <026c01d9b0ab$d9170140$8b4503c0$@spring8.or.jp> <004501da436b$db9a11c0$92ce3540$@spring8.or.jp> <002701da4420$21774dc0$6465e940$@spring8.or.jp> <21e3544a-9e79-490e-94e5-2fe824a15ed5@ibt.lt> <007701dac859$4cc081a0$e64184e0$@spring8.or.jp> <02c501dae499$85a9adf0$90fd09d0$@spring8.or.jp> Message-ID: Dear Dr. Saulius Gra?ulis Sorry for the lack of information. Deposited data and related information 3000560: taca-ina-meno2_fill.cif data_taca-ina-meno2 merged 13 crystals Removed lines for the deposition _refine_ls_wR_factor_ref 0.4090 3000561: TACA-INA-Toluene_fill.cif data_taca-ina-toluene merged 9 crystals Removed lines for the deposition _refine_ls_R_factor_gt 0.1540 _refine_ls_wR_factor_ref 0.3972 > BTW, will your your colleagues be at the ECM34? It would be nice to meet and talk in person :) I will not be there and I don't know whether my collaborators join the ECM34 or not. Anyway, I mentioned about you and if someone will attend the ECM34, he will contact you :) Best regards, Toshiyuki Sasaki 2024?8?2?(?) 18:02 Saulius Gra?ulis : > Dear Dr. Toshiyuki, > > thank you for the e-mail! I'll insert the ...R_factor_... data names into > you submitted CIFs where necessary. > > Regarding the references to the image locations, there are two mechanisms > that that we consider using. One is the IUCr suggestion by Brian McMahon > announced at the IUCr Congress last year. Another is the intrinsic COD > mechanism using the '_cod_related_entry_[]' category. Both requite certain > update of the COD infrastructure, which I plan to do, hopefully, before > leaving to the ECM34 this month. > > The '_cod_related_entry_[]' at the moment looks like this: > > saulius at tasmanijos-velnias cod-tools/ $ tail -5 > /home/saulius/struct/cod/current/cif/9/01/51/9015157.cif > loop_ > _cod_related_entry_id > _cod_related_entry_database > _cod_related_entry_code > 1 AMCSD 0018609 > > In a similar way, we could use an external database, say XRDA [1], and > gibe a link to the database's record (unique ID). > > For this I need to make an additional SQL table in the COD, xrda_x_cod, > and update the COD deposition scripts so that they populate that new table. > It is not a large work, I just need to find two free days to implement and > test this :) > > BTW, will your your colleagues be at the ECM34? It would be nice to meet > and talk in person :) > > Regards, > Saulius > > On 2024-08-02 08:04, Toshiyuki Sasaki wrote: > > Dear Dr. Saulius Gra?ulis, > > > > I have uploaded crystal structures with some errors solved by MicroED to > COD. > > By following the previous procedures (remove lines and give you > information about the errors etc.), I would like you to handle the data > properly. > > Deposited data and related information > > 3000560: merged 13 crystals > > Removed lines for the deposition > > _refine_ls_wR_factor_ref 0.4090 > > > > 3000561: merged 9 crystals > > Removed lines for the deposition > > _refine_ls_R_factor_gt 0.1540 > > _refine_ls_wR_factor_ref 0.3972 > > > > Regarding EM images, what can I do? > > ?I'll come back to your shortly with a suggested update of your data > files.? > > > > Thank you for your cooperation. > > Best regards, > > Toshiyuki > > ************************************************************** > > ?????????????????? > > (Japan Synchrotron Radiation Research Institute (JASRI)) > > ???????? (Diffraction and Scattering Division) > > ??????????? (Tenure-track researcher) > > ??? ?? (Dr. Toshiyuki Sasaki) > > TEL: 0791-58-0802(3430) > > ?679-5198 ???????????1??1?1? > > (1-1-1, Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5198 Japan) > > ************************************************************** > > > > *From:* Saulius Gra?ulis [mailto:grazulis at ibt.lt ] > *Sent:* Thursday, June 27, 2024 3:18 PM > *To:* Toshiyuki Sasaki > > *Cc:* cod-bugs at lists.crystallography.net > *Subject:* Re: COD Data registration > > > > On 2024-06-27 09:14, Toshiyuki Sasaki wrote: > > Finally our paper was published on Journal of Materials Chemistry C. > > The deposited CIF files (3000450, 3000451) were just updated by including > publication information. > > In addition, I give you DOI of XRDa of these structures as I promised. > > 3000450, 3000451: 10.51093/xrd-00142 > > Thanks a lot for your update, and congratulations with your publication! > > I'll have a look at the data. Also, I remember that your have published > the links to original EM images; it would be great to include these links > into the COD files as well. We have recently developed a mechanism how this > can be done,; I'll come back to your shortly with a suggested update of > your data files. > > Sincerely yours, > Saulius > > -- > > Dr. Saulius Gra?ulis > > Vilnius University Institute of Biotechnology, Saul?tekio al. 7 > > LT-10257 Vilnius, Lietuva (Lithuania) > > mobile: (+370-684)-49802, (+370-614)-36366 > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > > -- > Dr. Saulius Gra?ulis > Vilnius University Institute of Biotechnology, Saul?tekio al. 7 > LT-10257 Vilnius, Lietuva (Lithuania) > fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 > mobile: (+370-684)-49802, (+370-614)-36366 > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > -------------- next part -------------- An HTML attachment was scrubbed... URL: