From david at crystalmaker.com Wed Jul 5 12:59:41 2023 From: david at crystalmaker.com (David Palmer) Date: Wed, 5 Jul 2023 10:59:41 +0100 Subject: [Cod-bugs] 2997 invalid files in C.O.D. Message-ID: Dear Colleagues, I send you a message a few weeks ago about my plans to provide easy phase ID via C.O.D.-hosted structures. I haven?t heard back from you, so I assume you have no objections. In the meantime, we have used our automated tools to analyse all current structures files. I am attaching a summary, listing file IDs and errors for 2,997 out of your 0.5M or so files: a relatively-small figure (ca. 0.6%). However, these files are invalid, and cannot be used for structural work, so I would recommend getting them fixed. The most common errors are: - missing fractional coordinates - ambiguous site labelling - invalid element symbols A common issue is a mismatch between site labels in different data blocks (e.g., a table of anisotropic displacement parameters and a table of fractional coordinates). We found these errors in numerous files submitted via the American Mineralogist crystal structures database (clearly, substantial amounts of U.S. governmental funding failed to prevent basic transcription errors!) Take the following file, 9003355, as an example:- ? Sites SiT1?, AlT1? (etc.) are listed in the loop containing Uij ? The same site are labelling differently (e.g., SiT1*, AlT1*, etc.) in the loop containing xyz Whilst, to a human, one could make inferences as to how these labels should be related, a computer cannot make such a judgement, thereby rendering these files useless. I hope this helps, and do let me know if you have any questions. With best wishes, Yours faithfully, David Palmer David C Palmer, Ph.D. (Cantab), M.A. (Cantab), Managing Director, CrystalMaker Software Ltd Centre for Innovation & Enterprise | Oxford University Begbroke Science Park Woodstock Road, Begbroke, Oxfordshire, OX5 1PF, UK ? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Error Files from COD (2023-07-04).txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From toshiyuki.sasaki at spring8.or.jp Thu Jul 6 02:53:23 2023 From: toshiyuki.sasaki at spring8.or.jp (Toshiyuki Sasaki) Date: Thu, 6 Jul 2023 08:53:23 +0900 Subject: [Cod-bugs] COD Data registration Message-ID: <013801d9af9b$e265bbd0$a7313370$@spring8.or.jp> Dear Dr. Saulius Gra?ulis, I would like to register crystal structures solved by MicroED to COD. I have an error before registration. ?The following errors were detected in file [1Y.cif]: Data block 1Y: data item '_refine_ls_wR_factor_ref' value '0.4522' is > 0.45.? Can I remove the line and register the CIF file as in the previous case of COD Ids 3000438?3000442? After that I will send you an e-mail to give you information about the removed line and a number of merged crystals. Best regards, Toshiyuki ************************************************************** ?????????????????? (Japan Synchrotron Radiation Research Institute (JASRI)) ???????? (Diffraction and Scattering Division) ??????????? (Tenure-track researcher) ?????? (Dr. Toshiyuki Sasaki) TEL: 0791-58-0802(3430) ?679-5198????????????1??1?1? (1-1-1, Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5198 Japan) ************************************************************** -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Fri Jul 7 09:04:47 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Fri, 7 Jul 2023 09:04:47 +0300 Subject: [Cod-bugs] COD Data registration In-Reply-To: <013801d9af9b$e265bbd0$a7313370$@spring8.or.jp> References: <013801d9af9b$e265bbd0$a7313370$@spring8.or.jp> Message-ID: <7f81549e-5ab0-6ad9-ebc8-b10caf7c35de@ibt.lt> On 2023-07-06 02:53, Toshiyuki Sasaki wrote: > > I would like to register crystal structures solved by MicroED to COD. > > I have an error before registration. > > ?The following errors were detected in file [1Y.cif]: > > Data block 1Y: data item '_refine_ls_wR_factor_ref' value '0.4522' is > > 0.45.? > > Can I remove the line and register the CIF file as in the previous > case of COD Ids 3000438?3000442? > > After that I will send you an e-mail to give you information about the > removed line and a number of merged crystals. > Yes, please proceed in the way you suggest, and send me the removed lines so that I can reinsert them. We are planning structure review portal where this will be done on-line, but the system will take about half a year to test and roll out. Sincerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anders.Blom at synopsys.com Fri Jul 7 16:09:16 2023 From: Anders.Blom at synopsys.com (Anders Blom) Date: Fri, 7 Jul 2023 13:09:16 +0000 Subject: [Cod-bugs] Unable to connect to SQL server Message-ID: Hi COD, I am trying to connect and query the COD database via MySQL, but it is not working. Are you aware of any problems on your end? I have tried multiple clients, such as pymysql, mysql command line, Oracle MySQLshell and DBeaver GUI. In each case I am able to establish some initial connection, but ultimately it just times out [cid:image002.png at 01D9B0E4.F08684C0] I say "establish connection" in the sense that if I use incorrect server parameters such as the wrong port or hostname, it fails immediately, whereas with the correct COD hostname and port it does wait until the timeout before showing the above error. If this is a problem with your server, I hope it can be resolved, whereas if you believe the database is functioning properly for queries like this, then I have to dig further on my end (firewalls, perhaps, although I have tried multiple networks as well). Kind regards, Anders Blom ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | anders.blom at synopsys.com synopsys.com | 675 Almanor Ave, Sunnyvale, CA 94085, USA [cid:image001.png at 01D9B0E0.8735E780] -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 34938 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 10321 bytes Desc: image002.png URL: From antanas.vaitkus90 at gmail.com Fri Jul 7 16:32:59 2023 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Fri, 7 Jul 2023 16:32:59 +0300 Subject: [Cod-bugs] Unable to connect to SQL server In-Reply-To: References: Message-ID: Dear Anders Blom, I just tried running your query [1] useing MySQL/MariaDb from my machine and everything works as expected so the issue is most likely on your end. I also attach a file with the results of the query in case they would prove useful for you in the short term. [1] mysql -u cod_reader -h sql.crystallography.net -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = 1))" cod On Fri, 7 Jul 2023 at 16:29, Anders Blom wrote: > Hi COD, > > > > I am trying to connect and query the COD database via MySQL, but it is not > working. Are you aware of any problems on your end? > > > > I have tried multiple clients, such as pymysql, mysql command line, Oracle > MySQLshell and DBeaver GUI. In each case I am able to establish some > initial connection, but ultimately it just times out > > > > > > I say ?establish connection? in the sense that if I use incorrect server > parameters such as the wrong port or hostname, it fails immediately, > whereas with the correct COD hostname and port it does wait until the > timeout before showing the above error. > > > > If this is a problem with your server, I hope it can be resolved, whereas > if you believe the database is functioning properly for queries like this, > then I have to dig further on my end (firewalls, perhaps, although I have > tried multiple networks as well). > > > > Kind regards, > > Anders Blom > > > > > > ------------------------------------------------ > > *Anders Blom, Ph.D.* > Solutions Engineer, Sr Staff > > EDAG, Circuit Design & TCAD Solutions > > M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | > anders.blom at synopsys.com > synopsys.com | 675 Almanor Ave, Sunnyvale, CA > 94085, USA > > > > > > > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 34938 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 10321 bytes Desc: not available URL: -------------- next part -------------- file 1100136 1509145 1509146 1509194 1512487 5000218 9008459 9011607 9011608 9012431 9012961 9013045 9013046 9013047 9013048 9013049 9013050 9013051 9013052 9013053 From Anders.Blom at synopsys.com Fri Jul 7 16:54:12 2023 From: Anders.Blom at synopsys.com (Anders Blom) Date: Fri, 7 Jul 2023 13:54:12 +0000 Subject: [Cod-bugs] {Disarmed} RE: Unable to connect to SQL server In-Reply-To: References: Message-ID: Thank you for the quick response, Antanas. It?s very good to know that the server is working. Let me give some context also on this. I work for the team that develops the atomic-scale modeling tool QuantumATK for DFT and other simulations. The company was QuantumWise earlier, before being acquired in 2017 by Synopsys. We have for several years provided an interface to COD in our GUI, which allows users to build relatively advanced queries with a few clicks, and then retrieve and directly use structures from COD in their simulations. An example is shown in one of our tutorials: https://docs.quantumatk.com/tutorials/li_ion_diffusion/li_ion_diffusion.html#import-lifepo4-bulk-structure If relevant and interesting to COD, we could perhaps consider making this a more formal partnership, although I have to consult other people on my side in terms of co-marketing or PR announcements. Let me also point out that QuantumATK is essentially free for academic use in Europe, as it is provided via EuroPractice (https://www.europractice.stfc.ac.uk/tools/synopsys_details.html#QATK) for a cost of only ~?100 per year for a practically unlimited license. We would be delighted if you would consider promoting QuantumATK to your friends and colleagues ? our DFT-LCAO code is particularly useful for large-scale simulations of 1000+ atoms, even with HSE, and recently we have taken big steps towards automating fitting of machine-learned MTP forcefields! So, now I just have to figure out if there are some special firewall rules that need to be set for it to work from at least our own internal network. With thanks, and with hopes for a continued mutually beneficial relationship, Anders ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions +1 650-584-5000 (ext 47495) From: Antanas Vaitkus Sent: Friday, July 7, 2023 15:33 To: Anders Blom Cc: cod-bugs at ibt.lt; Bo Lue Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders Blom, I just tried running your query [1] useing MySQL/MariaDb from my machine and everything works as expected so the issue is most likely on your end. I also attach a file with the results of the query in case they would prove useful for you in the short term. [1] mysql -u cod_reader -h sql.crystallography.net -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = 1))" cod On Fri, 7 Jul 2023 at 16:29, Anders Blom > wrote: Hi COD, I am trying to connect and query the COD database via MySQL, but it is not working. Are you aware of any problems on your end? I have tried multiple clients, such as pymysql, mysql command line, Oracle MySQLshell and DBeaver GUI. In each case I am able to establish some initial connection, but ultimately it just times out [cid:image001.png at 01D9B0EA.12806320] I say ?establish connection? in the sense that if I use incorrect server parameters such as the wrong port or hostname, it fails immediately, whereas with the correct COD hostname and port it does wait until the timeout before showing the above error. If this is a problem with your server, I hope it can be resolved, whereas if you believe the database is functioning properly for queries like this, then I have to dig further on my end (firewalls, perhaps, although I have tried multiple networks as well). Kind regards, Anders Blom ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | anders.blom at synopsys.com synopsys.com | 675 Almanor Ave, Sunnyvale, CA 94085, USA [cid:image002.png at 01D9B0EA.12806320] -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Cod-bugs mailing list Cod-bugs at lists.crystallography.net http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10321 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 34938 bytes Desc: image002.png URL: From antanas.vaitkus90 at gmail.com Fri Jul 7 17:18:32 2023 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Fri, 7 Jul 2023 17:18:32 +0300 Subject: [Cod-bugs] {Disarmed} Re: Unable to connect to SQL server In-Reply-To: References: Message-ID: Dear Anders, I am very glad to hear that your company finds the COD useful. I am not in the position to individually make high-level policy decisions, but I will definitely forward your email to my supervisor (Saulius Gra?ulis) and other members of the COD Advisory Board. Please keep in mind, however, that it may take some time for them to respond as most of them have other primary work responsibilities. Please also let us know if you need any additional assistance at resolving the database connection issues on your end. Sincerely Antanas On Fri, 7 Jul 2023 at 16:54, Anders Blom wrote: > Thank you for the quick response, Antanas. It?s very good to know that the > server is working. > > > > Let me give some context also on this. I work for the team that develops > the atomic-scale modeling tool QuantumATK for DFT and other simulations. > The company was QuantumWise earlier, before being acquired in 2017 by > Synopsys. > > > > We have for several years provided an interface to COD in our GUI, which > allows users to build relatively advanced queries with a few clicks, and > then retrieve and directly use structures from COD in their simulations. An > example is shown in one of our tutorials: > https://docs.quantumatk.com/tutorials/li_ion_diffusion/li_ion_diffusion.html#import-lifepo4-bulk-structure > > > > If relevant and interesting to COD, we could perhaps consider making this > a more formal partnership, although I have to consult other people on my > side in terms of co-marketing or PR announcements. > > > > Let me also point out that QuantumATK is essentially free for academic use > in Europe, as it is provided via EuroPractice ( > https://www.europractice.stfc.ac.uk/tools/synopsys_details.html#QATK) for > a cost of only ~?100 per year for a practically unlimited license. We would > be delighted if you would consider promoting QuantumATK to your friends and > colleagues ? our DFT-LCAO code is particularly useful for large-scale > simulations of 1000+ atoms, even with HSE, and recently we have taken big > steps towards automating fitting of machine-learned MTP forcefields! > > > > So, now I just have to figure out if there are some special firewall rules > that need to be set for it to work from at least our own internal network. > > > > With thanks, and with hopes for a continued mutually beneficial > relationship, > > Anders > > > > > > > > > > > > ------------------------------------------------ > > *Anders Blom, Ph.D.* > Solutions Engineer, Sr Staff > > EDAG, Circuit Design & TCAD Solutions > > +1 650-584-5000 (ext 47495) > > > > *From:* Antanas Vaitkus > *Sent:* Friday, July 7, 2023 15:33 > *To:* Anders Blom > *Cc:* cod-bugs at ibt.lt; Bo Lue > *Subject:* Re: [Cod-bugs] Unable to connect to SQL server > > > > Dear Anders Blom, > > I just tried running your query [1] useing MySQL/MariaDb from my machine > and everything > works as expected so the issue is most likely on your end. > > I also attach a file with the results of the query in case they would > prove useful for you in the short term. > > > [1] mysql -u cod_reader -h sql.crystallography.net > > -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = > 1))" cod > > > > On Fri, 7 Jul 2023 at 16:29, Anders Blom wrote: > > Hi COD, > > > > I am trying to connect and query the COD database via MySQL, but it is not > working. Are you aware of any problems on your end? > > > > I have tried multiple clients, such as pymysql, mysql command line, Oracle > MySQLshell and DBeaver GUI. In each case I am able to establish some > initial connection, but ultimately it just times out > > > > > > I say ?establish connection? in the sense that if I use incorrect server > parameters such as the wrong port or hostname, it fails immediately, > whereas with the correct COD hostname and port it does wait until the > timeout before showing the above error. > > > > If this is a problem with your server, I hope it can be resolved, whereas > if you believe the database is functioning properly for queries like this, > then I have to dig further on my end (firewalls, perhaps, although I have > tried multiple networks as well). > > > > Kind regards, > > Anders Blom > > > > > > ------------------------------------------------ > > *Anders Blom, Ph.D.* > Solutions Engineer, Sr Staff > > EDAG, Circuit Design & TCAD Solutions > > M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | > anders.blom at synopsys.com > synopsys.com | 675 Almanor Ave, Sunnyvale, CA > 94085, USA > > > > > > > > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* > , > and is > believed to be clean. > > _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > > > > > -- > > Antanas Vaitkus, > > Vilnius University, > Life Sciences Center, > Institute of Biotechnology, > room C521, Saul?tekio al. 7, > LT-10257 Vilnius, Lithuania > > > -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10321 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 34938 bytes Desc: not available URL: From Anders.Blom at synopsys.com Fri Jul 7 17:36:17 2023 From: Anders.Blom at synopsys.com (Anders Blom) Date: Fri, 7 Jul 2023 14:36:17 +0000 Subject: [Cod-bugs] {Disarmed} RE: Unable to connect to SQL server In-Reply-To: References: Message-ID: Great! For the connection issue, would you mind confirming that you are not on the same network where the SQL server is located? Or, put differently, can you also connect to it when you are e.g. at home? And do you know of other people around the world that actively use the SQL connection? The point is, I suspect that there is a firewall blocking my request, but I don?t know if it?s my corporate one, or the one at the COD end. So even if you can connect from inside the firewall (if that is the case), it may still be that the SQL server is not responding to outside requests. This is just to eliminate as many possibilities as possible. Cheers, Anders From: Antanas Vaitkus Sent: Friday, July 7, 2023 16:19 To: Anders Blom ; Saulius Gra?ulis Cc: cod-bugs at ibt.lt; Bo Lue Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders, I am very glad to hear that your company finds the COD useful. I am not in the position to individually make high-level policy decisions, but I will definitely forward your email to my supervisor (Saulius Gra?ulis) and other members of the COD Advisory Board. Please keep in mind, however, that it may take some time for them to respond as most of them have other primary work responsibilities. Please also let us know if you need any additional assistance at resolving the database connection issues on your end. Sincerely Antanas On Fri, 7 Jul 2023 at 16:54, Anders Blom > wrote: Thank you for the quick response, Antanas. It?s very good to know that the server is working. Let me give some context also on this. I work for the team that develops the atomic-scale modeling tool QuantumATK for DFT and other simulations. The company was QuantumWise earlier, before being acquired in 2017 by Synopsys. We have for several years provided an interface to COD in our GUI, which allows users to build relatively advanced queries with a few clicks, and then retrieve and directly use structures from COD in their simulations. An example is shown in one of our tutorials: https://docs.quantumatk.com/tutorials/li_ion_diffusion/li_ion_diffusion.html#import-lifepo4-bulk-structure If relevant and interesting to COD, we could perhaps consider making this a more formal partnership, although I have to consult other people on my side in terms of co-marketing or PR announcements. Let me also point out that QuantumATK is essentially free for academic use in Europe, as it is provided via EuroPractice (https://www.europractice.stfc.ac.uk/tools/synopsys_details.html#QATK) for a cost of only ~?100 per year for a practically unlimited license. We would be delighted if you would consider promoting QuantumATK to your friends and colleagues ? our DFT-LCAO code is particularly useful for large-scale simulations of 1000+ atoms, even with HSE, and recently we have taken big steps towards automating fitting of machine-learned MTP forcefields! So, now I just have to figure out if there are some special firewall rules that need to be set for it to work from at least our own internal network. With thanks, and with hopes for a continued mutually beneficial relationship, Anders ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions +1 650-584-5000 (ext 47495) From: Antanas Vaitkus > Sent: Friday, July 7, 2023 15:33 To: Anders Blom > Cc: cod-bugs at ibt.lt; Bo Lue > Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders Blom, I just tried running your query [1] useing MySQL/MariaDb from my machine and everything works as expected so the issue is most likely on your end. I also attach a file with the results of the query in case they would prove useful for you in the short term. [1] mysql -u cod_reader -h sql.crystallography.net -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = 1))" cod On Fri, 7 Jul 2023 at 16:29, Anders Blom > wrote: Hi COD, I am trying to connect and query the COD database via MySQL, but it is not working. Are you aware of any problems on your end? I have tried multiple clients, such as pymysql, mysql command line, Oracle MySQLshell and DBeaver GUI. In each case I am able to establish some initial connection, but ultimately it just times out [cid:image001.png at 01D9B0F1.248BD110] I say ?establish connection? in the sense that if I use incorrect server parameters such as the wrong port or hostname, it fails immediately, whereas with the correct COD hostname and port it does wait until the timeout before showing the above error. If this is a problem with your server, I hope it can be resolved, whereas if you believe the database is functioning properly for queries like this, then I have to dig further on my end (firewalls, perhaps, although I have tried multiple networks as well). Kind regards, Anders Blom ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | anders.blom at synopsys.com synopsys.com | 675 Almanor Ave, Sunnyvale, CA 94085, USA [cid:image002.png at 01D9B0F1.248BD110] -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Cod-bugs mailing list Cod-bugs at lists.crystallography.net http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10321 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 34938 bytes Desc: image002.png URL: From antanas.vaitkus90 at gmail.com Fri Jul 7 17:42:57 2023 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Fri, 7 Jul 2023 17:42:57 +0300 Subject: [Cod-bugs] {Disarmed} Re: Unable to connect to SQL server In-Reply-To: References: Message-ID: Dear Anders, I can confirm that I successfully connected from an external network (I am actually working from home today). In a few instances when we got similar bug reports, it was indeed a problem with the firewall on the users side. Sincerely, Antanas On Fri, 7 Jul 2023 at 17:36, Anders Blom wrote: > Great! > > > > For the connection issue, would you mind confirming that you are not on > the same network where the SQL server is located? Or, put differently, can > you also connect to it when you are e.g. at home? And do you know of other > people around the world that actively use the SQL connection? > > > > The point is, I suspect that there is a firewall blocking my request, but > I don?t know if it?s my corporate one, or the one at the COD end. So even > if you can connect from inside the firewall (if that is the case), it may > still be that the SQL server is not responding to outside requests. > > > > This is just to eliminate as many possibilities as possible. > > > > Cheers, > > Anders > > > > > > *From:* Antanas Vaitkus > *Sent:* Friday, July 7, 2023 16:19 > *To:* Anders Blom ; Saulius Gra?ulis < > grazulis at ibt.lt> > *Cc:* cod-bugs at ibt.lt; Bo Lue > *Subject:* Re: [Cod-bugs] Unable to connect to SQL server > > > > Dear Anders, > > I am very glad to hear that your company finds the COD useful. I am not in > the position to individually > make high-level policy decisions, but I will definitely forward your email > to my supervisor (Saulius Gra?ulis) > and other members of the COD Advisory Board. Please keep in mind, however, > that it may take some time > for them to respond as most of them have other primary work > responsibilities. > > > > Please also let us know if you need any additional assistance at resolving > the database connection issues on your end. > > > > Sincerely > > Antanas > > > > On Fri, 7 Jul 2023 at 16:54, Anders Blom wrote: > > Thank you for the quick response, Antanas. It?s very good to know that the > server is working. > > > > Let me give some context also on this. I work for the team that develops > the atomic-scale modeling tool QuantumATK for DFT and other simulations. > The company was QuantumWise earlier, before being acquired in 2017 by > Synopsys. > > > > We have for several years provided an interface to COD in our GUI, which > allows users to build relatively advanced queries with a few clicks, and > then retrieve and directly use structures from COD in their simulations. An > example is shown in one of our tutorials: > https://docs.quantumatk.com/tutorials/li_ion_diffusion/li_ion_diffusion.html#import-lifepo4-bulk-structure > > > > > If relevant and interesting to COD, we could perhaps consider making this > a more formal partnership, although I have to consult other people on my > side in terms of co-marketing or PR announcements. > > > > Let me also point out that QuantumATK is essentially free for academic use > in Europe, as it is provided via EuroPractice ( > https://www.europractice.stfc.ac.uk/tools/synopsys_details.html#QATK > ) > for a cost of only ~?100 per year for a practically unlimited license. We > would be delighted if you would consider promoting QuantumATK to your > friends and colleagues ? our DFT-LCAO code is particularly useful for > large-scale simulations of 1000+ atoms, even with HSE, and recently we have > taken big steps towards automating fitting of machine-learned MTP > forcefields! > > > > So, now I just have to figure out if there are some special firewall rules > that need to be set for it to work from at least our own internal network. > > > > With thanks, and with hopes for a continued mutually beneficial > relationship, > > Anders > > > > > > > > > > > > ------------------------------------------------ > > *Anders Blom, Ph.D.* > Solutions Engineer, Sr Staff > > EDAG, Circuit Design & TCAD Solutions > > +1 650-584-5000 (ext 47495) > > > > *From:* Antanas Vaitkus > *Sent:* Friday, July 7, 2023 15:33 > *To:* Anders Blom > *Cc:* cod-bugs at ibt.lt; Bo Lue > *Subject:* Re: [Cod-bugs] Unable to connect to SQL server > > > > Dear Anders Blom, > > I just tried running your query [1] useing MySQL/MariaDb from my machine > and everything > works as expected so the issue is most likely on your end. > > I also attach a file with the results of the query in case they would > prove useful for you in the short term. > > > [1] mysql -u cod_reader -h sql.crystallography.net > > -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = > 1))" cod > > > > On Fri, 7 Jul 2023 at 16:29, Anders Blom wrote: > > Hi COD, > > > > I am trying to connect and query the COD database via MySQL, but it is not > working. Are you aware of any problems on your end? > > > > I have tried multiple clients, such as pymysql, mysql command line, Oracle > MySQLshell and DBeaver GUI. In each case I am able to establish some > initial connection, but ultimately it just times out > > > > > > I say ?establish connection? in the sense that if I use incorrect server > parameters such as the wrong port or hostname, it fails immediately, > whereas with the correct COD hostname and port it does wait until the > timeout before showing the above error. > > > > If this is a problem with your server, I hope it can be resolved, whereas > if you believe the database is functioning properly for queries like this, > then I have to dig further on my end (firewalls, perhaps, although I have > tried multiple networks as well). > > > > Kind regards, > > Anders Blom > > > > > > ------------------------------------------------ > > *Anders Blom, Ph.D.* > Solutions Engineer, Sr Staff > > EDAG, Circuit Design & TCAD Solutions > > M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | > anders.blom at synopsys.com > synopsys.com | 675 Almanor Ave, Sunnyvale, CA > 94085, USA > > > > > > > > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* > , > and is > believed to be clean. > > _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > > > > > -- > > Antanas Vaitkus, > > Vilnius University, > Life Sciences Center, > Institute of Biotechnology, > room C521, Saul?tekio al. 7, > LT-10257 Vilnius, Lithuania > > > > > > -- > > Antanas Vaitkus, > > Vilnius University, > Life Sciences Center, > Institute of Biotechnology, > room C521, Saul?tekio al. 7, > LT-10257 Vilnius, Lithuania > > > -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10321 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 34938 bytes Desc: not available URL: From Anders.Blom at synopsys.com Fri Jul 7 17:44:37 2023 From: Anders.Blom at synopsys.com (Anders Blom) Date: Fri, 7 Jul 2023 14:44:37 +0000 Subject: [Cod-bugs] {Disarmed} RE: Unable to connect to SQL server In-Reply-To: References: Message-ID: Thanks for letting me know, then I can proceed on my end and will not bother you more ? From: Antanas Vaitkus Sent: Friday, July 7, 2023 16:43 To: Anders Blom Cc: Saulius Gra?ulis ; cod-bugs at ibt.lt; Bo Lue Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders, I can confirm that I successfully connected from an external network (I am actually working from home today). In a few instances when we got similar bug reports, it was indeed a problem with the firewall on the users side. Sincerely, Antanas On Fri, 7 Jul 2023 at 17:36, Anders Blom > wrote: Great! For the connection issue, would you mind confirming that you are not on the same network where the SQL server is located? Or, put differently, can you also connect to it when you are e.g. at home? And do you know of other people around the world that actively use the SQL connection? The point is, I suspect that there is a firewall blocking my request, but I don?t know if it?s my corporate one, or the one at the COD end. So even if you can connect from inside the firewall (if that is the case), it may still be that the SQL server is not responding to outside requests. This is just to eliminate as many possibilities as possible. Cheers, Anders From: Antanas Vaitkus > Sent: Friday, July 7, 2023 16:19 To: Anders Blom >; Saulius Gra?ulis > Cc: cod-bugs at ibt.lt; Bo Lue > Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders, I am very glad to hear that your company finds the COD useful. I am not in the position to individually make high-level policy decisions, but I will definitely forward your email to my supervisor (Saulius Gra?ulis) and other members of the COD Advisory Board. Please keep in mind, however, that it may take some time for them to respond as most of them have other primary work responsibilities. Please also let us know if you need any additional assistance at resolving the database connection issues on your end. Sincerely Antanas On Fri, 7 Jul 2023 at 16:54, Anders Blom > wrote: Thank you for the quick response, Antanas. It?s very good to know that the server is working. Let me give some context also on this. I work for the team that develops the atomic-scale modeling tool QuantumATK for DFT and other simulations. The company was QuantumWise earlier, before being acquired in 2017 by Synopsys. We have for several years provided an interface to COD in our GUI, which allows users to build relatively advanced queries with a few clicks, and then retrieve and directly use structures from COD in their simulations. An example is shown in one of our tutorials: https://docs.quantumatk.com/tutorials/li_ion_diffusion/li_ion_diffusion.html#import-lifepo4-bulk-structure If relevant and interesting to COD, we could perhaps consider making this a more formal partnership, although I have to consult other people on my side in terms of co-marketing or PR announcements. Let me also point out that QuantumATK is essentially free for academic use in Europe, as it is provided via EuroPractice (https://www.europractice.stfc.ac.uk/tools/synopsys_details.html#QATK) for a cost of only ~?100 per year for a practically unlimited license. We would be delighted if you would consider promoting QuantumATK to your friends and colleagues ? our DFT-LCAO code is particularly useful for large-scale simulations of 1000+ atoms, even with HSE, and recently we have taken big steps towards automating fitting of machine-learned MTP forcefields! So, now I just have to figure out if there are some special firewall rules that need to be set for it to work from at least our own internal network. With thanks, and with hopes for a continued mutually beneficial relationship, Anders ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions +1 650-584-5000 (ext 47495) From: Antanas Vaitkus > Sent: Friday, July 7, 2023 15:33 To: Anders Blom > Cc: cod-bugs at ibt.lt; Bo Lue > Subject: Re: [Cod-bugs] Unable to connect to SQL server Dear Anders Blom, I just tried running your query [1] useing MySQL/MariaDb from my machine and everything works as expected so the issue is most likely on your end. I also attach a file with the results of the query in case they would prove useful for you in the short term. [1] mysql -u cod_reader -h sql.crystallography.net -e "select file from data where ((LOWER(formula) RLIKE ' ag') AND (nel = 1))" cod On Fri, 7 Jul 2023 at 16:29, Anders Blom > wrote: Hi COD, I am trying to connect and query the COD database via MySQL, but it is not working. Are you aware of any problems on your end? I have tried multiple clients, such as pymysql, mysql command line, Oracle MySQLshell and DBeaver GUI. In each case I am able to establish some initial connection, but ultimately it just times out [cid:image001.png at 01D9B0F2.4F80DE00] I say ?establish connection? in the sense that if I use incorrect server parameters such as the wrong port or hostname, it fails immediately, whereas with the correct COD hostname and port it does wait until the timeout before showing the above error. If this is a problem with your server, I hope it can be resolved, whereas if you believe the database is functioning properly for queries like this, then I have to dig further on my end (firewalls, perhaps, although I have tried multiple networks as well). Kind regards, Anders Blom ------------------------------------------------ Anders Blom, Ph.D. Solutions Engineer, Sr Staff EDAG, Circuit Design & TCAD Solutions M: +1 408 874 5806 | W: +1 650 584 5000 (ext 47495) | anders.blom at synopsys.com synopsys.com | 675 Almanor Ave, Sunnyvale, CA 94085, USA [cid:image002.png at 01D9B0F2.4F80DE00] -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Cod-bugs mailing list Cod-bugs at lists.crystallography.net http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- Antanas Vaitkus, Vilnius University, Life Sciences Center, Institute of Biotechnology, room C521, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10321 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 34938 bytes Desc: image002.png URL: From grazulis at ibt.lt Sat Jul 8 17:35:44 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Sat, 8 Jul 2023 17:35:44 +0300 Subject: [Cod-bugs] 2997 invalid files in C.O.D. In-Reply-To: References: Message-ID: <5732737c-0521-3043-6482-6d7cb9284ffd@ibt.lt> Dear David, thank you for your e-mail and the list of issues that you have provided. The feedback from the COD users, and of course that includes your feedback, is very valuable for us. We do our best to correct the COD entries if here are errors in them and to make COD as accurate as possible. In doing so we strictly stick to the definitions of the CIF provided by the IUCr and the best current practices we are aware of in crystallography. Sometimes, however, it is not possible to make all corrections that our users request. Below, I'll give my comments on the issues you raise. On 2023-07-05 12:59, David Palmer wrote: > Dear Colleagues, > > I send you a message a few weeks ago about my plans to provide easy > phase ID via C.O.D.-hosted structures. I haven?t heard back from you, > so I assume you have no objections. I must admit that we have not received your previous mail; it is possible that the e-mail was lost on the way since we had some mail server failures in our university. In any case, from you current letter I understand that you would like to provide material identification software based on the COD and make it public. If this is so, them we have absolutely no objections for that, in fact he COD exists to make such projects possible! Of course please advise your users that they cite the original publications that produced data records in the COD if specific records are used, as is customary in scientific practice, and we would appreciate citation and reference of the COD as well, where relevant. As a side note, we never abbreviate our database as the 'C.O.D.' (with periods); it is usually written as an initialism 'COD'. > > In the meantime, we have used our automated tools to analyse all > current structures files. I am attaching a summary, listing file IDs > and errors for 2,997 out of your 0.5M or so files: a relatively-small > figure (ca. 0.6%). However, these files are invalid, and cannot be > used for structural work, so I would recommend getting them fixed. > Thanks for providing the list of the files that failed processing, we will have a close look into them. As a note, the term "valid" in the CIF framework has a quite specific meaning ? it means that the structure CIFs are valid according to some declared CIF dictionaries. The invalid files may or may not be suitable for structural work, and may or may not be amenable for corrections. Currently, three levels of checks are performed in the COD, with the following guarantees we provide: - a syntax check. We guarantee that the CIFs from the COD are conformant to the syntax declared by the IUCr, using our CIF parser [1] and other parsers in the field. This ensures that the COD files can be processed in an automated way. Thus, if you spot a syntactically wrong file, please report it and we will fix that immediately; the file has to be checked against the IUCr CIF grammar. - a dictionary check. The files that validate against the IUCr dictionaries are using the data elements in an intended way. Though many files in the COD are indeed valid in this sense, a substantial portion of them raises one or several validation issues (we compiled over 11 mln. validation messages from the current COD collection). We look into them and search for systematic ways to correct the most serious ones, but this is an on-going work and the full validity can not be practically achieved at the moment; - we do certain COD specific checks (e.g. checking that all three coordinate data items, _atom_site_fract_{x,y,z} are present). This is supposed to catch most obvious mistakes in the data files, but can only be used for improving the COD records if we get hold on correct original data. Before we go into more details about the issues you report, let me draw you attention to one feature of the CIF framework that will be important: the CIF files MAY (as in RFC 2119) contain special values '?' and '.' (without the quotes) as values for any data item in the file. The files that contain such values are both syntactically correct and valid in the sense defined above (i.e. such values validate against CIF dictionaries). The '?' value, as we understand it, denotes that the actual value of the data item is not know (but may become known in the future). The value '.' denotes that the value is not relevant, or does not exist at all. We sometimes use these values to indicate special situations in the COD files; they can also be used as atom coordinate values. Any CIF compliant software should be prepared to deal with such values. > The most common errors are: > > - missing fractional coordinates There are several occasions when coordinate values are missing; let me illustrate them from the list the you have provided: - 2217080: this entry contains '.' as atom coordinates for a serious reason: the structure that was published in a peer-reviewed article appeared to be fake and was retracted. To avoid erroneous calculations, the original coordinate values were replaced by '.', marking them as irrelevant, and the entry is marked as retracted. It is retained in the COD database as a historic record and to prevent its renewed deposition. The exact reasons for retraction are documented in the COD CIF file, and the references to relevant IUCr editorials are given. You may want to filter out retracted entries, either by checking the '_cod_entry_issue_severity' data item or by querying status in our SQL database: > mysql -u cod_reader -h sql.crystallography.net cod -e 'select file > from data where status not like "%retracted%" or status is NULL' There are more flags that you may want to filter out (suboptimal structures, duplicates, structures without coordinates, structures with warnings, etc.); please check our Wiki from the COD Web page for full documentation. - 1000195: this entry contains '?' as coordinates, indicating that they are unknown. Looking at the publication year (1962) I realise that this is the very old publication; we do not have the paper at hand, and it is also likely that the coordinates were not reported for some compounds at these dates, only cell parameters. The COD entries of this kind are provided to indicated that the publication existed, and to provide the information currently known (cell parameters, chemical composition, crystal symmetry). This information is already enough for some kinds of computations (e.g. as initial approximations for DFT). If we ever get the original publication and the coordinates are published there, we will insert them in the new revision of this entry. If you have access to the original publication, we would be grateful if you share it (or the updated CIF ;) with us. - 5900030: in this entry, the x coordinate has values '.' since these values were not determined in the original publication; while physically the x coordinate is defined for the structure, it is not available from this particular publication (i.e. we have no chance to recover it from published data). Other data values, such as cell constants and the y-z coordinates of the projection are available and can be used. > - ambiguous site labelling I am not quite sure what problem you mean there. One known issue is that some structures do have duplicate atom labels. This is an error, and we will fix it with time. This involves a fair amount of manual checking however, so I can not promise we do it fast. For the moment, a possible workaround would be to add unique suffix to such atom labels during the structure interpretation and then process the structure as usual. > - invalid element symbols This is a known issue, especially with atoms from AMCSD that have custom labelling scheme. Fortunately, the new version of AMCSD has a new consistent atom naming, and we could assign atom types semi-automatically for these entries. Incidently, I have just finished analysis and assignments of atom types to those entries. Please check out the COD revision 285101 ? it should have most of the atoms with the correct types assigned. As per my checks, only 45 COD entries remain that still have unrecognised atom types (if you take _atom_site_type_label into account, of course). Some of these are indeed unknown atoms, such as metal sites with uncertain identity. Please let us know how this revision scores with your software! > > A common issue is a mismatch between site labels in different data > blocks (e.g., a table of anisotropic displacement parameters and a > table of fractional coordinates). Just a bit of nit-picking on terminology ? all COD files contain just one data block (it starts with a unique data_... header in each CIF). ADPs and coordinates are usually located in different /loops/ in the same data block. > We found these errors in numerous files submitted via the *American > Mineralogist crystal structures database* (clearly, substantial > amounts of U.S. governmental funding failed to prevent basic > transcription errors!) To all fairness, I would say that Bob Downs and his team make a good job collecting all minerals; without AMCSD contribution, our COD collection of minerals would have been much shabbier. They are constantly improving their collection (I'm constantly in touch with Bob on these matters), and heir recent work enabled us to assign atom types with reasonable work effort. As for the funding, I'm not sure if they get substantial amounts of it; I am aware of several startup grants they had, and I think they used them as good as they could. This does not mean that the matters can not be improved :), and we are working on that as well. The discrepancy of the labels in the Uij and xyz loops is a known issue that appeared in the recent update. We are working with Bob to rectify this, but this will take some while. In between, I have a suggestion of a workaround below: > > Take the following file, 9003355, as an example:- > > ? Sites SiT1?, AlT1? (etc.) are listed in the loop containing Uij > ? The same site are labelling differently (e.g., SiT1*, AlT1*, etc.) > in the loop containing xyz > > Whilst, to a human, one could make inferences as to how these labels > should be related, a computer cannot make such a judgement, thereby > rendering these files useless. I agree that humans can match the labels, and potentially fix them; we have no manpower however to go through these lists manually, and even then the manual editing would be error-prone. We could apply a heuristics that an apostrophe ("'") in one loop corresponds to the asterisk ("*") in the other loop and make an automatic correction, but the results still needs to be checked manually (I am reluctant to commit to the COD changes that are based on broad guesses); also, there are some other patterns in place (e.g. 'OH' vs 'O-H' change in labels). From the error messages in the log file that you sent us, I have impression that your program looks for an atom label in the _atom_site_aniso_label (aka Uij) loop, and then tries to find the corresponding _atom_site_label in the coordinate loop. This will fail not only when the labels do not match but also when the atom is not mentioned in the _atom_site_aniso_label loop /at all/. Since not all atoms are refined anisotropically, some of them can be legitimately left out? from the Uij loop, but have them in the _atom_site_fract_x loop; such files are perfectly valid and usable. May I suggest a workaround for the processing of such files ? let's to look first in the coordinate loop for the _atom_site_label to identify all atoms, and then look up the anisotropic displacement parameters Uij in the _atom_site_aniso_labelloop if they exist. If they do not, it is often possible to use Uiso instead, and I bet this will be a fair approximation even for anisotropically refined atoms. In this way you will correctly process all correct files and have a reasonable approximate data for the files that are currently mislabelled. In the future we will correct the Uij<->xyz label correspondence (our validator detects them), and you can then recalculate your outputs with the new COD revision, getting more accurate results. I can let you know when such revision is issued in the COD, but please ping me after some time since I can forget :) Of course one can also apply the heuristics mentioned above, or skip such entries with mismatches altogether, until the new COD revision is in place. Hope this clarifies the COD data contents and the way we address the detected problems. Once more thank you for your report! > > I hope this helps, and do let me know if you have any questions. > > With best wishes, > Yours faithfully, > > David Palmer > > David C Palmer, Ph.D. (Cantab), M.A. (Cantab), > Managing Director, CrystalMaker Software Ltd > Centre for Innovation & Enterprise | ?Oxford University Begbroke > Science Park > Woodstock Road, Begbroke, Oxfordshire, OX5 1PF, UK > Sincerely yours, Saulius References: [1] Merkys, A.; Vaitkus, A.; Butkus, J.; Okuli?-Kazarinas, M.; Kairys, V. & Gra?ulis, S. /COD::CIF::Parser/: an error-correcting CIF parser for the Perl language. /Journal of Applied Crystallography,/*2016*/, 49/, 292-301, DOI: https://doi.org/10.1107/S1600576715022396 -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Sun Jul 9 11:57:43 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Sun, 9 Jul 2023 11:57:43 +0300 Subject: [Cod-bugs] 2997 invalid files in C.O.D. In-Reply-To: References: Message-ID: <0b6f80f8-32f4-eade-6f78-bb9b9bf32c2f@ibt.lt> Dear David, please let me highlight one more feature of the COD records which I forgot to include into my yesterday's letter: On 2023-07-05 12:59, David Palmer wrote: > In the meantime, we have used our automated tools to analyse all > current structures files. I am attaching a summary, listing file IDs > and errors for 2,997 out of your 0.5M or so files: a relatively-small > figure (ca. 0.6%). However, these files are invalid, and cannot be > used for structural work, so I would recommend getting them fixed. > > The most common errors are: > > - missing fractional coordinates Fractional coordinates can be represented by '.' special values for the x,y and z coordinates in case of so called 'dummy' atoms [1].? There are several examples of such COD entries in your list (file "Error Files from COD (2023-07-04).txt"). For example, the coordinate section of the COD 1001614 entry contains the following: > loop_ > _atom_site_label > # ... other data names omiited for brevity > _atom_site_calc_flag > # ... regular atom sites omitted > H1 H1+ 4 e . . . 1 0 dum > H2 H1+ 4 e . . . 0.8 0 dum Likewise, the COD 1010499 entry contains: > loop_ > _atom_site_label > # ... other data names omiited for brevity > _atom_site_calc_flag > Hg1 Hg2+ 8 d 0.25 0.21 0.125 1. 0 d > C1 C2+ 16 ? . . . 1 0 dum > N1 N3- 16 ? . . . 1 0 dum The atomic sites are marked as 'dum' in accordance with the IUCr specification [1]. The IUCr does not prescribe any specific interpretation for these lines, but we in the COD use the following conventions: - the atom with an existing atomic symbol from the periodic system (like the 'H', 'C' or 'N' in the examples above) is considered as existing somewhere in the unit cell, but the coordinates of the atom are not determined. Thus, in the examples above, the unit cell of 1001614 contains 1.8 x 4 extra electrons (and protons) per unit cell, on average, but we do not /know/ where these atoms are located, not even the atom to which the hydrogens are attached. The rest of data that are specified for these sites are all relevant ? we need to take multiplicity (4 in this case) into account, and the Wyckoff letter tells us that we assume the hydrogens are on general positions. The hydrogens carry a (+1) formal charge. This allows us to check the electric neutrality of the cell, provides corrections for F000 and makes it possible to calculate the chemical formula. Your software may use this information for determining Fcalc if you find it necessary. Likewise, the 1010499 reports Hg atoms on special positions with specified coordinates, and the remaining "light" atoms C and N with undetermined coordinates. I interpret this record as follows: we know that for a Mercury cyanide we need to have carbon and nitrogen present (the formula is Hg(CN)2). The structure, however, was determined in 1926 (!), and with technologies of that time it is very likely that the researchers did not "see" the carbon and the nitrogen positions (getting Hg positions was already a feat!). Thus, we can calculate the total number of electrons *and* the positions of Hg, but the locations of lighter atoms need to be approximated or obtained by other means. There are no errors in these entries; they faithfully represent publications that are reported in their metadata and give the knowledge available at that point. - if a dummy atom has a label/chemical symbol that is /not/ in a periodic system of elements, the we should assume that the site is introduced for convenience purposes only (e.g. to measure distances in some software); these atoms should be excluded from structure factor calculations, even if they have coordinates. NB: if the hydrogens do not have modelled coordinates but the publication provides clear evidence to which heavy atoms these hydrogens are attached, we indicate this by setting _atom_site_attached_hydrogens of the site to a number more than 0, and no dummy atoms are used in this case; NB: unlike some other databases that could set occupancies of dummy hydrogen sites to more than 1 (e.g. set them to 4 to indicate four hydrogen atoms with unknown locations), we never use occupancies larger than 1.0. This makes COD CIFs /valid/ with respect to the current IUCr dictionaries. To specify more than one hydrogen atom, we specify several dummy sites with occupancies <= 1.0 for each such site, as you see in the 1001614 example above. To get the additional electron count yo would need to sum up occupancies of all such sites (times the multiplicity of their positions, of course). Hope this clarifies the policy of the COD content. Sincerely yours, Saulius Refs.: [1] IUCr Core dictionary (coreCIF) version 2.4.5 _atom_site_calc_flag (2023) https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Iatom_site_calc_flag.html [accessed 2023-07-09T11:24+03:00] -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: