From pramodkumar.rs.phy21 at itbhu.ac.in Tue Nov 22 10:46:34 2022 From: pramodkumar.rs.phy21 at itbhu.ac.in (Pramod Kumar Res. Scholar, Physics, IIT(BHU)) Date: Tue, 22 Nov 2022 14:16:34 +0530 Subject: [Cod-bugs] difficult to find standard XRD data Message-ID: Sir/Mam I am Pramod Kumar Ph.D. Student in IIT BHU Varanasi India I am finding difficult to find standard data of double perovskite *Y2NiMnO6 * *Please *let me know is this material data is not available in COD databse -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Tue Nov 22 10:47:14 2022 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Tue, 22 Nov 2022 10:47:14 +0200 Subject: [Cod-bugs] cif corrections In-Reply-To: References: <3769e404-73a3-4b73-5e44-8ffd7966ec4a@ibt.lt> <727e4d6b-76d0-3739-0019-d377bf843736@ibt.lt> Message-ID: Dear William, thank you for the answer! On 2022-11-19 01:54, William Lenthe wrote: > Thanks for your detailed response, I had a few logic / parsing errors in my code that I was able to get cleaned up (not ignoring leading whitespace, handling more than 1 loop row per line, and incorrect handling of loops ending in a comment line). Glad to hear you are working on the development your parser! > Upon closer inspection my remaining syntax issues are from fields taking the form: > > _cif_tag ;value\n > > I treat these as the start of a multiline delimited value, e.g. in 7223602: > > _computing_structure_solution ;SHELXS-86' > _diffrn_ambient_temperature 100(2) > _diffrn_detector_area_resol_mean 28.5714 > _diffrn_measured_fraction_theta_full 0.982 > _diffrn_measured_fraction_theta_max 0.965 > _diffrn_measurement_device_type > ; > Rigaku Kappa 3 circle diffractometer with Saturn 724+ detector. > ; > _diffrn_measurement_method 'profile data from \w-scans' > > I treat all the_diffrn_[] lines as part of the string starting SHELXs-86'\n and then "Rigaku Kappa..." is seen as an incorrect key since it doesn't start with _. I see. Well, this behaviour of the parser does not conform to the CIF syntax [1]. I would recommend against using it. > My reading of the cif specification led me to believe that ; are only treated as delimiters if they are the first character of the line, Indeed, the ';' tokens that delimit multi-line text fields MUST (as in RFC 2119) be on the first line. So the specification-compliant interpretation of the above fragment would be to treat the ;SHELXS-86' token as an unquoted string :/; our COD parser does exactly that, and so do all other parsers that I have seen (PyCifRw, vcif, etc.) This would result in correct parsing of the 7223602 COD entry. > ... but when I was strict, I had issues with cifs that contained fields like: > > _cif_tag ; > Multi > Line > Field > ; This is an erroneous CIF, and a correct CIF parser MUST reject it. The first semicolon after the _cif_tag does NOT open a text field, so the second semicolon at the beginning of the line remains unpaired. A multi-line text field is only started and terminated by a semicolon on the very first position of a line [1]. This is what our parser reports: > saulius at tasmanijos-velnias collection/ $ cat | cifparse > data_x > _cif_tag ; > Multi > Line > Field > ; > cifparse: -(6) data_x: ERROR, end of file encountered while in text > field starting in line 6, possible runaway closing semicolon (';') > cifparse: -(3,1) data_x: ERROR, incorrect CIF syntax: > ?Multi > ?^ > cifparse: file '-' FAILED COD CIFs do not contain such CIFs, all our CIFs pass the syntax checks. But in the wild there might be such broken CIFs, even as supplementary materials for reputable chemistry papers... One can apply various "correction heuristics" in such cases; for example one could assume that a lone semicolon at the end of the line should be actually preceded by a new line. But this is a non-canonical extension of the CIF syntax. I must note that some variant of this mistake /does/ parse correctly: > data_x > loop_ > _cif_tag ; > Multi > Line > Field ; > Note that in this case /both/ semicolons are not on the first column and are interpreted as unquoted strings; and there is a loop_ before the CIF tag, therefore all five unquoted strings (;, Multi, Line, Field, ;) end up to be values of the '_cif_tag' data item. I see no way of correcting this automatically; maybe applying some optional heuristics that lone semicolons should be transferred to new lines. The same situation was detected by your software in the entry 4301644 and I fixed it manually in the entries 4301644 and 4301643 (both from the same paper). The original files were syntactically correct but did not convey the intended information. > So I loosened my parser to allow it. I would recommend against doing so, because you now reject syntactically correct CIFs and risk loosing data. I would only use such interpretation if you use a deliberate, optional error correction and recovery (our parser corrects some of the common errors from supplementary materials, but not this one, unfortunately...). > I also have seen cifs that use: > > _cif_tag ;value that should probably be delimited with quotes; This is a tag followed by a bunch of unquoted strings; this would be an error if it is not in a loop_, valid in the loop_ if the number of data values is divisible by the number of data names following the loop_. > Unfortunately, there isn't an unambiguous way to support all 3 cases. Do you understand any/all of these to be allowable? IMHO the variants like "_cif_tag ;value that should probably be delimited with quotes;" or "_cif_tag ;" are errors and should be rejected, or parsed in accordance with the current CIF grammar. It is probable that sometimes CIF authors would just guess what the CIF should look like without consulting the formal grammar, and come up with texts that are not correct (I was guilty of this as well some long time ago ;). The only way to deal with such CIFs, IMHO, is to find out the correct authors' intentions and to fix the file syntax in accordance with the grammar, manually or semi-automatically. > The following cifs may have some technically correct but unintended values that were generating obtuse errors as a result: > > 7223602: _computing_structure_solution ;SHELXS-86' Indeed, this is technically correct but with a strange (most probably unintended) value of the software name. Can be fixed manually. > 7228312: _diffrn_measurement_device_type ;Nonius Again this is correct but probably unintended. Can be fixed manually. > 7238658: _exptl_absorpt_correction_type ;multi-scan' This is syntactically correct but fails validation against the IUCr dictionaries: > /usr/bin/cif_validate: > /home/saulius/struct/cod/cif/7/23/86/7238658.cif data_7238658: NOTE, > data item '_diffrn_detector_area_resol_mean' value '0.15 mm' violates > type constraints -- the value should be a numerically interpretable > string, e.g. '42', '42.00', '4200E-2'. > /usr/bin/cif_validate: > /home/saulius/struct/cod/cif/7/23/86/7238658.cif data_7238658: NOTE, > data item '_exptl_absorpt_correction_type' value '*;multi-scan'*' must > be one of the enumeration values [analytical, cylinder, empirical, > gaussian, integration, multi-scan, none, numerical, psi-scan, refdelf, > sphere]. Can be fixed manually or semi-automatically (we can add a regexp to our data checker if this bug is encountered often enough; but it is probably one of a kind error...). Regards, Saulius Refs.: [1] IUCr. CIF v1.1 File Syntax. URL: https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax#gram [accessed 2022-11-18T18:23+02:00]. -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From rodrigo.garcia at tervalis.com Wed Nov 23 19:51:55 2022 From: rodrigo.garcia at tervalis.com (Rodrigo Garcia) Date: Wed, 23 Nov 2022 18:51:55 +0100 Subject: [Cod-bugs] No atoms in entry entry 6000077 Message-ID: Dear Sir, Perhaps there is an issue with entry 6000077, atoms seem to be lacking in the CIF file. For your convenience, attached you will find the original publication. Rodrigo Garc?a Cantera Spain -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trommer2000.pdf Type: application/pdf Size: 360748 bytes Desc: not available URL: From grazulis at ibt.lt Thu Nov 24 11:32:55 2022 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 24 Nov 2022 11:32:55 +0200 Subject: [Cod-bugs] No atoms in entry entry 6000077 In-Reply-To: References: Message-ID: <022d1142-7c0c-8347-2253-2e695c17d45a@ibt.lt> Dear Rodrigo, thank you very much for your e-mail! Indeed, there are certain entries in the COD (1529 entries) that do not have coordinates, most often because we could not get the original data. The reference you have sent us is very helpful. I have entered the missing coordinates to the entry 6000077, and also updated symmetry information and added structure quality indicators (R_I , R_exp , R_wp ). I would be grateful if you double-check the new COD record [1], since there might be some errors in the OCR'ed data. Sincerely yours, Saulius Refs.: [1] http://www.crystallography.net/cod/6000077.html [accessed 2022-11-24T11:30+02:00] On 2022-11-23 19:51, Rodrigo Garcia wrote: > Dear Sir,? > > Perhaps there is an issue with entry 6000077, atoms seem to be lacking > in the CIF file.? > For your convenience, attached you will find the original publication.? > > Rodrigo Garc?a Cantera > Spain > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: grazulis.vcf Type: text/x-vcard Size: 4 bytes Desc: not available URL: From rodrigo.garcia at tervalis.com Thu Nov 24 11:37:33 2022 From: rodrigo.garcia at tervalis.com (Rodrigo Garcia) Date: Thu, 24 Nov 2022 10:37:33 +0100 Subject: [Cod-bugs] No atoms in entry entry 6000077 In-Reply-To: <022d1142-7c0c-8347-2253-2e695c17d45a@ibt.lt> References: <022d1142-7c0c-8347-2253-2e695c17d45a@ibt.lt> Message-ID: Dear Saulius, Thank you very much for your email and the invaluable work you perform for the global scientific community. I got the original publication free through the web repository libgen.is, which you most probably know. I will check the file, and in case there is any problem I'll let you know. Best regards, Rodrigo El jue, 24 nov 2022 a las 10:33, Saulius Gra?ulis () escribi?: > Dear Rodrigo, > > thank you very much for your e-mail! Indeed, there are certain entries in > the COD (1529 entries) that do not have coordinates, most often because we > could not get the original data. > > The reference you have sent us is very helpful. I have entered the missing > coordinates to the entry 6000077, and also updated symmetry information and > added structure quality indicators (RI, Rexp, Rwp). > > I would be grateful if you double-check the new COD record [1], since > there might be some errors in the OCR'ed data. > > Sincerely yours, > Saulius > > Refs.: > > [1] http://www.crystallography.net/cod/6000077.html [accessed > 2022-11-24T11:30+02:00] > > On 2022-11-23 19:51, Rodrigo Garcia wrote: > > Dear Sir, > > Perhaps there is an issue with entry 6000077, atoms seem to be lacking in > the CIF file. > For your convenience, attached you will find the original publication. > > Rodrigo Garc?a Cantera > Spain > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > _______________________________________________ > Cod-bugs mailing listCod-bugs at lists.crystallography.nethttp://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > > > -- > Dr. Saulius Gra?ulis > Vilnius University Institute of Biotechnology, Saul?tekio al. 7 > LT-10257 Vilnius, Lietuva (Lithuania) > fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 > mobile: (+370-684)-49802, (+370-614)-36366 > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: