From grazulis at ibt.lt Thu Apr 12 14:30:49 2018 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 12 Apr 2018 14:30:49 +0300 Subject: [Cod-bugs] Possible COD errors In-Reply-To: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> References: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> Message-ID: Dear Robert, On 2018-04-12 13:26, Robert McMeeking - UKRI STFC wrote: > I have been doing some investigation into problems encountered when > processing COD CIF for the CrystalWorks system at Daresbury. > > Some entries have some incorrect coordinates - most cases it is > possible that decimal points have been lost. In one cases cell > dimensions are missing. In other cases my symmetry operator parsing > fails because more there is more than one symmetry operator per line > line with the "_loop" format. I am not fully clear what the standard > is for that! > > Also there no coordinates present, but there is no "no coordinates" > flag in the MySQL database. I can send examples if that would be > helpful. > > I append some error examples. Thank you very much for your bug report, we'll have how this can be fixed. If you have an exhaustive error list in machine-readable form (something like processing logs from your programs), this could be very helpful. Cheers, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 From robert.mcmeeking at stfc.ac.uk Thu Apr 12 15:51:08 2018 From: robert.mcmeeking at stfc.ac.uk (Robert McMeeking - UKRI STFC) Date: Thu, 12 Apr 2018 12:51:08 +0000 Subject: [Cod-bugs] Possible COD errors In-Reply-To: References: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> Message-ID: <80885af6ae5c458987a08cc5d50d954f@stfc.ac.uk> Hi Saulius Meaningful logs are a bit of a problem! I encountered the problems when processing the CIF files to generate atom centred coordinate environment search data for use in our CrystalWorks system. The various examples crashed the processing software I have been using. I needed to do a bit of improvised detective work to home in on the specific CIFs and find out what caused the programs to crash. The plan is to process new/modified CIFs each night. So tracking new errors as they arrive should (hopefully) be manageable. In the meantime I append a couple of extra entries I omitted from the previous set. Also would a list be useful of those entries which are without coordinates and which I believe are not flagged in the MySQL file? Best Regards Bob ================================================================================================ 9013798 N2 ********** 0.45190 1.03750 0.96000 0.04830 Pb2 ********** 0.45190 1.03750 0.02000 0.04830 . . H21 ********** 0.45040 1.14880 1.00000 0.13000 9014271 H(51) 0.83800 0.08900 ********** 1.00000 0.04700 ================================================================================================ -----Original Message----- From: Saulius Gra?ulis [mailto:grazulis at ibt.lt] Sent: 12 April 2018 12:31 To: McMeeking, Robert (STFC,DL,SC) Cc: cod-bugs at lists.crystallography.net Subject: Re: Possible COD errors Dear Robert, On 2018-04-12 13:26, Robert McMeeking - UKRI STFC wrote: > I have been doing some investigation into problems encountered when > processing COD CIF for the CrystalWorks system at Daresbury. > > Some entries have some incorrect coordinates - most cases it is > possible that decimal points have been lost. In one cases cell > dimensions are missing. In other cases my symmetry operator parsing > fails because more there is more than one symmetry operator per line > line with the "_loop" format. I am not fully clear what the standard > is for that! > > Also there no coordinates present, but there is no "no coordinates" > flag in the MySQL database. I can send examples if that would be > helpful. > > I append some error examples. Thank you very much for your bug report, we'll have how this can be fixed. If you have an exhaustive error list in machine-readable form (something like processing logs from your programs), this could be very helpful. Cheers, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 From grazulis at ibt.lt Thu Apr 12 17:17:58 2018 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 12 Apr 2018 17:17:58 +0300 Subject: [Cod-bugs] Possible COD errors In-Reply-To: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> References: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> Message-ID: Hi, Bob, I've looked through your sample. Below ar my thoughts On 2018-04-12 13:26, Robert McMeeking - UKRI STFC wrote: > 2000430 > Coordinate error > H(18B) 3956(9) .328(2) .0914(12) .049(7) > and possibly others The typo is in the original file (http://scripts.iucr.org/cgi-bin/sendcif?al0527sup1); we need to look a the original paper or contact authors to fix that; > 1517953 > Coordinate error > H41 H -003915(69) 0.1170(37) 0.3539(45) 0.033(5) > and possibly others Deposited by our colleagues; we'll contact them to see of they have correct files, or can update coordinates. But, since the file was fro ma published paper, most probably the error is upstream. > 2100824 > Coordinate error > Cl(76) 1109(8) .7910(10) .718(2) .078(6) > and possibly others The typo is in the original (http://scripts.iucr.org/cgi-bin/sendcif?as0591sup1), ; we need to look a the original paper or contact authors to fix that; > 2104629 > No Cell data The cell data is missing from the original https://doi.org/10.1107/S0108768109032728/hw5005sup1.cif; it is a hybrid method structure for which probably the cell is not well defined anyway. I must admit however that other data sets from the same paper ave very buggy (some *.txt files do not even compile as CIFs), so I doubt the possibility to make use of that data... > 4020454 > loop_ > _symmetry_equiv_pos_as_xyz > +x,+y,+z 1/2-x,-y,1/2+z 1/2+x,1/2-y,-z -x,1/2+y,1/2-z > > 4100182 > loop_ > _symmetry_equiv_pos_as_xyz > +x,+y,+z +x,-y,-z -x,-y,+z -x,+y,-z +y,-x,-z -y,-x,+z -y,+x,-z +y,+x,+z > etc. These two seem OK for me... > 4342694 > loop_ > _space_group_symop_id > _space_group_symop_operation_xyz > x,y,z -x,-y,-z Here, symop IDs are missing; seems like they were not inserted during the data curation. Fixed in COD rev. 207319. > 8103028 > loop_ > _symmetry_equiv_pos_as_xyz > x,y,z -x,y,z x,-y,z -x,-y,z This is a totally messed up CIF, with ';' text field marks placed incorrectly. We need to get to the authors to fix that, and even then it's rather hopeless, I guess... In summary ? 1 file fixed, 2 files seem OK (need more specific bug report), all other have upstream errors with varying degree of possibilities to correct them. Cheers, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 From grazulis at ibt.lt Thu Apr 12 17:20:55 2018 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Thu, 12 Apr 2018 17:20:55 +0300 Subject: [Cod-bugs] Possible COD errors In-Reply-To: <80885af6ae5c458987a08cc5d50d954f@stfc.ac.uk> References: <8c22ffda52e24890af3f8b8a182b5572@stfc.ac.uk> <80885af6ae5c458987a08cc5d50d954f@stfc.ac.uk> Message-ID: <04fc5b4c-ce50-e495-777f-bf92dc574343@ibt.lt> Hi, Bob, On 2018-04-12 15:51, Robert McMeeking - UKRI STFC wrote: > Bob > ================================================================================================ > > 9013798 > N2 ********** 0.45190 1.03750 0.96000 0.04830 > Pb2 ********** 0.45190 1.03750 0.02000 0.04830 OK, this is a Fortran "feature" from the upstream (AMCSD) ? an overflowed fields are replaced by all stars by the Fortran runtime library :/. We need the original PDF or CIF file (hardly we will get the CIF from The Canadian Mineralogist'. Maybe you have access to it? Regards, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366