[Cod-bugs] cif corrections

Saulius Gražulis grazulis at ibt.lt
Tue Nov 15 17:56:14 EET 2022


Dear William,

thank you very much for your e-mail! Your reports are very valuable, 
because they show us what the requirements and priorities of COD users are.

The problems that you have identified are indeed problems with the COD 
data, mostly stemming from the original publications. These issues, as 
well as some other ones, can detected by using CIF validation software 
[1]. The problem is that there is a large amount (> 11 mnl.) of these 
validation issues [2], so we can not fix them on the spot; we are slowly 
going through the COD one file at a time and fixing them, quite often 
manually. In many cases we need to consult original publications to see 
what the authors' intent really was, since the main principle of the COD 
is "do not invent data".

It would be great of you could send us your processing logs with all 
entries; we would use it as a guide on what types of errors to treat 
first, and also I would be interested to see if you catch more errors 
than we do. So yes, please send us the remaining entry list of that is 
possible. Of course I can not promise that we fix them immediately, but 
at least we put them on the top of our priority queue ;). Also, some 
problems might be impossible to fix; e.g. the COD entry 2005961 indeed 
has a broken _atom_site_aniso list, but the same problem is detected 
also in the original supplementary data, and the paper itself does not 
contain Uij list (as far as I could see). In this case we can only mark 
the entry as "having problems", and suggest the users to use only the 
XYZ coordinates and Uiso values instead; or try to contact the authors 
and ask them for a correction or for Fobs data so that the Uij values 
can be refined anew.

Sincerely yours,
Saulius

Refs.:

[1] Vaitkus A. Validation messages for the COD entry 1506432. URL: 
https://sql.crystallography.net/db/cod_validation/validation_issue?offset=0&rows=100&filter=%28cod_id%20%3D%20%221506432%22%29

[2] Vaitkus, A. COD validation issue database. 2021, URL: 
https://sql.crystallography.net/db/cod_validation [accessed 
2022-11-15T17:39+02:00]. NOTE: the page is slow to load, please be patient!

On 2022-11-07 20:47, William Lenthe wrote:
>
> I read through the first 100k entries of the COD to test some cif 
> parsing code and believe I found a few errors
>
> 1506432: line 186 >F40 should be F40
>
> 1506503: line 200 >F1' should be F1'
>
> 2005854: lines 160-162 are human readable but non-standard (should be 
> split into 2 lines each or maybe the more common As1/P1)
>
> 2005923: line 176 _atom_site_aniso_label is '5' (maybe it should be O5?)
>
> 2005926: line 179 label is 04 instead of O4 (number zero instead of 
> letter O)
>
> 2005961: _atom_site_aniso_label loop appears to be malformed
>
> I generated another ~75 warnings if useful, they are mostly issues 
> with case consistency or atoms listed in _atom_site_aniso_label that 
> don’t appear in _atom_site_label
>

-- 
Dr. Saulius Gražulis
Vilnius University, Life Science Center, Institute of Biotechnology
Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20221115/9bae5537/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20221115/9bae5537/attachment-0001.sig>


More information about the Cod-bugs mailing list