[Cod-bugs] cif corrections
Saulius Gražulis
grazulis at ibt.lt
Tue Nov 15 17:56:14 EET 2022
Dear William,
thank you very much for your e-mail! Your reports are very valuable,
because they show us what the requirements and priorities of COD users are.
The problems that you have identified are indeed problems with the COD
data, mostly stemming from the original publications. These issues, as
well as some other ones, can detected by using CIF validation software
[1]. The problem is that there is a large amount (> 11 mnl.) of these
validation issues [2], so we can not fix them on the spot; we are slowly
going through the COD one file at a time and fixing them, quite often
manually. In many cases we need to consult original publications to see
what the authors' intent really was, since the main principle of the COD
is "do not invent data".
It would be great of you could send us your processing logs with all
entries; we would use it as a guide on what types of errors to treat
first, and also I would be interested to see if you catch more errors
than we do. So yes, please send us the remaining entry list of that is
possible. Of course I can not promise that we fix them immediately, but
at least we put them on the top of our priority queue ;). Also, some
problems might be impossible to fix; e.g. the COD entry 2005961 indeed
has a broken _atom_site_aniso list, but the same problem is detected
also in the original supplementary data, and the paper itself does not
contain Uij list (as far as I could see). In this case we can only mark
the entry as "having problems", and suggest the users to use only the
XYZ coordinates and Uiso values instead; or try to contact the authors
and ask them for a correction or for Fobs data so that the Uij values
can be refined anew.
Sincerely yours,
Saulius
Refs.:
[1] Vaitkus A. Validation messages for the COD entry 1506432. URL:
https://sql.crystallography.net/db/cod_validation/validation_issue?offset=0&rows=100&filter=%28cod_id%20%3D%20%221506432%22%29
[2] Vaitkus, A. COD validation issue database. 2021, URL:
https://sql.crystallography.net/db/cod_validation [accessed
2022-11-15T17:39+02:00]. NOTE: the page is slow to load, please be patient!
On 2022-11-07 20:47, William Lenthe wrote:
>
> I read through the first 100k entries of the COD to test some cif
> parsing code and believe I found a few errors
>
> 1506432: line 186 >F40 should be F40
>
> 1506503: line 200 >F1' should be F1'
>
> 2005854: lines 160-162 are human readable but non-standard (should be
> split into 2 lines each or maybe the more common As1/P1)
>
> 2005923: line 176 _atom_site_aniso_label is '5' (maybe it should be O5?)
>
> 2005926: line 179 label is 04 instead of O4 (number zero instead of
> letter O)
>
> 2005961: _atom_site_aniso_label loop appears to be malformed
>
> I generated another ~75 warnings if useful, they are mostly issues
> with case consistency or atoms listed in _atom_site_aniso_label that
> don’t appear in _atom_site_label
>
--
Dr. Saulius Gražulis
Vilnius University, Life Science Center, Institute of Biotechnology
Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20221115/9bae5537/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20221115/9bae5537/attachment-0001.sig>
More information about the Cod-bugs
mailing list