[Cod-bugs] COD conversion with HighScore
Saulius Gražulis
grazulis at ibt.lt
Fri Jan 3 21:35:41 EET 2020
Dear Thomas & Thomas,
thank you very much for you e-mail, it is great to hear from you again!
On the occasion, my best wishes for the New Year 2020 from the COD team!
On 2020-01-03 15:30, Thomas Dortmann wrote:
> We (once more) converted the COD latest release from October 2019 with
> our HighScore software.
Great, we will be happy to host the new file, as agreed before.
Answering your questions:
> 1. Naming of water (oxygen) positions as in for example COD entry
> 9015086 as Wat1, Wat2 and so on.
>
> Questions: Is this a standard way of indicating water positions
> in the COD?
>
> Are there other naming conventions for water positions in the COD?
There are no conventions for Water atom names in the COD. When
depositing files, we leave the original atom names provided by the
author; I believe we should not change these names to make data
traceable back to the original publications.
The 9015086 record comes from the AMCSD, and the WatN is the convention
that AMCSD uses; however, it is not widespread outside AMCSD. Thus,
other COD entries MAY contain different names for water residue atoms,
and you should not rely on atom name Wat to infer whether an atom
belongs to a water.
The COD approach to indicate water positions is the following:
a/ we add _atom_site_type_symbol with the atom chemical name (according
to the Mendelejev periodic table), "O", for the WatN atoms, so that we
(and software ;) know this is an Oxygen;
b/ we add _atom_site_attached_hydrogens with the value "2" for the WatN
sites; this would give summary formula H2O indicating water for these
sites, and would be a correct way to maintain hydrogen balance without
introducing spurious hydrogen sites with unknown coordinates.
BTW, the same rule is applied to ammonium ions, sulphurs, carbons at low
resolution, etc. – any atoms that may contain invisible hydrogens
attached to them.
In this way, the original authors' atoms names and their data are not
changed, we just add additional interpretation of the COD files (and we
will check if this interpretation is consistent with the original paper).
The new table for the entry COD 9015086 would look as follows:
> loop_
> _atom_site_type_symbol
> _atom_site_attached_hydrogens
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> _atom_site_U_iso_or_equiv
> V 0 V 0.42846 0.42846 0.08140 0.87000 0.04260
> Al 0 Al 0.42846 0.42846 0.08140 0.13000 0.04260
> P 0 P 0.25000 0.50000 0.00000 1.00000 0.04900
> O 0 O1 0.43570 0.30590 0.05150 1.00000 0.04200
> O 1 O-H2 0.42140 0.42140 0.18660 1.00000 0.05000
> O 1 O-H3 0.55610 -0.55610 0.06640 1.00000 0.04300
> Ca 0 Ca 0.65900 -0.65900 0.16050 0.25000 0.27200
> O 2 Wat1 0.65900 -0.65900 0.16050 0.61000 0.27200
> O 2 Wat2 0.29380 0.29380 0.29380 1.00000 0.19400
> O 2 Wat3 0.33610 0.45200 0.33610 0.56000 0.13000
> O 2 Wat4 0.24510 0.49000 0.24510 1.00000 0.22100
> O 2 Wat5 0.34500 0.54200 -0.54200 0.67000 0.43000
> O 2 Wat6 0.30900 0.69100 -0.69100 0.54000 0.44000
> O 2 Wat7 0.29500 0.59600 -0.59600 0.20000 0.15000
(I assume that O-H2 and O-H3 are hydroxyl ions, thus I indicate they
have 1 hydrogen attached to each of them, but we need to check the
original paper).
Would your software process such markup? I think this is a good,
standard, unambiguous way to indicate waters, without messing up with
authors' data too much.
(both '_atom_site_type_symbol' and '_atom_site_attached_hydrogens' are
standard IUCr data names, the COD just adds a convention that
_atom_site_type_symbol SHOULD contain the periodic system IUPAC atom
name, or "D" for Deuterium; with possibly atom charge attached).
Currently, we mark up the structures as we process them for ourselves;
if an automated procedure can be devised for spotting all such entries
(sure it can be done), we could add such fixes to all COD structures
that requires it, if that would be helpful for you.
There is already as set of structures (8509 COD entries) marked up in
this way, e.g.:
https://www.crystallography.net/cod/9004888.cif
https://www.crystallography.net/cod/9003573.cif
https://www.crystallography.net/cod/9002900.cif
https://www.crystallography.net/cod/9000403.cif
https://www.crystallography.net/cod/9001176.cif
https://www.crystallography.net/cod/9001786.cif
https://www.crystallography.net/cod/9001785.cif
https://www.crystallography.net/cod/9009869.cif
https://www.crystallography.net/cod/9009872.cif
https://www.crystallography.net/cod/9009840.cif
> 2. We check the values of Biso and Baniso, and we also convert
> Baniso-values back to Biso values;
>
> in COD entry 9014636 there are very big Uaniso-values (converted into
> Baniso values > 10), but small Uiso-values?
>
> Questions: does the COD apply a sanity check on the supplied B (or U)
> values, and do you compare anisotropic with isotropic values?
No, we do not check the Uij iso/aniso consistency so far... Thank you
for the error report; such inconsistencies are for sure errors and need
to be checked. I think we can relatively easy add this extra check into
our pipeline.
> We can easily give you a list of all COD entries which have (converted)
> B-values > 10, if that helps.
That would be very helpful. I can not promise that we fix them soon if
there is a substantial manual work involved, but we will note the list
in our COD bug list and try to deal with it ASAP.
> I am still waiting for the original literature of these two examples to
> exclude any input errors for the B’s.
Yes, we should double check against the originals, this is very wise. I
suspect some entries may contain scaling errors (B instead of U, or
x10^3 vs x10^4 scale in tables), but we definitely need to check...
> Best regards, and a happy new year 2020 to you!
Many great thanks!
Best,
Saulius
--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20200103/2c053386/attachment-0001.sig>
More information about the Cod-bugs
mailing list