[Cod-bugs] COD conversion with HighScore

Thomas Degen thomas.degen at panalytical.com
Fri Jan 3 22:00:44 EET 2020


Dear Saulius,

Thank you for the swift answer and a Happy New Year 2020 for you too !

We are indeed processing "_atom_site_type_symbol", but this information is missing for many COD entries.
It would be great if the chemical element (type symbol) would be unambiguously supplied for each Atom, we would very much appreciate this.
Only in that case we can generate a correct diffraction pattern from the atomic coordinates.

Concerning the list of patterns having unusually large isotropic/anisotropic displacement parameters.
We will have a look at some in more detail and generate a list of all entries that are suspicious (having B's > 10).
Note, that structures with wrong/too big displacement parameters produce very wrong quantitative results
When used in a Rietveld type (QPA) refinement.

Best regards,

Thomas

-----Original Message-----
From: Saulius Gražulis <grazulis at ibt.lt>
Sent: Friday, January 3, 2020 8:36 PM
To: Thomas Dortmann <thomas.dortmann at panalytical.com>; cod-bugs at ibt.lt
Cc: Thomas Degen <thomas.degen at panalytical.com>
Subject: Re: [Cod-bugs] COD conversion with HighScore

Dear Thomas & Thomas,

thank you very much for you e-mail, it is great to hear from you again!
On the occasion, my best wishes for the New Year 2020 from the COD team!

On 2020-01-03 15:30, Thomas Dortmann wrote:
> We (once more) converted the COD latest release from October 2019 with
> our HighScore software.

Great, we will be happy to host the new file, as agreed before.

Answering your questions:

>  1. Naming of water (oxygen) positions as in for example COD entry
>     9015086 as Wat1, Wat2 and so on.
>
> Questions:          Is this a standard way of indicating water
> positions in the COD?
>
> Are there other naming conventions for water positions in the COD?

There are no conventions for Water atom names in the COD. When depositing files, we leave the original atom names provided by the author; I believe we should not change these names to make data traceable back to the original publications.

The 9015086 record comes from the AMCSD, and the WatN is the convention that AMCSD uses; however, it is not widespread outside AMCSD. Thus, other COD entries MAY contain different names for water residue atoms, and you should not rely on atom name Wat to infer whether an atom belongs to a water.

The COD approach to indicate water positions is the following:

a/ we add _atom_site_type_symbol with the atom chemical name (according to the Mendelejev periodic table), "O", for the WatN atoms, so that we (and software ;) know this is an Oxygen;

b/ we add _atom_site_attached_hydrogens with the value "2" for the WatN sites; this would give summary formula H2O indicating water for these sites, and would be a correct way to maintain hydrogen balance without introducing spurious hydrogen sites with unknown coordinates.

BTW, the same rule is applied to ammonium ions, sulphurs, carbons at low resolution, etc. – any atoms that may contain invisible hydrogens attached to them.

In this way, the original authors' atoms names and their data are not changed, we just add additional interpretation of the COD files (and we will check if this interpretation is consistent with the original paper).

The new table for the entry COD 9015086 would look as follows:

> loop_
> _atom_site_type_symbol
> _atom_site_attached_hydrogens
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> _atom_site_U_iso_or_equiv
> V  0 V 0.42846 0.42846 0.08140 0.87000 0.04260 Al 0 Al 0.42846 0.42846
> 0.08140 0.13000 0.04260 P  0 P 0.25000 0.50000 0.00000 1.00000 0.04900
> O  0 O1 0.43570 0.30590 0.05150 1.00000 0.04200 O  1 O-H2 0.42140
> 0.42140 0.18660 1.00000 0.05000 O  1 O-H3 0.55610 -0.55610 0.06640
> 1.00000 0.04300 Ca 0 Ca 0.65900 -0.65900 0.16050 0.25000 0.27200 O  2
> Wat1 0.65900 -0.65900 0.16050 0.61000 0.27200 O  2 Wat2 0.29380
> 0.29380 0.29380 1.00000 0.19400 O  2 Wat3 0.33610 0.45200 0.33610
> 0.56000 0.13000 O  2 Wat4 0.24510 0.49000 0.24510 1.00000 0.22100 O  2
> Wat5 0.34500 0.54200 -0.54200 0.67000 0.43000 O  2 Wat6 0.30900
> 0.69100 -0.69100 0.54000 0.44000 O  2 Wat7 0.29500 0.59600 -0.59600
> 0.20000 0.15000

(I assume that O-H2 and O-H3 are hydroxyl ions, thus I indicate they have 1 hydrogen attached to each of them, but we need to check the original paper).

Would your software process such markup? I think this is a good, standard, unambiguous way to indicate waters, without messing up with authors' data too much.

(both '_atom_site_type_symbol' and '_atom_site_attached_hydrogens' are standard IUCr data names, the COD just adds a convention that _atom_site_type_symbol SHOULD contain the periodic system IUPAC atom name, or "D" for Deuterium; with possibly atom charge attached).

Currently, we mark up the structures as we process them for ourselves; if an automated procedure can be devised for spotting all such entries (sure it can be done), we could add such fixes to all COD structures that requires it, if that would be helpful for you.

There is already as set of structures (8509 COD entries) marked up in this way, e.g.:

https://www.crystallography.net/cod/9004888.cif
https://www.crystallography.net/cod/9003573.cif
https://www.crystallography.net/cod/9002900.cif
https://www.crystallography.net/cod/9000403.cif
https://www.crystallography.net/cod/9001176.cif
https://www.crystallography.net/cod/9001786.cif
https://www.crystallography.net/cod/9001785.cif
https://www.crystallography.net/cod/9009869.cif
https://www.crystallography.net/cod/9009872.cif
https://www.crystallography.net/cod/9009840.cif

>  2. We check the values of Biso and Baniso, and we also convert
>     Baniso-values back to Biso values;
>
> in COD entry 9014636 there are very big Uaniso-values (converted into
> Baniso values > 10), but small Uiso-values?
>
> Questions: does the COD apply a sanity check on the supplied B (or U)
> values, and do you compare anisotropic with isotropic values?

No, we do not check the Uij iso/aniso consistency so far... Thank you for the error report; such inconsistencies are for sure errors and need to be checked. I think we can relatively easy add this extra check into our pipeline.

> We can easily give you a list of all COD entries which have
> (converted) B-values > 10, if that helps.

That would be very helpful. I can not promise that we fix them soon if there is a substantial manual work involved, but we will note the list in our COD bug list and try to deal with it ASAP.

> I am still waiting for the original literature of these two examples
> to exclude any input errors for the B’s.

Yes, we should double check against the originals, this is very wise. I suspect some entries may contain scaling errors (B instead of U, or
x10^3 vs x10^4 scale in tables), but we definitely need to check...

> Best regards, and a happy new year 2020 to you!
Many great thanks!

Best,
Saulius

--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366

This email and any files transmitted with it are confidential and maybe legally privileged. Such message is intended solely for the use of the individual or entity to whom they are addressed. Please notify the originator of the message if you are not the intended recipient and destroy all copies of the message. Please note that any use, dissemination, or reproduction is strictly prohibited and may be unlawful.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list