[Cod-bugs] AMCSD duplicate atoms and missing aniso links?

Saulius Gražulis grazulis at ibt.lt
Sat Aug 3 13:49:14 EEST 2024


Hi, Bob!

How are you?

I'm currently doing COD data validation, and I'm running into two issues 
that I would need your help to resolve.

1. Some AMCSD entries (e.g. COD ID 9015515 [1], AMCSD ID "0019710" [2]) 
have recently received the '_atom_site_aniso_[]' loop. This is great, 
but in some cases atom site labels in the _atom_site_[] loop do not match:

> loop_
> _atom_site_aniso_label
> _atom_site_aniso_U_11
> _atom_site_aniso_U_22
> _atom_site_aniso_U_33
> _atom_site_aniso_U_12
> _atom_site_aniso_U_13
> _atom_site_aniso_U_23
> # ...
> *Pb'*  0.03159 0.03245 0.01811 0.01856 -0.00932 -0.00819
> # ...

vs.:

> loop_
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> _atom_site_U_iso_or_equiv
> # ...
> *Pb**    0.10313   0.33075   0.34924   0.19800   0.02774
> # ...

Can I assume that the atoms with quotes and atoms with stars are the 
same, e.g. that "Pb'" and "Pb*" is the same atom?

I have written a script to fix this under the above-mentioned 
assumptions, and I'm about to commit the changes to the COD. It would be 
great to propagate these changes back to AMCSD, what you think?

(Actually, I had the same issue with atom names containing hyphens, e.g. 
'OH1' and 'O-H1', but I see that this issue is already fixed in AMCSD; 
I've updated the COD accordingly :).

2. In some places (56 COD entries) there are duplicate atom labels. When 
these are in two loops, we can not decide unambiguously which ANISO 
entry pertains to which atom. Example (from COD ID 9000543 [3], AMCSD ID 
"0000554"):

> loop_
> _atom_site_aniso_label
> _atom_site_aniso_U_11
> _atom_site_aniso_U_22
> _atom_site_aniso_U_33
> _atom_site_aniso_U_12
> _atom_site_aniso_U_13
> _atom_site_aniso_U_23
> # ...
> *Mg* 0.00977 0.01092 0.00940 0.00000 0.00000 0.00000
> *Mg* 0.01490 0.00863 0.00956 0.00008 -0.00278 0.00053
> # ...

and

> loop_
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> # ...
> *Mg*    0.00720   0.39590   0.50000   0.21400
> *Mg*    0.25000   0.50000   0.25000   1.00000
> # ...

Can I assume, in general, that the atoms in both loops in the same 
order, and number them 'Mg1' and 'Mg2' correspondingly? Like this:

> loop_
> _atom_site_aniso_label
> _atom_site_aniso_U_11
> _atom_site_aniso_U_22
> _atom_site_aniso_U_33
> _atom_site_aniso_U_12
> _atom_site_aniso_U_13
> _atom_site_aniso_U_23
> # ...
> *Mg**1* 0.00977 0.01092 0.00940 0.00000 0.00000 0.00000
> *Mg**2* 0.01490 0.00863 0.00956 0.00008 -0.00278 0.00053
> # ...

and

> loop_
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> # ...
> *Mg**1*    0.00720   0.39590   0.50000   0.21400
> *Mg**2*    0.25000   0.50000   0.25000   1.00000
> # ...

I would do this for the COD (probably manually). Again, it would be 
great to back-propagate these changes to the AMCSD collection. I can 
send you the lists of changed files or the list of problematic entries 
of the would help.

Cheers,
Saulius

Refs.:

[1] http://www.crystallography.net/cod/9015515.cif, 
http://www.crystallography.net/cod/9015515.html

[2] https://rruff.geo.arizona.edu/AMS/CIF_text_files/05517_cif.txt

[3] http://www.crystallography.net/cod/9000543.cif, 
http://www.crystallography.net/cod/9000543.html

[4] https://rruff.geo.arizona.edu/AMS/CIF_text_files/00635_cif.txt

-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20240803/60561244/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grazulis.vcf
Type: text/vcard
Size: 4 bytes
Desc: not available
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20240803/60561244/attachment.vcf>


More information about the Cod-bugs mailing list