[Cod-bugs] [EXT] AMCSD duplicate atoms and missing aniso links?

Downs, Robert T - (rdowns) rdowns at arizona.edu
Sat Aug 3 22:48:31 EEST 2024


Saulius,
I will replace every single quote ‘ in atom names in the entire database. You won’t see the changes until I rebuild it however, and that wont be for a month or so.
Thanks for pointing in out to me. If you see any issues with double quotes “ let me know.
Thanks,
Bob


From: Saulius Gražulis <grazulis at ibt.lt>
Sent: Saturday, August 3, 2024 3:49 AM
To: Bob Downs <rdowns at u.arizona.edu>
Cc: cod-bugs at lists.crystallography.net; kicis at lists.crystallography.net
Subject: [EXT] AMCSD duplicate atoms and missing aniso links?


External Email

________________________________

Hi, Bob!

How are you?

I'm currently doing COD data validation, and I'm running into two issues that I would need your help to resolve.

1. Some AMCSD entries (e.g. COD ID 9015515 [1], AMCSD ID "0019710" [2]) have recently received the '_atom_site_aniso_[]' loop. This is great, but in some cases atom site labels in the _atom_site_[] loop do not match:

loop_

_atom_site_aniso_label

_atom_site_aniso_U_11

_atom_site_aniso_U_22

_atom_site_aniso_U_33

_atom_site_aniso_U_12

_atom_site_aniso_U_13

_atom_site_aniso_U_23

# ...

Pb' 0.03159 0.03245 0.01811 0.01856 -0.00932 -0.00819

# ...

vs.:

loop_

_atom_site_label

_atom_site_fract_x

_atom_site_fract_y

_atom_site_fract_z

_atom_site_occupancy

_atom_site_U_iso_or_equiv

# ...

Pb*   0.10313   0.33075   0.34924   0.19800   0.02774

# ...

Can I assume that the atoms with quotes and atoms with stars are the same, e.g. that "Pb'" and "Pb*" is the same atom?

I have written a script to fix this under the above-mentioned assumptions, and I'm about to commit the changes to the COD. It would be great to propagate these changes back to AMCSD, what you think?

(Actually, I had the same issue with atom names containing hyphens, e.g. 'OH1' and 'O-H1', but I see that this issue is already fixed in AMCSD; I've updated the COD accordingly :).

2. In some places (56 COD entries) there are duplicate atom labels. When these are in two loops, we can not decide unambiguously which ANISO entry pertains to which atom. Example (from COD ID 9000543 [3], AMCSD ID "0000554"):
loop_
_atom_site_aniso_label
_atom_site_aniso_U_11
_atom_site_aniso_U_22
_atom_site_aniso_U_33
_atom_site_aniso_U_12
_atom_site_aniso_U_13
_atom_site_aniso_U_23
# ...
Mg 0.00977 0.01092 0.00940 0.00000 0.00000 0.00000
Mg 0.01490 0.00863 0.00956 0.00008 -0.00278 0.00053
# ...

and

loop_

_atom_site_label

_atom_site_fract_x

_atom_site_fract_y

_atom_site_fract_z

_atom_site_occupancy

# ...

Mg   0.00720   0.39590   0.50000   0.21400

Mg   0.25000   0.50000   0.25000   1.00000

# ...

Can I assume, in general, that the atoms in both loops in the same order, and number them 'Mg1' and 'Mg2' correspondingly? Like this:
loop_
_atom_site_aniso_label
_atom_site_aniso_U_11
_atom_site_aniso_U_22
_atom_site_aniso_U_33
_atom_site_aniso_U_12
_atom_site_aniso_U_13
_atom_site_aniso_U_23
# ...
Mg1 0.00977 0.01092 0.00940 0.00000 0.00000 0.00000
Mg2 0.01490 0.00863 0.00956 0.00008 -0.00278 0.00053
# ...

and

loop_

_atom_site_label

_atom_site_fract_x

_atom_site_fract_y

_atom_site_fract_z

_atom_site_occupancy

# ...

Mg1   0.00720   0.39590   0.50000   0.21400

Mg2   0.25000   0.50000   0.25000   1.00000

# ...

I would do this for the COD (probably manually). Again, it would be great to back-propagate these changes to the AMCSD collection. I can send you the lists of changed files or the list of problematic entries of the would help.

Cheers,
Saulius

Refs.:

[1] http://www.crystallography.net/cod/9015515.cif<http://www.crystallography.net/cod/9015515.cif>, http://www.crystallography.net/cod/9015515.html<http://www.crystallography.net/cod/9015515.html>

[2] https://rruff.geo.arizona.edu/AMS/CIF_text_files/05517_cif.txt

[3] http://www.crystallography.net/cod/9000543.cif<http://www.crystallography.net/cod/9000543.cif>, http://www.crystallography.net/cod/9000543.html<http://www.crystallography.net/cod/9000543.html>

[4] https://rruff.geo.arizona.edu/AMS/CIF_text_files/00635_cif.txt

--

Dr. Saulius Gražulis

Vilnius University Institute of Biotechnology, Saulėtekio al. 7

LT-10257 Vilnius, Lietuva (Lithuania)

fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353

mobile: (+370-684)-49802, (+370-614)-36366

--
This message has been scanned for viruses and
dangerous content by MailScanner<http://www.mailscanner.info>, and is
believed to be clean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20240803/3c17c379/attachment-0001.htm>


More information about the Cod-bugs mailing list