[Cod-bugs] [EXT] AMCSD duplicate atoms and missing aniso links?

Saulius Gražulis grazulis at ibt.lt
Mon Aug 5 09:51:18 EEST 2024


Hi, Bob,

thank you for the swift answer! Below are m y answers and comments.

On 2024-08-03 21:04, Downs, Robert T - (rdowns) wrote:
>
> Thanks for pointing that out. Looks like a labeling problem. Pb* and 
> Pb’ are the same atom. The Pb atoms are on a split site.
>
Thanks for the clarification! I also understood it the same way!
>
> I think that my code to write cifs tried to avoid using a quote mark ‘ 
> because it gets interpreted poorly on some applications, especially 
> when searching, so I changed the quote to an asterisk, *, but I must 
> have only done it for atomic positions and didn’t do it for the 
> displacement factors. Ugh.
>
So the single quote character ("'") is regularly changed to an asterisk 
("*") when converting from AMC to CIF format, right? And yes, it should 
be the ASCII quote (ASCII DEC 39 / HEX 27), not the Unicode quote "‘" 
(UTF-8 HEX sequence: e2 80 98).

As far as the CIF and COD parser goes, the single ASCII quote in atom 
names ("'") is as good (or bad) as the asterisk. It is standard CIF, and 
the software MUST read it; if some program does not, IMHO the program 
should be fixed, not the data file. Adapting dat presentation to the 
quirks of buggy programs is, IMHO, a bad strategy – next time some 
program will /insist/ (erroneously) on having a quote and will reject 
the asterisk, and we are stuck :). Actually, all modern programs, 
notably Jmol, can process atom names with terminating quotes, even 
though this needs to use a new syntax in the program scripts... Also, 
changing "'" to "*" makes AMC and CIF atom naming different, so finding 
correspondences becomes more difficult.

If identifier-like atom labels are required, maybe it is better to spell 
out 'prime' and 'doubleprime' in both AMC and CIF files?

In any case, there is a second problem related to asterisk conversion: 
in several AMCSD CIFs (e.g. COD 9002008 [1], AMCSD "0002070" [2]) two 
atoms Pb' and Pb" are present, which both become Pb* after conversion:

> loop_
> _atom_site_aniso_label
> _atom_site_aniso_U_11
> _atom_site_aniso_U_22
> _atom_site_aniso_U_33
> _atom_site_aniso_U_12
> _atom_site_aniso_U_13
> _atom_site_aniso_U_23
> *Pb'* 0.02100 0.05400 0.03400 -0.01300 0.00700 -0.00100
> *Pb"* 0.01900 0.08100 0.06100 0.00600 0.01200 0.00500
and

> loop_
> _atom_site_label
> _atom_site_fract_x
> _atom_site_fract_y
> _atom_site_fract_z
> _atom_site_occupancy
> *Pb**    0.27050  -0.01360   0.07260   0.59000
> *Pb**    0.27870   0.00890   0.06860   0.41000
Thus, it is no longer possible for CIF programs to tell which is which 
without resorting to some wild assumptions (i.e. that the CIF atoms 
follow the AMC file ordering and no program has rearranged the atoms in 
the meantime...)

What would be your policy on handling double quotes in atom labels? Are 
you going to leave them as double quotes, or maybe convert to double 
asterisks, e.g. to "Pb**"?

> Anyways, its my mistake. Thanks for catching it.
>
A great deal of credit goes to Antanas Vaitkus who wrote the 
'cif_validate' software – it is now very handy to catch such situations 
in a systematic way :)
>
> You can go to amcsd web site and see the amc file, and that would have 
> helped you see the problem. I make cifs from the amc format.
>
Thanks, great tip! I somehow overlooked this; we can use AMC files to 
double check the CIFS!

> Also, the atoms are always in the same order in the two parts of the 
> cif, (xyz and adf) so that might help you to solve issues. Its because 
> the cif is always made from the amc.
>
Excellent! Good to know, I'll use this info to restore the aniso <-> 
atom_site connections.
>
> I am considering how to change this somehow in the future. We are 
> building a new website for the future, amcsd will become part of a 
> NASA host of mineral databases. I expect to have an agreement with the 
> society journals to automatically download their deposited cifs.
>
This sounds very promising, cool!
>
> Unfortunately, these cifs are usually so poorly made that I know this 
> will be a bad solution. It’s the reason why I have always made amc 
> format and then computed the cifs. But I am planning for the future 
> when it will not be me doing the databases.
>
Regarding the CIFs, we have at the COD a pretty decent pipeline for 
ingesting CIFs, even some broken ones. Could that be useful? Maybe we 
could collaborate on the CIF processing and reviewing?

Regards,
Saulius

Refs.:

[1] http://www.crystallography.net/cod/9002008.cif

[2] https://rruff.geo.arizona.edu/AMS/CIF_text_files/02094_cif.txt, 
https://rruff.geo.arizona.edu/AMS/AMC_text_files/02094_amc.txt, 
https://rruff.geo.arizona.edu/AMS/result.php

-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
mobile: (+370-684)-49802, (+370-614)-36366

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20240805/c2781899/attachment.htm>


More information about the Cod-bugs mailing list