[Cod-bugs] COD conversion with HighScore

Saulius Gražulis grazulis at ibt.lt
Sat Jan 11 16:26:01 EET 2020


Hi, Thomas,

these are my thoughts regarding Biso and Uiso.

On 2020-01-08 18:22, Thomas Dortmann wrote:

> Please let me know when you need more information, or want additional
> checks during our conversions to find out more details.

I have written a small Perl program to calculate Ueq and Beq from the
Uij, and to output it along with the CIFs Uiso version
(http://saulius-grazulis.lt/~saulius/.e6be37a23b470b0ded2e36c9d9a15ba529de29e9/).

I have calculated Ueq and Beq from the Uij tensors for all structures
that were marked as having 'Large (>= 10) Banis values' in your list. Do
I correctly understand that the first number, r.g. 51 in ': 51; No. of
atoms: 289', is the number of atoms that have Beq > 10? If so, for some
reason I get slightly smaller values than you; for example, for the
1000001 COD structure, I get only 13 such values from U_ij records; with
isotropic atoms (hydrogens), I get 141 atoms...

Could you please send my your code example (maybe as a pseudo-code) how
you calculate the Ueq?

I admit that I am not sure if my understanding of the CIF's Uij is
correct. I consulted both Fischer1988 (10.1107/S0108270187012745) and
Grosse-Kunstleve2002 (10.1107/S0021889802008580), but I am still not
sure if I get orthogonalisation correctly. Uij needs to be on orthogonal
basis before taking the Tr(Uij) and getting Ueq = (1/3)*Tr(Uij);
however, all orthogonalisations which I derive myself or pick from the
papers are equivalent to just summing up the U_ii from the CIF:

Ueq = (U11 + U22 + U33)/3

I suspect also that the definition on the IUCr page
(https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Iatom_site_U_iso_or_equiv.html)
is incorrect, the a values should be vectors, and not vector lengths, as
stated, and they should form an (aj,aj) scalar product... (?!). I did
not yet check the Tables.

What formula do you use?

In any case, with this method I get values close to those presented in
CIFs (mostly within the error margins), and reasonable Beq values. Thus
I decided to move on, leaving derivation of the maths for later times :).

> 2. Anisotropic displacement parameters are way too big; often 
> isotropic values don't match anisotropic values in these cases: We 
> convert from Us or Betas to Bs; the warning identifies one or more 
> Bansio values >= 10. This results in wrong phase quantitations with 
> the Rietveld method. (We also recalculate Biso from the Banisos,
> when Biso is missing) This is the biggest group of CIFs generating 
> warnings during the conversion: 123.717 warnings

As for the absolute values of Uiso or Biso, I have no special opinion,
but I do not spot any big trouble either. Taking a random structure from
your list:

> #@ label 	Uiso(CIF)	Ueq(comp)	Beq(comp)	diff

> 7225385	C42	       0.213	    0.212333	     16.7652	 0.000666667
> 7225385	C41	       0.159	    0.158667	     12.5278	 0.000333333
> 7225385	O53	       0.134	    0.133667	     10.5539	 0.000333333
> 7225385	O43	      0.1237	    0.123667	     9.76433	 3.33333e-05

So the largest Beq is 16.8 Å^2, next one is 12.5 Å^2, and starting from
the fourth atom they are below 10. In many random structures I observe
*all* Beq < 10, and all values seem consistent across the files and not
too much scattered. Granted, some B values are very large, over the 100,
but so that authors reported them. I do not see any way to rectify this
short of resolving the structure from a better crystal...

In many structures, there are just few such atoms with large Biso
values, and they are often in disordered regions; Beq from ordered
regions look normal to me, e.g.:

> saulius at varanas Uiso/ $ cod-download -s 7120853 | ./cif_Uiso | sort -gr -k4,4 | head -n 4
> C25B	       2.119	     2.00067	     157.966	    0.118333
> C19B	       1.149	           2	     157.914	      -0.851
> C26B	       1.163	       1.986	     156.808	      -0.823
> C6	        1.13	     1.95833	     154.624	   -0.828333
> saulius at varanas Uiso/ $ cod-download -s 7120853 | ./cif_Uiso | sort -gr -k4,4 | tail -n 8
> C41A	       0.035	   0.0373333	     2.94772	 -0.00233333
> C32A	       0.037	   0.0373333	     2.94772	-0.000333333
> C30A	       0.036	       0.031	     2.44766	       0.005
> C29A	       0.037	   0.0356667	     2.81613	  0.00133333
> #SPCGRP   :	P -1
> #FILENAME :	-
> #DATABLCK :	7120853
> #@ label 	Uiso(CIF)	Ueq(comp)	Beq(comp)	diff

Thus, the largest B-factors are 158, but they are on a periphery of a
flexible organic group here; the core atoms have much lower Bs. Coming
from a protein crystallography side, such situation does not seem to me
as something extraordinary. Granted, B=160 is too much, but to me this
just means that the crystal was rather disordered (in that region?), or
the given part of the molecule was not modelled well (the contrast of
B-factors in this particular entry is quite high). Maybe constraints
should have been applied during refinement.

Given that this is a published structure from a (reputable?) chemical
journal (https://doi.org/10.1039/C7CC06797F), I do not see how we can do
much about this, except excluding such structures from computations (but
we will probably loose a lot of data this way). Such is the state of the
art in the chemical crystallography field (?).

> You find a list of all CIFs throwing warnings during conversion in 
> the attached Excel sheet, ordered by the type of warning and the COD 
> ID number. I hope this will help to correct errors in the COD and to 
> make it better applicable through time.

What I agree to be a clear sign of technical problem is a large
discrepancy between Ueq (calculated from the Uij tensor) an and the Uiso
or Biso provided by the authors.

I have spotted in the list that you have provided:

18 structures with at least one Uiso-Ueq value > 1.0;

12 structures with at least one Uiso-Ueq value < -10.0;

Most structures in these lists are *consistently* too high or too low;
only two COD IDs in this list have atoms not in these extreme ranges.

The 16 of the structures from the 'Uiso-Ueq > 1.0' list seem to have Beq
specified instead of Ueq, since the value is approx. x80 too large; when
corrected with this assumption, the values become reasonable and consistent:

> #DATABLCK :	4321814
> #SPCGRP   :	C m c 21
> #CELL     :	11.889	33.380	9.012	90	90	90
> #@ label 	Uiso(CIF)	Ueq(comp)	Beq(comp)	diff
> CU1	         2.5	   0.0316667	      2.5003	     2.46833
> CU2	         2.9	   0.0366667	     2.89508	     2.86333
> CU3	           3	       0.038	     3.00036	       2.962
> CU4	         2.9	   0.0366667	     2.89508	     2.86333
> CU5	        2.91	       0.037	      2.9214	       2.873

As you see, Uiso in the CIF is nearly exactly the same as the Beq
calculated by my script from the Uij values.

I suggest fixing these structures by renaming the data name:
s/_atom_site_U_iso_or_equiv/_atom_site_B_iso_or_equiv/; vis.:

> saulius at varanas Uiso/ $ cod-download -s 4321814 | sed 's/_atom_site_U_iso_or_equiv/_atom_site_B_iso_or_equiv/' | ./cif_Uiso
> #FILENAME :	-
> #DATABLCK :	4321814
> #SPCGRP   :	C m c 21
> #CELL     :	11.889	33.380	9.012	90	90	90
> #@ label 	Uiso(CIF)	Ueq(comp)	Beq(comp)	diff
> CU1	   0.0316629	   0.0316667	      2.5003	-3.79678e-06
> CU2	   0.0367289	   0.0366667	     2.89508	 6.22624e-05
> CU3	   0.0379954	       0.038	     3.00036	-4.55613e-06
> CU4	   0.0367289	   0.0366667	     2.89508	 6.22624e-05
> CU5	   0.0368556	       0.037	      2.9214	-0.000144419

The table above shows that after this renaming Uiso(CIF) (computed from
the corresponding _B_iso_or_equiv value in the incoming CIF) are again
the same as the Ueq computed from the Uij tensor.

The 12 structures with enormous negative differences have, IMHO, a scale
problem in the Uij fields; probably the authors provided values Uij *
10^4 or something like that (such conventions were usual in papers or in
old computer printouts during 90-ies). If we find the clear indication
of such scaling in the corresponding papers, we can correct the problem,
otherwise the Uij data is lost... I have inspected one such IUCr paper,
and the authors, unfortunately, do not mention their scaling convention
at all. We'll have to check the others.

One structure, I have noticed, has too little values for hydrogen ('H')
Uij fields, but, by accident, the number of column in the loop matches.
This will be picked up by our validator which Antanas has written (we
now need to send away a paper about the validator, and then we can dig
deeper into the COD corrections).

My proposals of the corrections are noted in the .csv file on my home
page
(http://saulius-grazulis.lt/~saulius/.aa50f94063d8810bf957e3b39f733acd7474bc08/COD_Conv_Warnings_FIXES.csv),
based on your list. Of course we'll scan the rest of the COD as well, as
soon as I am sure that I compute Ueq's correctly :).

Sincerely,
Saulius


-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20200111/8bd5be62/attachment-0001.sig>


More information about the Cod-bugs mailing list