[Cod-bugs] COD conversion with HighScore

Thomas Degen thomas.degen at panalytical.com
Sat Jan 11 18:47:32 EET 2020


Hi Saulius,

>, is the number of atoms that have Beq > 10?
No, it is what it says, it is about how often any of the anisotropic values (B11,B22,B33,B12,B23,B13) hit the "B >= 10" limit.
We are using B instead of U in the UI because these values are closer to 1, which makes them more convenient to look at.

> Could you please send my your code example
This is approximately the code to convert Banis to Biso:

case FSpaceGroup.SimpleCrystalSystem of
        scTriclinic:
          begin
            Atom.Biso.Value := (1 / 3) *
              (Atom.b11.Value * Sqr(Cell.a * ReciprocalCell.a) + Atom.b22.Value * Sqr(Cell.b * ReciprocalCell.b)
              + Atom.b33.Value * Sqr(Cell.c * ReciprocalCell.c)
              + 2 * Atom.b12.Value * Cell.a * Cell.b * ReciprocalCell.a * ReciprocalCell.b * CosDeg(Cell.gamma)
              + 2 * Atom.b13.Value * Cell.a * Cell.c * ReciprocalCell.a * ReciprocalCell.c * CosDeg(Cell.beta)
              + 2 * Atom.b23.Value * Cell.b * Cell.c * ReciprocalCell.b * ReciprocalCell.c * CosDeg(Cell.alpha));
          end;
        scMonoclinic:
          begin
            case FSpaceGroup.SpaceGroupInfo._MonoclinicAxis of
              enmMonoclinicAxis.maB:
                Atom.Biso.Value := (1 / 3) * (Atom.b22.Value + 1 / Sqr(SinDeg(Cell.Beta)) * (Atom.b11.Value + Atom.b33.Value + 2 * Atom.b13.Value * CosDeg(Cell.Beta)));
              enmMonoclinicAxis.maC:
                Atom.Biso.Value := (1 / 3) * (Atom.b33.Value + 1 / Sqr(SinDeg(Cell.Gamma)) * (Atom.b11.Value + Atom.b22.Value + 2 * Atom.b12.Value * CosDeg(Cell.Gamma)));
              enmMonoclinicAxis.maA:
                Atom.Biso.Value := (1 / 3) * (Atom.b11.Value + 1 / Sqr(SinDeg(Cell.Alpha)) * (Atom.b11.Value + Atom.b11.Value + 2 * Atom.b23.Value * CosDeg(Cell.Alpha)));
            end;
          end;
        scOrthorhombic,
        scTetragonal,
        scCubic:
          Atom.Biso.Value := (1 / 3) * (Atom.b11.Value + Atom.b22.Value + Atom.b33.Value);
        scTrigonalRhombAxes:
          Atom.Biso.Value := (1 / 3) * (Sqr(Cell.a * ReciprocalCell.a) * (Atom.b11.Value + Atom.b22.Value + Atom.b33.Value +
                                       2 * CosDeg(Cell.Alpha) * (Atom.b12.Value + Atom.b13.Value + Atom.b23.Value)));
        scHexagonal,
        scTrigonalHexAxes:
          Atom.Biso.Value := (1 / 3) * (Atom.b33.Value + (4 / 3) * (Atom.b11.Value + Atom.b22.Value - Atom.b12.Value));
      end;
      Atom.Biso.Deviation := ((Atom.b11.Deviation + Atom.b22.Deviation + Atom.b33.Deviation) / 3) * (1 / 6);
    end;

Concerning these many pattern having so many big displacement parameters (which we don't see in other databases)
My guess is that the Units got confused. So it wasn’t U but the data was given as B or Beta instead (and simply wrongly flagged as U).

Best regards,

Thomas

-----Original Message-----
From: Saulius Gražulis <grazulis at ibt.lt>
Sent: Saturday, January 11, 2020 3:26 PM
To: Thomas Dortmann <thomas.dortmann at panalytical.com>; Thomas Degen <thomas.degen at panalytical.com>; cod-bugs at ibt.lt
Cc: Thomas Dortmann <thomas at tdsonline.nl>
Subject: Re: [Cod-bugs] COD conversion with HighScore

Hi, Thomas,

these are my thoughts regarding Biso and Uiso.

On 2020-01-08 18:22, Thomas Dortmann wrote:

> Please let me know when you need more information, or want additional
> checks during our conversions to find out more details.

I have written a small Perl program to calculate Ueq and Beq from the Uij, and to output it along with the CIFs Uiso version (http://saulius-grazulis.lt/~saulius/.e6be37a23b470b0ded2e36c9d9a15ba529de29e9/).

I have calculated Ueq and Beq from the Uij tensors for all structures that were marked as having 'Large (>= 10) Banis values' in your list. Do I correctly understand that the first number, r.g. 51 in ': 51; No. of
atoms: 289', is the number of atoms that have Beq > 10? If so, for some reason I get slightly smaller values than you; for example, for the
1000001 COD structure, I get only 13 such values from U_ij records; with isotropic atoms (hydrogens), I get 141 atoms...

Could you please send my your code example (maybe as a pseudo-code) how you calculate the Ueq?

I admit that I am not sure if my understanding of the CIF's Uij is correct. I consulted both Fischer1988 (10.1107/S0108270187012745) and
Grosse-Kunstleve2002 (10.1107/S0021889802008580), but I am still not sure if I get orthogonalisation correctly. Uij needs to be on orthogonal basis before taking the Tr(Uij) and getting Ueq = (1/3)*Tr(Uij); however, all orthogonalisations which I derive myself or pick from the papers are equivalent to just summing up the U_ii from the CIF:

Ueq = (U11 + U22 + U33)/3

I suspect also that the definition on the IUCr page
(https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Iatom_site_U_iso_or_equiv.html)
is incorrect, the a values should be vectors, and not vector lengths, as stated, and they should form an (aj,aj) scalar product... (?!). I did not yet check the Tables.

What formula do you use?

In any case, with this method I get values close to those presented in CIFs (mostly within the error margins), and reasonable Beq values. Thus I decided to move on, leaving derivation of the maths for later times :).

> 2. Anisotropic displacement parameters are way too big; often
> isotropic values don't match anisotropic values in these cases: We
> convert from Us or Betas to Bs; the warning identifies one or more
> Bansio values >= 10. This results in wrong phase quantitations with
> the Rietveld method. (We also recalculate Biso from the Banisos, when
> Biso is missing) This is the biggest group of CIFs generating warnings
> during the conversion: 123.717 warnings

As for the absolute values of Uiso or Biso, I have no special opinion, but I do not spot any big trouble either. Taking a random structure from your list:

> #@ label Uiso(CIF)Ueq(comp)Beq(comp)diff

> 7225385C42       0.213    0.212333     16.7652 0.000666667
> 7225385C41       0.159    0.158667     12.5278 0.000333333
> 7225385O53       0.134    0.133667     10.5539 0.000333333
> 7225385O43      0.1237    0.123667     9.76433 3.33333e-05

So the largest Beq is 16.8 Å^2, next one is 12.5 Å^2, and starting from the fourth atom they are below 10. In many random structures I observe
*all* Beq < 10, and all values seem consistent across the files and not too much scattered. Granted, some B values are very large, over the 100, but so that authors reported them. I do not see any way to rectify this short of resolving the structure from a better crystal...

In many structures, there are just few such atoms with large Biso values, and they are often in disordered regions; Beq from ordered regions look normal to me, e.g.:

> saulius at varanas Uiso/ $ cod-download -s 7120853 | ./cif_Uiso | sort -gr -k4,4 | head -n 4
> C25B       2.119     2.00067     157.966    0.118333
> C19B       1.149           2     157.914      -0.851
> C26B       1.163       1.986     156.808      -0.823
> C6        1.13     1.95833     154.624   -0.828333
> saulius at varanas Uiso/ $ cod-download -s 7120853 | ./cif_Uiso | sort -gr -k4,4 | tail -n 8
> C41A       0.035   0.0373333     2.94772 -0.00233333
> C32A       0.037   0.0373333     2.94772-0.000333333
> C30A       0.036       0.031     2.44766       0.005
> C29A       0.037   0.0356667     2.81613  0.00133333
> #SPCGRP   :P -1
> #FILENAME :-
> #DATABLCK :7120853
> #@ label Uiso(CIF)Ueq(comp)Beq(comp)diff

Thus, the largest B-factors are 158, but they are on a periphery of a flexible organic group here; the core atoms have much lower Bs. Coming from a protein crystallography side, such situation does not seem to me as something extraordinary. Granted, B=160 is too much, but to me this just means that the crystal was rather disordered (in that region?), or the given part of the molecule was not modelled well (the contrast of B-factors in this particular entry is quite high). Maybe constraints should have been applied during refinement.

Given that this is a published structure from a (reputable?) chemical journal (https://doi.org/10.1039/C7CC06797F), I do not see how we can do much about this, except excluding such structures from computations (but we will probably loose a lot of data this way). Such is the state of the art in the chemical crystallography field (?).

> You find a list of all CIFs throwing warnings during conversion in the
> attached Excel sheet, ordered by the type of warning and the COD ID
> number. I hope this will help to correct errors in the COD and to make
> it better applicable through time.

What I agree to be a clear sign of technical problem is a large discrepancy between Ueq (calculated from the Uij tensor) an and the Uiso or Biso provided by the authors.

I have spotted in the list that you have provided:

18 structures with at least one Uiso-Ueq value > 1.0;

12 structures with at least one Uiso-Ueq value < -10.0;

Most structures in these lists are *consistently* too high or too low; only two COD IDs in this list have atoms not in these extreme ranges.

The 16 of the structures from the 'Uiso-Ueq > 1.0' list seem to have Beq specified instead of Ueq, since the value is approx. x80 too large; when corrected with this assumption, the values become reasonable and consistent:

> #DATABLCK :4321814
> #SPCGRP   :C m c 21
> #CELL     :11.88933.3809.012909090
> #@ label Uiso(CIF)Ueq(comp)Beq(comp)diff
> CU1         2.5   0.0316667      2.5003     2.46833
> CU2         2.9   0.0366667     2.89508     2.86333
> CU3           3       0.038     3.00036       2.962
> CU4         2.9   0.0366667     2.89508     2.86333
> CU5        2.91       0.037      2.9214       2.873

As you see, Uiso in the CIF is nearly exactly the same as the Beq calculated by my script from the Uij values.

I suggest fixing these structures by renaming the data name:
s/_atom_site_U_iso_or_equiv/_atom_site_B_iso_or_equiv/; vis.:

> saulius at varanas Uiso/ $ cod-download -s 4321814 | sed 's/_atom_site_U_iso_or_equiv/_atom_site_B_iso_or_equiv/' | ./cif_Uiso
> #FILENAME :-
> #DATABLCK :4321814
> #SPCGRP   :C m c 21
> #CELL     :11.88933.3809.012909090
> #@ label Uiso(CIF)Ueq(comp)Beq(comp)diff
> CU1   0.0316629   0.0316667      2.5003-3.79678e-06
> CU2   0.0367289   0.0366667     2.89508 6.22624e-05
> CU3   0.0379954       0.038     3.00036-4.55613e-06
> CU4   0.0367289   0.0366667     2.89508 6.22624e-05
> CU5   0.0368556       0.037      2.9214-0.000144419

The table above shows that after this renaming Uiso(CIF) (computed from the corresponding _B_iso_or_equiv value in the incoming CIF) are again the same as the Ueq computed from the Uij tensor.

The 12 structures with enormous negative differences have, IMHO, a scale problem in the Uij fields; probably the authors provided values Uij *
10^4 or something like that (such conventions were usual in papers or in old computer printouts during 90-ies). If we find the clear indication of such scaling in the corresponding papers, we can correct the problem, otherwise the Uij data is lost... I have inspected one such IUCr paper, and the authors, unfortunately, do not mention their scaling convention at all. We'll have to check the others.

One structure, I have noticed, has too little values for hydrogen ('H') Uij fields, but, by accident, the number of column in the loop matches.
This will be picked up by our validator which Antanas has written (we now need to send away a paper about the validator, and then we can dig deeper into the COD corrections).

My proposals of the corrections are noted in the .csv file on my home page (http://saulius-grazulis.lt/~saulius/.aa50f94063d8810bf957e3b39f733acd7474bc08/COD_Conv_Warnings_FIXES.csv),
based on your list. Of course we'll scan the rest of the COD as well, as soon as I am sure that I compute Ueq's correctly :).

Sincerely,
Saulius


--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366

This email and any files transmitted with it are confidential and maybe legally privileged. Such message is intended solely for the use of the individual or entity to whom they are addressed. Please notify the originator of the message if you are not the intended recipient and destroy all copies of the message. Please note that any use, dissemination, or reproduction is strictly prohibited and may be unlawful.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list