From grazulis at ibt.lt Tue Oct 14 18:34:56 2025 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=C5=BEulis?=) Date: Tue, 14 Oct 2025 18:34:56 +0300 Subject: [Cod-bugs] Issues found in a systematic scanning of the COD In-Reply-To: References: Message-ID: <012ba0dd-3c1d-4ee5-b052-827a18329827@ibt.lt> Dear Pierre, thank you for your email and for the list of spacegroup vs. symop mismatch, and sorry for a delay with the answer. I've looked into the log and blow are my comments, based on my interpretation of this log. First of all, you are absolutely write to use symmetry operations provided in the CIF if these are available and interpretable. This is our standard procedure and recommendation to COD users. Symmetry operations are the most versatile way to describe symmetry relations in a computer- and human-readable form, so that can always be used. In contrast, Hermann-Mauguin and even Hall symbols can give humans a symbolic identification of the space group, but they lack a standard way to express all varieties of settings, cell choices and origins used nowadays by crystallographers. The Hermann-Mauguin and Hall symbols /can/ be adapted to express multiple (all?) non-standard settings and origins; for this, however, a Change of Basis matrix must be used, or at least Shift of Origin which might be sufficient in some cases. Unfortunately, there is no standard way to encode these elements (I am seriously considering writing up a standardisation proposal to IUCr...), but the current use in published literature (even in the Tables!) attests the following uses: ?? -- Parse the Change of Basis (CoB) operator strings in the form ?? --? "(a-b,a+b,c)" or "(x-y,x+y,z)" and return a matrix that encodes ?? --? this operator. The CoB is described in [1,2]. ?? -- ?? -- The 'abc' for is a transpose inverse of the 'xyz' form. The use cases and brief mention of these conventions can be found in IUCr sources [1,3] or on the Web by the same authors [2]. We in the COD adopted these conventions, and I wrote a small parser for the space group symbol interpretation [4] to make these conventions explicit. The code is in Ada which, although not in the top 5 popularity list :), is very readable, standard and stable. To see how we interpret the extended H-M and Hall space group symbol strings, you may want to look into the Change_of_Basis Ada package [5]. I would thus like to ask you a question as to what you mean by saying that "list of symmetries is obviously wrong or contradicts the Bravais symbol in the group name"? So, for example, in the COD 1001841 entry which is mentioned first in your log, the space group symbol derived form the symmetry operations is 'P 2yb (-1/2*x+z,1/2*x,y)' (Hall), or 'P 1 21 1 (c,2*a+c,b)' (H-M). Both symbols yield identical symmetry operations when decoded by my program [4], and they coincide with the symmetry operation lists given in the COD entry. The space group for this crystal is ITC No. 4, H-M symbol 'P 21'. However, for whatever reason authors chose to represent the structure in a C-centered cell (probably to compare with a similar C2 structure or something like that). They identified the space group as 'C1121', which is of course a non-standard symbol and cell choice for this space group, but that is what the authors reported. When I calculate the crystal composition using the symmetry operations and the atom list that the authors provided, I get the summary formula "La3 O8 Re", exactly like authors have reported. Thus, I conclude that this structure is reported the way it was intended by the authors. To interpret the space group symbols, it is not enough to take just the Table symmetries for P1211 (which you list in your log); you need to analyse the change-of-basis as well, so to interpret the whole string 'P 1 21 1 (c,2*a+c,b)' given in the file; this will yield four symmetry operations, not two. It seems that many of your log entries are of similar kind, representing non-standard settings and cell choices. In some cases, like for COD 1553126, you report the duplicated symmetry operations, while in fact they are not: in this entry, the symmetry operations are present under different data names ("_space_group_symop_operation_xyz" and "_symmetry_equiv_pos_as_xyz"), one of the old and another new. This is not a bug, all data names are permitted in the CIF, as long as the symmetry operation lists are equivalent. Please make sure that you software correctly chooses one set of symmetry operations or another (we suggest using newer CIF name if it exists, and falling back to the old one if the new one is missing). There are other cases where the same data names report different symmetry operations sets, like in COD 1564490. This is indeed wrong, thanks for spotting such cases! We will have to look into them individually. My plans are to update our the space group symbol determination software and to assign space group names with the change-of-basis operators to all remaining COD entries that do not have these designations. Also, we will check that the "_space_group_symop_operation_xyz" and "_symmetry_equiv_pos_as_xyz", if they exists, always report correct symmetry operations. What you can do on your side is to make sure that your program pick only one data item, "_space_group_symop_operation_xyz" or "_symmetry_equiv_pos_as_xyz", the one that has complete symmetry operation list (i.e. contains unity operation 'x,y,z' and the specified symops form a group), and that the program either interprets the change-of-basis operators or ignore space group symbols that have them. HTH, Saulius Refs.: [1] Zwart, P. H.; Grosse-Kunstleve, R. W.; Lebedev, A. A.; Murshudov, G. N. & Adams, P. D. (2007) Surprises and pitfalls arising from (pseudo)symmetry. Acta Crystallographica Section D Biological Crystallography 64(1), 99-107. International Union of Crystallography (IUCr). DOI: https://doi.org/10.1107/s090744490705531x [2] Sydney R. Hall, Ralf W. Grosse-Kunstleve (1996) "Concise Space-Group Symbols". URL: https://cci.lbl.gov/sginfo/hall_symbols.html [accessed: 2022-06-14T15:24+03:00] [3] International Tables Volume B (2010), "Symmetry in reciprocal space". Section 1.4., Appendix A1.4.2. Space-group symbols for numeric and symbolic computations, URL: https://onlinelibrary.wiley.com/iucr/itc/B/ [accessed: 2022-06-14T15:35+03:00] [4] Gra?ulis, S. (2024) decode-Hall-symbol [computer software]. https://github.com/sauliusg/decode-Hall-symbol [5] Gra?ulis, S. (2024) decode-Hall-symbol [computer software]. Ada package 'Change_Of_Basis'. https://github.com/sauliusg/decode-Hall-symbol/blob/master/src/change_of_basis.ads, https://github.com/sauliusg/decode-Hall-symbol/blob/master/src/change_of_basis.adb On 2025-09-02 11:03, Caussin, Pierre wrote: > > Hi there, > > I am working for Bruker and trying to better streamline the use of the > COD in our search/match and semiquantitative software. One of the > goals is being able to compute the peak positions and relative > intensities and RIR of COD entries. We do this at CuKA1 wavelength for > all the entries that have a structure and space group given (all > entries except 1119 that have either no atomic coordinates and/or > blank or unknown space group, which are currently ignored). This > compilation is done using the symmetries stored in the CIF file. > > To be able to compute the selected entries at other wavelengths on the > fly, we need the space group symmetries, without keeping the complete > CIFs, which use over 26GB after ZIP compression. I have written code > to create a table [space group HM name] => [table of symmetries]. This > works well, but I have found 478 CIF files where the given HM name > appears to contradict the list of symmetries. I enclose the diagnostic > output of my program (plus my case-by-case comment ?//?), which can be > summarized in two cases: > > 1. The list of symmetries is obviously wrong or contradicts the > Bravais symbol in the group name > 2. The list of symmetries is plausible but either is erroneous or > does not match the most common space group settings (axes and > origin). This is not a problem if I rely on the CIF list but > causes a conflict if I have other phases using the same space > group in the ?usual? settings. I can handle this by adding a > character to the group name when the phase originates in the COD. > > Thank you for your attention, best regards, > > Pierre Caussin > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: