[Cod-bugs] Corrupted files in COD

Steef Boerrigter sxmboer at gmail.com
Wed Feb 1 00:39:51 EET 2023


Hello,

I am currently developing a program in the programming language of D
to read .cif files and process the contents to calculate various
things. I am sure I am just one of hundreds to have taken the
frustrating decision to try and write a comprehensive parser of "STAR"
formatted files.

During testing of my implementation, I came across two files that
clearly are corrupted. I deleted them on my mirror, re-synced and
received the exact same corrupted files. So, I am pretty sure the bit
rot is on the COD server.

The files are
7/70/81/7708164.cif which has zero bytes.
7/05/48/7054812.cif which goes into corruption at line 55186.

During testing, I further came across several hundred files that have
rather questionable formatting choices that I would argue are either
in violation with the CIF specification or stretch the rules to the
extent that it makes it almost impossible for any implementation to
interpret the data correctly.
To what extent are the maintainers interested in learning about my
findings and potentially amending the entries to fix them?

Just to name one example. Apparently the program Maud produces the
spacegroup operators in the format (see 3/50/01/3500127.cif)
1 '-x+0.25, -y+0.25, -z+0.25'
as opposed to
1 '-x+1/4, -y+1/4, -z+1/4'
To my knowledge, none of the IUCR CIF guidelines, specs, website,
international tables ever use the decimal format for the translations.
It is bad enough to have to program an exception to the standard
fractional notation, but what happens with the 1/3 translation. How
many decimals should that get in this format. Even worse is that other
entries list the translation as +0.500 and I have seen '...z+1' and
z+7/6. I mean, why? These are all non-standard translational operators
that make it ridiculously difficult to map the operators to a space
group.
Allowing all these exceptions make things very challenging. I would
argue it improves the quality of the data if these type of things are
standardized.
to what extent would you be willing to receive my findings or what are
the possibilities for me to suggest edits?

Best Regards,
Steef Boerrigter, PhD.
Triclinic Labs / Purdue University, West Lafayette, IN, USA

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list