[Cod-bugs] Fwd: special characters (0x1b, 0x07) in CIF files

Antanas Vaitkus antanas.vaitkus90 at gmail.com
Wed Dec 11 07:04:43 EET 2019


---------- Forwarded message ---------
From: Antanas Vaitkus <antanas.vaitkus90 at gmail.com>
Date: Wed, 11 Dec 2019 at 07:04
Subject: Re: [Cod-bugs] special characters (0x1b, 0x07) in CIF files
To: Marcin Wojdyr <wojdyr at gmail.com>


Dear Marcin Wojdyr,

thank You for informing us of this issue. The special characters were most
likely introduced by the original publisher of the CIF file. For example,
the original file of COD entry 4089313 (located at
https://pubs.acs.org/doi/suppl/10.1021/om010651j/suppl_file/om010651j.cif)
contains the same syntax errors as the entry in the COD.

Normally, during our automatic deposition workflow such symbols would be
detected an encoded using their hex codes (i.e. "#x001B;"). However, in
these particular cases, a slightly older version of our software must have
been used which did not properly handle some of the lower-number ASCII
symbols. We will fix the corrupted files as soon as possible as well as
deploy the updated version of the software to avoid such discrepancies in
the future.

Thanks again for the report.

On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr <wojdyr at gmail.com> wrote:

> Hi,
>
> I downloaded COD a few days ago and I noticed that some files fail to
> parse for me because of special characters, most ESC. Below is the full
> list.
> For example:
> _diffrn_radiation_type           MoK^[$B%(^[(Ba
> (but ^[ is ESC code 0x07 in the file)
>
> Do you know what program writes these characters?
>
> Cheers,
> Marcin
>
> $ time find ../cod/cif/ -name \*.cif | xargs -n1000 ./build/gemmi validate
> ../cod/cif/4/08/93/4089313.cif:58:36(2271): parse error
> ../cod/cif/4/08/93/4089312.cif:58:36(2274): parse error
> ../cod/cif/4/08/93/4089320.cif:119:39(4625): parse error
> ../cod/cif/4/08/93/4089309.cif:59:36(2363): parse error
> ../cod/cif/4/08/93/4089306.cif:59:36(2380): parse error
> ../cod/cif/4/08/93/4089318.cif:54:33(2044): expected value
> ../cod/cif/4/08/93/4089319.cif:55:33(2098): expected value
> ../cod/cif/4/08/93/4089317.cif:58:36(2284): parse error
> ../cod/cif/4/08/93/4089311.cif:59:36(2370): parse error
> ../cod/cif/4/08/93/4089315.cif:58:36(2276): parse error
> ../cod/cif/4/08/93/4089314.cif:58:36(2275): parse error
> ../cod/cif/4/08/93/4089307.cif:59:36(2366): parse error
> ../cod/cif/4/08/93/4089310.cif:59:36(2370): parse error
> ../cod/cif/4/08/93/4089316.cif:58:36(2293): parse error
> ../cod/cif/4/08/93/4089308.cif:59:36(2357): parse error
> ../cod/cif/4/08/97/4089713.cif:60:33(2289): expected value
> ../cod/cif/7/12/54/7125471.cif:68:36(2652): parse error
> ../cod/cif/7/12/54/7125469.cif:70:36(2706): parse error
>
> real 13m47.423s
> user 10m38.349s
> sys 0m38.298s
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean. _______________________________________________
> Cod-bugs mailing list
> Cod-bugs at lists.crystallography.net
> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs
>


-- 
Antanas Vaitkus,
PhD student at Vilnius University Institute of Biotechnology,
room V325, SaulÄ—tekio al. 7,
LT-10257 Vilnius, Lithuania




-- 
Antanas Vaitkus,
PhD student at Vilnius University Institute of Biotechnology,
room V325, SaulÄ—tekio al. 7,
LT-10257 Vilnius, Lithuania

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20191211/6139f348/attachment.html>


More information about the Cod-bugs mailing list