[Cod-bugs] special characters (0x1b, 0x07) in CIF files

Marcin Wojdyr wojdyr at gmail.com
Wed Dec 11 15:34:44 EET 2019


Thanks a lot!

Initially the parser in gemmi was failing when on any non-ascii
characters, but then I came across a file from wwPDB that had such
character (non-breaking space) in a quoted string, and I decided to
add an exception for quoted strings. Although indeed the validator
should report such things.

Best wishes,
Marcin

On Wed, 11 Dec 2019 at 13:37, Antanas Vaitkus
<antanas.vaitkus90 at gmail.com> wrote:
>
> Dear Marcin Wojdyn,
>
> as of COD revision r245002 the issues you outlined are considered resolved.
>
> I would also like to note, that during the reparsing of the entire COD we discovered several more COD entries with illegal ASCII characters that were not picked up by your software.
> A representative list of such structures:
> https://www.crystallography.net/cod/4350338.cif@239844 -- contains the ACK symbol in the value of the '_refine_diff_density_rms' data item;
> https://www.crystallography.net/cod/4089334.cif@243612 -- contains the SOH symbol in the value of the '_refine_diff_density_rms' data item.
>
> The '@' postfix points to the specific SVN revision where the file still contained the error. Just pointing this out in case you would find these examples useful in testing your software.
>
> Sincerely,
> Antanas Vaitkus
>
>
> On Wed, 11 Dec 2019 at 07:08, Antanas Vaitkus <antanas.vaitkus90 at gmail.com> wrote:
>>
>> Dear Marcin Wojdyr,
>>
>> currently, the naming conventions of multi-block hkl files are a little inconsistent in the COD. However, I do agree that we should at least avoid duplicate data names. We will fix this issue as soon as possible.
>>
>> As for hkl entry 4115482, it seems to contain a CIF syntax error that our parser did not properly detect. We will definitely investigate that.
>>
>> Sincerely,
>> Antanas Vaitkus
>>
>> On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr <wojdyr at gmail.com> wrote:
>>>
>>>
>>> and four hkl file with different syntax problems:
>>>
>>> $ time find ../cod/hkl/ -name \*.hkl | xargs -n1000 ./build/gemmi validate
>>> ../cod/hkl/2/00/88/2008821.hkl: duplicate block name: 2008821_Fobs
>>> ../cod/hkl/4/11/54/4115482.hkl:27:0(860): parse error
>>> ../cod/hkl/4/11/75/4117532.hkl: duplicate block name: 4117532_diffractogram_1
>>> ../cod/hkl/4/11/75/4117533.hkl: duplicate block name: 4117532_diffractogram_1
>>>
>>> real 2m27.263s
>>> user 1m41.871s
>>> sys 0m6.641s
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean. _______________________________________________
>>> Cod-bugs mailing list
>>> Cod-bugs at lists.crystallography.net
>>> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs
>>
>>
>>
>> --
>> Antanas Vaitkus,
>> PhD student at Vilnius University Institute of Biotechnology,
>> room V325, SaulÄ—tekio al. 7,
>> LT-10257 Vilnius, Lithuania
>>
>>
>
>
> --
> Antanas Vaitkus,
> PhD student at Vilnius University Institute of Biotechnology,
> room V325, SaulÄ—tekio al. 7,
> LT-10257 Vilnius, Lithuania
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list