From wojdyr at gmail.com Tue Dec 10 13:13:42 2019 From: wojdyr at gmail.com (Marcin Wojdyr) Date: Tue, 10 Dec 2019 12:13:42 +0100 Subject: [Cod-bugs] special characters (0x1b, 0x07) in CIF files Message-ID: Hi, I downloaded COD a few days ago and I noticed that some files fail to parse for me because of special characters, most ESC. Below is the full list. For example: _diffrn_radiation_type MoK^[$B%(^[(Ba (but ^[ is ESC code 0x07 in the file) Do you know what program writes these characters? Cheers, Marcin $ time find ../cod/cif/ -name \*.cif | xargs -n1000 ./build/gemmi validate ../cod/cif/4/08/93/4089313.cif:58:36(2271): parse error ../cod/cif/4/08/93/4089312.cif:58:36(2274): parse error ../cod/cif/4/08/93/4089320.cif:119:39(4625): parse error ../cod/cif/4/08/93/4089309.cif:59:36(2363): parse error ../cod/cif/4/08/93/4089306.cif:59:36(2380): parse error ../cod/cif/4/08/93/4089318.cif:54:33(2044): expected value ../cod/cif/4/08/93/4089319.cif:55:33(2098): expected value ../cod/cif/4/08/93/4089317.cif:58:36(2284): parse error ../cod/cif/4/08/93/4089311.cif:59:36(2370): parse error ../cod/cif/4/08/93/4089315.cif:58:36(2276): parse error ../cod/cif/4/08/93/4089314.cif:58:36(2275): parse error ../cod/cif/4/08/93/4089307.cif:59:36(2366): parse error ../cod/cif/4/08/93/4089310.cif:59:36(2370): parse error ../cod/cif/4/08/93/4089316.cif:58:36(2293): parse error ../cod/cif/4/08/93/4089308.cif:59:36(2357): parse error ../cod/cif/4/08/97/4089713.cif:60:33(2289): expected value ../cod/cif/7/12/54/7125471.cif:68:36(2652): parse error ../cod/cif/7/12/54/7125469.cif:70:36(2706): parse error real 13m47.423s user 10m38.349s sys 0m38.298s -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wojdyr at gmail.com Tue Dec 10 16:40:26 2019 From: wojdyr at gmail.com (Marcin Wojdyr) Date: Tue, 10 Dec 2019 15:40:26 +0100 Subject: [Cod-bugs] special characters (0x1b, 0x07) in CIF files In-Reply-To: References: Message-ID: and four hkl file with different syntax problems: $ time find ../cod/hkl/ -name \*.hkl | xargs -n1000 ./build/gemmi validate ../cod/hkl/2/00/88/2008821.hkl: duplicate block name: 2008821_Fobs ../cod/hkl/4/11/54/4115482.hkl:27:0(860): parse error ../cod/hkl/4/11/75/4117532.hkl: duplicate block name: 4117532_diffractogram_1 ../cod/hkl/4/11/75/4117533.hkl: duplicate block name: 4117532_diffractogram_1 real 2m27.263s user 1m41.871s sys 0m6.641s -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antanas.vaitkus90 at gmail.com Wed Dec 11 07:08:49 2019 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Wed, 11 Dec 2019 07:08:49 +0200 Subject: [Cod-bugs] special characters (0x1b, 0x07) in CIF files In-Reply-To: References: Message-ID: Dear Marcin Wojdyr, currently, the naming conventions of multi-block hkl files are a little inconsistent in the COD. However, I do agree that we should at least avoid duplicate data names. We will fix this issue as soon as possible. As for hkl entry 4115482, it seems to contain a CIF syntax error that our parser did not properly detect. We will definitely investigate that. Sincerely, Antanas Vaitkus On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr wrote: > > and four hkl file with different syntax problems: > > $ time find ../cod/hkl/ -name \*.hkl | xargs -n1000 ./build/gemmi validate > ../cod/hkl/2/00/88/2008821.hkl: duplicate block name: 2008821_Fobs > ../cod/hkl/4/11/54/4115482.hkl:27:0(860): parse error > ../cod/hkl/4/11/75/4117532.hkl: duplicate block name: > 4117532_diffractogram_1 > ../cod/hkl/4/11/75/4117533.hkl: duplicate block name: > 4117532_diffractogram_1 > > real 2m27.263s > user 1m41.871s > sys 0m6.641s > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > -- Antanas Vaitkus, PhD student at Vilnius University Institute of Biotechnology, room V325, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antanas.vaitkus90 at gmail.com Wed Dec 11 07:04:43 2019 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Wed, 11 Dec 2019 07:04:43 +0200 Subject: [Cod-bugs] Fwd: special characters (0x1b, 0x07) in CIF files In-Reply-To: References: Message-ID: ---------- Forwarded message --------- From: Antanas Vaitkus Date: Wed, 11 Dec 2019 at 07:04 Subject: Re: [Cod-bugs] special characters (0x1b, 0x07) in CIF files To: Marcin Wojdyr Dear Marcin Wojdyr, thank You for informing us of this issue. The special characters were most likely introduced by the original publisher of the CIF file. For example, the original file of COD entry 4089313 (located at https://pubs.acs.org/doi/suppl/10.1021/om010651j/suppl_file/om010651j.cif) contains the same syntax errors as the entry in the COD. Normally, during our automatic deposition workflow such symbols would be detected an encoded using their hex codes (i.e. "#x001B;"). However, in these particular cases, a slightly older version of our software must have been used which did not properly handle some of the lower-number ASCII symbols. We will fix the corrupted files as soon as possible as well as deploy the updated version of the software to avoid such discrepancies in the future. Thanks again for the report. On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr wrote: > Hi, > > I downloaded COD a few days ago and I noticed that some files fail to > parse for me because of special characters, most ESC. Below is the full > list. > For example: > _diffrn_radiation_type MoK^[$B%(^[(Ba > (but ^[ is ESC code 0x07 in the file) > > Do you know what program writes these characters? > > Cheers, > Marcin > > $ time find ../cod/cif/ -name \*.cif | xargs -n1000 ./build/gemmi validate > ../cod/cif/4/08/93/4089313.cif:58:36(2271): parse error > ../cod/cif/4/08/93/4089312.cif:58:36(2274): parse error > ../cod/cif/4/08/93/4089320.cif:119:39(4625): parse error > ../cod/cif/4/08/93/4089309.cif:59:36(2363): parse error > ../cod/cif/4/08/93/4089306.cif:59:36(2380): parse error > ../cod/cif/4/08/93/4089318.cif:54:33(2044): expected value > ../cod/cif/4/08/93/4089319.cif:55:33(2098): expected value > ../cod/cif/4/08/93/4089317.cif:58:36(2284): parse error > ../cod/cif/4/08/93/4089311.cif:59:36(2370): parse error > ../cod/cif/4/08/93/4089315.cif:58:36(2276): parse error > ../cod/cif/4/08/93/4089314.cif:58:36(2275): parse error > ../cod/cif/4/08/93/4089307.cif:59:36(2366): parse error > ../cod/cif/4/08/93/4089310.cif:59:36(2370): parse error > ../cod/cif/4/08/93/4089316.cif:58:36(2293): parse error > ../cod/cif/4/08/93/4089308.cif:59:36(2357): parse error > ../cod/cif/4/08/97/4089713.cif:60:33(2289): expected value > ../cod/cif/7/12/54/7125471.cif:68:36(2652): parse error > ../cod/cif/7/12/54/7125469.cif:70:36(2706): parse error > > real 13m47.423s > user 10m38.349s > sys 0m38.298s > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. _______________________________________________ > Cod-bugs mailing list > Cod-bugs at lists.crystallography.net > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs > -- Antanas Vaitkus, PhD student at Vilnius University Institute of Biotechnology, room V325, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- Antanas Vaitkus, PhD student at Vilnius University Institute of Biotechnology, room V325, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gombotz at tugraz.at Wed Dec 11 11:02:55 2019 From: gombotz at tugraz.at (Maria Gombotz) Date: Wed, 11 Dec 2019 10:02:55 +0100 Subject: [Cod-bugs] Problem with activation Message-ID: <9221ff53-cb89-f170-ecc9-d6e08e64491e@tugraz.at> Dear Mamdam or Sir, I tried on Monday to sign up at the Crystallography Open Database, but I have not gotten an activation E-Mail yet. Can you help me out? Kind regards, Maria Gombotz -- *Dipl.-Ing. Maria Gombotz* Graz University of Technology Institute for Chemistry and Technology of Materials (ICTM) Stremayrgasse 9/II, 8010 Graz Austria Tel.: +43 316 873 32353 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antanas.vaitkus90 at gmail.com Wed Dec 11 14:37:21 2019 From: antanas.vaitkus90 at gmail.com (Antanas Vaitkus) Date: Wed, 11 Dec 2019 14:37:21 +0200 Subject: [Cod-bugs] special characters (0x1b, 0x07) in CIF files In-Reply-To: References: Message-ID: Dear Marcin Wojdyn, as of COD revision r245002 the issues you outlined are considered resolved. I would also like to note, that during the reparsing of the entire COD we discovered several more COD entries with illegal ASCII characters that were not picked up by your software. A representative list of such structures: https://www.crystallography.net/cod/4350338.cif at 239844 -- contains the ACK symbol in the value of the '_refine_diff_density_rms' data item; https://www.crystallography.net/cod/4089334.cif at 243612 -- contains the SOH symbol in the value of the '_refine_diff_density_rms' data item. The '@' postfix points to the specific SVN revision where the file still contained the error. Just pointing this out in case you would find these examples useful in testing your software. Sincerely, Antanas Vaitkus On Wed, 11 Dec 2019 at 07:08, Antanas Vaitkus wrote: > Dear Marcin Wojdyr, > > currently, the naming conventions of multi-block hkl files are a little > inconsistent in the COD. However, I do agree that we should at least avoid > duplicate data names. We will fix this issue as soon as possible. > > As for hkl entry 4115482, it seems to contain a CIF syntax error that our > parser did not properly detect. We will definitely investigate that. > > Sincerely, > Antanas Vaitkus > > On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr wrote: > >> >> and four hkl file with different syntax problems: >> >> $ time find ../cod/hkl/ -name \*.hkl | xargs -n1000 ./build/gemmi >> validate >> ../cod/hkl/2/00/88/2008821.hkl: duplicate block name: 2008821_Fobs >> ../cod/hkl/4/11/54/4115482.hkl:27:0(860): parse error >> ../cod/hkl/4/11/75/4117532.hkl: duplicate block name: >> 4117532_diffractogram_1 >> ../cod/hkl/4/11/75/4117533.hkl: duplicate block name: >> 4117532_diffractogram_1 >> >> real 2m27.263s >> user 1m41.871s >> sys 0m6.641s >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* , and >> is >> believed to be clean. _______________________________________________ >> Cod-bugs mailing list >> Cod-bugs at lists.crystallography.net >> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs >> > > > -- > Antanas Vaitkus, > PhD student at Vilnius University Institute of Biotechnology, > room V325, Saul?tekio al. 7, > LT-10257 Vilnius, Lithuania > > > -- Antanas Vaitkus, PhD student at Vilnius University Institute of Biotechnology, room V325, Saul?tekio al. 7, LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wojdyr at gmail.com Wed Dec 11 15:34:44 2019 From: wojdyr at gmail.com (Marcin Wojdyr) Date: Wed, 11 Dec 2019 14:34:44 +0100 Subject: [Cod-bugs] special characters (0x1b, 0x07) in CIF files In-Reply-To: References: Message-ID: Thanks a lot! Initially the parser in gemmi was failing when on any non-ascii characters, but then I came across a file from wwPDB that had such character (non-breaking space) in a quoted string, and I decided to add an exception for quoted strings. Although indeed the validator should report such things. Best wishes, Marcin On Wed, 11 Dec 2019 at 13:37, Antanas Vaitkus wrote: > > Dear Marcin Wojdyn, > > as of COD revision r245002 the issues you outlined are considered resolved. > > I would also like to note, that during the reparsing of the entire COD we discovered several more COD entries with illegal ASCII characters that were not picked up by your software. > A representative list of such structures: > https://www.crystallography.net/cod/4350338.cif at 239844 -- contains the ACK symbol in the value of the '_refine_diff_density_rms' data item; > https://www.crystallography.net/cod/4089334.cif at 243612 -- contains the SOH symbol in the value of the '_refine_diff_density_rms' data item. > > The '@' postfix points to the specific SVN revision where the file still contained the error. Just pointing this out in case you would find these examples useful in testing your software. > > Sincerely, > Antanas Vaitkus > > > On Wed, 11 Dec 2019 at 07:08, Antanas Vaitkus wrote: >> >> Dear Marcin Wojdyr, >> >> currently, the naming conventions of multi-block hkl files are a little inconsistent in the COD. However, I do agree that we should at least avoid duplicate data names. We will fix this issue as soon as possible. >> >> As for hkl entry 4115482, it seems to contain a CIF syntax error that our parser did not properly detect. We will definitely investigate that. >> >> Sincerely, >> Antanas Vaitkus >> >> On Tue, 10 Dec 2019 at 22:46, Marcin Wojdyr wrote: >>> >>> >>> and four hkl file with different syntax problems: >>> >>> $ time find ../cod/hkl/ -name \*.hkl | xargs -n1000 ./build/gemmi validate >>> ../cod/hkl/2/00/88/2008821.hkl: duplicate block name: 2008821_Fobs >>> ../cod/hkl/4/11/54/4115482.hkl:27:0(860): parse error >>> ../cod/hkl/4/11/75/4117532.hkl: duplicate block name: 4117532_diffractogram_1 >>> ../cod/hkl/4/11/75/4117533.hkl: duplicate block name: 4117532_diffractogram_1 >>> >>> real 2m27.263s >>> user 1m41.871s >>> sys 0m6.641s >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is >>> believed to be clean. _______________________________________________ >>> Cod-bugs mailing list >>> Cod-bugs at lists.crystallography.net >>> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs >> >> >> >> -- >> Antanas Vaitkus, >> PhD student at Vilnius University Institute of Biotechnology, >> room V325, Saul?tekio al. 7, >> LT-10257 Vilnius, Lithuania >> >> > > > -- > Antanas Vaitkus, > PhD student at Vilnius University Institute of Biotechnology, > room V325, Saul?tekio al. 7, > LT-10257 Vilnius, Lithuania > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From andrius.merkys at gmail.com Wed Dec 11 16:10:34 2019 From: andrius.merkys at gmail.com (Andrius Merkys) Date: Wed, 11 Dec 2019 16:10:34 +0200 Subject: [Cod-bugs] Problem with activation In-Reply-To: <9221ff53-cb89-f170-ecc9-d6e08e64491e@tugraz.at> References: <9221ff53-cb89-f170-ecc9-d6e08e64491e@tugraz.at> Message-ID: <2cb5bb84-3536-48e6-08c7-ee946c394507@gmail.com> Dear Maria, We at the COD had had problems with e-mail server since this weekend, but they have been resolved as of today. Have you by any chance received the e-mail already? If not, could you please tell the username with which you have registered? I will resend you the activation link. Best wishes, Andrius On 2019-12-11 11:02, Maria Gombotz wrote: > > Dear Mamdam or Sir, > > I tried on Monday to sign up at the Crystallography Open Database, but > I have not gotten an activation E-Mail yet. > Can you help me out? > > Kind regards, > Maria Gombotz > -- Andrius Merkys Vilnius University Institute of Biotechnology, Saul?tekio al. 7, room V325 LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From andrius.merkys at gmail.com Wed Dec 11 16:23:52 2019 From: andrius.merkys at gmail.com (Andrius Merkys) Date: Wed, 11 Dec 2019 16:23:52 +0200 Subject: [Cod-bugs] Problem with activation In-Reply-To: <5dde320d-8b74-9478-4567-621ccf5742a3@tugraz.at> References: <9221ff53-cb89-f170-ecc9-d6e08e64491e@tugraz.at> <2cb5bb84-3536-48e6-08c7-ee946c394507@gmail.com> <5dde320d-8b74-9478-4567-621ccf5742a3@tugraz.at> Message-ID: Dear Maria, Try this activation link: https://www.crystallography.net/cod/check_user.php?activationstr=8f35a12823b7f69d776d529321ab9a79 Hope this helps, Andrius On 2019-12-11 16:12, Maria Gombotz wrote: > Unfortunately I have not the email so far, but my username is "TUGom", > the adress is "gombotz at tugraz.at". -- Andrius Merkys Vilnius University Institute of Biotechnology, Saul?tekio al. 7, room V325 LT-10257 Vilnius, Lithuania -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.