[Cod-bugs] meta question about cif syntax on COD

Norwid Behrnd nbehrnd at yahoo.com
Mon Jul 26 18:10:10 EEST 2021


Dear Antanas Vaitkus,

thank your for your reply.

To keep the COD-internal idea discussion board in good shape, note 
I was writing today, Monday, July twenty-sixth, but not (as currently
stated), Tuesday July 0-sixth about three weeks ago.

A time-limited play with the keywords of `.cif`, `#\#CIF_2.0`,
`#\#CIF 2.0` brought my attention to pages like

http://www.oqmd.org/materials/export/conventional/cif/4149656
http://www.oqmd.org/materials/export/conventional/cif/4149650

These belong to Open Quantum Materials Database (root entry
http://www.oqmd.org/) filing structures with `#\#CIF 2.0` (last part
of the address apparently needs to be an I7 integer), perhaps because
there is a relevant page on GitHub within an electronic structure
calculation multiple times starred and forked

https://github.com/electronic-structure/SIRIUS/blob/master/apps/cif_input/PyCifRW/CifFile.py

or pycifrw's entry into Debian's repositories.  The conversion

```
obabel -icif 4149656_2.cif -oxyz -O test.xyz
```
of such a .cif (v2.0) works well enough.  Checkcif seems to be
prepared that such files might be presented (of course the specific
example yields multiple warnings).

Speculation «seen by chance» is the file 1000001.rod on the
Open Raman Database,
https://solsa.crystallography.net/rod/1000001.html with an early
`#\#CIF 2.0`, for that it is the sole entry identified so far with
simultaneous display Raman and crystal structure preceded by this
indicator.  (This time not suitable for OpenBabel.)

I understand you wish to focus the resources available to cif v1.1 and
consider the question fully answered.  (.pdb are recognized by
photorec, by the way)

Thank you very much.  With best regards,

Norwid Behrnd


On Mon, 26 Jul 2021 15:49:37 +0300
Antanas Vaitkus <antanas.vaitkus90 at gmail.com> wrote:

> Dear Norwid Behrnd,
> 
> Thank you for your question.
> 
> On Mon, 26 Jul 2021 at 15:07, Norwid Behrnd <nbehrnd at yahoo.com> wrote:
> 
> > Dear developers of COD,
> >
> > I became aware of the specification about the syntax for cif 2.0,
> > which -- if used -- requires an early comment
> > ```
> > #\#CIF_2.0
> > ```
> > while Bernstein et al. remind / recommend (the not mandatory)
> > comment
> > ```
> > #\#CIF_1.1
> > ```
> > in files of .cif 1.1.[1]  In your description of the
> > COD::CIF::Parser: parser[2] both mentions focus on the syntax of
> > .cif (v1.1) as a target as well as preparation for .cif (v2.0).
> >  
> 
> I am glad to inform you that the COD::CIF::Parser from the cod-tools [AV-1]
> package is now capable of parsing CIF_2.0 files as well. However, I am yet
> to encounter a CIF 2.0 data file in the wild.
> 
> While I might misunderstand its processing of the data, I speculate
> > a file recovery program like photorec[3] might work better if these
> > ASCII-files would contain either one form of this type of identifier
> > while re-assigning the extension .cif instead of a mere .txt to the
> > file restored.  The tentative addition of `#\#CIF_1.1` to a COD .cif
> > (attached) retained the file's content fully accessible to e.g.,
> > Mercury (2020.3), or Jmol.  On the other hand, I'm unable to recall a
> > recent instance where a .cif, downloaded as SI of a publication, or
> > /via/ CCDC's conquest interface, explicitly contained such a label.
> >  
> 
> A similar question has already been raised in our team internally [AV-2],
> but has not gained a lot of traction. The change itself is indeed quite
> simple, however, it would require to update all COD CIF files as well
> as the related data curation and depositions pipelines. At that time it
> was not deemed a priority since CIF_1.1 files seem to function well
> without the explicit format comment. Having a few real-world examples
> of where the comment actually proves useful would help move things
> along in this regard.
> 
> As for the examples that you have already provided:
> - photorec: the crystallographic CIF file is quite obscure to most people,
>   so I would be very surprised if 'photorec' already had the heuristics to
>   recognise CIF files as such. Nevertheless, the developers may be open
>   to including such enhancements in the future.
> - jmol: as far as I know jmol determines the format by actually parsing
>   the file so the file extension (cif, txt, etc.) should not really make a
>   difference.
> - Mercury (2020.3): I am unsure how Mercury process files, but I image
>   it should still recognise CIF files with the .txt extension as CIF files
>   regardless of the presence of the "#\#CIF_1.1".
> 
> Additional examples would indeed be useful.
> 
> What is your perspective on adding `#\#CIF_1.1` to the .cif?
> >  
> 
> In general, we are not opposed to this idea, however, any large-scale
> modifications to the COD data should be backed up by a specific need
> (i.e. ensuring data quality, adherence to the FAIR principles, etc.).
> 
> 
> > [1] https://scripts.iucr.org/cgi-bin/paper?aj5269
> > [2] https://journals.iucr.org/j/issues/2016/01/00/po5052/index.html
> > [3] https://www.cgsecurity.org/wiki/PhotoRec#How_PhotoRec_works
> >
> >  
> Sincerely,
> Antanas Vaitkus
> 
> [AV-1] https://github.com/cod-developers/cod-tools
> [AV-2] https://projects.ibt.lt/repositories/issues/100
> 
> 
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> >
> > Cod-bugs mailing list
> > Cod-bugs at lists.crystallography.net
> > http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs
> >  
> 
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=4149656_2.cif

I1wjQ0lGXzIuMAoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMKIyAgICAgICAgICAgICAgIENyeXN0YWxsb2dy
YXBoaWMgSW5mb3JtYXRpb24gRm9ybWF0IGZpbGUKIyAgICAgICAgICAgICAgIFByb2R1Y2VkIGJ5
IFB5Q2lmUlcgbW9kdWxlCiMKIyAgVGhpcyBpcyBhIENJRiBmaWxlLiAgQ0lGIGhhcyBiZWVuIGFk
b3B0ZWQgYnkgdGhlIEludGVybmF0aW9uYWwKIyAgVW5pb24gb2YgQ3J5c3RhbGxvZ3JhcGh5IGFz
IHRoZSBzdGFuZGFyZCBmb3IgZGF0YSBhcmNoaXZpbmcgYW5kCiMgIHRyYW5zbWlzc2lvbi4KIwoj
ICBGb3IgaW5mb3JtYXRpb24gb24gdGhpcyBmaWxlIGZvcm1hdCwgZm9sbG93IHRoZSBDSUYgbGlu
a3MgYXQKIyAgaHR0cDovL3d3dy5pdWNyLm9yZwojIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKZGF0YV9zdHJ1
Y3R1cmVfMAoKX2NoZW1pY2FsX2Zvcm11bGFfc3VtIFtQICByICAxICBSICB1ICAxXQpfY2VsbF9s
ZW5ndGhfYSAgICAgICAgICAgICAgICAgICAgICAgICAgMy41NDk0MjYKX2NlbGxfbGVuZ3RoX2Ig
ICAgICAgICAgICAgICAgICAgICAgICAgIDMuNTQ5NDI2Cl9jZWxsX2xlbmd0aF9jICAgICAgICAg
ICAgICAgICAgICAgICAgICA0LjI1OTk5NgpfY2VsbF9hbmdsZV9hbHBoYSAgICAgICAgICAgICAg
ICAgICAgICAgOTAuMDAwMDAwCl9jZWxsX2FuZ2xlX2JldGEgICAgICAgICAgICAgICAgICAgICAg
ICA5MC4wMDAwMDAKX2NlbGxfYW5nbGVfZ2FtbWEgICAgICAgICAgICAgICAgICAgICAgIDEyMC4w
MDAwMDAKX2NlbGxfdm9sdW1lICAgICAgICAgICAgICAgICAgICAgICAgICAgIDQ2LjQ3ODkxOApf
c3ltbWV0cnlfc3BhY2VfZ3JvdXBfbmFtZV9ILU0gICAgICAgICAgUC02bTIKX3N5bW1ldHJ5X0lu
dF9UYWJsZXNfbnVtYmVyICAgICAgICAgICAgIDE4Nwpsb29wXwogIF9zeW1tZXRyeV9lcXVpdl9w
b3Nfc2l0ZV9pZAogIF9zeW1tZXRyeV9lcXVpdl9wb3NfYXNfeHl6CiAgICAgICAgIDEgICAgICAg
ICAnK3gsK3ksK3onICAgICAgICAgIAogICAgICAgICAyICAgICAgICAgJy14K3krMCwteCwteicg
ICAgICAKICAgICAgICAgMyAgICAgICAgICcteS0wLCt4LXktMCwreicgICAgCiAgICAgICAgIDQg
ICAgICAgICAnK3gsK3ksLXonICAgICAgICAgIAogICAgICAgICA1ICAgICAgICAgJy14K3krMCwt
eCwreicgICAgICAKICAgICAgICAgNiAgICAgICAgICcteS0wLCt4LXktMCwteicgICAgCiAgICAg
ICAgIDcgICAgICAgICAnLXksLXgsLXonICAgICAgICAgIAogICAgICAgICA4ICAgICAgICAgJyt4
LTAsK3gteS0wLCt6JyAgICAKICAgICAgICAgOSAgICAgICAgICcteCt5KzAsK3ksLXonICAgICAg
CiAgICAgICAgIDEwICAgICAgICAnLXksLXgsK3onICAgICAgICAgIAogICAgICAgICAxMSAgICAg
ICAgJyt4LTAsK3gteS0wLC16JyAgICAKICAgICAgICAgMTIgICAgICAgICcteCt5KzAsK3ksK3on
IApsb29wXwogIF9hdG9tX3NpdGVfbGFiZWwKICBfYXRvbV9zaXRlX3R5cGVfc3ltYm9sCiAgX2F0
b21fc2l0ZV9mcmFjdF94CiAgX2F0b21fc2l0ZV9mcmFjdF95CiAgX2F0b21fc2l0ZV9mcmFjdF96
CiAgX2F0b21fc2l0ZV9XeWNrb2ZmX3N5bWJvbAogIF9hdG9tX3NpdGVfb2NjdXBhbmN5CiAgICAg
ICAgIFByMCAgICAgICBQciswICAgICAgMC42NjY2NjcgIDAuMzMzMzMzICAwLjUwMDAwMCAgZiAg
ICAgICAgIDEuMDAwMDAwICAKICAgICAgICAgUnUxICAgICAgIFJ1KzAgICAgICAwLjMzMzMzMyAg
MC42NjY2NjcgIDAuMDAwMDAwICBjICAgICAgICAgMS4wMDAwMDAgIAogICAgICAgICBSdTEgICAg
ICAgUnUrMCAgICAgIDAuMzMzMzMzICAwLjY2NjY2NyAgMS4wMDAwMDAgIGMgICAgICAgICAxLjAw
MDAwMCAKbG9vcF8KICBfYXRvbV90eXBlX3N5bWJvbAogIF9hdG9tX3R5cGVfb3hpZGF0aW9uX251
bWJlcgogICAgICAgICBQciAgICAgICAgMCAgICAgICAgIAogICAgICAgICBSdSAgICAgICAgMCA=
-------------- next part --------------
A non-text attachment was scrubbed...
Name: checkcif.pdf
Type: application/pdf
Size: 237810 bytes
Desc: not available
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20210726/d01c5b5d/attachment-0001.pdf>


More information about the Cod-bugs mailing list