[Cod-bugs] meta question about cif syntax on COD
Antanas Vaitkus
antanas.vaitkus90 at gmail.com
Mon Jul 26 15:49:37 EEST 2021
Dear Norwid Behrnd,
Thank you for your question.
On Mon, 26 Jul 2021 at 15:07, Norwid Behrnd <nbehrnd at yahoo.com> wrote:
> Dear developers of COD,
>
> I became aware of the specification about the syntax for cif 2.0,
> which -- if used -- requires an early comment
> ```
> #\#CIF_2.0
> ```
> while Bernstein et al. remind / recommend (the not mandatory)
> comment
> ```
> #\#CIF_1.1
> ```
> in files of .cif 1.1.[1] In your description of the
> COD::CIF::Parser: parser[2] both mentions focus on the syntax of
> .cif (v1.1) as a target as well as preparation for .cif (v2.0).
>
I am glad to inform you that the COD::CIF::Parser from the cod-tools [AV-1]
package is now capable of parsing CIF_2.0 files as well. However, I am yet
to encounter a CIF 2.0 data file in the wild.
While I might misunderstand its processing of the data, I speculate
> a file recovery program like photorec[3] might work better if these
> ASCII-files would contain either one form of this type of identifier
> while re-assigning the extension .cif instead of a mere .txt to the
> file restored. The tentative addition of `#\#CIF_1.1` to a COD .cif
> (attached) retained the file's content fully accessible to e.g.,
> Mercury (2020.3), or Jmol. On the other hand, I'm unable to recall a
> recent instance where a .cif, downloaded as SI of a publication, or
> /via/ CCDC's conquest interface, explicitly contained such a label.
>
A similar question has already been raised in our team internally [AV-2],
but has not gained a lot of traction. The change itself is indeed quite
simple, however, it would require to update all COD CIF files as well
as the related data curation and depositions pipelines. At that time it
was not deemed a priority since CIF_1.1 files seem to function well
without the explicit format comment. Having a few real-world examples
of where the comment actually proves useful would help move things
along in this regard.
As for the examples that you have already provided:
- photorec: the crystallographic CIF file is quite obscure to most people,
so I would be very surprised if 'photorec' already had the heuristics to
recognise CIF files as such. Nevertheless, the developers may be open
to including such enhancements in the future.
- jmol: as far as I know jmol determines the format by actually parsing
the file so the file extension (cif, txt, etc.) should not really make a
difference.
- Mercury (2020.3): I am unsure how Mercury process files, but I image
it should still recognise CIF files with the .txt extension as CIF files
regardless of the presence of the "#\#CIF_1.1".
Additional examples would indeed be useful.
What is your perspective on adding `#\#CIF_1.1` to the .cif?
>
In general, we are not opposed to this idea, however, any large-scale
modifications to the COD data should be backed up by a specific need
(i.e. ensuring data quality, adherence to the FAIR principles, etc.).
> [1] https://scripts.iucr.org/cgi-bin/paper?aj5269
> [2] https://journals.iucr.org/j/issues/2016/01/00/po5052/index.html
> [3] https://www.cgsecurity.org/wiki/PhotoRec#How_PhotoRec_works
>
>
Sincerely,
Antanas Vaitkus
[AV-1] https://github.com/cod-developers/cod-tools
[AV-2] https://projects.ibt.lt/repositories/issues/100
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> Cod-bugs mailing list
> Cod-bugs at lists.crystallography.net
> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs
>
--
Antanas Vaitkus,
Vilnius University,
Life Sciences Center,
Institute of Biotechnology,
room C521, Saulėtekio al. 7,
LT-10257 Vilnius, Lithuania
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20210726/4b5a1acb/attachment.htm>
More information about the Cod-bugs
mailing list