[Cod-bugs] Number of entries in smiles.txt do not match cif entries.

Vladas Oleinikovas voleinikovas at monterosatx.com
Tue Nov 15 15:22:16 EET 2022


Hi!

Firstly, thanks for an amazing repo and great documentation!

I have recently downloaded COD using command:
>wget http://www.crystallography.net/archives/cod-cifs-mysql.zip
After unzipping I found cif and mysql directories – as expected.

Looking at files in mysql entries I caught interest of smiles.txt file. This looks very useful for searching the molecules of interest, especially the organic ones, that I am interested. I assume this relates to this paper (https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0279-6), is that correct?

Counting entries in this file, however, I find the number of entries significantly smaller than the reported number of entries on the title page (“Currently there are 494800 entries in the COD”):
~/COD/mysql:> wc -l smiles.txt
> 219646 smiles.txt

Is this because the file is not being updated, or does that exclude entries that were unable to be converted into SMILES?

Many thanks for your reply!

Best wishes,
Vladas

P.S. Feel free to answer in Lithuanian, if preferred 😊

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20221115/ce71189f/attachment.htm>


More information about the Cod-bugs mailing list