[Cod-bugs] overlap between COD and CSD

Saulius Gražulis grazulis at ibt.lt
Tue Dec 14 09:00:04 EET 2021


On 2021-12-13 17:38, Yifei Qi wrote:
> Dear COD developers,
> Thank you all for maintaining such a great database for open access
> of crystal structures for chemicals.
> I am in the process of writing a book chapter about structure database
> of small molecules and would like to include a brief introduction to COD.
> I am wondering how many of the 482,202 entries in COD are also
> included in CSD (Cambridge Structural Database).
> If you happen to have that number kindly let me know as I do not have
> access to the whole CSD database.

Unfortunately, we do not have access to the CSD either (this is one of
the reasons why we build and use the COD :). Thus, we can not provide
you this number.

And we should probably not consult CSD even if it were available,
building the COD in a "cleanroom approach", to avoid any accusations
that we have "stolen" data from the CSD. So we do not in principle
compare our data collection against the CSD, for legal reasons, except
possibly matching against the publicly available identifiers.

The closest proxy of the numbers you seek can be found by comparing
publicly available DataCite paper DOIs. The summary table which I made
for ourselves in 2020 looks like this:

> # 2020-05-31 21:04:49 EEST
> 168756   *Papers referenced in the CSD but not in the COD*
> 23556    Papers referenced in the COD but not in the CSD
> 153896   Papers referenced in both the COD and the CSD
> 457203   Structures that are in the COD
> 815131   Structures that are in the CSD
> 177452   Papers that are referenced in the COD
> 322652   Papers that are referenced in the CSD
> 147490   Common COD and CSD papers that report equal number of structures
> 2606     Common COD and CSD papers where *COD* reports less structures
> 3800     Common COD and CSD papers where *CSD* reports less structures

The recalculation for the current date is possible but would take some time.

The number of structures in the CSD is suspiciously low, so it is
possible that we did not spot all CSD structures.

Hope this helps.

Sincerely yours,
Saulius

-- 
Dr. Saulius Gražulis
Vilnius University, Life Science Center, Institute of Biotechnology
Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20211214/01bb7de9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20211214/01bb7de9/attachment.sig>


More information about the Cod-bugs mailing list