[Cod-bugs] overlap between COD and CSD

Yifei Qi yfqi at fudan.edu.cn
Tue Dec 14 09:10:56 EET 2021


Got it. Thanks a lot for your quick reply.



Yifei

> On Dec 14, 2021, at 15:00, Saulius Gražulis <grazulis at ibt.lt> wrote:
> 
> On 2021-12-13 17:38, Yifei Qi wrote:
>> Dear COD developers,
>> Thank you all for maintaining such a great database for open access of crystal structures for chemicals.
>> I am in the process of writing a book chapter about structure database of small molecules and would like to include a brief introduction to COD.
>> I am wondering how many of the 482,202 entries in COD are also included in CSD (Cambridge Structural Database).
>> If you happen to have that number kindly let me know as I do not have access to the whole CSD database.
> Unfortunately, we do not have access to the CSD either (this is one of the reasons why we build and use the COD :). Thus, we can not provide you this number.
> 
> And we should probably not consult CSD even if it were available, building the COD in a "cleanroom approach", to avoid any accusations that we have "stolen" data from the CSD. So we do not in principle compare our data collection against the CSD, for legal reasons, except possibly matching against the publicly available identifiers.
> 
> The closest proxy of the numbers you seek can be found by comparing publicly available DataCite paper DOIs. The summary table which I made for ourselves in 2020 looks like this:
> 
> 
>> # 2020-05-31 21:04:49 EEST
>> 168756   *Papers referenced in the CSD but not in the COD*
>> 23556    Papers referenced in the COD but not in the CSD
>> 153896   Papers referenced in both the COD and the CSD
>> 457203   Structures that are in the COD
>> 815131   Structures that are in the CSD
>> 177452   Papers that are referenced in the COD
>> 322652   Papers that are referenced in the CSD
>> 147490   Common COD and CSD papers that report equal number of structures
>> 2606     Common COD and CSD papers where *COD* reports less structures
>> 3800     Common COD and CSD papers where *CSD* reports less structures
> 
> The recalculation for the current date is possible but would take some time.
> 
> The number of structures in the CSD is suspiciously low, so it is possible that we did not spot all CSD structures.
> 
> Hope this helps.
> 
> Sincerely yours,
> Saulius
> 
> -- 
> Dr. Saulius Gražulis
> Vilnius University, Life Science Center, Institute of Biotechnology
> Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
> phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Cod-bugs mailing list