From yfqi at fudan.edu.cn Mon Dec 13 17:38:34 2021 From: yfqi at fudan.edu.cn (Yifei Qi) Date: Mon, 13 Dec 2021 23:38:34 +0800 Subject: [Cod-bugs] overlap between COD and CSD Message-ID: <9021F24C-A8D0-4BC4-B4FF-2D14A6F85822@fudan.edu.cn> Dear COD developers, Thank you all for maintaining such a great database for open access of crystal structures for chemicals. I am in the process of writing a book chapter about structure database of small molecules and would like to include a brief introduction to COD. I am wondering how many of the 482,202 entries in COD are also included in CSD (Cambridge Structural Database). If you happen to have that number kindly let me know as I do not have access to the whole CSD database. Thank you very much. Best, Yifei Qi Associate Professor Department of Medicinal Chemistry, School of Pharmacy, Fudan University, China -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Tue Dec 14 09:00:04 2021 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Tue, 14 Dec 2021 09:00:04 +0200 Subject: [Cod-bugs] overlap between COD and CSD In-Reply-To: <9021F24C-A8D0-4BC4-B4FF-2D14A6F85822@fudan.edu.cn> References: <9021F24C-A8D0-4BC4-B4FF-2D14A6F85822@fudan.edu.cn> Message-ID: On 2021-12-13 17:38, Yifei Qi wrote: > Dear COD developers, > Thank you all for maintaining such a great database for open access > of?crystal structures for chemicals. > I am in the process of writing a book chapter about structure database > of small molecules and would like to include a brief introduction to COD. > I am wondering how many of the 482,202?entries in COD are also > included in CSD (Cambridge Structural Database). > If you happen to have that number kindly let me know as I do not have > access to the whole CSD database. Unfortunately, we do not have access to the CSD either (this is one of the reasons why we build and use the COD :). Thus, we can not provide you this number. And we should probably not consult CSD even if it were available, building the COD in a "cleanroom approach", to avoid any accusations that we have "stolen" data from the CSD. So we do not in principle compare our data collection against the CSD, for legal reasons, except possibly matching against the publicly available identifiers. The closest proxy of the numbers you seek can be found by comparing publicly available DataCite paper DOIs. The summary table which I made for ourselves in 2020 looks like this: > # 2020-05-31 21:04:49 EEST > 168756?? *Papers referenced in the CSD but not in the COD* > 23556??? Papers referenced in the COD but not in the CSD > 153896?? Papers referenced in both the COD and the CSD > 457203?? Structures that are in the COD > 815131?? Structures that are in the CSD > 177452?? Papers that are referenced in the COD > 322652?? Papers that are referenced in the CSD > 147490?? Common COD and CSD papers that report equal number of structures > 2606???? Common COD and CSD papers where *COD* reports less structures > 3800???? Common COD and CSD papers where *CSD* reports less structures The recalculation for the current date is possible but would take some time. The number of structures in the CSD is suspiciously low, so it is possible that we did not spot all CSD structures. Hope this helps. Sincerely yours, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From yfqi at fudan.edu.cn Tue Dec 14 09:10:56 2021 From: yfqi at fudan.edu.cn (Yifei Qi) Date: Tue, 14 Dec 2021 15:10:56 +0800 Subject: [Cod-bugs] overlap between COD and CSD In-Reply-To: References: <9021F24C-A8D0-4BC4-B4FF-2D14A6F85822@fudan.edu.cn> Message-ID: Got it. Thanks a lot for your quick reply. Yifei > On Dec 14, 2021, at 15:00, Saulius Gra?ulis wrote: > > On 2021-12-13 17:38, Yifei Qi wrote: >> Dear COD developers, >> Thank you all for maintaining such a great database for open access of crystal structures for chemicals. >> I am in the process of writing a book chapter about structure database of small molecules and would like to include a brief introduction to COD. >> I am wondering how many of the 482,202 entries in COD are also included in CSD (Cambridge Structural Database). >> If you happen to have that number kindly let me know as I do not have access to the whole CSD database. > Unfortunately, we do not have access to the CSD either (this is one of the reasons why we build and use the COD :). Thus, we can not provide you this number. > > And we should probably not consult CSD even if it were available, building the COD in a "cleanroom approach", to avoid any accusations that we have "stolen" data from the CSD. So we do not in principle compare our data collection against the CSD, for legal reasons, except possibly matching against the publicly available identifiers. > > The closest proxy of the numbers you seek can be found by comparing publicly available DataCite paper DOIs. The summary table which I made for ourselves in 2020 looks like this: > > >> # 2020-05-31 21:04:49 EEST >> 168756 *Papers referenced in the CSD but not in the COD* >> 23556 Papers referenced in the COD but not in the CSD >> 153896 Papers referenced in both the COD and the CSD >> 457203 Structures that are in the COD >> 815131 Structures that are in the CSD >> 177452 Papers that are referenced in the COD >> 322652 Papers that are referenced in the CSD >> 147490 Common COD and CSD papers that report equal number of structures >> 2606 Common COD and CSD papers where *COD* reports less structures >> 3800 Common COD and CSD papers where *CSD* reports less structures > > The recalculation for the current date is possible but would take some time. > > The number of structures in the CSD is suspiciously low, so it is possible that we did not spot all CSD structures. > > Hope this helps. > > Sincerely yours, > Saulius > > -- > Dr. Saulius Gra?ulis > Vilnius University, Life Science Center, Institute of Biotechnology > Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) > phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.