From grazulis at ibt.lt Wed Mar 29 09:04:22 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Wed, 29 Mar 2023 09:04:22 +0300 Subject: [Cod-bugs] Accepted Manuscripts Entries In-Reply-To: References: Message-ID: <360d65b6-f48a-e7fe-907e-d632f36659bd@ibt.lt> Hi, Bob, many thanks for your e-mail! On 2023-03-28 19:26, Robert McMeeking - STFC UKRI wrote: > > I have noticed that there are now quite a lot of ?Accepted > Manuscripts? entries making their way into Crystallography Open > Database ? especially from Royal Society of Chemistry. It is good to > get the structures as soon as possible, but there is information such > as Volume/page numbers which needs updating. I have been thinking > about ways of setting up a system to update this information for the > CrystalWorks implementation at Daresbury. I can see ways of doing this > this using the DOI for the articles using various CrossRef web services. > Currently, the following procedure is implemented in our Web crawlers: step 1: every day we scan RSS feeds of multiple publishers to learn what new papers are published. This is done not so much to get the structures as early as possible (although this is a positive outcome as well, IMHO), but to have machine readable list of published papers (RSS feeds are much more amenable to automatic processing than web pages), and to have additional source of publication information. Since pages are not assigned at this point, and RSS feed differ in the amount of bibliographic information they provide, we take only DOI from the RSS feed, which is always present and os sufficient to get the full bibliography later; step 2: once a month we scan the published papers published in the current year by the publishers we monitor, and we check if full bibliographic information is already available. If that is the case, we update the COD record with complete bibliography. This is indicated in the COD SVN logs with the "Adding full bibliography for ..." log entries. Paper lists (DOIs) are obtained from CrossRef, and bibliographies are fetched from various sources, including PubMed, CrossRef, and publishers' publicly available information from their Web sites. An example of how the complete process looks can be found in the COD entry 1568124 [1-3]. There are over 11000 such bibliography update events recorded currently in the COD SVN logs. > Do you have plans to automatically check for updated bibliographic > information as it becomes available. If you do that is great! If not I > would be happy to pass on any information I am able to generate at our > end. I am currently working on understanding the technical details. > As said, we do check for the updated bibliographies and we update them when we find new data. Our process, however, is not perfect. The Web crawlers suffer from intense "bit rot" as the publishers' Web sites become more "modern". Our process might miss some bibliographies, and in several cases, I have noticed, we got garbled data (like in the old record of 7131035, see [4-5]). Moreover, recently IURr complained that we are sending too many requests to their server's bibliography endpoint, so I had to stop fetching IURr data temporarily, and this needs to be resolved. Thus, if you have a process that would allow to complete the bibliographies or correct other metadata (and data :), we would be grateful for your contribution. COD is meant to be collaborative project, we we all pool information for our mutual benefit :). We are ready to accept verified COD changes into the upstream, so that you do not need to re-apply your patches as you fetch a new COD revision. Please let me know what your plans are if you are about to implement information fetches, let's discuss how can integrate our data. Sincerely yours, Saulius Refs.: [1] Yu, Gang; Liu, Huanyu; Yan, Wenchao; Guo, Ruoyao; Wu, Aoben; Zhao, Zifeng; Liu, Zhiwei; Bian, Zuqiang. 4f ? 3d sensitization: a luminescent EuII-MnII heteronuclear complex with a near-unity quantum yield. (2023) COD Entry 1568124. URL: http://www.crystallography.net/cod/1568124.html [accessed 2023-03-29T08:32+03:00]. [2] The COD Advisory Board. COD Entry 1568124, rev. 282108. Subversion record. URL: svn://www.crystallography.net/cod/cif/1/56/81/1568124.cif [accessed 2023-03-29T08:34+03:00]. [3] The COD Entry 1568124 bibliography change revision, difference log: > saulius at tasmanijos-velnias ~/ $ svn log -c281623 --diff $(codid2file > 1568124) > ------------------------------------------------------------------------ > r281623 | coder | 2023-03-04 20:51:18 +0200 (Sat, 04 Mar 2023) | 4 lines > > cif/ > Updating files of 1568124, 1568125, 1568126, 1568127 > Original log message: > Adding full bibliography for 1568124--1568127.cif. > > Index: /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif > =================================================================== > --- /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif > ?(revision 281622) > +++ /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif > ?(revision 281623) > @@ -23,11 +23,16 @@ > ?'Bian, Zuqiang' > ?_publ_section_title > ?; > - 4f \\rightarrow 3d sensitization: a luminescent EuII--MnII heteronuclear > - complex with a near-unity quantum yield > + 4f → 3d sensitization: a luminescent > + EuII-MnII heteronuclear complex with a near-unity > + quantum yield. > ?; > -_journal_name_full?????????????? 'Materials Horizons' > -_journal_paper_doi?????????????? 10.1039/D2MH01123A > +_journal_issue?????????????????? 2 > +_journal_name_full?????????????? 'Materials horizons' > +_journal_page_first????????????? 625 > +_journal_page_last?????????????? 631 > +_journal_paper_doi?????????????? 10.1039/d2mh01123a > +_journal_volume????????????????? 10 > ?_journal_year??????????????????? 2023 > ?_chemical_absolute_configuration ad > ?_chemical_formula_sum??????????? 'C18 H36 Br2 Eu N2 O6' > @@ -138,6 +143,8 @@ > ?_reflns_threshold_expression???? 'I > 2\s(I)' > ?_cod_data_source_file??????????? d2mh01123a2.cif > ?_cod_data_source_block?????????? 3 > +_cod_depositor_comments > +'Adding full bibliography for 1568124--1568127.cif.' > ?_cod_database_code?????????????? 1568124 > ?_shelx_shelxl_version_number???? 2014/7 > ?_shelx_space_group_comment [4] The current 7131035 COD Entry (2023). URL: http://www.crystallography.net/cod/7131035.html; CIF URL: http://www.crystallography.net/cod/7131035.cif [accessed 2023-03-29T08:45+03:00]. [5] The old 7131035 COD Entry with gabled author names (2022) CIF URL: http://www.crystallography.net/cod/7131035.cif at 279571 [accessed 2023-03-29T08:47+03:00]. -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.mcmeeking at stfc.ac.uk Wed Mar 29 11:09:53 2023 From: robert.mcmeeking at stfc.ac.uk (Robert McMeeking - STFC UKRI) Date: Wed, 29 Mar 2023 08:09:53 +0000 Subject: [Cod-bugs] Accepted Manuscripts Entries In-Reply-To: <360d65b6-f48a-e7fe-907e-d632f36659bd@ibt.lt> References: <360d65b6-f48a-e7fe-907e-d632f36659bd@ibt.lt> Message-ID: Hi Saulius Thank you for the very informative reply. Yes, I have just checked and notice the ?Added full bibliography?? records. So I think you have things largely covered. As you suggest, there may be the danger for updates on entries to ?fall through the cracks?. I will check through the details in your email try to think of sensible ways we might work together on this. It is certainly better to avoid making local ?corrections? specifically for the CrystalWorks implementation. It does make sense to make updates centrally! Best Regards Bob From: Saulius Gra?ulis Sent: 29 March 2023 07:04 To: McMeeking, Robert (STFC,DL,SC) Cc: cod-bugs at lists.crystallography.net Subject: Re: Accepted Manuscripts Entries Hi, Bob, many thanks for your e-mail! On 2023-03-28 19:26, Robert McMeeking - STFC UKRI wrote: I have noticed that there are now quite a lot of ?Accepted Manuscripts? entries making their way into Crystallography Open Database ? especially from Royal Society of Chemistry. It is good to get the structures as soon as possible, but there is information such as Volume/page numbers which needs updating. I have been thinking about ways of setting up a system to update this information for the CrystalWorks implementation at Daresbury. I can see ways of doing this this using the DOI for the articles using various CrossRef web services. Currently, the following procedure is implemented in our Web crawlers: step 1: every day we scan RSS feeds of multiple publishers to learn what new papers are published. This is done not so much to get the structures as early as possible (although this is a positive outcome as well, IMHO), but to have machine readable list of published papers (RSS feeds are much more amenable to automatic processing than web pages), and to have additional source of publication information. Since pages are not assigned at this point, and RSS feed differ in the amount of bibliographic information they provide, we take only DOI from the RSS feed, which is always present and os sufficient to get the full bibliography later; step 2: once a month we scan the published papers published in the current year by the publishers we monitor, and we check if full bibliographic information is already available. If that is the case, we update the COD record with complete bibliography. This is indicated in the COD SVN logs with the "Adding full bibliography for ..." log entries. Paper lists (DOIs) are obtained from CrossRef, and bibliographies are fetched from various sources, including PubMed, CrossRef, and publishers' publicly available information from their Web sites. An example of how the complete process looks can be found in the COD entry 1568124 [1-3]. There are over 11000 such bibliography update events recorded currently in the COD SVN logs. Do you have plans to automatically check for updated bibliographic information as it becomes available. If you do that is great! If not I would be happy to pass on any information I am able to generate at our end. I am currently working on understanding the technical details. As said, we do check for the updated bibliographies and we update them when we find new data. Our process, however, is not perfect. The Web crawlers suffer from intense "bit rot" as the publishers' Web sites become more "modern". Our process might miss some bibliographies, and in several cases, I have noticed, we got garbled data (like in the old record of 7131035, see [4-5]). Moreover, recently IURr complained that we are sending too many requests to their server's bibliography endpoint, so I had to stop fetching IURr data temporarily, and this needs to be resolved. Thus, if you have a process that would allow to complete the bibliographies or correct other metadata (and data :), we would be grateful for your contribution. COD is meant to be collaborative project, we we all pool information for our mutual benefit :). We are ready to accept verified COD changes into the upstream, so that you do not need to re-apply your patches as you fetch a new COD revision. Please let me know what your plans are if you are about to implement information fetches, let's discuss how can integrate our data. Sincerely yours, Saulius Refs.: [1] Yu, Gang; Liu, Huanyu; Yan, Wenchao; Guo, Ruoyao; Wu, Aoben; Zhao, Zifeng; Liu, Zhiwei; Bian, Zuqiang. 4f ? 3d sensitization: a luminescent EuII-MnII heteronuclear complex with a near-unity quantum yield. (2023) COD Entry 1568124. URL: http://www.crystallography.net/cod/1568124.html [accessed 2023-03-29T08:32+03:00]. [2] The COD Advisory Board. COD Entry 1568124, rev. 282108. Subversion record. URL: svn://www.crystallography.net/cod/cif/1/56/81/1568124.cif [accessed 2023-03-29T08:34+03:00]. [3] The COD Entry 1568124 bibliography change revision, difference log: saulius at tasmanijos-velnias ~/ $ svn log -c281623 --diff $(codid2file 1568124) ------------------------------------------------------------------------ r281623 | coder | 2023-03-04 20:51:18 +0200 (Sat, 04 Mar 2023) | 4 lines cif/ Updating files of 1568124, 1568125, 1568126, 1568127 Original log message: Adding full bibliography for 1568124--1568127.cif. Index: /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif =================================================================== --- /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif (revision 281622) +++ /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif (revision 281623) @@ -23,11 +23,16 @@ 'Bian, Zuqiang' _publ_section_title ; - 4f \\rightarrow 3d sensitization: a luminescent EuII--MnII heteronuclear - complex with a near-unity quantum yield + 4f → 3d sensitization: a luminescent + EuII-MnII heteronuclear complex with a near-unity + quantum yield. ; -_journal_name_full 'Materials Horizons' -_journal_paper_doi 10.1039/D2MH01123A +_journal_issue 2 +_journal_name_full 'Materials horizons' +_journal_page_first 625 +_journal_page_last 631 +_journal_paper_doi 10.1039/d2mh01123a +_journal_volume 10 _journal_year 2023 _chemical_absolute_configuration ad _chemical_formula_sum 'C18 H36 Br2 Eu N2 O6' @@ -138,6 +143,8 @@ _reflns_threshold_expression 'I > 2\s(I)' _cod_data_source_file d2mh01123a2.cif _cod_data_source_block 3 +_cod_depositor_comments +'Adding full bibliography for 1568124--1568127.cif.' _cod_database_code 1568124 _shelx_shelxl_version_number 2014/7 _shelx_space_group_comment [4] The current 7131035 COD Entry (2023). URL: http://www.crystallography.net/cod/7131035.html; CIF URL: http://www.crystallography.net/cod/7131035.cif [accessed 2023-03-29T08:45+03:00]. [5] The old 7131035 COD Entry with gabled author names (2022) CIF URL: http://www.crystallography.net/cod/7131035.cif at 279571 [accessed 2023-03-29T08:47+03:00]. -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grazulis at ibt.lt Wed Mar 29 12:50:41 2023 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Wed, 29 Mar 2023 12:50:41 +0300 Subject: [Cod-bugs] Accepted Manuscripts Entries In-Reply-To: References: <360d65b6-f48a-e7fe-907e-d632f36659bd@ibt.lt> Message-ID: <8564feb5-1b93-929c-afe0-f803ef7f6c9f@ibt.lt> On 2023-03-29 11:09, Robert McMeeking - STFC UKRI wrote: > > Yes, I have just checked and notice the ?Added full bibliography?? > records. So I think you have things largely covered. As you suggest, > there may be the danger for updates on entries to ?fall through the > cracks?. > > I will check through the details in your email try to think of > sensible ways we might work together on this. It is certainly better > to avoid making local ?corrections? specifically for the CrystalWorks > implementation. It does make sense to make updates centrally! > Thank you for your answer! Please let me know about decsions. Regards, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.mcmeeking at stfc.ac.uk Fri Mar 31 19:00:28 2023 From: robert.mcmeeking at stfc.ac.uk (Robert McMeeking - STFC UKRI) Date: Fri, 31 Mar 2023 16:00:28 +0000 Subject: [Cod-bugs] Accepted Manuscripts Entries In-Reply-To: <8564feb5-1b93-929c-afe0-f803ef7f6c9f@ibt.lt> References: <360d65b6-f48a-e7fe-907e-d632f36659bd@ibt.lt> <8564feb5-1b93-929c-afe0-f803ef7f6c9f@ibt.lt> Message-ID: Hello Saulius There have been problems with my VPN setup and I have not been able to do as much work from home as I would have liked. I have managed a check on the probable number of references involved. I made a snapshot a few hours. The number of DOIs linked to COD entries without vol/page info was 14031. This corresponds to 10 publishers ? most have 2 or problem entries IUCr had 1107 ACS had 3079 RSC had 9732 These are the numbers of papers ? the number of COD entries will, of course, be grater. The next stage is mapping corrected year/vol/issue/page/DOI info to COD entries. Hopefully this will not upset the CrossRef folk too much! Best Regards Bob From: Saulius Gra?ulis Sent: 29 March 2023 10:51 To: McMeeking, Robert (STFC,DL,SC) Cc: cod-bugs at lists.crystallography.net Subject: Re: Accepted Manuscripts Entries On 2023-03-29 11:09, Robert McMeeking - STFC UKRI wrote: Yes, I have just checked and notice the ?Added full bibliography?? records. So I think you have things largely covered. As you suggest, there may be the danger for updates on entries to ?fall through the cracks?. I will check through the details in your email try to think of sensible ways we might work together on this. It is certainly better to avoid making local ?corrections? specifically for the CrystalWorks implementation. It does make sense to make updates centrally! Thank you for your answer! Please let me know about decsions. Regards, Saulius -- Dr. Saulius Gra?ulis Vilnius University, Life Science Center, Institute of Biotechnology Saul?tekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania) phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: