[Cod-bugs] Accepted Manuscripts Entries

Saulius Gražulis grazulis at ibt.lt
Wed Mar 29 09:04:22 EEST 2023


Hi, Bob,

many thanks for your e-mail!

On 2023-03-28 19:26, Robert McMeeking - STFC UKRI wrote:
>
> I have noticed that there are now quite a lot of “Accepted 
> Manuscripts” entries making their way into Crystallography Open 
> Database – especially from Royal Society of Chemistry. It is good to 
> get the structures as soon as possible, but there is information such 
> as Volume/page numbers which needs updating. I have been thinking 
> about ways of setting up a system to update this information for the 
> CrystalWorks implementation at Daresbury. I can see ways of doing this 
> this using the DOI for the articles using various CrossRef web services.
>
Currently, the following procedure is implemented in our Web crawlers:

step 1: every day we scan RSS feeds of multiple publishers to learn what 
new papers are published. This is done not so much to get the structures 
as early as possible (although this is a positive outcome as well, 
IMHO), but to have machine readable list of published papers (RSS feeds 
are much more amenable to automatic processing than web pages), and to 
have additional source of publication information.

Since pages are not assigned at this point, and RSS feed differ in the 
amount of bibliographic information they provide, we take only DOI from 
the RSS feed, which is always present and os sufficient to get the full 
bibliography later;

step 2: once a month we scan the published papers published in the 
current year by the publishers we monitor, and we check if full 
bibliographic information is already available. If that is the case, we 
update the COD record with complete bibliography. This is indicated in 
the COD SVN logs with the "Adding full bibliography for ..." log 
entries. Paper lists (DOIs) are obtained from CrossRef, and 
bibliographies are fetched from various sources, including PubMed, 
CrossRef, and publishers' publicly available information from their Web 
sites.

An example of how the complete process looks can be found in the COD 
entry 1568124 [1-3]. There are over 11000 such bibliography update 
events recorded currently in the COD SVN logs.

> Do you have plans to automatically check for updated bibliographic 
> information as it becomes available. If you do that is great! If not I 
> would be happy to pass on any information I am able to generate at our 
> end. I am currently working on understanding the technical details.
>
As said, we do check for the updated bibliographies and we update them 
when we find new data. Our process, however, is not perfect. The Web 
crawlers suffer from intense "bit rot" as the publishers' Web sites 
become more "modern". Our process might miss some bibliographies, and in 
several cases, I have noticed, we got garbled data (like in the old 
record of 7131035, see [4-5]). Moreover, recently IURr complained that 
we are sending too many requests to their server's bibliography 
endpoint, so I had to stop fetching IURr data temporarily, and this 
needs to be resolved.

Thus, if you have a process that would allow to complete the 
bibliographies or correct other metadata (and data :), we would be 
grateful for your contribution. COD is meant to be collaborative 
project, we we all pool information for our mutual benefit :). We are 
ready to accept verified COD changes into the upstream, so that you do 
not need to re-apply your patches as you fetch a new COD revision.

Please let me know what your plans are if you are about to implement 
information fetches, let's discuss how can integrate our data.

Sincerely yours,
Saulius

Refs.:

[1] Yu, Gang; Liu, Huanyu; Yan, Wenchao; Guo, Ruoyao; Wu, Aoben; Zhao, 
Zifeng; Liu, Zhiwei; Bian, Zuqiang. 4f → 3d sensitization: a luminescent 
Eu<sup>II</sup>-Mn<sup>II</sup> heteronuclear complex with a near-unity 
quantum yield. (2023) COD Entry 1568124. URL: 
http://www.crystallography.net/cod/1568124.html [accessed 
2023-03-29T08:32+03:00].

[2] The COD Advisory Board. COD Entry 1568124, rev. 282108. Subversion 
record. URL: svn://www.crystallography.net/cod/cif/1/56/81/1568124.cif 
[accessed 2023-03-29T08:34+03:00].

[3] The COD Entry 1568124 bibliography change revision, difference log:

> saulius at tasmanijos-velnias ~/ $ svn log -c281623 --diff $(codid2file 
> 1568124)
> ------------------------------------------------------------------------
> r281623 | coder | 2023-03-04 20:51:18 +0200 (Sat, 04 Mar 2023) | 4 lines
>
> cif/
> Updating files of 1568124, 1568125, 1568126, 1568127
> Original log message:
> Adding full bibliography for 1568124--1568127.cif.
>
> Index: /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif
> ===================================================================
> --- /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif 
>  (revision 281622)
> +++ /home/saulius/struct/cod/current/cif/1/56/81/1568124.cif 
>  (revision 281623)
> @@ -23,11 +23,16 @@
>  'Bian, Zuqiang'
>  _publ_section_title
>  ;
> - 4f \\rightarrow 3d sensitization: a luminescent EuII--MnII heteronuclear
> - complex with a near-unity quantum yield
> + 4f &#x2192; 3d sensitization: a luminescent
> + Eu<sup>II</sup>-Mn<sup>II</sup> heteronuclear complex with a near-unity
> + quantum yield.
>  ;
> -_journal_name_full               'Materials Horizons'
> -_journal_paper_doi               10.1039/D2MH01123A
> +_journal_issue                   2
> +_journal_name_full               'Materials horizons'
> +_journal_page_first              625
> +_journal_page_last               631
> +_journal_paper_doi               10.1039/d2mh01123a
> +_journal_volume                  10
>  _journal_year                    2023
>  _chemical_absolute_configuration ad
>  _chemical_formula_sum            'C18 H36 Br2 Eu N2 O6'
> @@ -138,6 +143,8 @@
>  _reflns_threshold_expression     'I > 2\s(I)'
>  _cod_data_source_file            d2mh01123a2.cif
>  _cod_data_source_block           3
> +_cod_depositor_comments
> +'Adding full bibliography for 1568124--1568127.cif.'
>  _cod_database_code               1568124
>  _shelx_shelxl_version_number     2014/7
>  _shelx_space_group_comment

[4] The current 7131035 COD Entry (2023). URL: 
http://www.crystallography.net/cod/7131035.html; CIF URL: 
http://www.crystallography.net/cod/7131035.cif [accessed 
2023-03-29T08:45+03:00].

[5] The old 7131035 COD Entry with gabled author names (2022) CIF URL: 
http://www.crystallography.net/cod/7131035.cif@279571 [accessed 
2023-03-29T08:47+03:00].

-- 
Dr. Saulius Gražulis
Vilnius University, Life Science Center, Institute of Biotechnology
Saulėtekio al. 7, LT-10257 Vilnius, Lietuva (Lithuania)
phone (office): (+370-5)-2234353, mobile: (+370-684)-49802, (+370-614)-36366

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20230329/fef4d4e0/attachment.htm>


More information about the Cod-bugs mailing list