[Cod-bugs] tag list

Saulius Gražulis grazulis at ibt.lt
Wed Mar 18 11:32:34 EET 2020


Dear Marcin,

thank you very much for sharing your tables!

On 2020-03-17 16:31, Marcin Wojdyr wrote:
> just for your information:
> when we were looking into cif tags in the PDB we needed a summary of
> all tags that are used.
> I was just updating that summary and I thought I'd run the same scripts on COD.
> The resulting table is here:
> 
> https://project-gemmi.github.io/pdb-stats/cod-tags.html
> 
> The values have yellow tooltips that show a name of one block that
> contains such a value.
> 
> I guess nothing there is new for you, but perhaps it can be useful in some way.

Although we do COD validation using various tools, we did not use
tag/value frequencies so far, and the idea to look at data name
frequencies and data ranges is indeed very useful and simple to
implement, and versatile (not only applicable to CIFs but to XML and
JSON as well). Looks like one can apply TFIDF
(https://en.wikipedia.org/wiki/Tf%E2%80%93idf) on structured data names!

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Saulėtekio al. 7
LT-10257 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353
mobile: (+370-684)-49802, (+370-614)-36366

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: grazulis.vcf
Type: text/x-vcard
Size: 4 bytes
Desc: not available
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20200318/8058ad89/attachment.vcf>


More information about the Cod-bugs mailing list