From wojdyr at gmail.com Tue Mar 17 16:31:33 2020 From: wojdyr at gmail.com (Marcin Wojdyr) Date: Tue, 17 Mar 2020 15:31:33 +0100 Subject: [Cod-bugs] tag list Message-ID: Hello, just for your information: when we were looking into cif tags in the PDB we needed a summary of all tags that are used. I was just updating that summary and I thought I'd run the same scripts on COD. The resulting table is here: https://project-gemmi.github.io/pdb-stats/cod-tags.html The values have yellow tooltips that show a name of one block that contains such a value. I guess nothing there is new for you, but perhaps it can be useful in some way. Marcin -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From grazulis at ibt.lt Wed Mar 18 11:32:34 2020 From: grazulis at ibt.lt (=?UTF-8?Q?Saulius_Gra=c5=beulis?=) Date: Wed, 18 Mar 2020 11:32:34 +0200 Subject: [Cod-bugs] tag list In-Reply-To: References: Message-ID: <20a3faa7-af00-3066-77ae-5cfa6eef52a8@ibt.lt> Dear Marcin, thank you very much for sharing your tables! On 2020-03-17 16:31, Marcin Wojdyr wrote: > just for your information: > when we were looking into cif tags in the PDB we needed a summary of > all tags that are used. > I was just updating that summary and I thought I'd run the same scripts on COD. > The resulting table is here: > > https://project-gemmi.github.io/pdb-stats/cod-tags.html > > The values have yellow tooltips that show a name of one block that > contains such a value. > > I guess nothing there is new for you, but perhaps it can be useful in some way. Although we do COD validation using various tools, we did not use tag/value frequencies so far, and the idea to look at data name frequencies and data ranges is indeed very useful and simple to implement, and versatile (not only applicable to CIFs but to XML and JSON as well). Looks like one can apply TFIDF (https://en.wikipedia.org/wiki/Tf%E2%80%93idf) on structured data names! Regards, Saulius -- Dr. Saulius Gra?ulis Vilnius University Institute of Biotechnology, Saul?tekio al. 7 LT-10257 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2234367 / phone (office): (+370-5)-2234353 mobile: (+370-684)-49802, (+370-614)-36366 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- A non-text attachment was scrubbed... Name: grazulis.vcf Type: text/x-vcard Size: 4 bytes Desc: not available URL: