[TCOD] TCOD dictionaries and data presentation -- 3 levels of description

Saulius Gražulis grazulis at ibt.lt
Tue Jul 22 13:29:55 UTC 2014


Dear TCODers!

sorry for a period of silence. It took me a while to digest all ideas
expressed on this list, catch up with the work done by other groups
(Quichote, CompChem).

I am toying with the idea that TCOD should have 3 levels of structure
description:

-- level 0:
 - cell constants;
 - atomic coordinates;
 - literature reference.

 Standard CIF dictionaries are enough for such description.

-- level 1:
 - same as in level 0, plus any parameters that permit qualified person
to judge if the structure has converged and how good it is. The
parameters should include residual force on atoms (2014-02-10 16:16,
Björkman Torbjörn), energy change(s) in the last cycle(s), and
references to basis set, pseudo-potentials, XC functionals, etc. (as
described in our dictionaries and in
http://www.xml-cml.org/dictionary/compchem/). Basis sets can be
referenced as in https://bse.pnl.gov/bse/portal;

-- level 2:
 - same as in level 1, plus:

 - copies of all input files (such as .inp files), or stable references
to them (possible in case of widely used basis sets or
pseudo-potentials); the text of these files could be stored, unparsed,
in appropriate CIF data items so that one can extract those files and
run the code automatically;

 - command line used to run the code;

 - optionally, output logs of the run;

 - the name, URL reference, and version of the program used for
computations;

 - for F/LOSS programs and systems, we can store the source code and the
packages -- these can be re-run on emulators when the code and the
systems become obsolete.

Rationale for the level 0: at the moment, all TCOD CIFs are of this
kind, so we are just expressing the state of the art. This assumes that
referees and editors were careful enough to review the paper, check the
convergence, that nobody has confused the coordinates, etc.

Rationale for the level 1: we need some criteria to select structures
for further processing, review, etc. We will start with the minimal set
of tags (data names), and add more when necessary.

Rationale for the level 2: in the nearest time, the computations can be
re-run from such representation, and modified computations can be done.
In the more distant future, the codes will become obsolete as Stefaan
correctly predicts in his 2013-08-19 18:30 e-mail, but we can still
read, analyse and parse the logs, and possibly extract more information
that was parsed during the deposition.

Copying and dumping all inputs and outputs captures all relevant
information, allows to re-parse it in the future if necessary and
accommodates any new codes and new developments of old codes.

People who are not willing to share their scripts can stick to level 1
as long as their publishers, funders and community do not insist on
level 2 :). TCOD as a database is neutral about this (although TCOD
maintainers or participants can have their strong opinions on the subject ;)

Large intermediate or derivable files (electron/spin density maps, wave
functions) will not be stored at the moment, and if they will, they will
be added as separate files.

What you think about such policy?

Regards,
Saulius

PS. In a while, I'll send you a version of our poster for IUCr for the
review, and some updates to the COD dictionaries.

-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366



More information about the Tcod mailing list