[TCOD] Fwd: some discussion topics related to TCOD

Stefaan Cottenier Stefaan.Cottenier at UGent.be
Fri Sep 6 16:02:18 UTC 2013


(as it seems there have been technical problems about TCOD, I'm not sure 
whether this 3-months-ago mail actually reached all or you or not. I 
just resend it.)



-------- Original Message --------
Subject: some discussion topics related to TCOD
Date: Fri, 24 May 2013 14:34:08 +0200
From: Stefaan Cottenier <Stefaan.Cottenier at UGent.be>
To: tcod at lists.crystallography.net


Dear colleagues,

Today, we held a group discussion at CMM (http://molmod.ugent.be) about
TCOD from the perspective of users and/or structure donators. That did
not lead to clear conclusions, but rather to a series of thoughts that
can be a starting point for further discussions or actions. I'll list
those here (I guess that is what this mailing list is meant for):

1) You are probably aware of other database initiatives for computed
crystal structures. Is there a vision on whether TCOD wants to 'compete'
with those, or whether TCOD tries to fill a niche that is not served by
other databases?

For instance, consider https://www.materialsproject.org/. This is a
database that aims doing a VASP geometry optimization for every
structure in ICSD, starting from the ICSD cif (quite ironically, you can
read their starting geometry (=ICSD info) for free, without having an
ICSD yourself...). They use their own dedicated supercomputer
infrastructure to run all this, and upload only results achieved by
themselves. No input by others, this in order to keep control over
quality and consistency. It has over 30.000 crystals by now, and apart
from the structure info also computed properties are being added.

I quote here a paragraph from a project proposal which we submitted
earlier this year, and which contains info/references about other databases:

"As is often the case for revolutions, this idea has been realized
simultaneously and more or less independently at several places,
emphasizing different aspects. The Materials Project [Jai11,MP] is an
initiative at MIT, where basic properties of all crystalline solids
documented in the (experimental) Inorganic Crystal Structure Database
(ICSD) [ICSD] have been computed by DFT. The computed formation energies
are used to construct secondary data bases of binary and ternary phase
diagrams [Ong08, Jai11b]. Similar initiatives, each with their own focus
and in different stages of growth are AFLOWLIB [Set10,Cur12] (Duke
University), OQMD [Wol12] (Northwestern University) and CompES [CES]
(NIMS, Japan). Other database initiatives that emphasize collaborative
data sharing (see also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC
Davis (US), the Computational Materials Repository (CMR) [Lan12] at DTU
(Denmark) and the AIDA environment [Koz13]."

[Jai11] A. Jain et al. Computational Materials Science 50 (2011) 2295
[Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115.
[Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials 20
(2008) 1798
[Set10] W. Setyawan, S. Curtarolo, Computational Materials Science 49
(2010) 299
[Cur12] S. Curtarolo et al., Computational Materials Science 58 (2012) 227
[Wol12] Open Quantum Mechanical Database,
http://wolverton.northwestern.edu/oqmd (no public access yet)
[CES] http://caldb.nims.go.jp/index_en.html (Y. Chen, A. Nogami, H.
Ohtani, N. Tatara)
[Yua10] G. Yuan, F. Gygi, Computational Science & Discovery 3 (2010) 015004
[Yua12] G. Yuan, F. Gygi, Computer Physics Communications 183 (2012) 1744
[Lan12] D. Landis et al., Computing in Science and Engineering, nov/dec
2012, p. 51
(http://dx.doi.org/10.1109/MCSE.2012.16)
[Koz13] B. Kozinsky, N. Marzari, N. Bonini, J. Garg, G. Pizzi, A.
Cepellotti, M. Fornari, contributions at the MRS
Fall meeting (Boston, Nov. 2012) and the DPG Spring meeting (Regensburg,
March 2013).

(2) From our perspective, we don't understand the need to separate
experimental and computational databases. As long as every structure is
properly tagged as 'experimental' or 'computed', we see no reason to
separate them. It will only lead to the burden of having to run queries
twice. In any case, a unified search web page that searches in both data
bases simultaneously seems useful to us (that is, de facto, treating
(T)COD as one database).

(3) Quality control. If a deposited computed structure has been
published, the reference to the publication serves as quality control.
That's probably similar for deposited experimental structures, isn't it?
If a computed structure has not (yet) been published, how to assess its
quality? Well, let us turn the question around: if a not yet published
experimental structure is deposited, how do you judge its quality...? It
makes sense to do it in the same way: as long as basic information is
provided on how the calculation has been done, later users should make
their own judgement.

(4) A quality control measure that could make sense for both
experimental and computed structures, is to allow people to add remarks
that appear on the web page of that structure ('considering the small
basis set that has been used, I do not trust this result', or 'this
result has been contradicted by <ref>'). Such information by the
community can help people who are not experts in either computing or
measuring crystal structures to decide to which extent they can trust a
particular result. (Another variant is a feature on the web page of each
structure to flag it if you have doubts, such that experts can look at it).

(5) COD has cifs as entries only. If you want to go as far as including
input info to reproduce each calculation, it might be hard to stick
cifs-only for TCOD. It will then be necessary to link additional files
to the cif of each entry.

(6) We are split over whether it makes sense to require that each
deposited calculation is exactly reproducible. Some basic information
should be given, of course. This will be mainly information about the
method that does not depend on the specific implementation (code).
Furthermore, the main technical input parameters that are specific for
the code used, should be given as well (if you want to do that in a
structured way, you will immediately hit the problem that the required
set of numbers will be a different kind of set for every other code...).
Third in line should then be a free field for 'any other special
settings' that could apply. With this kind of info, people can reproduce
to a large extent (95%) the same calculation. If you want to have it
reproduced exactly, then much more input (files) will be required. Which
might not be worth the effort.

(7) One of our enthusiastic PhD students with many years of experience
in web design and databases immediately had a mental map about how to
implement an automated version of (6) -- if the TCOD-team is willing to
take him aboard to code that, he'll probably agree ;-).

(8) In any case, it should be clear for the one who provides the
structure, what exactly is requested. A web form where the information
under (6) can be filled out (or the automated tool of (7) that extracts
all this info from the output of each mainstream code) will be required
in order to have consistent data.

(9) Which kind of computed structures will TCOD accept? Ground state
structures, obviously? Metastable structures probably as well? Also if
they are not dynamically stable (soft phonon mode)? (that is not
routinely examined) And what about transition state structures? (never
observable in experiment, yet very useful to know -- and hard to find --
in order to understand reactions) If the latter are allowed, then they
should be tagged as such.

Best regards,
Stefaan

-- 
Stefaan Cottenier
Center for Molecular Modeling (CMM) &
Department of Materials Science and Engineering (DMSE)
Ghent University
Technologiepark 903
BE-9052 Zwijnaarde
Belgium

http://molmod.Ugent.be
http://www.ugent.be/ea/dmse/en
email: Stefaan . Cottenier /at/ UGent . be







More information about the Tcod mailing list