(as it seems there have been technical problems about TCOD, I'm not sure
whether this 3-months-ago mail actually reached all or you or not. I
just resend it.)
-------- Original Message --------
Subject: some discussion topics related to TCOD
Date: Fri, 24 May 2013 14:34:08 +0200
From: Stefaan Cottenier <Stefaan.Cottenier(a)UGent.be>
To: tcod(a)lists.crystallography.net
Dear colleagues,
Today, we held a group discussion at CMM (http://molmod.ugent.be) about
TCOD from the perspective of users and/or structure donators. That did
not lead to clear conclusions, but rather to a series of thoughts that
can be a starting point for further discussions or actions. I'll list
those here (I guess that is what this mailing list is meant for):
1) You are probably aware of other database initiatives for computed
crystal structures. Is there a vision on whether TCOD wants to 'compete'
with those, or whether TCOD tries to fill a niche that is not served by
other databases?
For instance, consider https://www.materialsproject.org/. This is a
database that aims doing a VASP geometry optimization for every
structure in ICSD, starting from the ICSD cif (quite ironically, you can
read their starting geometry (=ICSD info) for free, without having an
ICSD yourself...). They use their own dedicated supercomputer
infrastructure to run all this, and upload only results achieved by
themselves. No input by others, this in order to keep control over
quality and consistency. It has over 30.000 crystals by now, and apart
from the structure info also computed properties are being added.
I quote here a paragraph from a project proposal which we submitted
earlier this year, and which contains info/references about other databases:
"As is often the case for revolutions, this idea has been realized
simultaneously and more or less independently at several places,
emphasizing different aspects. The Materials Project [Jai11,MP] is an
initiative at MIT, where basic properties of all crystalline solids
documented in the (experimental) Inorganic Crystal Structure Database
(ICSD) [ICSD] have been computed by DFT. The computed formation energies
are used to construct secondary data bases of binary and ternary phase
diagrams [Ong08, Jai11b]. Similar initiatives, each with their own focus
and in different stages of growth are AFLOWLIB [Set10,Cur12] (Duke
University), OQMD [Wol12] (Northwestern University) and CompES [CES]
(NIMS, Japan). Other database initiatives that emphasize collaborative
data sharing (see also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC
Davis (US), the Computational Materials Repository (CMR) [Lan12] at DTU
(Denmark) and the AIDA environment [Koz13]."
[Jai11] A. Jain et al. Computational Materials Science 50 (2011) 2295
[Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115.
[Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials 20
(2008) 1798
[Set10] W. Setyawan, S. Curtarolo, Computational Materials Science 49
(2010) 299
[Cur12] S. Curtarolo et al., Computational Materials Science 58 (2012) 227
[Wol12] Open Quantum Mechanical Database,
http://wolverton.northwestern.edu/oqmd (no public access yet)
[CES] http://caldb.nims.go.jp/index_en.html (Y. Chen, A. Nogami, H.
Ohtani, N. Tatara)
[Yua10] G. Yuan, F. Gygi, Computational Science & Discovery 3 (2010) 015004
[Yua12] G. Yuan, F. Gygi, Computer Physics Communications 183 (2012) 1744
[Lan12] D. Landis et al., Computing in Science and Engineering, nov/dec
2012, p. 51
(http://dx.doi.org/10.1109/MCSE.2012.16)
[Koz13] B. Kozinsky, N. Marzari, N. Bonini, J. Garg, G. Pizzi, A.
Cepellotti, M. Fornari, contributions at the MRS
Fall meeting (Boston, Nov. 2012) and the DPG Spring meeting (Regensburg,
March 2013).
(2) From our perspective, we don't understand the need to separate
experimental and computational databases. As long as every structure is
properly tagged as 'experimental' or 'computed', we see no reason to
separate them. It will only lead to the burden of having to run queries
twice. In any case, a unified search web page that searches in both data
bases simultaneously seems useful to us (that is, de facto, treating
(T)COD as one database).
(3) Quality control. If a deposited computed structure has been
published, the reference to the publication serves as quality control.
That's probably similar for deposited experimental structures, isn't it?
If a computed structure has not (yet) been published, how to assess its
quality? Well, let us turn the question around: if a not yet published
experimental structure is deposited, how do you judge its quality...? It
makes sense to do it in the same way: as long as basic information is
provided on how the calculation has been done, later users should make
their own judgement.
(4) A quality control measure that could make sense for both
experimental and computed structures, is to allow people to add remarks
that appear on the web page of that structure ('considering the small
basis set that has been used, I do not trust this result', or 'this
result has been contradicted by <ref>'). Such information by the
community can help people who are not experts in either computing or
measuring crystal structures to decide to which extent they can trust a
particular result. (Another variant is a feature on the web page of each
structure to flag it if you have doubts, such that experts can look at it).
(5) COD has cifs as entries only. If you want to go as far as including
input info to reproduce each calculation, it might be hard to stick
cifs-only for TCOD. It will then be necessary to link additional files
to the cif of each entry.
(6) We are split over whether it makes sense to require that each
deposited calculation is exactly reproducible. Some basic information
should be given, of course. This will be mainly information about the
method that does not depend on the specific implementation (code).
Furthermore, the main technical input parameters that are specific for
the code used, should be given as well (if you want to do that in a
structured way, you will immediately hit the problem that the required
set of numbers will be a different kind of set for every other code...).
Third in line should then be a free field for 'any other special
settings' that could apply. With this kind of info, people can reproduce
to a large extent (95%) the same calculation. If you want to have it
reproduced exactly, then much more input (files) will be required. Which
might not be worth the effort.
(7) One of our enthusiastic PhD students with many years of experience
in web design and databases immediately had a mental map about how to
implement an automated version of (6) -- if the TCOD-team is willing to
take him aboard to code that, he'll probably agree ;-).
(8) In any case, it should be clear for the one who provides the
structure, what exactly is requested. A web form where the information
under (6) can be filled out (or the automated tool of (7) that extracts
all this info from the output of each mainstream code) will be required
in order to have consistent data.
(9) Which kind of computed structures will TCOD accept? Ground state
structures, obviously? Metastable structures probably as well? Also if
they are not dynamically stable (soft phonon mode)? (that is not
routinely examined) And what about transition state structures? (never
observable in experiment, yet very useful to know -- and hard to find --
in order to understand reactions) If the latter are allowed, then they
should be tagged as such.
Best regards,
Stefaan
--
Stefaan Cottenier
Center for Molecular Modeling (CMM) &
Department of Materials Science and Engineering (DMSE)
Ghent University
Technologiepark 903
BE-9052 Zwijnaarde
Belgium
http://molmod.Ugent.behttp://www.ugent.be/ea/dmse/en
email: Stefaan . Cottenier /at/ UGent . be
Dear colleagues,
as I am going to the 23rd Congress of IUCr this year, I though that this
would be a good opportunity to present TCOD to crystallographic and
chemical community there.
Unfortunately, the deadline of abstract submission is tomorrow,
2014-02-11, so I need a short feedback from you ASAP... I apologize for
such a short notice.
At this stage, TCOD poster/talk is not to present some spectacular
results, but rather to inform community and to make sure that everyone
is invited, so that nobody is excluded. Only in this case will the TCOD
have its value.
I attach a project of an abstract and an author list. If you do not
mind, I present the abstract as a presenting author and thus put my name
first; otherwise the list is alphabetical. If the NWchem people, or
anyone else, would wish to participate and to provide their input about
computational data representation and ontologies, I'd be glad to include
them as co-authors. At the moment, I have included people on the tcod
mailing list (except that I do not know the full name of , and Peter and
Nicola with whom we discussed TCOD in detail; I hope you will
participate :).
Since we are a new team, I'll do as follows:
a) those who e-mail me till tomorrow (2014-02-11) that they participate
in the presentation I leave on the author list;
b) those people who do not agree or *do not reply* by 2014-02-11 I will
leave out as not consenting with their authorship;
c) If no one from the theoretical community replies, I'll probably
refrain from submitting the abstract.
d) If you join the team, I'll submit the abstract and the author list
tomorrow, on 2014-02-11.
Me, Andrius and Antanas will do all the technical editing of the poster
or slides; any comments on the text from you are welcome (but please
keep in mind that the abstract is limited to 2000 chars). The most
important contribution from you wold be the ideas:
-- how to describe the computation data so that it is useful and
reusable (what parameters need to be specified for different methods)?
-- what quality criteria do we want to put on structures in TCOD (and in
general on published DFT and other structures)? In other words, what
computed structures will we be happy with?
After the abstract submission we will have some time to polish TCOD
policy details, data dictionaries (initial version can be found here:
http://www.crystallography.net/tcod/cif/dictionaries/) and software
pipeline (we will take care of this last bit in Vilnius, but
participants are welcome :).
Regards,
Saulius
--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
Dear Saulius and everyone else,
I'll be happy to join.
(I realize that people already have replied, sorry for not accounting for that, but I need to send this quickly now.)
One question back regarding your second question:
-- What are the existing quality criteria for COD? I know that I have to give Fabs, but that only applies if the data is not "old". However, data that is "old" (I forget the cutoff year) is still considered acceptable, so clearly it has been decided that the most important thing is not that a number is accurate but that an experienced person can see HOW accurate it is.
I think that we should calibrate our mindsets to this "intermediately liberal" attitude and would prefer a rather liberal attitude when it comes to the pure convergence parameters as long as the most important ones are given. That would be (in addition to which code was used...) a description of the basis set (e.g. something like "400 eV" for a plane wave pseudopotential code or perhaps "tier 1" if you use FHI-AIMS), k-point set and the force/cell convergence criterion. Then we could have a rough classification resulting in a little green, yellow or red label for well, intermediately and poorly converged calculations. A major problem is that the choice of functional may not be as easily tagged as good or bad. There are a number of pointers that all of us in the business know, and of course we can collect these into a sort of heuristic. The modern thing would of course be to crosslink structures in TCOD with corresponding structures in COD and let some kind of neural network (or similar) generate a new heuristic free from our biases, although of course biased by only accounting for the crystal structure.
Regards,
Torbjörn
P.S.
OK, so maybe someone has to go first and get complained at by everoyne else for his sloppy standards... Here a suggestion for marking up the two most important convergence parameters:
Max residual force on atoms < 0.01 eV/Å = good, 0.1-0.01 = intermediate, > 0.1 = low
K-point resolution < 0.2 Å^{-1} = good, 0.75-0.2 = intermediate, > 0.75 = low
There, I took the plunge, now to hide under my desk....
/T.B.
---
Torbjörn Björkman, PhD
COMP, Aalto University School of Science
Espoo, Finland
________________________________________
Från: tcod-bounces(a)lists.crystallography.net [tcod-bounces(a)lists.crystallography.net] för Saulius Gražulis [grazulis(a)ibt.lt]
Skickat: den 10 februari 2014 15:14
Till: tcod(a)lists.crystallography.net; blueobelisk-discuss(a)lists.sourceforge.net; Nicola Marzari; Peter Murray-Rust; Antanas Vaitkus; Andrius; Chateigner Daniel
Ämne: [TCOD] Presenting TCOD at the IUCr 23rd Congress?
Dear colleagues,
as I am going to the 23rd Congress of IUCr this year, I though that this
would be a good opportunity to present TCOD to crystallographic and
chemical community there.
Unfortunately, the deadline of abstract submission is tomorrow,
2014-02-11, so I need a short feedback from you ASAP... I apologize for
such a short notice.
At this stage, TCOD poster/talk is not to present some spectacular
results, but rather to inform community and to make sure that everyone
is invited, so that nobody is excluded. Only in this case will the TCOD
have its value.
I attach a project of an abstract and an author list. If you do not
mind, I present the abstract as a presenting author and thus put my name
first; otherwise the list is alphabetical. If the NWchem people, or
anyone else, would wish to participate and to provide their input about
computational data representation and ontologies, I'd be glad to include
them as co-authors. At the moment, I have included people on the tcod
mailing list (except that I do not know the full name of , and Peter and
Nicola with whom we discussed TCOD in detail; I hope you will
participate :).
Since we are a new team, I'll do as follows:
a) those who e-mail me till tomorrow (2014-02-11) that they participate
in the presentation I leave on the author list;
b) those people who do not agree or *do not reply* by 2014-02-11 I will
leave out as not consenting with their authorship;
c) If no one from the theoretical community replies, I'll probably
refrain from submitting the abstract.
d) If you join the team, I'll submit the abstract and the author list
tomorrow, on 2014-02-11.
Me, Andrius and Antanas will do all the technical editing of the poster
or slides; any comments on the text from you are welcome (but please
keep in mind that the abstract is limited to 2000 chars). The most
important contribution from you wold be the ideas:
-- how to describe the computation data so that it is useful and
reusable (what parameters need to be specified for different methods)?
-- what quality criteria do we want to put on structures in TCOD (and in
general on published DFT and other structures)? In other words, what
computed structures will we be happy with?
After the abstract submission we will have some time to polish TCOD
policy details, data dictionaries (initial version can be found here:
http://www.crystallography.net/tcod/cif/dictionaries/) and software
pipeline (we will take care of this last bit in Vilnius, but
participants are welcome :).
Regards,
Saulius
--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
Dear colleagues,
after some unfortunate period of silence, I have finally put together
two CIF dictionaries for the TCOD, using the input provided by Linas and
Torbjorn. They can now be parsed by CIF parsers. You can find the
current versions here:
http://www.crystallography.net/tcod/cif/dictionaries/
files:
http://www.crystallography.net/tcod/cif/dictionaries/cif_dft.dichttp://www.crystallography.net/tcod/cif/dictionaries/cif_tcod.dic
The cif_tcod.dic is intended as a general house-keeping dictionary and
method identification meta-data, and cif_dft.dic is the first
method-specific dictionary (and there might be more if needed, for
different methods).
All TCOD entries validate against these dictionaries, but for a trivial
reason: the data items described there are not yet used in real data
files. The dictionaries however open us possibility to start thinking
about these data items, putting together their descriptions and setting
up requirements that structures which we want to see in TCOD should
fulfill :)
Regards,
Saulius
PS. I am thinking that it would be good to announce TCOD in the upcoming
IUCr meeting. I will send you an abstract project in a short while.
S.G.
--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366