Re: [TCOD] TCOD dictionaries and data presentation -- 3 levels of description

29 Jul 2014


      Hello everybody,
You did a great job, Saulius, digesting this diverse bunch of opinions
into a sound compromise ! Some comments (I collect several of the
previous mails hereafter):
...
I am toying with the idea that TCOD should have 3 levels of structure
description:
Yes, that makes sense. Your rationale for the three levels is really
good (one comment about level 1, though - see hereafter).
...
-- level 0: - cell constants; - atomic coordinates; - literature
reference.
Standard CIF dictionaries are enough for such description.
What I feel is missing in this minimalistic set, is the XC-functional.
Without that information, the relevance of predicted unit cell
information is limited. (It can be retrieved from the literature
reference, sure. And if people complete level 1, then the info is
available anyway. But as this is really necessary information, it would
be good to force users to include it by asking it at the mandatory level 0.)
Immediately then the question arises how to identify unambiguously the
XC-functional (a question that would arise equally well if this would be
kept in level 1). One possibility would be to use the identifier used in
LIBXC (http://www.tddft.org/programs/octopus/wiki/index.php/Libxc). This
is widely accepted, code-independent and open source.
Specifying the XC-functional limits us to DFT only. Perhaps the keyword
should rather be 'level-of-theory'. If a LIBXC-identifier is given, then
this implies it is DFT with the quoted functional. If the value is not a
LIBXC-identifier, it refers to a non-DFT method (Hartree-Fock, GW,
QMC,...) It might be tricky to create a relevant list of non-DFT
methods, and right now it is perhaps of limited use, but the number of
predictions by these methods is likely going to surge in coming years.
[added later: OK, the _tcod_model variable more or less does this, as I
see now. Hence, my comment boils down to replacing the value 'DFT' by
the LIBXC identifier for the relevant functional]
...
-- level 1: - same as in level 0, plus any parameters that permit
qualified person to judge if the structure has converged and how
good it is. The parameters should include residual force on atoms
(2014-02-10 16:16, Björkman Torbjörn), energy change(s) in the last
cycle(s), and references to basis set, pseudo-potentials, XC
functionals, etc. (as described in our dictionaries and in
http://www.xml-cml.org/dictionary/compchem/). Basis sets can be
referenced as in https://bse.pnl.gov/bse/portal;
I might be ruining the beautiful simplicity of the scheme now, but it
seems to me that the present level 1 covers two different things. What
about splitting this into two levels:
* One that lists the main technical settings (basis set, pseudo's,
k-mesh, XC [although the latter might at level 0]), such that a
qualified person can assess the level of credibility of the calculation
* One that lists information to assess the level of convergence.
The former is information at the input stage, the latter is at the
output stage. Two different aspects. The advantage of splitting the
level is that probably less people would take the effort to collect the
convergence info, and might then discard that level entirely.
...
One more related question:
If I understand correctly, from what I recall from the QM lectures,
we can have in principle two kinds of boundary conditions for
localised particle wave functions:
a) vanishing at the infinity (modelling single molecule in vacuum),
and b) periodic (modelling an ideal crystal).
From what I have read in manuals of the QM codes, most implement b),
and a) is approximated by putting a molecule in a large enough "unit
cell" so that interactions between molecule images are negligible.
Is this a correct view? Does any code implement (a) as a separate
mode of computation?
In either case, we should probably have a special tag that
distinguishes "true" crystal structures from the "convenience" unit
cells that are non-physical but are set up solely to solve a
molecule structure problem with the same code that also deals with
crystals. Any ideas how to tell from the computations which mode was
used?
Yes, this looks to be correct. There some codes that have a 3D
(crystals), 2D (surfaces) and 1D (molecules) implementation, without
mimicking the missing dimension by finite vacuum (FLEUR does this, for
instance).
As, however, the goal of TCOD would be to document truly infinite
crystals, there will be no calculations for molecules in a big almost
empty unit cell in the data base anyway. What could occur are
non-periodic cluster models to mimic infinite crystals -- rather the
opposite situation. Having a keyword that differentiates between
'periodic' and 'non-periodic' calculations would be sufficient to filter
these.
...
I still hold to the statement that it's is almost impossible to
reproduce the exact results using two different codes even though the
settings (xc functional, basis, pseudopotential, k-point sampling
etc.) are the same. The numerical implementations could be
significantly different and there are always a bunch of hard-coded
parameters which are different for different implementations.
Interesting statement, as we are currently working on a quantitative
proof of what you say (Torbjörn is involved in this as well). You can
watch a 15-minute talk on this topic at
https://www.youtube.com/pasSSaMMnnE , and inspect a snapshot of current
results at https://molmod.ugent.be/deltacodesdft ). [so far the
advertisement ;-) ]
...
I don't think it is so important to have the energy convergence
reported. It's a crystal structural database, so the it is probably
enough to report 8 things from the last step:
Max Force on Atoms RMS Force on Atoms Max Displacement on Atoms RMS
Displacement on Atoms
If the cell optimization is performed:
Max Force on Cell RMS Force on Cell Max Displacement on Cell RMS
Displacement on Cell
This illustrates why I hesitate to keep this information within level 1:
not all codes can provide this. Forces are probably available
everywhere, but cell optimizations by 'forces' require the stress tensor
formalism. That's peanuts for plane wave codes, but has not been
developed for all more involved basis sets. There is not a single LAPW
code, for instance, that can optimize a unit cell in that way (they have
to resort to energy minimization, which is fair as long as the symmetry
of the cell is sufficiently high).
...
As I said, this could be extremely tricky. Don't think it makes much
 sense to store the output logs, since they can get huge. What we
could do, is to define a couple (3-5) of "Reference codes" which
could be supported for checking/benchmarking purposes. VASP,
QuantumEspresso, GPAW? Which inputs we could store and structures
submitted from other codes should stick to those benchmarks. As far
as I know some of the databases use in their philosophy (e.g. they
would only accept VASP results calculated using some minimum
criteria). In this way one can be consistent.
I agree that level 2 is the tricky one. One has to be careful here to
create a pragmatic solution without investing too much time. What I see
as a fair target is that those (few?) people who are willing to provide
information for a full recreation of their results, should be able to do
that. Full stop. The amount of work it takes from their side is not our
worry (for some codes this will be far more easy to do than for others).
Perhaps the most pragmatic thing to do is to provide the possibility to
insert a verbatim section with required input files, code version and
run commands in order to reproduce the results, without any checking
whether or not this information is actually complete.
Working with reference codes, hmm,... Tricky as well. This implicitly
gives the message that tcod favours/promotes some codes more than
others. Moreover, minimal requirements may evolve over time. What about
this alternative: provide a separate algorithm that can, for a given
code, assess whether or not a given entry is sufficiently reliable or
not. If there is crucial information missing to assess this, then the
algoritm answers 'unable to decide'. The advantage is that such
algorithms are not baked into the tcod entries, but are stand-alone.
They can evolve over time (tresholds can be increased, but also more
refined criteria to assess quality can be included), different people
can each contribute to a different algorith (an expert about one code
writes the algorithm for that specific code), it has no quality
suggestion (if someone is unhappy that the algorithm for his/her pet
code is missing they can provide one themselves), etc.
...
...
In either case, we should probably have a special tag that
distinguishes "true" crystal structures from the "convenience" unit
cells that are non-physical but are set up solely to solve a
molecule structure problem with the same code that also deals with
crystals. Any ideas how to tell from the computations which mode
was used?
We should be only interested in periodic structures. Or I would be
even more strict - 3D periodic (crystals). We should not worry about
 the molecules or clusters... or leave it to others :-) We just
should not have any limitation on the supercell size of the
structure, but in general it should be strictly 3D periodic for this
database. That would make life much easier and give the database
some shape on what we are trying to systematize.
I agree with Linas. The 'C' in TCOD excludes the need to deal with
molecules in vacuo. That's an entire different world (with even much
more degrees of freedom than in our 3D crystal world), and there are
other databases that deal with these objects.
...
As a person affiliated with molecular biology and drug design, I
would be of course also interested in comparing crystal structures
with gas-phase (or I would rather say in vacuo) QM optimised
structures; but these, as Peter has told me, are much more diverse
and thus more difficult to manage in a coherent database. So we can
postpone their addition; in any case, we'll have to flag them
carefully and probably keep separately (different ID range or
namespace?).
Saulius, you seem to be interested in formation energies of molecular
crystals? (energy difference between the actual crystal and the
individual molecules) Although from an experimental point of view this
information is considered to be a property of a crystal (and therefore
could have its place in a crystal database), it actually is a difference
between a solid state and a molecular property. As most computational
methods are tailored towards either molecules or solids (and have severe
limitations on the other class), such formation energies are bound to
have problems. I agree this would be useful information, but it is hard
to produce really accurate numbers and I doubt whether many people will
have data that are worth adding to a database.
Best,
Stefaan

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [TCOD] TCOD dictionaries and data presentation -- 3 levels of description