[TCOD] Questions still left regarding TCOD dictionaries

Wed Nov 5 11:33:52 UTC 2014

Hello Saulius,

> I have a (rather long) list of questions regarding the TCOD
> dictionaries; I've tried to compile it here. I'd be grateful if you
> comment on them as much as you have time.

Let me comment on those questions that I can do on the spot, without 
looking into the dictionary yet :

> a) the units of energy that we started to use in the dictionaries are eV
> (electron-Volts). For distances, we should probably use Angstroems,
> since then we can easier compare computation results with
> crystallographic experimental data. This naturally suggests units for
> forces as eV/A. Is that OK, or should we better use SI units of similar
> scale (say aJ -- atto-Joules, on the similar scale)? The only problem I
> see with eV is that it is measured, not defined from the basic SI units
> (http://en.wikipedia.org/wiki/Electronvolt,
> http://physics.nist.gov/cuu/Units/outside.html). We should probably stay
> of from Hartries, Rydbergs, Bohrs for archiving purposes, shouldn't we?
>
> NB.: for archival computer files, it is more convenient to have all data
> in always in the same units, in all files, and not to allow different
> units -- at least with current standard software libraries.

Although SI units would be in principle the good choice, nobody uses 
them in this context. At least eV and Angstrom have a special status 
('tolerated units' within SI), hence allowing eV, Angstrom and therefore 
eV/A for forces is a fair compromise.

> b) Is nuclear electric dipole moment used/necessary for DFT computations
> (_dft_atom_type_nuclear_dipole)?
>
> c) If b) is "yes", what units we should use for electric dipole (and
> higher moments) -- Debayes, e*A (amount of unit charges times
> Angstroems), or something else?

There must be some kind of confusion here -- my old nuclear physics 
courses always emphasized that nuclei do not have an electric dipole 
moment. Either you mean magnetic dipole moment or nuclear quadrupole 
moment? Anyway, nuclear properties are never required for DFT 
calculations as such, but they can be used to convert DFT-predictions 
into quantities that are experimentally accessible. I don't see the need 
to keep track of this, however, in a computational database.

> d) Is my definition of residual forces in cif_tcod.dic,
> "data_tcod_atom_site_residual_force" correct/acceptable? If not, how
> should we define them?

<skip>

> e) If I understand correctly, DFT and most other QM methods operate
> under Born-Oppenheimer approximation; under this approximation, electron
> densities (electron wave-functions) are optimised to minimal energy at
> fixed nuclei and unit cell parameters, and when this converges, the
> nuclei parameters and/or cell constants are changed slightly (e.g. along
> gradients), and the electron energy is minimised again. Is this a
> correct view? Is it a universal situation across QM codes?

Yes, correct. It is pretty universal. There some special-purpose 
applications that do not make the born-oppenheimer approximation, but 
that's really a minority.

> f) If e) is "yes", then we can talk about "microcycles" (electron w/f
> refinement) and "macrocycles" (nuclei/cell shifted in each
> "macrocycle"). We could also document total energy changes in all these
> cycles, to monitor the convergence, as I suggest in
> _tcod_computation_cycle_... data items. Is such view acceptable? What is
> the terminology used in different codes? Will such table be useful? Will
> it be easy to obtain from most codes?

I advice to stay away from that. Your view is correct, but this 
information is often not mentioned even in papers. Documenting such 
changes would be arbitrary, to some extent. The final relaxed geometry 
is well-defined, but what do you take as the unrelaxed starting point...?

> g) I have attempted to put all DFT related data items into the
> cif_dft.dic dictionary, and the general computational items (suitable
> also for MM and other methods), and into the cif_tcod.dic. Are all the
> parameters in cif_dft.dic indeed DFT specific? Are they named and
> commented properly?

<skip>

> h) The CML CompChem dictionary mentions SCF as a method. I know HF and
> its modifications are SCF; is DFT technically also SCF? Are there more
> SCF methods that are not HF? Should we include "SCF" into the
> enumeration values of the _tcod_model as a separate model?

'SCF' refers only to the fact that a particular iterative solving scheme 
is used. As such, I would consider that term as being less informative 
than HF or DFT (one could even imagine to do DFT without SCF, although 
in practice this very rarely is done).

> i) "Model" is a very overloaded term. Maybe it would be better to rename
> _tcod_model to "_tcod_method", or "_tcod_theory_level"?

<skip>

> j) I have taken the _dft_basisset_type list from
> http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least
> potentially for a possibility of the CIF->CML->CIF roundtrip. The two
> big classes of basis functions, as I have learned, are localised
> (Slater, Gaussian) and plane wave. Should we introduce such
> classification on top of the _dft_basisset_type enumerator?

I don't think so. It will be implicit in the name people use for the 
basis set.

> Are
> localised bases relevant for DFT at all?

Yes, sure (SIESTA, for instance).

> Is it enough for DFT to specify
> just an energy cut-off, assuming plane wave bases (for a given
> pseudopotential), or are there different possible bases also among plane
> waves (I guess there should not be, but maybe I'm missing something...)?

For plane waves the energy cut-off is the only quantity. But there are 
many other types of basis sets that are no plane waves. For these, more 
specification might be needed (although often they are contained in the 
published definition of the basis set).

> Or are localised bases sets relevant for computing pseudopotentials
> (cores)? Can I assume that dftxfit, dftorb and dftcfit are plane wave
> bases? What about 'periodic'? (these terms are all from the Basis Set
> Exchange database, I guess).

<skip>

> k) What is the difference between _dft_atom_basisset and _dft_basisset?
> Can they be simultaneously used in one computation? If not, maybe we can
> merge the two definition sets into one?

<skip>

> l) There are a lot of *_conv data items declared (e.g.
> _dft_basisset_energy_conv). Are they for convergence tests? Or for
> convolutions? What is their proposed definition?

These are the convergence criteria, for instance: stop the iterative 
(SCF) cycle once the total energy does change by less than 
_dft_basisset_energy_conv during the last few iterations.

> m) Is _dft_cell_energy the same as the "total energy" reported by some
> codes? Can we rename it to _dft_total_energy?

Probably yes.

> n) Am I correct to assume that the "total energy" reported by codes will
> always be the sum of separate energy terms (Coulomb, exchange,
> 1-electron, 2-electron, etc.)? Is there an interest to have them
> recorded in the result data files (CIFs) separately? If yes, what is the
> "Hartree energy" (is it a sum of all single electron energies in the SCF
> for each of them?), "Ewald energy" (is it the electrostatic lattice
> energy, obtained by Ewald summation?) and the rest in the values from
> the AbInit output file? Are these terms consistent across QM codes?

Also here I think this is asking for way too much detail. Most codes can 
indeed split up the total energy in many contributions, but papers 
usually do not report that (only in the special cases when there is 
useful information in the splitting). If papers don't do it, databases 
shouldn't either -- that feels as a sound criterium.

> o) How does one check that computation has converged on k-points,
> E-cuttof, smear and other parameters, and that pseudopotential is
> selected right? From the Abinit tutorial
> (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got
> impression that one needs to run computation with different values of
> these parameters, and see that the total energy, or other gauge values,
> no longer change significantly when these parameters are increased. Is
> that right? If yes, are there codes that do this automatically? Should
> we require Etotal (or coordinates') dependence on k-grid, E-cuttof,
> smear, to check convergence when depositing to TCOD? Or should TCOD side
> check this automatically when appropriate (say for F/LOSS codes)?

"k-points, E-cuttof, smear and other parameters" are indeed tested as 
you describe. The pseudpotential can't be tested that way, what people 
usually do is to verify whether the numerically converged results when 
using a particular pseudo do agree with experiment.

Doing such tests is the responsibility of each user. In principle, 
journals should not publish ab initio results if such tests are missing. 
Journals are not that strict, unfortunately. And some researchers are 
not very careful in that respect.

It's a longstanding problem, that is gradually being solved because 
computers are so fast now that the default settings of most codes are 
sufficiently accurate for many cases, even if a researcher does not 
explicitly tests it.

Also here, TCOD shouldn't try to do better than the journals do.

> p) what are other obvious things that one could make wrong in QM/DFT
> computations, that could be checked formally?

That's an interesting one... With no answer from my side. If there is 
anything that can go obviously wrong, the codes will have an internal 
test for it already.

Stefaan