Hello Saulius,
I have a (rather long) list of questions regarding the TCOD dictionaries; I've tried to compile it here. I'd be grateful if you comment on them as much as you have time.
Let me comment on those questions that I can do on the spot, without looking into the dictionary yet :
a) the units of energy that we started to use in the dictionaries are eV (electron-Volts). For distances, we should probably use Angstroems, since then we can easier compare computation results with crystallographic experimental data. This naturally suggests units for forces as eV/A. Is that OK, or should we better use SI units of similar scale (say aJ -- atto-Joules, on the similar scale)? The only problem I see with eV is that it is measured, not defined from the basic SI units (http://en.wikipedia.org/wiki/Electronvolt, http://physics.nist.gov/cuu/Units/outside.html). We should probably stay of from Hartries, Rydbergs, Bohrs for archiving purposes, shouldn't we?
NB.: for archival computer files, it is more convenient to have all data in always in the same units, in all files, and not to allow different units -- at least with current standard software libraries.
Although SI units would be in principle the good choice, nobody uses them in this context. At least eV and Angstrom have a special status ('tolerated units' within SI), hence allowing eV, Angstrom and therefore eV/A for forces is a fair compromise.
b) Is nuclear electric dipole moment used/necessary for DFT computations (_dft_atom_type_nuclear_dipole)?
c) If b) is "yes", what units we should use for electric dipole (and higher moments) -- Debayes, e*A (amount of unit charges times Angstroems), or something else?
There must be some kind of confusion here -- my old nuclear physics courses always emphasized that nuclei do not have an electric dipole moment. Either you mean magnetic dipole moment or nuclear quadrupole moment? Anyway, nuclear properties are never required for DFT calculations as such, but they can be used to convert DFT-predictions into quantities that are experimentally accessible. I don't see the need to keep track of this, however, in a computational database.
d) Is my definition of residual forces in cif_tcod.dic, "data_tcod_atom_site_residual_force" correct/acceptable? If not, how should we define them?
<skip>
e) If I understand correctly, DFT and most other QM methods operate under Born-Oppenheimer approximation; under this approximation, electron densities (electron wave-functions) are optimised to minimal energy at fixed nuclei and unit cell parameters, and when this converges, the nuclei parameters and/or cell constants are changed slightly (e.g. along gradients), and the electron energy is minimised again. Is this a correct view? Is it a universal situation across QM codes?
Yes, correct. It is pretty universal. There some special-purpose applications that do not make the born-oppenheimer approximation, but that's really a minority.
f) If e) is "yes", then we can talk about "microcycles" (electron w/f refinement) and "macrocycles" (nuclei/cell shifted in each "macrocycle"). We could also document total energy changes in all these cycles, to monitor the convergence, as I suggest in _tcod_computation_cycle_... data items. Is such view acceptable? What is the terminology used in different codes? Will such table be useful? Will it be easy to obtain from most codes?
I advice to stay away from that. Your view is correct, but this information is often not mentioned even in papers. Documenting such changes would be arbitrary, to some extent. The final relaxed geometry is well-defined, but what do you take as the unrelaxed starting point...?
g) I have attempted to put all DFT related data items into the cif_dft.dic dictionary, and the general computational items (suitable also for MM and other methods), and into the cif_tcod.dic. Are all the parameters in cif_dft.dic indeed DFT specific? Are they named and commented properly?
<skip>
h) The CML CompChem dictionary mentions SCF as a method. I know HF and its modifications are SCF; is DFT technically also SCF? Are there more SCF methods that are not HF? Should we include "SCF" into the enumeration values of the _tcod_model as a separate model?
'SCF' refers only to the fact that a particular iterative solving scheme is used. As such, I would consider that term as being less informative than HF or DFT (one could even imagine to do DFT without SCF, although in practice this very rarely is done).
i) "Model" is a very overloaded term. Maybe it would be better to rename _tcod_model to "_tcod_method", or "_tcod_theory_level"?
<skip>
j) I have taken the _dft_basisset_type list from http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least potentially for a possibility of the CIF->CML->CIF roundtrip. The two big classes of basis functions, as I have learned, are localised (Slater, Gaussian) and plane wave. Should we introduce such classification on top of the _dft_basisset_type enumerator?
I don't think so. It will be implicit in the name people use for the basis set.
Are localised bases relevant for DFT at all?
Yes, sure (SIESTA, for instance).
Is it enough for DFT to specify just an energy cut-off, assuming plane wave bases (for a given pseudopotential), or are there different possible bases also among plane waves (I guess there should not be, but maybe I'm missing something...)?
For plane waves the energy cut-off is the only quantity. But there are many other types of basis sets that are no plane waves. For these, more specification might be needed (although often they are contained in the published definition of the basis set).
Or are localised bases sets relevant for computing pseudopotentials (cores)? Can I assume that dftxfit, dftorb and dftcfit are plane wave bases? What about 'periodic'? (these terms are all from the Basis Set Exchange database, I guess).
<skip>
k) What is the difference between _dft_atom_basisset and _dft_basisset? Can they be simultaneously used in one computation? If not, maybe we can merge the two definition sets into one?
<skip>
l) There are a lot of *_conv data items declared (e.g. _dft_basisset_energy_conv). Are they for convergence tests? Or for convolutions? What is their proposed definition?
These are the convergence criteria, for instance: stop the iterative (SCF) cycle once the total energy does change by less than _dft_basisset_energy_conv during the last few iterations.
m) Is _dft_cell_energy the same as the "total energy" reported by some codes? Can we rename it to _dft_total_energy?
Probably yes.
n) Am I correct to assume that the "total energy" reported by codes will always be the sum of separate energy terms (Coulomb, exchange, 1-electron, 2-electron, etc.)? Is there an interest to have them recorded in the result data files (CIFs) separately? If yes, what is the "Hartree energy" (is it a sum of all single electron energies in the SCF for each of them?), "Ewald energy" (is it the electrostatic lattice energy, obtained by Ewald summation?) and the rest in the values from the AbInit output file? Are these terms consistent across QM codes?
Also here I think this is asking for way too much detail. Most codes can indeed split up the total energy in many contributions, but papers usually do not report that (only in the special cases when there is useful information in the splitting). If papers don't do it, databases shouldn't either -- that feels as a sound criterium.
o) How does one check that computation has converged on k-points, E-cuttof, smear and other parameters, and that pseudopotential is selected right? From the Abinit tutorial (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got impression that one needs to run computation with different values of these parameters, and see that the total energy, or other gauge values, no longer change significantly when these parameters are increased. Is that right? If yes, are there codes that do this automatically? Should we require Etotal (or coordinates') dependence on k-grid, E-cuttof, smear, to check convergence when depositing to TCOD? Or should TCOD side check this automatically when appropriate (say for F/LOSS codes)?
"k-points, E-cuttof, smear and other parameters" are indeed tested as you describe. The pseudpotential can't be tested that way, what people usually do is to verify whether the numerically converged results when using a particular pseudo do agree with experiment.
Doing such tests is the responsibility of each user. In principle, journals should not publish ab initio results if such tests are missing. Journals are not that strict, unfortunately. And some researchers are not very careful in that respect.
It's a longstanding problem, that is gradually being solved because computers are so fast now that the default settings of most codes are sufficiently accurate for many cases, even if a researcher does not explicitly tests it.
Also here, TCOD shouldn't try to do better than the journals do.
p) what are other obvious things that one could make wrong in QM/DFT computations, that could be checked formally?
That's an interesting one... With no answer from my side. If there is anything that can go obviously wrong, the codes will have an internal test for it already.
Stefaan