[TCOD] Questions still left regarding TCOD dictionaries

Wed Nov 5 16:34:52 UTC 2014

Dear all,

Stefaan supplies good information for everything as usual, so just a couple of remarks here.

>>> f) If e) is "yes", then we can talk about "microcycles" (electron w/f
>>> refinement) and "macrocycles" (nuclei/cell shifted in each
>>> "macrocycle"). We could also document total energy changes in all these
>>> cycles, to monitor the convergence, as I suggest in
>>> _tcod_computation_cycle_... data items. Is such view acceptable? What is
>>> the terminology used in different codes? Will such table be useful? Will
>>> it be easy to obtain from most codes?
>>
>> I advice to stay away from that. Your view is correct, but this
>> information is often not mentioned even in papers. Documenting such
>> changes would be arbitrary, to some extent. The final relaxed geometry
>> is well-defined, but what do you take as the unrelaxed starting point...?
>
> The idea is that the unrelaxed structure starts somewhere at high
> energies, and then it converges to low energy that no longer changes
> significantly with refinement cycles. For me, this would be evidence
> that the process has converged. I guess when one does a calculation, one
> looks into such energy behavior traces, doesn't one?
> 
> If so, then it makes sense to have tools to record the traces in CIFs,
> as an evidence of convergence and convergence checks.
>
> I also want to point out that the presence of the data items (for energy
> tables) in the dictionary does not imply any obligation to use them. Any
> CIF will be correct and valid without them. Its just if we decide to
> include them, there will be a publicly announced way to do this.

Here I would say that what one normally pays attention to is just the residual forces/energy changes at convergence, documenting the starting point is not something that is normally done. We should also beware a little here since the starting point is very often data straight out of proprietory databases... it might well lead to a lot of trouble.

>>> Is it enough for DFT to specify
>>> just an energy cut-off, assuming plane wave bases (for a given
>>> pseudopotential), or are there different possible bases also among plane
>>> waves (I guess there should not be, but maybe I'm missing something...)?
>>
>> For plane waves the energy cut-off is the only quantity. But there are
>> many other types of basis sets that are no plane waves. For these, more
>> specification might be needed (although often they are contained in the
>> published definition of the basis set).
>
> I see. OK, I so understand that DFT can work both with PW and localized
> bases, and the exact basis should be specified in the input file (and
> might be documented in the CIF).

There are even "hybrid" schemes, like anything based on the muffin-tin geometry (LAPW, LMTO, ...) which come with a local and PW-like part and where the enumeration of basis functions may be by the PW's (like LAPW) or by the local part (like LMTO). There are wavelet-based codes which I have no idea how they converge. And so on. We have unfortunately a rather messier situation than the quantum chemists.

>>> n) Am I correct to assume that the "total energy" reported by codes will
>>> always be the sum of separate energy terms (Coulomb, exchange,
>>> 1-electron, 2-electron, etc.)? Is there an interest to have them
>>> recorded in the result data files (CIFs) separately? If yes, what is the
>>> "Hartree energy" (is it a sum of all single electron energies in the SCF
>>> for each of them?), "Ewald energy" (is it the electrostatic lattice
>>> energy, obtained by Ewald summation?) and the rest in the values from
>>> the AbInit output file? Are these terms consistent across QM codes?
>>
>> Also here I think this is asking for way too much detail. Most codes can
>> indeed split up the total energy in many contributions, but papers
>> usually do not report that (only in the special cases when there is
>> useful information in the splitting). If papers don't do it, databases
>> shouldn't either -- that feels as a sound criterium.
>
> Interesting idea. Well, that makes our life easier.

It is in fact worse than that, because the precise nature of this split depends on technical details such as how you solve the Poisson equation and how you treat core states. This varies between the codes, so there is not in general any possibility to make a sound comparison between different methods for these partial energies. So I don't think anything much is gained by documenting them.

...
> May I disagree to some extent. If we can do better checks than journals
> do, why shouldn't we? That would be a useful tool, a help for journal
> reviewers, and one possible way to improved the situation.
...
> In short, I think that it would be quite useful to have uniform *actual*
> convergence criteria in the CIF output, and check them before inserting
> computations into TCOD, like we check crystal structure symmetry, bond
> distances, parameter shifts or R-factors.

Here I tend to agree, but at the same time I would not want us to impose too draconic criteria for acceptance, as long as the obtained result is documented. Perhaps some two-level system with "hard" acceptance criteria and a "warning" level would be good?

Best,
Torbjörn

---
Torbjörn Björkman, PhD
COMP, Aalto University School of Science
Espoo, Finland

________________________________________
Från: tcod-bounces at lists.crystallography.net [tcod-bounces at lists.crystallography.net] för Saulius Gražulis [grazulis at ibt.lt]
Skickat: den 5 november 2014 15:51
Till: tcod at lists.crystallography.net
Ämne: Re: [TCOD] Questions still left regarding TCOD dictionaries

Hi, Stefaan,

many thanks for your quick and very informative answer. I'll adjust the
dictionaries according to it. Below are some of my comments.

On 2014-11-05 13:33, Stefaan Cottenier wrote:
> Let me comment on those questions that I can do on the spot, without
> looking into the dictionary yet :
>
>> a) the units of energy that we started to use in the dictionaries are eV
>> (electron-Volts). For distances, we should probably use Angstroems,
>> since then we can easier compare computation results with
>> crystallographic experimental data. This naturally suggests units for
>> forces as eV/A. Is that OK, or should we better use SI units of similar
>> scale (say aJ -- atto-Joules, on the similar scale)? The only problem I
>> see with eV is that it is measured, not defined from the basic SI units
>> (http://en.wikipedia.org/wiki/Electronvolt,
>> http://physics.nist.gov/cuu/Units/outside.html).

> Although SI units would be in principle the good choice, nobody uses
> them in this context. At least eV and Angstrom have a special status
> ('tolerated units' within SI), hence allowing eV, Angstrom and therefore
> eV/A for forces is a fair compromise.

Good. So we stay with eV and A, with eV/A for forces, as already
documented in our dictionaries.

>> b) Is nuclear electric dipole moment used/necessary for DFT computations
>> (_dft_atom_type_nuclear_dipole)?
>>
>> c) If b) is "yes", what units we should use for electric dipole (and
>> higher moments) -- Debayes, e*A (amount of unit charges times
>> Angstroems), or something else?
>
> There must be some kind of confusion here -- my old nuclear physics
> courses always emphasized that nuclei do not have an electric dipole
> moment. Either you mean magnetic dipole moment or nuclear quadrupole
> moment? Anyway, nuclear properties are never required for DFT
> calculations as such, but they can be used to convert DFT-predictions
> into quantities that are experimentally accessible. I don't see the need
> to keep track of this, however, in a computational database.

OK, this is a gap in my education -- I must have overlooked the zero
nuclear electric dipole during my university years...

Setting aside the theoretical question whether the quarks can move in
such a way as to give non-zero dipole, I remove the
_dft_atom_type_nuclear_dipole as lacking theoretical justification and
empirical evidence. For magnetic dipole, a
_dft_atom_type_magn_nuclear_moment could be introduced if needed (for
orbital and spin magnetic moments the data names are already there).

>> e) If I understand correctly, DFT and most other QM methods operate
>> under Born-Oppenheimer approximation; under this approximation, electron
>> densities (electron wave-functions) are optimised to minimal energy at
>> fixed nuclei and unit cell parameters, and when this converges, the
>> nuclei parameters and/or cell constants are changed slightly (e.g. along
>> gradients), and the electron energy is minimised again. Is this a
>> correct view? Is it a universal situation across QM codes?
>
> Yes, correct. It is pretty universal. There some special-purpose
> applications that do not make the born-oppenheimer approximation, but
> that's really a minority.

OK. Thanks for confirmation.

>> f) If e) is "yes", then we can talk about "microcycles" (electron w/f
>> refinement) and "macrocycles" (nuclei/cell shifted in each
>> "macrocycle"). We could also document total energy changes in all these
>> cycles, to monitor the convergence, as I suggest in
>> _tcod_computation_cycle_... data items. Is such view acceptable? What is
>> the terminology used in different codes? Will such table be useful? Will
>> it be easy to obtain from most codes?
>
> I advice to stay away from that. Your view is correct, but this
> information is often not mentioned even in papers. Documenting such
> changes would be arbitrary, to some extent. The final relaxed geometry
> is well-defined, but what do you take as the unrelaxed starting point...?

The idea is that the unrelaxed structure starts somewhere at high
energies, and then it converges to low energy that no longer changes
significantly with refinement cycles. For me, this would be evidence
that the process has converged. I guess when one does a calculation, one
looks into such energy behavior traces, doesn't one?

If so, then it makes sense to have tools to record the traces in CIFs,
as an evidence of convergence and convergence checks.

I also want to point out that the presence of the data items (for energy
tables) in the dictionary does not imply any obligation to use them. Any
CIF will be correct and valid without them. Its just if we decide to
include them, there will be a publicly announced way to do this.

>> h) The CML CompChem dictionary mentions SCF as a method. I know HF and
>> its modifications are SCF; is DFT technically also SCF? Are there more
>> SCF methods that are not HF? Should we include "SCF" into the
>> enumeration values of the _tcod_model as a separate model?
>
> 'SCF' refers only to the fact that a particular iterative solving scheme
> is used. As such, I would consider that term as being less informative
> than HF or DFT (one could even imagine to do DFT without SCF, although
> in practice this very rarely is done).

OK; so we skip SCF as a separate "model".

>> j) I have taken the _dft_basisset_type list from
>> http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least
>> potentially for a possibility of the CIF->CML->CIF roundtrip. The two
>> big classes of basis functions, as I have learned, are localised
>> (Slater, Gaussian) and plane wave. Should we introduce such
>> classification on top of the _dft_basisset_type enumerator?
>
> I don't think so. It will be implicit in the name people use for the
> basis set.

OK. The type of the basis set should be implied from the basis set date
(name, files, reference).

>> Are
>> localised bases relevant for DFT at all?
>
> Yes, sure (SIESTA, for instance).

Good to know. Thanks for the info!

>> Is it enough for DFT to specify
>> just an energy cut-off, assuming plane wave bases (for a given
>> pseudopotential), or are there different possible bases also among plane
>> waves (I guess there should not be, but maybe I'm missing something...)?
>
> For plane waves the energy cut-off is the only quantity. But there are
> many other types of basis sets that are no plane waves. For these, more
> specification might be needed (although often they are contained in the
> published definition of the basis set).

I see. OK, I so understand that DFT can work both with PW and localized
bases, and the exact basis should be specified in the input file (and
might be documented in the CIF).

>> l) There are a lot of *_conv data items declared (e.g.
>> _dft_basisset_energy_conv). Are they for convergence tests? Or for
>> convolutions? What is their proposed definition?
>
> These are the convergence criteria, for instance: stop the iterative
> (SCF) cycle once the total energy does change by less than
> _dft_basisset_energy_conv during the last few iterations.

Perfect. Now, are these the *desired* criteria, or the *obtained* values
(i.e. actual values of the computation)? Although probably we can assume
that in any case the energy change at the end of the computation was
less than the specified _dft_basisset_energy_conv value, and the same
for other *_conv values, right?

I'll add units and explanations to the dictionary.

>> m) Is _dft_cell_energy the same as the "total energy" reported by some
>> codes? Can we rename it to _dft_total_energy?
>
> Probably yes.

Hmmm... We need some explanation in the dictionary how these values are
supposed to be used.

>> n) Am I correct to assume that the "total energy" reported by codes will
>> always be the sum of separate energy terms (Coulomb, exchange,
>> 1-electron, 2-electron, etc.)? Is there an interest to have them
>> recorded in the result data files (CIFs) separately? If yes, what is the
>> "Hartree energy" (is it a sum of all single electron energies in the SCF
>> for each of them?), "Ewald energy" (is it the electrostatic lattice
>> energy, obtained by Ewald summation?) and the rest in the values from
>> the AbInit output file? Are these terms consistent across QM codes?
>
> Also here I think this is asking for way too much detail. Most codes can
> indeed split up the total energy in many contributions, but papers
> usually do not report that (only in the special cases when there is
> useful information in the splitting). If papers don't do it, databases
> shouldn't either -- that feels as a sound criterium.

Interesting idea. Well, that makes our life easier.

On the other hand, electronic media, like database, can record and make
usable more information than a traditional paper or PDF publication. We
should not overlook such possibilities and use them when needed.

For example, in protein crystallography, structure factors were not
reported in publications at the very beginning, due to a sheer volume of
data (a protein crystal can give you a million of unique reflections);
but today it is a must and a self-evident thing to deposit such data
electronically into PDB.

>> o) How does one check that computation has converged on k-points,
>> E-cuttof, smear and other parameters, and that pseudopotential is
>> selected right? From the Abinit tutorial
>> (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got
>> impression that one needs to run computation with different values of
>> these parameters, and see that the total energy, or other gauge values,
>> no longer change significantly when these parameters are increased. Is
>> that right? If yes, are there codes that do this automatically? Should
>> we require Etotal (or coordinates') dependence on k-grid, E-cuttof,
>> smear, to check convergence when depositing to TCOD? Or should TCOD side
>> check this automatically when appropriate (say for F/LOSS codes)?
>
> "k-points, E-cuttof, smear and other parameters" are indeed tested as
> you describe.

OK. Thanks for clarification.

I think it would be beneficial to have such checks included into teh
results file...

> ... The pseudpotential can't be tested that way, what people
> usually do is to verify whether the numerically converged results when
> using a particular pseudo do agree with experiment.

I see. OK, we'll have to trust PP was selected properly.

Actually, TCOD + COD can check the computations against empirical
(crystallographic) data -- say interatomic distances, bonds, angles,
coordination sphere geometry, etc. The results might be interesting --
significant discrepancies will either predict unseen new phenomena or
point out at problems that can be fixed readily.

> Doing such tests is the responsibility of each user. In principle,
> journals should not publish ab initio results if such tests are missing.
> Journals are not that strict, unfortunately. And some researchers are
> not very careful in that respect.
>
> It's a longstanding problem, that is gradually being solved because
> computers are so fast now that the default settings of most codes are
> sufficiently accurate for many cases, even if a researcher does not
> explicitly tests it.
>
> Also here, TCOD shouldn't try to do better than the journals do.

May I disagree to some extent. If we can do better checks than journals
do, why shouldn't we? That would be a useful tool, a help for journal
reviewers, and one possible way to improved the situation.

In crystallography, the situation is sometimes similar. IUCr journals
are very good at checking structures, but some chemical journals, even
the "high profile" ones, give you CIFs that may even have syntax errors
in them! For me, this hints that nobody bothered to look at the data
before publication. But how can the claim then that he paper was "peer
reviewed"? The text apparently was, but the data probably not. This is
not a good way in today's data-driven sciences.

COD does checks, and we plan more -- and it helps e.g. when personal
communications or prepublication structures are deposited. My personal
experience was very positive on this -- the first personal communication
I tried to send to COD was marked as not converged properly; indeed this
was an oversight, and a couple more refinement cycles fixed the problem.

I fund such checks to be a very useful tool, so why not having something
similar for TCOD? Especially, when we expect a large number of
structures from wide-scale computational projects, where not every
computation is checked manually.

>> p) what are other obvious things that one could make wrong in QM/DFT
>> computations, that could be checked formally?
>
> That's an interesting one... With no answer from my side. If there is
> anything that can go obviously wrong, the codes will have an internal
> test for it already.

Well, an obviously wrong thing would be insufficient checks for
convergence (to coarse k-grid, to little steps of minimization, etc.).

Usually, experienced computational chemists will check for these, but
ever so often a MSc student who is just starting to learn things will
compute a structure, and the experienced boss will happen to be in the
conference... The structure may look reasonable, especially for an
unexperienced eye, but in fact it may be inaccurate... You know what I
mean :)

In short, I think that it would be quite useful to have uniform *actual*
convergence criteria in the CIF output, and check them before inserting
computations into TCOD, like we check crystal structure symmetry, bond
distances, parameter shifts or R-factors.

The question is, what should be the universal criteria for QChem?

Regards,
Saulius

--
Saulius Gražulis

Biotechnologijos institutas
Graičiūno 8
02241 Vilnius-28
Lietuva

Tel.: vidaus BTI:    226
      Vilniaus BTI:  (8-5)-260-25-56
      mobilus TELE2: (8-684)-49-802
      mobilus OMNIT: (8-614)-36-366

_______________________________________________
Tcod mailing list
Tcod at lists.crystallography.net
http://lists.crystallography.net/cgi-bin/mailman/listinfo/tcod