Hi, folks,
I have a (rather long) list of questions regarding the TCOD dictionaries; I've tried to compile it here. I'd be grateful if you comment on them as much as you have time.
Questions:
a) the units of energy that we started to use in the dictionaries are eV (electron-Volts). For distances, we should probably use Angstroems, since then we can easier compare computation results with crystallographic experimental data. This naturally suggests units for forces as eV/A. Is that OK, or should we better use SI units of similar scale (say aJ -- atto-Joules, on the similar scale)? The only problem I see with eV is that it is measured, not defined from the basic SI units (http://en.wikipedia.org/wiki/Electronvolt, http://physics.nist.gov/cuu/Units/outside.html). We should probably stay of from Hartries, Rydbergs, Bohrs for archiving purposes, shouldn't we?
NB.: for archival computer files, it is more convenient to have all data in always in the same units, in all files, and not to allow different units -- at least with current standard software libraries.
b) Is nuclear electric dipole moment used/necessary for DFT computations (_dft_atom_type_nuclear_dipole)?
c) If b) is "yes", what units we should use for electric dipole (and higher moments) -- Debayes, e*A (amount of unit charges times Angstroems), or something else?
d) Is my definition of residual forces in cif_tcod.dic, "data_tcod_atom_site_residual_force" correct/acceptable? If not, how should we define them?
e) If I understand correctly, DFT and most other QM methods operate under Born-Oppenheimer approximation; under this approximation, electron densities (electron wave-functions) are optimised to minimal energy at fixed nuclei and unit cell parameters, and when this converges, the nuclei parameters and/or cell constants are changed slightly (e.g. along gradients), and the electron energy is minimised again. Is this a correct view? Is it a universal situation across QM codes?
f) If e) is "yes", then we can talk about "microcycles" (electron w/f refinement) and "macrocycles" (nuclei/cell shifted in each "macrocycle"). We could also document total energy changes in all these cycles, to monitor the convergence, as I suggest in _tcod_computation_cycle_... data items. Is such view acceptable? What is the terminology used in different codes? Will such table be useful? Will it be easy to obtain from most codes?
g) I have attempted to put all DFT related data items into the cif_dft.dic dictionary, and the general computational items (suitable also for MM and other methods), and into the cif_tcod.dic. Are all the parameters in cif_dft.dic indeed DFT specific? Are they named and commented properly?
h) The CML CompChem dictionary mentions SCF as a method. I know HF and its modifications are SCF; is DFT technically also SCF? Are there more SCF methods that are not HF? Should we include "SCF" into the enumeration values of the _tcod_model as a separate model?
i) "Model" is a very overloaded term. Maybe it would be better to rename _tcod_model to "_tcod_method", or "_tcod_theory_level"?
j) I have taken the _dft_basisset_type list from http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least potentially for a possibility of the CIF->CML->CIF roundtrip. The two big classes of basis functions, as I have learned, are localised (Slater, Gaussian) and plane wave. Should we introduce such classification on top of the _dft_basisset_type enumerator? Are localised bases relevant for DFT at all? Is it enough for DFT to specify just an energy cut-off, assuming plane wave bases (for a given pseudopotential), or are there different possible bases also among plane waves (I guess there should not be, but maybe I'm missing something...)? Or are localised bases sets relevant for computing pseudopotentials (cores)? Can I assume that dftxfit, dftorb and dftcfit are plane wave bases? What about 'periodic'? (these terms are all from the Basis Set Exchange database, I guess).
k) What is the difference between _dft_atom_basisset and _dft_basisset? Can they be simultaneously used in one computation? If not, maybe we can merge the two definition sets into one?
l) There are a lot of *_conv data items declared (e.g. _dft_basisset_energy_conv). Are they for convergence tests? Or for convolutions? What is their proposed definition?
m) Is _dft_cell_energy the same as the "total energy" reported by some codes? Can we rename it to _dft_total_energy?
n) Am I correct to assume that the "total energy" reported by codes will always be the sum of separate energy terms (Coulomb, exchange, 1-electron, 2-electron, etc.)? Is there an interest to have them recorded in the result data files (CIFs) separately? If yes, what is the "Hartree energy" (is it a sum of all single electron energies in the SCF for each of them?), "Ewald energy" (is it the electrostatic lattice energy, obtained by Ewald summation?) and the rest in the values from the AbInit output file? Are these terms consistent across QM codes?
o) How does one check that computation has converged on k-points, E-cuttof, smear and other parameters, and that pseudopotential is selected right? From the Abinit tutorial (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got impression that one needs to run computation with different values of these parameters, and see that the total energy, or other gauge values, no longer change significantly when these parameters are increased. Is that right? If yes, are there codes that do this automatically? Should we require Etotal (or coordinates') dependence on k-grid, E-cuttof, smear, to check convergence when depositing to TCOD? Or should TCOD side check this automatically when appropriate (say for F/LOSS codes)?
p) what are other obvious things that one could make wrong in QM/DFT computations, that could be checked formally?
Sorry for a long list, but I would like to get the things possibly from the very beginning...
Regards, Saulius
PS. If you find obvious mistakes in the current dictionaries, please feel free to correct them and commit the corrections back to the repository.
Hello Saulius,
I have a (rather long) list of questions regarding the TCOD dictionaries; I've tried to compile it here. I'd be grateful if you comment on them as much as you have time.
Let me comment on those questions that I can do on the spot, without looking into the dictionary yet :
a) the units of energy that we started to use in the dictionaries are eV (electron-Volts). For distances, we should probably use Angstroems, since then we can easier compare computation results with crystallographic experimental data. This naturally suggests units for forces as eV/A. Is that OK, or should we better use SI units of similar scale (say aJ -- atto-Joules, on the similar scale)? The only problem I see with eV is that it is measured, not defined from the basic SI units (http://en.wikipedia.org/wiki/Electronvolt, http://physics.nist.gov/cuu/Units/outside.html). We should probably stay of from Hartries, Rydbergs, Bohrs for archiving purposes, shouldn't we?
NB.: for archival computer files, it is more convenient to have all data in always in the same units, in all files, and not to allow different units -- at least with current standard software libraries.
Although SI units would be in principle the good choice, nobody uses them in this context. At least eV and Angstrom have a special status ('tolerated units' within SI), hence allowing eV, Angstrom and therefore eV/A for forces is a fair compromise.
b) Is nuclear electric dipole moment used/necessary for DFT computations (_dft_atom_type_nuclear_dipole)?
c) If b) is "yes", what units we should use for electric dipole (and higher moments) -- Debayes, e*A (amount of unit charges times Angstroems), or something else?
There must be some kind of confusion here -- my old nuclear physics courses always emphasized that nuclei do not have an electric dipole moment. Either you mean magnetic dipole moment or nuclear quadrupole moment? Anyway, nuclear properties are never required for DFT calculations as such, but they can be used to convert DFT-predictions into quantities that are experimentally accessible. I don't see the need to keep track of this, however, in a computational database.
d) Is my definition of residual forces in cif_tcod.dic, "data_tcod_atom_site_residual_force" correct/acceptable? If not, how should we define them?
<skip>
e) If I understand correctly, DFT and most other QM methods operate under Born-Oppenheimer approximation; under this approximation, electron densities (electron wave-functions) are optimised to minimal energy at fixed nuclei and unit cell parameters, and when this converges, the nuclei parameters and/or cell constants are changed slightly (e.g. along gradients), and the electron energy is minimised again. Is this a correct view? Is it a universal situation across QM codes?
Yes, correct. It is pretty universal. There some special-purpose applications that do not make the born-oppenheimer approximation, but that's really a minority.
f) If e) is "yes", then we can talk about "microcycles" (electron w/f refinement) and "macrocycles" (nuclei/cell shifted in each "macrocycle"). We could also document total energy changes in all these cycles, to monitor the convergence, as I suggest in _tcod_computation_cycle_... data items. Is such view acceptable? What is the terminology used in different codes? Will such table be useful? Will it be easy to obtain from most codes?
I advice to stay away from that. Your view is correct, but this information is often not mentioned even in papers. Documenting such changes would be arbitrary, to some extent. The final relaxed geometry is well-defined, but what do you take as the unrelaxed starting point...?
g) I have attempted to put all DFT related data items into the cif_dft.dic dictionary, and the general computational items (suitable also for MM and other methods), and into the cif_tcod.dic. Are all the parameters in cif_dft.dic indeed DFT specific? Are they named and commented properly?
<skip>
h) The CML CompChem dictionary mentions SCF as a method. I know HF and its modifications are SCF; is DFT technically also SCF? Are there more SCF methods that are not HF? Should we include "SCF" into the enumeration values of the _tcod_model as a separate model?
'SCF' refers only to the fact that a particular iterative solving scheme is used. As such, I would consider that term as being less informative than HF or DFT (one could even imagine to do DFT without SCF, although in practice this very rarely is done).
i) "Model" is a very overloaded term. Maybe it would be better to rename _tcod_model to "_tcod_method", or "_tcod_theory_level"?
<skip>
j) I have taken the _dft_basisset_type list from http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least potentially for a possibility of the CIF->CML->CIF roundtrip. The two big classes of basis functions, as I have learned, are localised (Slater, Gaussian) and plane wave. Should we introduce such classification on top of the _dft_basisset_type enumerator?
I don't think so. It will be implicit in the name people use for the basis set.
Are localised bases relevant for DFT at all?
Yes, sure (SIESTA, for instance).
Is it enough for DFT to specify just an energy cut-off, assuming plane wave bases (for a given pseudopotential), or are there different possible bases also among plane waves (I guess there should not be, but maybe I'm missing something...)?
For plane waves the energy cut-off is the only quantity. But there are many other types of basis sets that are no plane waves. For these, more specification might be needed (although often they are contained in the published definition of the basis set).
Or are localised bases sets relevant for computing pseudopotentials (cores)? Can I assume that dftxfit, dftorb and dftcfit are plane wave bases? What about 'periodic'? (these terms are all from the Basis Set Exchange database, I guess).
<skip>
k) What is the difference between _dft_atom_basisset and _dft_basisset? Can they be simultaneously used in one computation? If not, maybe we can merge the two definition sets into one?
<skip>
l) There are a lot of *_conv data items declared (e.g. _dft_basisset_energy_conv). Are they for convergence tests? Or for convolutions? What is their proposed definition?
These are the convergence criteria, for instance: stop the iterative (SCF) cycle once the total energy does change by less than _dft_basisset_energy_conv during the last few iterations.
m) Is _dft_cell_energy the same as the "total energy" reported by some codes? Can we rename it to _dft_total_energy?
Probably yes.
n) Am I correct to assume that the "total energy" reported by codes will always be the sum of separate energy terms (Coulomb, exchange, 1-electron, 2-electron, etc.)? Is there an interest to have them recorded in the result data files (CIFs) separately? If yes, what is the "Hartree energy" (is it a sum of all single electron energies in the SCF for each of them?), "Ewald energy" (is it the electrostatic lattice energy, obtained by Ewald summation?) and the rest in the values from the AbInit output file? Are these terms consistent across QM codes?
Also here I think this is asking for way too much detail. Most codes can indeed split up the total energy in many contributions, but papers usually do not report that (only in the special cases when there is useful information in the splitting). If papers don't do it, databases shouldn't either -- that feels as a sound criterium.
o) How does one check that computation has converged on k-points, E-cuttof, smear and other parameters, and that pseudopotential is selected right? From the Abinit tutorial (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got impression that one needs to run computation with different values of these parameters, and see that the total energy, or other gauge values, no longer change significantly when these parameters are increased. Is that right? If yes, are there codes that do this automatically? Should we require Etotal (or coordinates') dependence on k-grid, E-cuttof, smear, to check convergence when depositing to TCOD? Or should TCOD side check this automatically when appropriate (say for F/LOSS codes)?
"k-points, E-cuttof, smear and other parameters" are indeed tested as you describe. The pseudpotential can't be tested that way, what people usually do is to verify whether the numerically converged results when using a particular pseudo do agree with experiment.
Doing such tests is the responsibility of each user. In principle, journals should not publish ab initio results if such tests are missing. Journals are not that strict, unfortunately. And some researchers are not very careful in that respect.
It's a longstanding problem, that is gradually being solved because computers are so fast now that the default settings of most codes are sufficiently accurate for many cases, even if a researcher does not explicitly tests it.
Also here, TCOD shouldn't try to do better than the journals do.
p) what are other obvious things that one could make wrong in QM/DFT computations, that could be checked formally?
That's an interesting one... With no answer from my side. If there is anything that can go obviously wrong, the codes will have an internal test for it already.
Stefaan
Hi, Stefaan,
many thanks for your quick and very informative answer. I'll adjust the dictionaries according to it. Below are some of my comments.
On 2014-11-05 13:33, Stefaan Cottenier wrote:
Let me comment on those questions that I can do on the spot, without looking into the dictionary yet :
a) the units of energy that we started to use in the dictionaries are eV (electron-Volts). For distances, we should probably use Angstroems, since then we can easier compare computation results with crystallographic experimental data. This naturally suggests units for forces as eV/A. Is that OK, or should we better use SI units of similar scale (say aJ -- atto-Joules, on the similar scale)? The only problem I see with eV is that it is measured, not defined from the basic SI units (http://en.wikipedia.org/wiki/Electronvolt, http://physics.nist.gov/cuu/Units/outside.html).
Although SI units would be in principle the good choice, nobody uses them in this context. At least eV and Angstrom have a special status ('tolerated units' within SI), hence allowing eV, Angstrom and therefore eV/A for forces is a fair compromise.
Good. So we stay with eV and A, with eV/A for forces, as already documented in our dictionaries.
b) Is nuclear electric dipole moment used/necessary for DFT computations (_dft_atom_type_nuclear_dipole)?
c) If b) is "yes", what units we should use for electric dipole (and higher moments) -- Debayes, e*A (amount of unit charges times Angstroems), or something else?
There must be some kind of confusion here -- my old nuclear physics courses always emphasized that nuclei do not have an electric dipole moment. Either you mean magnetic dipole moment or nuclear quadrupole moment? Anyway, nuclear properties are never required for DFT calculations as such, but they can be used to convert DFT-predictions into quantities that are experimentally accessible. I don't see the need to keep track of this, however, in a computational database.
OK, this is a gap in my education -- I must have overlooked the zero nuclear electric dipole during my university years...
Setting aside the theoretical question whether the quarks can move in such a way as to give non-zero dipole, I remove the _dft_atom_type_nuclear_dipole as lacking theoretical justification and empirical evidence. For magnetic dipole, a _dft_atom_type_magn_nuclear_moment could be introduced if needed (for orbital and spin magnetic moments the data names are already there).
e) If I understand correctly, DFT and most other QM methods operate under Born-Oppenheimer approximation; under this approximation, electron densities (electron wave-functions) are optimised to minimal energy at fixed nuclei and unit cell parameters, and when this converges, the nuclei parameters and/or cell constants are changed slightly (e.g. along gradients), and the electron energy is minimised again. Is this a correct view? Is it a universal situation across QM codes?
Yes, correct. It is pretty universal. There some special-purpose applications that do not make the born-oppenheimer approximation, but that's really a minority.
OK. Thanks for confirmation.
f) If e) is "yes", then we can talk about "microcycles" (electron w/f refinement) and "macrocycles" (nuclei/cell shifted in each "macrocycle"). We could also document total energy changes in all these cycles, to monitor the convergence, as I suggest in _tcod_computation_cycle_... data items. Is such view acceptable? What is the terminology used in different codes? Will such table be useful? Will it be easy to obtain from most codes?
I advice to stay away from that. Your view is correct, but this information is often not mentioned even in papers. Documenting such changes would be arbitrary, to some extent. The final relaxed geometry is well-defined, but what do you take as the unrelaxed starting point...?
The idea is that the unrelaxed structure starts somewhere at high energies, and then it converges to low energy that no longer changes significantly with refinement cycles. For me, this would be evidence that the process has converged. I guess when one does a calculation, one looks into such energy behavior traces, doesn't one?
If so, then it makes sense to have tools to record the traces in CIFs, as an evidence of convergence and convergence checks.
I also want to point out that the presence of the data items (for energy tables) in the dictionary does not imply any obligation to use them. Any CIF will be correct and valid without them. Its just if we decide to include them, there will be a publicly announced way to do this.
h) The CML CompChem dictionary mentions SCF as a method. I know HF and its modifications are SCF; is DFT technically also SCF? Are there more SCF methods that are not HF? Should we include "SCF" into the enumeration values of the _tcod_model as a separate model?
'SCF' refers only to the fact that a particular iterative solving scheme is used. As such, I would consider that term as being less informative than HF or DFT (one could even imagine to do DFT without SCF, although in practice this very rarely is done).
OK; so we skip SCF as a separate "model".
j) I have taken the _dft_basisset_type list from http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least potentially for a possibility of the CIF->CML->CIF roundtrip. The two big classes of basis functions, as I have learned, are localised (Slater, Gaussian) and plane wave. Should we introduce such classification on top of the _dft_basisset_type enumerator?
I don't think so. It will be implicit in the name people use for the basis set.
OK. The type of the basis set should be implied from the basis set date (name, files, reference).
Are localised bases relevant for DFT at all?
Yes, sure (SIESTA, for instance).
Good to know. Thanks for the info!
Is it enough for DFT to specify just an energy cut-off, assuming plane wave bases (for a given pseudopotential), or are there different possible bases also among plane waves (I guess there should not be, but maybe I'm missing something...)?
For plane waves the energy cut-off is the only quantity. But there are many other types of basis sets that are no plane waves. For these, more specification might be needed (although often they are contained in the published definition of the basis set).
I see. OK, I so understand that DFT can work both with PW and localized bases, and the exact basis should be specified in the input file (and might be documented in the CIF).
l) There are a lot of *_conv data items declared (e.g. _dft_basisset_energy_conv). Are they for convergence tests? Or for convolutions? What is their proposed definition?
These are the convergence criteria, for instance: stop the iterative (SCF) cycle once the total energy does change by less than _dft_basisset_energy_conv during the last few iterations.
Perfect. Now, are these the *desired* criteria, or the *obtained* values (i.e. actual values of the computation)? Although probably we can assume that in any case the energy change at the end of the computation was less than the specified _dft_basisset_energy_conv value, and the same for other *_conv values, right?
I'll add units and explanations to the dictionary.
m) Is _dft_cell_energy the same as the "total energy" reported by some codes? Can we rename it to _dft_total_energy?
Probably yes.
Hmmm... We need some explanation in the dictionary how these values are supposed to be used.
n) Am I correct to assume that the "total energy" reported by codes will always be the sum of separate energy terms (Coulomb, exchange, 1-electron, 2-electron, etc.)? Is there an interest to have them recorded in the result data files (CIFs) separately? If yes, what is the "Hartree energy" (is it a sum of all single electron energies in the SCF for each of them?), "Ewald energy" (is it the electrostatic lattice energy, obtained by Ewald summation?) and the rest in the values from the AbInit output file? Are these terms consistent across QM codes?
Also here I think this is asking for way too much detail. Most codes can indeed split up the total energy in many contributions, but papers usually do not report that (only in the special cases when there is useful information in the splitting). If papers don't do it, databases shouldn't either -- that feels as a sound criterium.
Interesting idea. Well, that makes our life easier.
On the other hand, electronic media, like database, can record and make usable more information than a traditional paper or PDF publication. We should not overlook such possibilities and use them when needed.
For example, in protein crystallography, structure factors were not reported in publications at the very beginning, due to a sheer volume of data (a protein crystal can give you a million of unique reflections); but today it is a must and a self-evident thing to deposit such data electronically into PDB.
o) How does one check that computation has converged on k-points, E-cuttof, smear and other parameters, and that pseudopotential is selected right? From the Abinit tutorial (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got impression that one needs to run computation with different values of these parameters, and see that the total energy, or other gauge values, no longer change significantly when these parameters are increased. Is that right? If yes, are there codes that do this automatically? Should we require Etotal (or coordinates') dependence on k-grid, E-cuttof, smear, to check convergence when depositing to TCOD? Or should TCOD side check this automatically when appropriate (say for F/LOSS codes)?
"k-points, E-cuttof, smear and other parameters" are indeed tested as you describe.
OK. Thanks for clarification.
I think it would be beneficial to have such checks included into teh results file...
... The pseudpotential can't be tested that way, what people usually do is to verify whether the numerically converged results when using a particular pseudo do agree with experiment.
I see. OK, we'll have to trust PP was selected properly.
Actually, TCOD + COD can check the computations against empirical (crystallographic) data -- say interatomic distances, bonds, angles, coordination sphere geometry, etc. The results might be interesting -- significant discrepancies will either predict unseen new phenomena or point out at problems that can be fixed readily.
Doing such tests is the responsibility of each user. In principle, journals should not publish ab initio results if such tests are missing. Journals are not that strict, unfortunately. And some researchers are not very careful in that respect.
It's a longstanding problem, that is gradually being solved because computers are so fast now that the default settings of most codes are sufficiently accurate for many cases, even if a researcher does not explicitly tests it.
Also here, TCOD shouldn't try to do better than the journals do.
May I disagree to some extent. If we can do better checks than journals do, why shouldn't we? That would be a useful tool, a help for journal reviewers, and one possible way to improved the situation.
In crystallography, the situation is sometimes similar. IUCr journals are very good at checking structures, but some chemical journals, even the "high profile" ones, give you CIFs that may even have syntax errors in them! For me, this hints that nobody bothered to look at the data before publication. But how can the claim then that he paper was "peer reviewed"? The text apparently was, but the data probably not. This is not a good way in today's data-driven sciences.
COD does checks, and we plan more -- and it helps e.g. when personal communications or prepublication structures are deposited. My personal experience was very positive on this -- the first personal communication I tried to send to COD was marked as not converged properly; indeed this was an oversight, and a couple more refinement cycles fixed the problem.
I fund such checks to be a very useful tool, so why not having something similar for TCOD? Especially, when we expect a large number of structures from wide-scale computational projects, where not every computation is checked manually.
p) what are other obvious things that one could make wrong in QM/DFT computations, that could be checked formally?
That's an interesting one... With no answer from my side. If there is anything that can go obviously wrong, the codes will have an internal test for it already.
Well, an obviously wrong thing would be insufficient checks for convergence (to coarse k-grid, to little steps of minimization, etc.).
Usually, experienced computational chemists will check for these, but ever so often a MSc student who is just starting to learn things will compute a structure, and the experienced boss will happen to be in the conference... The structure may look reasonable, especially for an unexperienced eye, but in fact it may be inaccurate... You know what I mean :)
In short, I think that it would be quite useful to have uniform *actual* convergence criteria in the CIF output, and check them before inserting computations into TCOD, like we check crystal structure symmetry, bond distances, parameter shifts or R-factors.
The question is, what should be the universal criteria for QChem?
Regards, Saulius
Dear Saulius and TCOD'ers I'll try to make my comments and suggestions on the original questionnaire.
On Tue, Nov 4, 2014 at 1:31 PM, Saulius Gražulis grazulis@ibt.lt wrote:
Hi, folks,
I have a (rather long) list of questions regarding the TCOD dictionaries; I've tried to compile it here. I'd be grateful if you comment on them as much as you have time.
Questions:
a) the units of energy that we started to use in the dictionaries are eV (electron-Volts). For distances, we should probably use Angstroems, since then we can easier compare computation results with crystallographic experimental data. This naturally suggests units for forces as eV/A. Is that OK, or should we better use SI units of similar scale (say aJ -- atto-Joules, on the similar scale)? The only problem I see with eV is that it is measured, not defined from the basic SI units (http://en.wikipedia.org/wiki/Electronvolt, http://physics.nist.gov/cuu/Units/outside.html). We should probably stay of from Hartries, Rydbergs, Bohrs for archiving purposes, shouldn't we?
I think eVs are fine. They are more commonly used in the solid state community. Whereas quantum chemists usually give them in Hartree. I think eVs are better since many other people (experimentalists) use them and know what they are.
NB.: for archival computer files, it is more convenient to have all data in always in the same units, in all files, and not to allow different units -- at least with current standard software libraries.
b) Is nuclear electric dipole moment used/necessary for DFT computations (_dft_atom_type_nuclear_dipole)?
c) If b) is "yes", what units we should use for electric dipole (and higher moments) -- Debayes, e*A (amount of unit charges times Angstroems), or something else?
d) Is my definition of residual forces in cif_tcod.dic, "data_tcod_atom_site_residual_force" correct/acceptable? If not, how should we define them?
e) If I understand correctly, DFT and most other QM methods operate under Born-Oppenheimer approximation; under this approximation, electron densities (electron wave-functions) are optimised to minimal energy at fixed nuclei and unit cell parameters, and when this converges, the nuclei parameters and/or cell constants are changed slightly (e.g. along gradients), and the electron energy is minimised again. Is this a correct view? Is it a universal situation across QM codes?
f) If e) is "yes", then we can talk about "microcycles" (electron w/f refinement) and "macrocycles" (nuclei/cell shifted in each "macrocycle"). We could also document total energy changes in all these cycles, to monitor the convergence, as I suggest in _tcod_computation_cycle_... data items. Is such view acceptable? What is the terminology used in different codes? Will such table be useful? Will it be easy to obtain from most codes?
g) I have attempted to put all DFT related data items into the cif_dft.dic dictionary, and the general computational items (suitable also for MM and other methods), and into the cif_tcod.dic. Are all the parameters in cif_dft.dic indeed DFT specific? Are they named and commented properly?
h) The CML CompChem dictionary mentions SCF as a method. I know HF and its modifications are SCF; is DFT technically also SCF? Are there more SCF methods that are not HF? Should we include "SCF" into the enumeration values of the _tcod_model as a separate model?
i) "Model" is a very overloaded term. Maybe it would be better to rename _tcod_model to "_tcod_method", or "_tcod_theory_level"?
Ok. I introduced this term primarily because I was thinking about PCOD and eventual merging of the two. My scheme was to reserve the _tcod_method for a particular method used to predict the crystal structure (ab initio; without the knowledge of an experimental one) which would be: packing, Monte Carlo annealing, free energy based etc and etc. and reserve _tcod model for a particular model which was used to describe the interactions _dft _forcefield _hf etc. for the final refinement. We could also change it to _theory_level or something.
j) I have taken the _dft_basisset_type list from http://www.xml-cml.org/dictionary/compchem/#basisSet, to strive at least potentially for a possibility of the CIF->CML->CIF roundtrip. The two big classes of basis functions, as I have learned, are localised (Slater, Gaussian) and plane wave. Should we introduce such classification on top of the _dft_basisset_type enumerator? Are localised bases relevant for DFT at all? Is it enough for DFT to specify just an energy cut-off, assuming plane wave bases (for a given pseudopotential), or are there different possible bases also among plane waves (I guess there should not be, but maybe I'm missing something...)? Or are localised bases sets relevant for computing pseudopotentials (cores)? Can I assume that dftxfit, dftorb and dftcfit are plane wave bases? What about 'periodic'? (these terms are all from the Basis Set Exchange database, I guess).
k) What is the difference between _dft_atom_basisset and _dft_basisset? Can they be simultaneously used in one computation? If not, maybe we can merge the two definition sets into one?
l) There are a lot of *_conv data items declared (e.g. _dft_basisset_energy_conv). Are they for convergence tests? Or for convolutions? What is their proposed definition?
m) Is _dft_cell_energy the same as the "total energy" reported by some codes? Can we rename it to _dft_total_energy?
n) Am I correct to assume that the "total energy" reported by codes will always be the sum of separate energy terms (Coulomb, exchange, 1-electron, 2-electron, etc.)? Is there an interest to have them recorded in the result data files (CIFs) separately? If yes, what is the "Hartree energy" (is it a sum of all single electron energies in the SCF for each of them?), "Ewald energy" (is it the electrostatic lattice energy, obtained by Ewald summation?) and the rest in the values from the AbInit output file? Are these terms consistent across QM codes?
o) How does one check that computation has converged on k-points, E-cuttof, smear and other parameters, and that pseudopotential is selected right? From the Abinit tutorial (http://flex.phys.tohoku.ac.jp/texi/abinit/Tutorial/lesson_4.html) I got impression that one needs to run computation with different values of these parameters, and see that the total energy, or other gauge values, no longer change significantly when these parameters are increased. Is that right? If yes, are there codes that do this automatically? Should we require Etotal (or coordinates') dependence on k-grid, E-cuttof, smear, to check convergence when depositing to TCOD? Or should TCOD side check this automatically when appropriate (say for F/LOSS codes)?
p) what are other obvious things that one could make wrong in QM/DFT computations, that could be checked formally?
Sorry for a long list, but I would like to get the things possibly from the very beginning...
Regards, Saulius
PS. If you find obvious mistakes in the current dictionaries, please feel free to correct them and commit the corrections back to the repository.
-- Saulius Gražulis
Biotechnologijos institutas Graičiūno 8 02241 Vilnius-28 Lietuva
Tel.: vidaus BTI: 226 Vilniaus BTI: (8-5)-260-25-56 mobilus TELE2: (8-684)-49-802 mobilus OMNIT: (8-614)-36-366
Tcod mailing list Tcod@lists.crystallography.net http://lists.crystallography.net/cgi-bin/mailman/listinfo/tcod
I'll follow up on the other questions later.
Regards,
Linas
On 2014-11-05 18:42, Linas Vilciauskas wrote:
I think eVs are fine. They are more commonly used in the solid state community. Whereas quantum chemists usually give them in Hartree. I think eVs are better since many other people (experimentalists) use them and know what they are.
Good. I also like eV's. So let's stick to them.
Regards, Saulius
tcod@lists.crystallography.net