[TCOD] Questions still left regarding TCOD dictionaries

Fri Nov 7 15:30:53 UTC 2014

Hi, Torbjörn,

many thanks for your ideas!

On 2014-11-05 18:34, Björkman Torbjörn wrote:
> Stefaan supplies good information for everything as usual, so just a
>  couple of remarks here.

Indeed :) I join the thanks!

>>> I also want to point out that the presence of the data items (for
>>> energy tables) in the dictionary does not imply any obligation to
>>> use them. Any CIF will be correct and valid without them. Its
>>> just if we decide to include them, there will be a publicly
>>> announced way to do this.

> Here I would say that what one normally pays attention to is just the
> residual forces/energy changes at convergence,

That's the very important point for the dictionary.

So, we will have the _conv parameters:

_dft_basisset_energy_conv
_dft_atom_basisset_energy_conv
_dft_BZ_integration_energy_conv
_dft_cell_energy_conv
_dft_cell_density_conv
_dft_cell_potential_conv
_dft_atom_relax_force_conv
_dft_atom_relax_energy_conv

(actually, these data names from you e-mail, aren't they?)

These will specify the target values for the energy shifts in the last
(several) cycles). If a structure is deposited to TCOD (and probably
marked as 'converged'), we will assume that the energy shifts were
smaller than these specified values, right?

For example, OQMD made by the Wolverton's group says: " The electronic
ground states were converged to within 0.0001 eV/atom and the crystal
structure to within 0.001 eV/atom."
(http://oqmd.org/documentation/vasp). So I would assume:

_dft_atom_basisset_energy_conv 0.0001
_dft_cell_energy_conv 0.001

Am I right as for the meaning of the data items and their match with the
published OQMD values? Or should we have yet another data items for these?

And then we could have min/median/RMSD/max residual forces on atoms
reported as well. Together, this should give some impression on how good
the calculation has converged, right?

> documenting the starting point is not something that is normally 
> done. We should also beware a little here since the starting point
> is very often data straight out of proprietary databases... it might 
> well lead to a lot of trouble.

OK. For now we skip energy traces. We can have possibility to record
initial state from the open data bases like COD, but it is not
obligatory. So we skip it for now.

> There are even "hybrid" schemes, like anything based on the 
> muffin-tin geometry (LAPW, LMTO, ...) which come with a local and 
> PW-like part and where the enumeration of basis functions may be by 
> the PW's (like LAPW) or by the local part (like LMTO). There are 
> wavelet-based codes which I have no idea how they converge. And so 
> on. We have unfortunately a rather messier situation than the quantum
> chemists.

That's fine. We already specify the corresponding 'method' and
'basisset' data items with references to publications describing them
and files with their data.

>>>>> Also here I think this is asking for way too much detail. 
>>>>> Most codes can indeed split up the total energy in many 
>>>>> contributions, but papers usually do not report that (only in
>>>>> the special cases when there is useful information in the 
>>>>> splitting). If papers don't do it, databases shouldn't either
>>>>> -- that feels as a sound criterium.

> It is in fact worse than that, because the precise nature of this 
> split depends on technical details such as how you solve the Poisson
>  equation and how you treat core states. This varies between the 
> codes, so there is not in general any possibility to make a sound 
> comparison between different methods for these partial energies. So I
> don't think anything much is gained by documenting them.

OK, I agree. I skip the detailed energy items for now.

Actually, I only included them because they were in the CompChem CML
dictionary.

>>> In short, I think that it would be quite useful to have uniform 
>>> *actual* convergence criteria in the CIF output, and check them 
>>> before inserting computations into TCOD, like we check crystal 
>>> structure symmetry, bond distances, parameter shifts or 
>>> R-factors.

> Here I tend to agree,

OK :)

> but at the same time I would not want us to impose too draconic
> criteria for acceptance, as long as the obtained result is
> documented.

We do not need to be "draconic". The same as for COD. For instance, if
the structure was published in a reputable journal, we just accept it,
no mater what the convergence was. If it was good for referees of the
publications, its probably good enough for the database; at least we
need to document the fact of publication.

For high-throughput projects, personal communications and
pre-publications computations, we can set whatever level we decide, and
this must be more-or-less the "state of the art" level of convergence.
What you realistically can and should achieve. Think that way : would
you publish your own computations with such level of convergence, and
would you trust somebody else's computations with similar parameters? If
you would trust the, than TCOD should accept it. Its community peer
review that should set the rules, like in "normal" publications.

> Perhaps some two-level system with "hard" acceptance
> criteria and a "warning" level would be good?

Indeed. We can have three level system; "green" lane are excellent
structures that can be deposited automatically; "yellow" cases (with
warnings as you say) that should receive attention from a specialist --
we are working on a WordPress based system for this, to be used for COD
but also usable for TCOD; and finally "red" cases that are clearly bad
and should be improved even before taking time of human experts (its a
precious resource, the time!).

In this way we should eventually have a trustworthy collection of the
state-of-the art computation results in TCOD.

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366