[TCOD] Questions still left regarding TCOD dictionaries

Fri Nov 7 16:30:07 UTC 2014

Hi Saulius

> So, we will have the _conv parameters:
> 
> _dft_basisset_energy_conv
> _dft_atom_basisset_energy_conv
> _dft_BZ_integration_energy_conv
> _dft_cell_energy_conv
> _dft_cell_density_conv
> _dft_cell_potential_conv
> _dft_atom_relax_force_conv
> _dft_atom_relax_energy_conv

Looking pretty good, but see further below...

> (actually, these data names from you e-mail, aren't they?)

I think that they were all in the set that I originally wrote down, yes. 

> These will specify the target values for the energy shifts in the last
> (several) cycles). If a structure is deposited to TCOD (and probably
> marked as 'converged'), we will assume that the energy shifts were
> smaller than these specified values, right?

Exactly. I would say that the values should either be either the value actually achieved at the of the relaxation, or the target value given as input to the DFT program, which is an upper bound to the "achieved value" (assuming it was converged). 

> For example, OQMD made by the Wolverton's group says: " The electronic
> ground states were converged to within 0.0001 eV/atom and the crystal
> structure to within 0.001 eV/atom."
> (http://oqmd.org/documentation/vasp). So I would assume:
>
> _dft_atom_basisset_energy_conv 0.0001
> _dft_cell_energy_conv 0.001
>
> Am I right as for the meaning of the data items and their match with the
> published OQMD values? Or should we have yet another data items for these?

No, this is not quite correct. Their "electronic ground state convergence" is the accuracy demanded in every single relaxation step, which does not correspond to any of the above mentioned convergence parameters. It is not the basis set. OQMD does not have a basis set convergence procedure that gives a value for _basisset_energy_conv (and I doubt that it can be estimated from the "coarse" and "fine" runs that have been done). Perhaps we should have a value for this though. It would be something like _SCF_conv I guess, since it is the convergence parameter of the self-consistent field cycle.

> And then we could have min/median/RMSD/max residual forces on atoms
> reported as well. Together, this should give some impression on how good
> the calculation has converged, right?

Definitely.

> We do not need to be "draconic". The same as for COD. For instance, if
> the structure was published in a reputable journal, we just accept it,
> no mater what the convergence was. If it was good for referees of the
> publications, its probably good enough for the database; at least we
> need to document the fact of publication.
> 
> For high-throughput projects, personal communications and
> pre-publications computations, we can set whatever level we decide, and
> this must be more-or-less the "state of the art" level of convergence.
> What you realistically can and should achieve. Think that way : would
> you publish your own computations with such level of convergence, and
> would you trust somebody else's computations with similar parameters? If
> you would trust the, than TCOD should accept it. Its community peer
> review that should set the rules, like in "normal" publications.
>
>> Perhaps some two-level system with "hard" acceptance
>> criteria and a "warning" level would be good?
>
> Indeed. We can have three level system; "green" lane are excellent
> structures that can be deposited automatically; "yellow" cases (with
> warnings as you say) that should receive attention from a specialist --
> we are working on a WordPress based system for this, to be used for COD
> but also usable for TCOD; and finally "red" cases that are clearly bad
> and should be improved even before taking time of human experts (its a
> precious resource, the time!).

This sounds great. So peer-reviewed material is accepted (so we'd have to have some journal list?) and non-reviewed material is accepted if quality meets certain criteria. Everything is marked up in some kind of green-yellow-red fashion and so users can easily find out how to treat the numbers. 

> In this way we should eventually have a trustworthy collection of the
> state-of-the art computation results in TCOD.

Yes, and as others have pointed out, we do our job to improve the standards in the field. Very good!

Cheers,
Torbjörn

---
Torbjörn Björkman, PhD
COMP, Aalto University School of Science
Espoo, Finland

________________________________________
Från: Saulius Gražulis [grazulis at ibt.lt]
Skickat: den 7 november 2014 17:30
Till: Björkman Torbjörn; tcod at lists.crystallography.net
Ämne: Re: SV: [TCOD] Questions still left regarding TCOD dictionaries

Hi, Torbjörn,

many thanks for your ideas!

On 2014-11-05 18:34, Björkman Torbjörn wrote:
> Stefaan supplies good information for everything as usual, so just a
>  couple of remarks here.

Indeed :) I join the thanks!

>>> I also want to point out that the presence of the data items (for
>>> energy tables) in the dictionary does not imply any obligation to
>>> use them. Any CIF will be correct and valid without them. Its
>>> just if we decide to include them, there will be a publicly
>>> announced way to do this.

> Here I would say that what one normally pays attention to is just the
> residual forces/energy changes at convergence,

That's the very important point for the dictionary.

> documenting the starting point is not something that is normally
> done. We should also beware a little here since the starting point
> is very often data straight out of proprietary databases... it might
> well lead to a lot of trouble.

OK. For now we skip energy traces. We can have possibility to record
initial state from the open data bases like COD, but it is not
obligatory. So we skip it for now.

> There are even "hybrid" schemes, like anything based on the
> muffin-tin geometry (LAPW, LMTO, ...) which come with a local and
> PW-like part and where the enumeration of basis functions may be by
> the PW's (like LAPW) or by the local part (like LMTO). There are
> wavelet-based codes which I have no idea how they converge. And so
> on. We have unfortunately a rather messier situation than the quantum
> chemists.

That's fine. We already specify the corresponding 'method' and
'basisset' data items with references to publications describing them
and files with their data.

>>>>> Also here I think this is asking for way too much detail.
>>>>> Most codes can indeed split up the total energy in many
>>>>> contributions, but papers usually do not report that (only in
>>>>> the special cases when there is useful information in the
>>>>> splitting). If papers don't do it, databases shouldn't either
>>>>> -- that feels as a sound criterium.

> It is in fact worse than that, because the precise nature of this
> split depends on technical details such as how you solve the Poisson
>  equation and how you treat core states. This varies between the
> codes, so there is not in general any possibility to make a sound
> comparison between different methods for these partial energies. So I
> don't think anything much is gained by documenting them.

OK, I agree. I skip the detailed energy items for now.

Actually, I only included them because they were in the CompChem CML
dictionary.

>>> In short, I think that it would be quite useful to have uniform
>>> *actual* convergence criteria in the CIF output, and check them
>>> before inserting computations into TCOD, like we check crystal
>>> structure symmetry, bond distances, parameter shifts or
>>> R-factors.

> Here I tend to agree,

OK :)

> but at the same time I would not want us to impose too draconic
> criteria for acceptance, as long as the obtained result is
> documented.

Regards,
Saulius

--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366