(as it seems there have been technical problems about TCOD, I'm not sure whether this 3-months-ago mail actually reached all or you or not. I just resend it.)
-------- Original Message -------- Subject: some discussion topics related to TCOD Date: Fri, 24 May 2013 14:34:08 +0200 From: Stefaan Cottenier Stefaan.Cottenier@UGent.be To: tcod@lists.crystallography.net
Dear colleagues,
Today, we held a group discussion at CMM (http://molmod.ugent.be) about TCOD from the perspective of users and/or structure donators. That did not lead to clear conclusions, but rather to a series of thoughts that can be a starting point for further discussions or actions. I'll list those here (I guess that is what this mailing list is meant for):
1) You are probably aware of other database initiatives for computed crystal structures. Is there a vision on whether TCOD wants to 'compete' with those, or whether TCOD tries to fill a niche that is not served by other databases?
For instance, consider https://www.materialsproject.org/. This is a database that aims doing a VASP geometry optimization for every structure in ICSD, starting from the ICSD cif (quite ironically, you can read their starting geometry (=ICSD info) for free, without having an ICSD yourself...). They use their own dedicated supercomputer infrastructure to run all this, and upload only results achieved by themselves. No input by others, this in order to keep control over quality and consistency. It has over 30.000 crystals by now, and apart from the structure info also computed properties are being added.
I quote here a paragraph from a project proposal which we submitted earlier this year, and which contains info/references about other databases:
"As is often the case for revolutions, this idea has been realized simultaneously and more or less independently at several places, emphasizing different aspects. The Materials Project [Jai11,MP] is an initiative at MIT, where basic properties of all crystalline solids documented in the (experimental) Inorganic Crystal Structure Database (ICSD) [ICSD] have been computed by DFT. The computed formation energies are used to construct secondary data bases of binary and ternary phase diagrams [Ong08, Jai11b]. Similar initiatives, each with their own focus and in different stages of growth are AFLOWLIB [Set10,Cur12] (Duke University), OQMD [Wol12] (Northwestern University) and CompES [CES] (NIMS, Japan). Other database initiatives that emphasize collaborative data sharing (see also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC Davis (US), the Computational Materials Repository (CMR) [Lan12] at DTU (Denmark) and the AIDA environment [Koz13]."
[Jai11] A. Jain et al. Computational Materials Science 50 (2011) 2295 [Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115. [Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials 20 (2008) 1798 [Set10] W. Setyawan, S. Curtarolo, Computational Materials Science 49 (2010) 299 [Cur12] S. Curtarolo et al., Computational Materials Science 58 (2012) 227 [Wol12] Open Quantum Mechanical Database, http://wolverton.northwestern.edu/oqmd (no public access yet) [CES] http://caldb.nims.go.jp/index_en.html (Y. Chen, A. Nogami, H. Ohtani, N. Tatara) [Yua10] G. Yuan, F. Gygi, Computational Science & Discovery 3 (2010) 015004 [Yua12] G. Yuan, F. Gygi, Computer Physics Communications 183 (2012) 1744 [Lan12] D. Landis et al., Computing in Science and Engineering, nov/dec 2012, p. 51 (http://dx.doi.org/10.1109/MCSE.2012.16) [Koz13] B. Kozinsky, N. Marzari, N. Bonini, J. Garg, G. Pizzi, A. Cepellotti, M. Fornari, contributions at the MRS Fall meeting (Boston, Nov. 2012) and the DPG Spring meeting (Regensburg, March 2013).
(2) From our perspective, we don't understand the need to separate experimental and computational databases. As long as every structure is properly tagged as 'experimental' or 'computed', we see no reason to separate them. It will only lead to the burden of having to run queries twice. In any case, a unified search web page that searches in both data bases simultaneously seems useful to us (that is, de facto, treating (T)COD as one database).
(3) Quality control. If a deposited computed structure has been published, the reference to the publication serves as quality control. That's probably similar for deposited experimental structures, isn't it? If a computed structure has not (yet) been published, how to assess its quality? Well, let us turn the question around: if a not yet published experimental structure is deposited, how do you judge its quality...? It makes sense to do it in the same way: as long as basic information is provided on how the calculation has been done, later users should make their own judgement.
(4) A quality control measure that could make sense for both experimental and computed structures, is to allow people to add remarks that appear on the web page of that structure ('considering the small basis set that has been used, I do not trust this result', or 'this result has been contradicted by <ref>'). Such information by the community can help people who are not experts in either computing or measuring crystal structures to decide to which extent they can trust a particular result. (Another variant is a feature on the web page of each structure to flag it if you have doubts, such that experts can look at it).
(5) COD has cifs as entries only. If you want to go as far as including input info to reproduce each calculation, it might be hard to stick cifs-only for TCOD. It will then be necessary to link additional files to the cif of each entry.
(6) We are split over whether it makes sense to require that each deposited calculation is exactly reproducible. Some basic information should be given, of course. This will be mainly information about the method that does not depend on the specific implementation (code). Furthermore, the main technical input parameters that are specific for the code used, should be given as well (if you want to do that in a structured way, you will immediately hit the problem that the required set of numbers will be a different kind of set for every other code...). Third in line should then be a free field for 'any other special settings' that could apply. With this kind of info, people can reproduce to a large extent (95%) the same calculation. If you want to have it reproduced exactly, then much more input (files) will be required. Which might not be worth the effort.
(7) One of our enthusiastic PhD students with many years of experience in web design and databases immediately had a mental map about how to implement an automated version of (6) -- if the TCOD-team is willing to take him aboard to code that, he'll probably agree ;-).
(8) In any case, it should be clear for the one who provides the structure, what exactly is requested. A web form where the information under (6) can be filled out (or the automated tool of (7) that extracts all this info from the output of each mainstream code) will be required in order to have consistent data.
(9) Which kind of computed structures will TCOD accept? Ground state structures, obviously? Metastable structures probably as well? Also if they are not dynamically stable (soft phonon mode)? (that is not routinely examined) And what about transition state structures? (never observable in experiment, yet very useful to know -- and hard to find -- in order to understand reactions) If the latter are allowed, then they should be tagged as such.
Best regards, Stefaan
On 2013-09-06 19:02, Stefaan Cottenier wrote:
(9) Which kind of computed structures will TCOD accept? Ground state structures, obviously? Metastable structures probably as well? Also if they are not dynamically stable (soft phonon mode)? (that is not routinely examined) And what about transition state structures? (never observable in experiment, yet very useful to know -- and hard to find -- in order to understand reactions) If the latter are allowed, then they should be tagged as such.
I find this suggestion very good and comprehensive. I have added the '_tcode_structure_type' data name to cif_tcod.dic based on this suggestion. I would be grateful if you verify that I did not put too much nonsense into the definitions, my understanding if the metastable/transition/soft phonon structures is so far rather limited... The dictionary is in its usual place: http://www.crystallography.net/tcod/cif/dictionaries/cif_tcod.dic (Subversion repo: svn://cod.ibt.lt/tcod/cif/dictionaries/)
One more related question:
If I understand correctly, from what I recall from the QM lectures, we can have in principle two kinds of boundary conditions for localised particle wave functions:
a) vanishing at the infinity (modelling single molecule in vacuum), and b) periodic (modelling an ideal crystal).
From what I have read in manuals of the QM codes, most implement b), and a) is approximated by putting a molecule in a large enough "unit cell" so that interactions between molecule images are negligible.
Is this a correct view? Does any code implement (a) as a separate mode of computation?
In either case, we should probably have a special tag that distinguishes "true" crystal structures from the "convenience" unit cells that are non-physical but are set up solely to solve a molecule structure problem with the same code that also deals with crystals. Any ideas how to tell from the computations which mode was used?
Regards, Saulius
Dear Saulius,
On Tue, Jul 22, 2014 at 3:02 PM, Saulius Gražulis grazulis@ibt.lt wrote:
On 2013-09-06 19:02, Stefaan Cottenier wrote:
(9) Which kind of computed structures will TCOD accept? Ground state structures, obviously? Metastable structures probably as well? Also if they are not dynamically stable (soft phonon mode)? (that is not routinely examined) And what about transition state structures? (never observable in experiment, yet very useful to know -- and hard to find -- in order to understand reactions) If the latter are allowed, then they should be tagged as such.
I find this suggestion very good and comprehensive. I have added the '_tcode_structure_type' data name to cif_tcod.dic based on this suggestion. I would be grateful if you verify that I did not put too much nonsense into the definitions, my understanding if the metastable/transition/soft phonon structures is so far rather limited... The dictionary is in its usual place: http://www.crystallography.net/tcod/cif/dictionaries/cif_tcod.dic (Subversion repo: svn://cod.ibt.lt/tcod/cif/dictionaries/)
But would it not becoming, reaction and not crystallographic database? Having a phonon spectrum, even if there are imaginary modes is probably very useful, some materials only become stable due to anharmonicity, so their phonon spectra always contain some imaginary modes. But storing transition state structures would be an enormous overhead... Moreover, if one wants to store TS one should also store the reactant and product states as well. Then all the definitions of the method on how that TS was obtained... Would one store all the intermediate NEB images then?
One more related question:
If I understand correctly, from what I recall from the QM lectures, we can have in principle two kinds of boundary conditions for localised particle wave functions:
a) vanishing at the infinity (modelling single molecule in vacuum), and b) periodic (modelling an ideal crystal).
That's is kind of correct.
From what I have read in manuals of the QM codes, most implement b), and a) is approximated by putting a molecule in a large enough "unit cell" so that interactions between molecule images are negligible.
This schism results from the fact that there are two main approaches to the basis sets used to expand the wave-functions in electronic structure codes. Plane-wave are very popular in the solid-state community and due to their inherent periodicity are very suitable for dealing with periodic systems (solids > surfaces > infinite wires). Atomic basis sets such as Gaussian-type orbitals are very popular in the molecule calculations or non-periodic systems. PW are rarely used for doing molecule calculations, since you need a big box so that the molecules would not interact, or you have to decouple them, by e.g. setting the potential to zero at the cell boundaries and self-consistently solving that problem.
Is this a correct view? Does any code implement (a) as a separate mode of computation?
In either case, we should probably have a special tag that distinguishes "true" crystal structures from the "convenience" unit cells that are non-physical but are set up solely to solve a molecule structure problem with the same code that also deals with crystals. Any ideas how to tell from the computations which mode was used?
We should be only interested in periodic structures. Or I would be even more strict - 3D periodic (crystals). We should not worry about the molecules or clusters... or leave it to others :-) We just should not have any limitation on the supercell size of the structure, but in general it should be strictly 3D periodic for this database. That would make life much easier and give the database some shape on what we are trying to systematize.
Regards, Saulius
Best regards,
Linas
-- Saulius Gražulis
Biotechnologijos institutas Graičiūno 8 02241 Vilnius-28 Lietuva
Tel.: vidaus BTI: 226 Vilniaus BTI: (8-5)-260-25-56 mobilus TELE2: (8-684)-49-802 mobilus OMNIT: (8-614)-36-366
Tcod mailing list Tcod@lists.crystallography.net http://lists.crystallography.net/cgi-bin/mailman/listinfo/tcod
Dear DFT-optimizers (and others),
Since it is time to questions, I have one.
DFT-optimized structures are done in a great variety of cases, one being when the refinement quality against experimental data (powder, etc) is poor due to an unfavourable ratio P/D (parameters/Data).
However, I have almost never seen the comparison of the experimental data to the intensities calculated from the optimized atomic coordinates... I have especially in mind a "simple" example, calcium carbonate, vaterite form, for which dozens of optimized models were proposed, without even one comparison to the data...
So the question is : In case of the existence of an experimental dataset, should a comparison observed/calculated be requested (classical R values but with fixed coordinates) ?
Best,
Armel
On 2014-07-28 11:23, Armel le Bail wrote:
Since it is time to questions, I have one.
DFT-optimized structures are done in a great variety of cases, one being when the refinement quality against experimental data (powder, etc) is poor due to an unfavourable ratio P/D (parameters/Data).
However, I have almost never seen the comparison of the experimental data to the intensities calculated from the optimized atomic coordinates... I have especially in mind a "simple" example, calcium carbonate, vaterite form, for which dozens of optimized models were proposed, without even one comparison to the data...
So the question is : In case of the existence of an experimental dataset, should a comparison observed/calculated be requested (classical R values but with fixed coordinates) ?
I thing that COD/TCOD comparison of the same crystal structure refined against Fobs and optimised with the advanced DFT methods would give an good overview of the correspondence, and that's what I would be very interested in.
I regard this as a win-win-win situation:
a) if both X-ray and DFT agree, we will have greater reassurance that both are correct (within experimental and computational errors);
b) if the disagree, then either:
b') the x-ray structure has problems
or
b'') the DFT computation has problems.
In both cases we can check the data, trace down the problems and (ideally) fix them.
I am currently in Cambridge with Peter Murray-Rust, and we had very interesting conversations with people who actually do similar things, for their own research. The results are very intriguing :). If we could get these people on board we could have valuable contributions, I hope.
Regards, Saulius
Dear Armel,
However, I have almost never seen the comparison of the experimental data to the intensities calculated from the optimized atomic coordinates... I have especially in mind a "simple" example, calcium carbonate, vaterite form, for which dozens of optimized models were proposed, without even one comparison to the data...
I don't think this is really rare. But often there is simply no need to do it. One case in which I was somewhat involved makes such an explicit comparison in figs. 3-4-5: http://dx.doi.org/10.1103/PhysRevB.84.054110 . Another example that comes to my mind is a structure prediction problem by Oganov (fig. 2 in http://dx.doi.org/10.1016/j.epsl.2005.10.014). And I remember a conference presentation by Chris Wolverton where he does DFT structure prediction not by optimizing total energy but by minimizing the difference between the experimental and DFT-based powder spectrum.
So the question is : In case of the existence of an experimental dataset, should a comparison observed/calculated be requested (classical R values but with fixed coordinates) ?
According to me: no. Let me slightly idealize, and assume that for every entry in TCOD the numerics are done right. In that case, all entries would contain the exact/unique/precise prediction for a particular crystal *at the given level of theory*. Everyone working at the same level of theory (e.g. using the same XC-functional within DFT) would make essentially the same prediction. That may or may not agree with the experimental structure, either because the experiment is 'wrong' (e.g. a small contamination that triggers a different crystal structure than for the pure material that has been calculated) or because the level of theory is insufficient (there is no XC-functional known that is 100% correct for all classes of materials).
My point is: the task of comparing experiment and theory (i.e. comparing COD and TCOD) is a research task in its own right. It should be performed by the users of these databases. For TCOD as a computational database, comparison to the alien world of experiments is not the thing to do. It would be similar to allowing into COD only those structures that have survived a DFT check ;-)
Best, Stefaan
Hi,
My point is: the task of comparing experiment and theory (i.e. comparing COD and TCOD) is a research task in its own right.
To my knowledge, theoricians are aware about the low quality of the cell parameters derived from their DFT optimizations to the point that they prefer to fix them to the experimental values when available. There is here at least a point of convergence between COD and TCOD - and a serious problem with the current theoretical approach ;-).
Best,
Armel
My point is: the task of comparing experiment and theory (i.e. comparing COD and TCOD) is a research task in its own right.
To my knowledge, theoricians are aware about the low quality of the cell parameters derived from their DFT optimizations to the point that they prefer to fix them to the experimental values when available.
Very interesting to read this statement. It shows how useful it is that we discuss. As to my experience, it is the other way around: theoretical lattice parameters are so good that there is often no need to compute them if you already know the experimental ones (why spending computer time to find the same numbers that you already knew from experiment)?
Sure, there are exceptions. And sure, it also depends on the level of accuracy you need: if the observable property you are interested in crucially depends on the 3th digit of the lattice parameter, then the 'low quality' theoretical lattice parameters can be detrimental. But for the majority of cases, it simply doesn't matter whether you take experimental or theoretical lattice parameters.
In http://dx.doi.org/10.1080/10408436.2013.772503 , we tried to quantify the disagreement between experimental and theoretical cell volume for the PBE XC-functional. The conclusion is (Tab. 10) that PBE overestimates the cell volume by 3.8%, and that after correcting for this overestimation (if you wish to do so) there is a residual scatter (error bar) of 1.1 A3/atom. To give an example (see also last two lines of Tab. 5): this means that PBE predicts the lattice parameter of bcc-W with an uncertainty of 0.07 Ang. That would be a huge error bar for a modern diffractometer, I agree. From that point of view you are right to call this 'low quality'. But for many purposes that level of uncertainty is just fine.
There is here at least a point of convergence between COD and TCOD - and a serious problem with the current theoretical approach ;-).
There is another point of convergence that is imho more important: the crystal structure itself. >99% of all current DFT calculations start from complete or partial knowledge of the crystal symmetry. Doing a fully unbiased structure prediction by DFT, from the starting point of only the chemical composition, is a huge (albeit not impossible) task. We've done only one example of this so far (http://dx.doi.org/10.1039/c3ce41009a -- I love this one). It takes a lot of resources, though, and that's why at present everybody happily accepts the experimental symmetry as long as there are no indications it could be wrong.
This triggered another thought: isn't the information whether only positions were optimized and/or whether the full cell shape was optimized as well level-0 information? If we see only the cif without this information, then there is not much that can be concluded about such an entry. And as a corollary: doesn't that imply that there is no place in TCOD for DFT calculations that start from the experimental cif without any subsequent optimization? What do others think?
Stefaan
Hello,
On 2014-07-30 01:02, Armel le Bail wrote:
My point is: the task of comparing experiment and theory (i.e. comparing COD and TCOD) is a research task in its own right.
To my knowledge, theoreticians are aware about the low quality of the cell parameters derived from their DFT optimizations to the point that they prefer to fix them to the experimental values when available. There is here at least a point of convergence between COD and TCOD - and a serious problem with the current theoretical approach ;-).
I would take a philosophical view on this subject :). In science, we are always constrained by the limitations of our current methods, and this is true both for XRD experiments and for DFT computations. As long as these limitations are known and recorded, we can make useful predictions from the experimental and theoretical results. This is what COD and TCOD are aiming at, aren't they?
There are cases when the limitations of the methods are not exactly known, or are known only approximately. I hope that in these cases, comparing TCOD and COD would bring useful insights and help to determine those limits; and maybe to improve methods as well, both in experiment and in theoretical approaches.
I would expect that, in a "reliable prediction" limit, theoretical optimisations and experimental results must agree within error bars. If they do not, we need to investigate why. I agree that this will be a research in its own; both COD and TCOD will record self-consistent, state-of-the art results generated by both approaches (but we will want to exclude sub-standard files, that's why quality criteria are important).
If DFT computations need experimental cells for increased accuracy -- fine, COD is free and open to provide them! Moreover, in some cases COD *only* has cell constants and composition, since the experimental coordinates are behind the pay-wall of proprietary journals and databases. In these cases a DFT computation int TCOD might be the only free data set with atomic coordinates of that particular compound! If we could do this reliably on a large scale, TCOD would have enormous value as a free resource, liberating crystal structures that many people otherwise have no access to, or which we may not use as we would wish due to license limitations.
Sincerely yours, Saulius
If DFT computations need experimental cells for increased accuracy -- fine, COD is free and open to provide them! Moreover, in some cases COD *only* has cell constants and composition, since the experimental coordinates are behind the pay-wall of proprietary journals and databases. In these cases a DFT computation int TCOD might be the only free data set with atomic coordinates of that particular compound! If we could do this reliably on a large scale, TCOD would have enormous value as a free resource, liberating crystal structures that many people otherwise have no access to, or which we may not use as we would wish due to license limitations.
It might be useful to know that others have done already exactly this. www.materialsproject.org is a free database of VASP calculations for essentially all relevant ICSD entries. They list not only the unit cell info after optimization, but also the starting point... which is exactly what you find in ICSD. Free to read for anyone. (I once asked them whether they asked permission to ICSD to do this. The answer was: no, we just did it, and so far nobody complained.)
Stefaan
Hi,
On 2014-07-30 12:21, Stefaan Cottenier wrote:
It might be useful to know that others have done already exactly this. www.materialsproject.org is a free database of VASP calculations for essentially all relevant ICSD entries. They list not only the unit cell info after optimization, but also the starting point... which is exactly what you find in ICSD. Free to read for anyone. (I once asked them whether they asked permission to ICSD to do this. The answer was: no, we just did it, and so far nobody complained.)
Just looked into the www.materialsproject.org web site. This might be a useful project, but nowhere close to being "free" (as in "freedom"):
a) Terms and conditions: "You agree not to use automated scripts to collect large fractions of the database and disseminate them." (https://www.materialsproject.org/terms).
b) Even to look at it, you need to log in (no data is available otherwise -- or at least I could not find it). When I tried to register using my gmail account, Google reports: "Materialsproject.rpxnow.com would like to: View the email address associated with your account; - View your basic profile info". Why would the MP want to dig into my profile just to let me to view a sample of the MP data?! I wonder if MP makes a database of materials properties, or a database for direct marketing? Needless to say, I declined.
The declared MP terms and conditions fall short of any open data definition (e.g. as in http://opendefinition.org/), IMHO. That's probably why ICSD did not complain -- the Materials project actually exposes nothing on the Web.
This is not to say that they are doing anything bad or not useful under special circumstances, but we may not/can not reuse the MP data for TCOD, so it is of little use for us. *COD databases are completely different in this respect -- they are indeed free.
Regards, Saulius
This is not to say that they are doing anything bad or not useful under special circumstances, but we may not/can not reuse the MP data for TCOD, so it is of little use for us.
Hello Saulius,
I was not suggesting we could possibly reuse the MP data, I only wanted to warn that we are not the only/first ones who ponder about translating for-pay crystallographic info into the not-for-pay area.
I know your sensitivities ;-), but personally I'm not too much concerned by these privacy issues. Money can be a barrier for progress, so I like scientific information to be accessible without paying. But if a supplier of scientific information asks me some information in return instead of money, I don't care. Even if it would trigger direct marketing (I wish them good luck when trying to direct-market me -- I delete every morning >50% of the new emails without reading them at all).
Of course, I prefer to contribute to projects that are maximally open. But I know that in the world in which we live many scientists face constraints that prevent them to make their data as open as you wish. And I will not deliberately keep myself unaware about their knowledge just because I want to protect my google profile ;-)
And now I shut up about this issue, as it is going to be off-topic as far as the TCOD mailing list is concerned.
Best, Stefaan
Hi,
On 2014-07-25 06:51, Linas Vilciauskas wrote:
But would it not becoming, reaction and not crystallographic database? Having a phonon spectrum, even if there are imaginary modes is probably very useful, some materials only become stable due to anharmonicity,
You are right. Actually, we have discussed the issue with Peter Murray-Rust, and he has dispelled some of my misunderstandings about the breadth of the quantum chemistry uses.
It seems that it makes sense that we focus on the QM optimisations (most will be DFT I suppose) of *crystalline* materials. After all, TCOD is a crystallography database, isn't it? :)
As a person affiliated with molecular biology and drug design, I would be of course also interested in comparing crystal structures with gas-phase (or I would rather say in vacuo) QM optimised structures; but these, as Peter has told me, are much more diverse and thus more difficult to manage in a coherent database. So we can postpone their addition; in any case, we'll have to flag them carefully and probably keep separately (different ID range or namespace?).
These are just ideas, I regard this as the idea generation stage where the community expectations and our capabilities will become clear so that we can put them down as a set of principles to guide us. Please feel free to contribute and contradict :)
Regards, Saulius
tcod@lists.crystallography.net