(as it seems there have been technical problems about TCOD, I'm not sure
whether this 3-months-ago mail actually reached all or you or not. I
just resend it.)
-------- Original Message --------
Subject: some discussion topics related to TCOD
Date: Fri, 24 May 2013 14:34:08 +0200
From: Stefaan Cottenier <Stefaan.Cottenier(a)UGent.be>
To: tcod(a)lists.crystallography.net
Dear colleagues,
Today, we held a group discussion at CMM (http://molmod.ugent.be) about
TCOD from the perspective of users and/or structure donators. That did
not lead to clear conclusions, but rather to a series of thoughts that
can be a starting point for further discussions or actions. I'll list
those here (I guess that is what this mailing list is meant for):
1) You are probably aware of other database initiatives for computed
crystal structures. Is there a vision on whether TCOD wants to 'compete'
with those, or whether TCOD tries to fill a niche that is not served by
other databases?
For instance, consider https://www.materialsproject.org/. This is a
database that aims doing a VASP geometry optimization for every
structure in ICSD, starting from the ICSD cif (quite ironically, you can
read their starting geometry (=ICSD info) for free, without having an
ICSD yourself...). They use their own dedicated supercomputer
infrastructure to run all this, and upload only results achieved by
themselves. No input by others, this in order to keep control over
quality and consistency. It has over 30.000 crystals by now, and apart
from the structure info also computed properties are being added.
I quote here a paragraph from a project proposal which we submitted
earlier this year, and which contains info/references about other databases:
"As is often the case for revolutions, this idea has been realized
simultaneously and more or less independently at several places,
emphasizing different aspects. The Materials Project [Jai11,MP] is an
initiative at MIT, where basic properties of all crystalline solids
documented in the (experimental) Inorganic Crystal Structure Database
(ICSD) [ICSD] have been computed by DFT. The computed formation energies
are used to construct secondary data bases of binary and ternary phase
diagrams [Ong08, Jai11b]. Similar initiatives, each with their own focus
and in different stages of growth are AFLOWLIB [Set10,Cur12] (Duke
University), OQMD [Wol12] (Northwestern University) and CompES [CES]
(NIMS, Japan). Other database initiatives that emphasize collaborative
data sharing (see also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC
Davis (US), the Computational Materials Repository (CMR) [Lan12] at DTU
(Denmark) and the AIDA environment [Koz13]."
[Jai11] A. Jain et al. Computational Materials Science 50 (2011) 2295
[Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115.
[Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials 20
(2008) 1798
[Set10] W. Setyawan, S. Curtarolo, Computational Materials Science 49
(2010) 299
[Cur12] S. Curtarolo et al., Computational Materials Science 58 (2012) 227
[Wol12] Open Quantum Mechanical Database,
http://wolverton.northwestern.edu/oqmd (no public access yet)
[CES] http://caldb.nims.go.jp/index_en.html (Y. Chen, A. Nogami, H.
Ohtani, N. Tatara)
[Yua10] G. Yuan, F. Gygi, Computational Science & Discovery 3 (2010) 015004
[Yua12] G. Yuan, F. Gygi, Computer Physics Communications 183 (2012) 1744
[Lan12] D. Landis et al., Computing in Science and Engineering, nov/dec
2012, p. 51
(http://dx.doi.org/10.1109/MCSE.2012.16)
[Koz13] B. Kozinsky, N. Marzari, N. Bonini, J. Garg, G. Pizzi, A.
Cepellotti, M. Fornari, contributions at the MRS
Fall meeting (Boston, Nov. 2012) and the DPG Spring meeting (Regensburg,
March 2013).
(2) From our perspective, we don't understand the need to separate
experimental and computational databases. As long as every structure is
properly tagged as 'experimental' or 'computed', we see no reason to
separate them. It will only lead to the burden of having to run queries
twice. In any case, a unified search web page that searches in both data
bases simultaneously seems useful to us (that is, de facto, treating
(T)COD as one database).
(3) Quality control. If a deposited computed structure has been
published, the reference to the publication serves as quality control.
That's probably similar for deposited experimental structures, isn't it?
If a computed structure has not (yet) been published, how to assess its
quality? Well, let us turn the question around: if a not yet published
experimental structure is deposited, how do you judge its quality...? It
makes sense to do it in the same way: as long as basic information is
provided on how the calculation has been done, later users should make
their own judgement.
(4) A quality control measure that could make sense for both
experimental and computed structures, is to allow people to add remarks
that appear on the web page of that structure ('considering the small
basis set that has been used, I do not trust this result', or 'this
result has been contradicted by <ref>'). Such information by the
community can help people who are not experts in either computing or
measuring crystal structures to decide to which extent they can trust a
particular result. (Another variant is a feature on the web page of each
structure to flag it if you have doubts, such that experts can look at it).
(5) COD has cifs as entries only. If you want to go as far as including
input info to reproduce each calculation, it might be hard to stick
cifs-only for TCOD. It will then be necessary to link additional files
to the cif of each entry.
(6) We are split over whether it makes sense to require that each
deposited calculation is exactly reproducible. Some basic information
should be given, of course. This will be mainly information about the
method that does not depend on the specific implementation (code).
Furthermore, the main technical input parameters that are specific for
the code used, should be given as well (if you want to do that in a
structured way, you will immediately hit the problem that the required
set of numbers will be a different kind of set for every other code...).
Third in line should then be a free field for 'any other special
settings' that could apply. With this kind of info, people can reproduce
to a large extent (95%) the same calculation. If you want to have it
reproduced exactly, then much more input (files) will be required. Which
might not be worth the effort.
(7) One of our enthusiastic PhD students with many years of experience
in web design and databases immediately had a mental map about how to
implement an automated version of (6) -- if the TCOD-team is willing to
take him aboard to code that, he'll probably agree ;-).
(8) In any case, it should be clear for the one who provides the
structure, what exactly is requested. A web form where the information
under (6) can be filled out (or the automated tool of (7) that extracts
all this info from the output of each mainstream code) will be required
in order to have consistent data.
(9) Which kind of computed structures will TCOD accept? Ground state
structures, obviously? Metastable structures probably as well? Also if
they are not dynamically stable (soft phonon mode)? (that is not
routinely examined) And what about transition state structures? (never
observable in experiment, yet very useful to know -- and hard to find --
in order to understand reactions) If the latter are allowed, then they
should be tagged as such.
Best regards,
Stefaan
--
Stefaan Cottenier
Center for Molecular Modeling (CMM) &
Department of Materials Science and Engineering (DMSE)
Ghent University
Technologiepark 903
BE-9052 Zwijnaarde
Belgium
http://molmod.Ugent.behttp://www.ugent.be/ea/dmse/en
email: Stefaan . Cottenier /at/ UGent . be
Dear Stefaan,
dear colleagues,
On 2013-09-06 19:02, Stefaan Cottenier wrote:
> (as it seems there have been technical problems about TCOD, I'm not
> sure whether this 3-months-ago mail actually reached all or you or
> not. I just resend it.)
Many thanks for reposting your e-mail. The TCOD mailing list, indeed,
had a configuration error, but now I hope I have fixed it and our
discussions can continue unhindered.
> -------- Original Message -------- Subject: some discussion topics
> related to TCOD Date: Fri, 24 May 2013 14:34:08 +0200 From: Stefaan
> Cottenier <Stefaan.Cottenier(a)UGent.be> To:
> tcod(a)lists.crystallography.net
>
>
> Dear colleagues,
>
> Today, we held a group discussion at CMM (http://molmod.ugent.be)
> about TCOD from the perspective of users and/or structure donators.
> That did not lead to clear conclusions, but rather to a series of
> thoughts that can be a starting point for further discussions or
> actions. I'll list those here (I guess that is what this mailing list
> is meant for):
Thanks a lot, your input is very important, and we definitely need to
take into account everyone's needs and opinions to make TCOD useful.
> 1) You are probably aware of other database initiatives for computed
> crystal structures. Is there a vision on whether TCOD wants to
> 'compete' with those, or whether TCOD tries to fill a niche that is
> not served by other databases?
>
> For instance, consider https://www.materialsproject.org/. This is a
> database that aims doing a VASP geometry optimization for every
> structure in ICSD, starting from the ICSD cif (quite ironically, you
> can read their starting geometry (=ICSD info) for free, without
> having an ICSD yourself...). They use their own dedicated
> supercomputer infrastructure to run all this, and upload only results
> achieved by themselves. No input by others, this in order to keep
> control over quality and consistency. It has over 30.000 crystals by
> now, and apart from the structure info also computed properties are
> being added.
Sure enough, there are several approaches besides TCOD already that
implement collections in theoretical structures in one way or another. I
have read a bit about the https://www.materialsproject.org/ and I am
still familiarising myself with others you have mentioned in your
2013-08-20 01:44 e-mail (I would like to encourage you to repost it to
the TCOD list if you do not object making it open).
The main difference of TCOD is that, unlike the Materialsproject group
who as you say "pload only results achieved by themselves", TCOD is
(*already* :) open to wide range of contributors, including the
Materialsproject if they wish or agree to share their data on the Open
Data basis. This may be both TCOD's weakness and strength. As you say,
"No input by others, this in order to keep control over quality and
consistency" -- true. But on the other hand, the scientific community is
broader and is getting broader, there are more programs than VASP (even
if VASP would be considered the best), and it would be interesting to
compare various calculations, and calculations vs. experimental results
-- a COD/TCOD/PCOD bundle would make this easy to accomplish if done
properly.
> I quote here a paragraph from a project proposal which we submitted
> earlier this year, and which contains info/references about other
> databases:
>
> "As is often the case for revolutions, this idea has been realized
> simultaneously and more or less independently at several places,
> emphasizing different aspects. The Materials Project [Jai11,MP] is
> an initiative at MIT, where basic properties of all crystalline
> solids documented in the (experimental) Inorganic Crystal Structure
> Database (ICSD) [ICSD] have been computed by DFT. The computed
> formation energies are used to construct secondary data bases of
> binary and ternary phase diagrams [Ong08, Jai11b]. Similar
> initiatives, each with their own focus and in different stages of
> growth are AFLOWLIB [Set10,Cur12] (Duke University), OQMD [Wol12]
> (Northwestern University) and CompES [CES] (NIMS, Japan). Other
> database initiatives that emphasize collaborative data sharing (see
> also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC Davis (US), the
> Computational Materials Repository (CMR) [Lan12] at DTU (Denmark) and
> the AIDA environment [Koz13]."
>
> [Jai11] A. Jain et al. Computational Materials Science 50 (2011)
> 2295 [Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115.
> [Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials
> 20 (2008) 1798 [Set10] W. Setyawan, S. Curtarolo, Computational
> Materials Science 49 (2010) 299 [Cur12] S. Curtarolo et al.,
> Computational Materials Science 58 (2012) 227 [Wol12] Open Quantum
> Mechanical Database, http://wolverton.northwestern.edu/oqmd (no
> public access yet) [CES] http://caldb.nims.go.jp/index_en.html (Y.
> Chen, A. Nogami, H. Ohtani, N. Tatara) [Yua10] G. Yuan, F. Gygi,
> Computational Science & Discovery 3 (2010) 015004 [Yua12] G. Yuan, F.
> Gygi, Computer Physics Communications 183 (2012) 1744 [Lan12] D.
> Landis et al., Computing in Science and Engineering, nov/dec 2012, p.
> 51 (http://dx.doi.org/10.1109/MCSE.2012.16) [Koz13] B. Kozinsky, N.
> Marzari, N. Bonini, J. Garg, G. Pizzi, A. Cepellotti, M. Fornari,
> contributions at the MRS Fall meeting (Boston, Nov. 2012) and the DPG
> Spring meeting (Regensburg, March 2013).
You are right, the idea of having Open theoretical structure database is
floating in the air. But is there a database that is:
a) open (as in Open Data), permitting unrestricted sharing, reuse and
republication of data if cited properly;
b) accepting contributions from all relevant sources;
c) up and running right now?
If yes, then TCOD is probably not needed. Indeed, if Computational
Materials Repository accepts the computed depositions, then its probably
the one that does the job. If, however, it does not -- we give a try,
why not? The need is there, and the TCOD was basically a reaction to the
request from community to deposit computed crystal structures which COD
currently either rejects or checks poorly.
> (2) From our perspective, we don't understand the need to separate
> experimental and computational databases. As long as every structure
> is properly tagged as 'experimental' or 'computed', we see no reason
> to separate them. It will only lead to the burden of having to run
> queries twice. In any case, a unified search web page that searches
> in both data bases simultaneously seems useful to us (that is, de
> facto, treating (T)COD as one database).
These concerns will be addressed. The division COD and TCOD is pure
technical; searches will be possible in the uniform unions of COD ad
TCOD and PCOD.
The reason for separation are the different data requirements for
experimental and computed structures -- COD needs criteria that show how
good the model matches the observed data (R-factors, CC's, finally, the
Fobs data itself); TCOD needs computational parameters that indicate
convergence of the process, etc.
> (3) Quality control. If a deposited computed structure has been
> published, the reference to the publication serves as quality
> control.
Absolutely. The COD/TCOD framework does store full provenance
information in the records, and we treat published structures as
peer-reviewed.
Still, "peer-reviewed" dos not always mean "correct" or "accurate";
especially when it comes to data...
> That's probably similar for deposited experimental structures, isn't
> it?
Yes it is.
> If a computed structure has not (yet) been published, how to assess
> its quality? Well, let us turn the question around: if a not yet
> published experimental structure is deposited, how do you judge its
> quality...?
For experimental structure, what counts is its correspondence to the
observed data (Iobs/Fobs) and to our background knowledge (reasonably
chemistry). The correspondence to data can be expressed in numeric terms
in many cases -- Rcryst, CC, GoF are parameters that are routinely cited
in experimental CIFs to show the size of discrepancies between the model
and the experiment. No single parameter is perfect, for sure, and most
have been criticized, but they do give a rough (and, most importantly,
machine-processable) picture of the model-observation fit.
As for "reasonable chemistry", parameters are less formalised, but an
experienced chemist can detect a lot of suspicious cases by looking at
the structure, and we have now compiled comprehensive statistics on COD
(the master thesis work of Andrius Merkys) so that we can now detect
"low probability" structures automatically.
> It makes sense to do it in the same way: as long as basic information
> is provided on how the calculation has been done, later users should
> make their own judgement.
Absolutely. But the numeric quality criteria (QC) are needed so that we
can monitor the incoming structures automatically -- no-one will be
willing or able to sift manually through the million+ structures, right?
> (4) A quality control measure that could make sense for both
> experimental and computed structures, is to allow people to add
> remarks that appear on the web page of that structure ('considering
> the small basis set that has been used, I do not trust this result',
> or 'this result has been contradicted by <ref>').
You hit the nerve! That's what we are doing right now for COD, and TCOD
will immediately benefit from the system as well.
I would like to point out however that community consensus does not
necessary mean a correct judgement, just a widely accepted one. We have
all knew for a couple of hundred of years that crystals do not have
fivefold symmetry axes -- and yet Shechtman discovered quasicrystals,
now well confirmed (right?).
Another problem with community review is that we simply do not have time
to look through each and every file carefully enough. So computers
should help us; my strategy would be to flag structures automatically as
being "usual" or "unusual", and then ask people whether they find the
provided evidence "convincing" or "unconvicing", ideally in the way you
have proposed, 'this result has been contradicted by <ref>', or
otherwise with a sound argumentation.
The "unusual" and "unconvincing" structures will probably need to be
reinvestigated; the "unusual" but "convincing" ones might be a real gems!
I hope this strategy would be applicable both for COD and TCOD.
> Such information by the community can help people who are not
> experts in either computing or measuring crystal structures to decide
> to which extent they can trust a particular result.
Sure. Lets present it on the *COD search results? Group structures by
"reliability index"?
> (Another variant is a feature on the web page of each structure to
> flag it if you have doubts, such that experts can look at it).
Will be done automatically in a near future.
> (5) COD has cifs as entries only. If you want to go as far as
> including input info to reproduce each calculation, it might be hard
> to stick cifs-only for TCOD. It will then be necessary to link
> additional files to the cif of each entry.
Adding additional files is possible; the upload site needs to be fixed
to accept ans store arbitrary archives, but this is easy to do.
For TCOD, I envisage possibility to store files in arbitrary formats. O
good strategy, IMHO, is outlined Tim Berners Lee’s Open data definition
(I learnd about it from the Peter Murray-Rust's talk,
http://www.iucr.org/__data/assets/pdf_file/0018/80280/murrayrust3.pdf).
We can start accepting additional files that outline computation
process, and later formalise data presentations and formats.
Of course, the better defined the formats are the more useful these
auxiliary files will be. It would be nice if the community agreed on
comprehensive and open computational data formats, and deposit also
descriptions of their computational work flow, not just the final
results. If this goal is attainable, remains to be seen...
> (6) We are split over whether it makes sense to require that each
> deposited calculation is exactly reproducible.
So what if we say that at the moment this is "nice to have", but not a
"must have" option?
> Some basic information should be given, of course.
Agreed. And it is vital that the community agrees what information is
necessary, and we formalise it in some ontology. CIF dictionaries which
you have started are a good possibility for this; other ways are also
possible.
> This will be mainly information about the method that does not depend
> on the specific implementation (code). Furthermore, the main
> technical input parameters that are specific for the code used,
> should be given as well (if you want to do that in a structured way,
> you will immediately hit the problem that the required set of
> numbers will be a different kind of set for every other code...).
IUCr does similar thing for the crystal structure refinement data: they
just include the input script of the refinement program verbatim, in the
data item "_iucr_refine_instructions_details"; vis:
http://www.crystallography.net/cod/2234930.cif
Maybe such approach would be viable for DFT computations? Say in
_dft_computations_details data item?
> Third in line should then be a free field for 'any other special
> settings' that could apply. With this kind of info, people can
> reproduce to a large extent (95%) the same calculation. If you want
> to have it reproduced exactly, then much more input (files) will be
> required. Which might not be worth the effort.
Could be. Actually, storing the just the information you have outlined,
and not the complete input files, would make our database maintainer's
life easier as well.
Would a program, something like Torbjörn's 'cif2cell', be able to
reconstruct at least approximate input scripts to repeat the
computation? If yes, this would be probably what we really need.
> (7) One of our enthusiastic PhD students with many years of
> experience in web design and databases immediately had a mental map
> about how to implement an automated version of (6) -- if the
> TCOD-team is willing to take him aboard to code that, he'll probably
> agree ;-).
Sure! Please invite him/her! Cordially welcome!
> (8) In any case, it should be clear for the one who provides the
> structure, what exactly is requested.
A part of this would be CIF dictionaries and other ontologies,
supplemented by deposition check software, supplemented by *good*
documentation (hopefully ;).
> A web form where the information under (6) can be filled out (or the
> automated tool of (7) that extracts all this info from the output of
> each mainstream code) will be required in order to have consistent
> data.
Good idea!
> (9) Which kind of computed structures will TCOD accept? Ground state
> structures, obviously?
Sure!
> Metastable structures probably as well?
Why not, if marked or otherwise detectable as such.
> Also if they are not dynamically stable (soft phonon mode)? (that is
> not routinely examined)
Sure, very interesting to see such beasts. Again, marked properly.
> And what about transition state structures? (never observable in
> experiment, yet very useful to know -- and hard to find -- in order
> to understand reactions)
Absolutely, this is what many chemists and biologists (including
structural biologists) are interested in. I guess most structural
enzymologists would love to see what a transition state of the
substrate-to-product reaction of their beloved enzyme is, wouldn't they?
> If the latter are allowed, then they should be tagged as such.
For sure. Actually, your list is a good addition for the dictionary --
each of your suggestions might become an enumeration value in a data
item describing the struct
Thanks for your input!
Will put together your and others' ideas as suggestions for action soon
on this mailing list.
Regards,
Saulius
--
Dr. Saulius Gražulis
Vilnius University Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
Dear Stefaan,
Indeed I never received this message before !
I would only make a short answer, and more pkilosophical, because all
these structure simulations are rather new to me.
As for COD, TCOD should remain open to everybody, i.e. uploaded data are
free, downloadable even by fee-based database owners and industrials.
Plus, anybody should be able to add new structures, and it is not
restricted to one given type of calculation, code or whatever. I feel
for now this is the main difference between TCOD and materialsproject (I
haven't seen the others you mentionned).
In my experimentalist mind, getting new structure sets to help
identifying new synthesized structures (Search-Match) would be a great plus.
We already discussed the item in the COD AB whether if the search should
be carried out separately on the different databases or not, and if I
remember right, we went to the point that a single interface allowing to
chose between them would be better. Still to be built.
The questions on all possible tags to place in the files, or to add
separate files should be best chosen by experts. A priori building a
dictionary that includes new terms relative to whatsoever (DFT, MD, LDA,
GGA+ ...) is not a problem.
New arms are welcome to me !
daniel
Le 07/09/2013 14:00, tcod-request(a)lists.crystallography.net a écrit :
> Send Tcod mailing list submissions to
> tcod(a)lists.crystallography.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.crystallography.net/cgi-bin/mailman/listinfo/tcod
> or, via email, send a message with subject or body 'help' to
> tcod-request(a)lists.crystallography.net
>
> You can reach the person managing the list at
> tcod-owner(a)lists.crystallography.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tcod digest..."
>
>
> Today's Topics:
>
> 1. Fwd: some discussion topics related to TCOD (Stefaan Cottenier)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 06 Sep 2013 18:02:18 +0200
> From: Stefaan Cottenier <Stefaan.Cottenier(a)UGent.be>
> To: tcod(a)lists.crystallography.net
> Subject: [TCOD] Fwd: some discussion topics related to TCOD
> Message-ID: <5229FC8A.901(a)UGent.be>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> (as it seems there have been technical problems about TCOD, I'm not sure
> whether this 3-months-ago mail actually reached all or you or not. I
> just resend it.)
>
>
>
> -------- Original Message --------
> Subject: some discussion topics related to TCOD
> Date: Fri, 24 May 2013 14:34:08 +0200
> From: Stefaan Cottenier <Stefaan.Cottenier(a)UGent.be>
> To: tcod(a)lists.crystallography.net
>
>
> Dear colleagues,
>
> Today, we held a group discussion at CMM (http://molmod.ugent.be) about
> TCOD from the perspective of users and/or structure donators. That did
> not lead to clear conclusions, but rather to a series of thoughts that
> can be a starting point for further discussions or actions. I'll list
> those here (I guess that is what this mailing list is meant for):
>
> 1) You are probably aware of other database initiatives for computed
> crystal structures. Is there a vision on whether TCOD wants to 'compete'
> with those, or whether TCOD tries to fill a niche that is not served by
> other databases?
>
> For instance, consider https://www.materialsproject.org/. This is a
> database that aims doing a VASP geometry optimization for every
> structure in ICSD, starting from the ICSD cif (quite ironically, you can
> read their starting geometry (=ICSD info) for free, without having an
> ICSD yourself...). They use their own dedicated supercomputer
> infrastructure to run all this, and upload only results achieved by
> themselves. No input by others, this in order to keep control over
> quality and consistency. It has over 30.000 crystals by now, and apart
> from the structure info also computed properties are being added.
>
> I quote here a paragraph from a project proposal which we submitted
> earlier this year, and which contains info/references about other databases:
>
> "As is often the case for revolutions, this idea has been realized
> simultaneously and more or less independently at several places,
> emphasizing different aspects. The Materials Project [Jai11,MP] is an
> initiative at MIT, where basic properties of all crystalline solids
> documented in the (experimental) Inorganic Crystal Structure Database
> (ICSD) [ICSD] have been computed by DFT. The computed formation energies
> are used to construct secondary data bases of binary and ternary phase
> diagrams [Ong08, Jai11b]. Similar initiatives, each with their own focus
> and in different stages of growth are AFLOWLIB [Set10,Cur12] (Duke
> University), OQMD [Wol12] (Northwestern University) and CompES [CES]
> (NIMS, Japan). Other database initiatives that emphasize collaborative
> data sharing (see also Sec. 2.3, WP3) are ESTEST [Yua10,Yua12] at UC
> Davis (US), the Computational Materials Repository (CMR) [Lan12] at DTU
> (Denmark) and the AIDA environment [Koz13]."
>
> [Jai11] A. Jain et al. Computational Materials Science 50 (2011) 2295
> [Jai11b] A. Jain et al., Physical Review B 84 (2011) 045115.
> [Ong08] S. P. Ong, L. Wang, B. Kang, G. Ceder, Chemistry of Materials 20
> (2008) 1798
> [Set10] W. Setyawan, S. Curtarolo, Computational Materials Science 49
> (2010) 299
> [Cur12] S. Curtarolo et al., Computational Materials Science 58 (2012) 227
> [Wol12] Open Quantum Mechanical Database,
> http://wolverton.northwestern.edu/oqmd (no public access yet)
> [CES] http://caldb.nims.go.jp/index_en.html (Y. Chen, A. Nogami, H.
> Ohtani, N. Tatara)
> [Yua10] G. Yuan, F. Gygi, Computational Science & Discovery 3 (2010) 015004
> [Yua12] G. Yuan, F. Gygi, Computer Physics Communications 183 (2012) 1744
> [Lan12] D. Landis et al., Computing in Science and Engineering, nov/dec
> 2012, p. 51
> (http://dx.doi.org/10.1109/MCSE.2012.16)
> [Koz13] B. Kozinsky, N. Marzari, N. Bonini, J. Garg, G. Pizzi, A.
> Cepellotti, M. Fornari, contributions at the MRS
> Fall meeting (Boston, Nov. 2012) and the DPG Spring meeting (Regensburg,
> March 2013).
>
> (2) From our perspective, we don't understand the need to separate
> experimental and computational databases. As long as every structure is
> properly tagged as 'experimental' or 'computed', we see no reason to
> separate them. It will only lead to the burden of having to run queries
> twice. In any case, a unified search web page that searches in both data
> bases simultaneously seems useful to us (that is, de facto, treating
> (T)COD as one database).
>
> (3) Quality control. If a deposited computed structure has been
> published, the reference to the publication serves as quality control.
> That's probably similar for deposited experimental structures, isn't it?
> If a computed structure has not (yet) been published, how to assess its
> quality? Well, let us turn the question around: if a not yet published
> experimental structure is deposited, how do you judge its quality...? It
> makes sense to do it in the same way: as long as basic information is
> provided on how the calculation has been done, later users should make
> their own judgement.
>
> (4) A quality control measure that could make sense for both
> experimental and computed structures, is to allow people to add remarks
> that appear on the web page of that structure ('considering the small
> basis set that has been used, I do not trust this result', or 'this
> result has been contradicted by <ref>'). Such information by the
> community can help people who are not experts in either computing or
> measuring crystal structures to decide to which extent they can trust a
> particular result. (Another variant is a feature on the web page of each
> structure to flag it if you have doubts, such that experts can look at it).
>
> (5) COD has cifs as entries only. If you want to go as far as including
> input info to reproduce each calculation, it might be hard to stick
> cifs-only for TCOD. It will then be necessary to link additional files
> to the cif of each entry.
>
> (6) We are split over whether it makes sense to require that each
> deposited calculation is exactly reproducible. Some basic information
> should be given, of course. This will be mainly information about the
> method that does not depend on the specific implementation (code).
> Furthermore, the main technical input parameters that are specific for
> the code used, should be given as well (if you want to do that in a
> structured way, you will immediately hit the problem that the required
> set of numbers will be a different kind of set for every other code...).
> Third in line should then be a free field for 'any other special
> settings' that could apply. With this kind of info, people can reproduce
> to a large extent (95%) the same calculation. If you want to have it
> reproduced exactly, then much more input (files) will be required. Which
> might not be worth the effort.
>
> (7) One of our enthusiastic PhD students with many years of experience
> in web design and databases immediately had a mental map about how to
> implement an automated version of (6) -- if the TCOD-team is willing to
> take him aboard to code that, he'll probably agree ;-).
>
> (8) In any case, it should be clear for the one who provides the
> structure, what exactly is requested. A web form where the information
> under (6) can be filled out (or the automated tool of (7) that extracts
> all this info from the output of each mainstream code) will be required
> in order to have consistent data.
>
> (9) Which kind of computed structures will TCOD accept? Ground state
> structures, obviously? Metastable structures probably as well? Also if
> they are not dynamically stable (soft phonon mode)? (that is not
> routinely examined) And what about transition state structures? (never
> observable in experiment, yet very useful to know -- and hard to find --
> in order to understand reactions) If the latter are allowed, then they
> should be tagged as such.
>
> Best regards,
> Stefaan
>
--
http://www.ecole.ensicaen.fr/~chateign/danielc/
Address:
IUT-Caen Université de Caen Basse-Normandie and
CRISMAT-ENSICAEN
6 Bd. M. Juin, 14050 Caen
The Crystallography Open Database: www.crystallography.net
The Materials Property Open Database: http://www.materialproperties.org/
Combined Analysis using rays: http://iste.co.uk/index.php?f=x&ACTION=View&id=359