[Cod-bugs] question about using COD query by specifying elements

Antanas Vaitkus antanas.vaitkus90 at gmail.com
Sun Dec 30 23:27:06 EET 2018


Dear Long Yang,

the MySQL LIKE operator allows one to specify wildcard values using the "%"
symbol (https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html).
For example, Your query could be rewritten as:
'*SELECT file FROM data WHERE formula RLIKE "- C%Ca%O%-" and nel=3'*

It works on this specific case, however, it is not entirely correct since
structures containing only C, Ca and Os would also be selected if there
were any in the COD. This is due to the '%' symbol matching any number of
characters.

One could alternatively employ the RLIKE operator that allows using regular
expressions (
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp). It is
a bit more complex, but definitely more flexible.

For example, a query that selects all structures that consists only of C,
Ca and O would read:

mysql -u cod_reader -h www.crystallography.net -e '*SELECT file FROM data
WHERE formula RLIKE "C[0-9.]* Ca[0-9.]* O[0-9.]*" and nel=3*' cod

The '*[0-9.]**' sequence after each chemical element specifies that it is
allowed for the chemical element name to be followed by a floating-point
number. In cases when one need structures with specific stoichiometry the
sequence can be replaced by an exact number, i.e. ... RLIKE "C5 " would
return all structures that have 5 carbon atoms in their chemical formula.

*Please note, that the order of chemical symbols in Your query is important
-- formulas in the COD follow the Hill formula notation *(
https://en.wikipedia.org/wiki/Chemical_formula#Hill_system) *and moving the
query element into different positions would most likely not return any
results *( i.e. RLIKE "*C[0-9.]* Ca[0-9.]**" is a proper query whereas
*"Ca[0-9.]*
C[0-9.]*"*is not).

The restful API is also a viable option, however, I must admit that are
current examples of the RESTful API usage are currently a bit lacking -- we
will work on that. Some general tips on how to use it can be found on the
following GitHub discussion page (
https://github.com/cod-developers/cod-tools/issues/3).

I also crafted a simple example tailored to Your specific query (C, Ca, O):

*http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst*
<http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst>

Pasting this URL to Your browser or querying it using alternative web
clients like cURL would return an ID list of COD entries matching the given
criteria, i.e.:

*curl -s
'http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst
<http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst>'*

If You have any more questions please do not hesitate to ask, but note,
that I might be a bit slower to respond due to the New Year being around
the corner.

Sincerely,
Antanas Vaitkus



On Sun, 30 Dec 2018 at 22:25, Long Yang <long.yang at columbia.edu> wrote:

> Dear Antanas,
>
> Thanks a lot for your quick reply! Could you please show in more details
> about how to do it in MySQL level? Sorry I was a beginner of SQL.
>
> For example, I would like to query all entries only with elements “Ca”,
> “C”, “O”, what would the MySQL query like?
>
> I tried the query below but it doesn’t work.
> mysql -u cod_reader -h www.crystallography.net -e '*select file from data
> where formula like "- C Ca O -“ and nel=3*' cod
>
> BTW, how about using RESTful API, it seems to have the “el1, el2, ...,
> el8” and “strictmin, strictmax” parameters according to the wiki doc. But I
> don’t know how to use it.
>
> Thank you!
>
> Best Regards,
>
> Long
>
> On Dec 30, 2018, at 3:08 PM, Antanas Vaitkus <antanas.vaitkus90 at gmail.com>
> wrote:
>
> Dear Long Yang,
>
> the specific elements are only listed in the chemical formula fields
> ('formula', 'calcformula', 'cellformula'). The formula fields all follow
> the same format:
> "- el1[no1] el2[no2] el3[no3] -", for example, "- F16 H3 O6 Sr5 V3 -".
>
> On a pure MySQL level You can use the "LIKE" or "RLIKE" operators to
> construct the appropriate queries.
>
> Hope that helps.
>
> Sincerely,
> Antanas Vaitkus
>
> On Sun, 30 Dec 2018 at 21:59, Long Yang <long.yang at columbia.edu> wrote:
>
>> Hi,
>>
>> I am Long Yang, graduate student from Columbia University, USA. I am
>> trying to query COD by specifying elements. It works as expected on your
>> website, by entering the elements, for example, “Ca”, “C”, “O”. But when I
>> do the SQL query, it doesn’t have similar schema as search in the website.
>> The schema (http://wiki.crystallography.net/cod_mysql_schema/) only have
>> “ nel” to specify the number of elements, but does not have “elements”
>> thing to enter “Ca”, “C”, “O”. How to do it?
>>
>> Thank you!
>>
>> Best Regards,
>>
>> Long
>> _______________________________________________
>> Cod-bugs mailing list
>> Cod-bugs at lists.crystallography.net
>> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs
>>
>
>
> --
> Antanas Vaitkus,
> PhD student at Vilnius University Institute of Biotechnology,
> room V325, Saulėtekio al. 7,
> LT-10257 Vilnius, Lithuania
>
>
>
>

-- 
Antanas Vaitkus,
PhD student at Vilnius University Institute of Biotechnology,
room V325, Saulėtekio al. 7,
LT-10257 Vilnius, Lithuania
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20181230/f97f0cfb/attachment.html>


More information about the Cod-bugs mailing list