[Cod-bugs] question about using COD query by specifying elements

Long Yang long.yang at columbia.edu
Sun Dec 30 23:36:08 EET 2018


Dear Antanas,

Thanks you very much for the detailed instructions! I tried both of your mysql regular expression and RESTful API solutions. They works perfect!

Thanks a lot for your help. I really appreciate it. Have a nice holiday!

Best Regards,

Long

> On Dec 30, 2018, at 4:27 PM, Antanas Vaitkus <antanas.vaitkus90 at gmail.com> wrote:
> 
> Dear Long Yang,
> 
> the MySQL LIKE operator allows one to specify wildcard values using the "%" symbol (https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html <https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html>).
> For example, Your query could be rewritten as:
> 'SELECT file FROM data WHERE formula RLIKE "- C%Ca%O%-" and nel=3'
> 
> It works on this specific case, however, it is not entirely correct since structures containing only C, Ca and Os would also be selected if there were any in the COD. This is due to the '%' symbol matching any number of characters.
> 
> One could alternatively employ the RLIKE operator that allows using regular expressions (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp <https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp>). It is a bit more complex, but definitely more flexible.
> 
> For example, a query that selects all structures that consists only of C, Ca and O would read:
> 
> mysql -u cod_reader -h www.crystallography.net <http://www.crystallography.net/> -e 'SELECT file FROM data WHERE formula RLIKE "C[0-9.]* Ca[0-9.]* O[0-9.]*" and nel=3' cod
> 
> The '[0-9.]*' sequence after each chemical element specifies that it is allowed for the chemical element name to be followed by a floating-point number. In cases when one need structures with specific stoichiometry the sequence can be replaced by an exact number, i.e. ... RLIKE "C5 " would return all structures that have 5 carbon atoms in their chemical formula.
> 
> Please note, that the order of chemical symbols in Your query is important -- formulas in the COD follow the Hill formula notation (https://en.wikipedia.org/wiki/Chemical_formula#Hill_system <https://en.wikipedia.org/wiki/Chemical_formula#Hill_system>) and moving the query element into different positions would most likely not return any results ( i.e. RLIKE "C[0-9.]* Ca[0-9.]*" is a proper query whereas "Ca[0-9.]* C[0-9.]*"is not).
> 
> The restful API is also a viable option, however, I must admit that are current examples of the RESTful API usage are currently a bit lacking -- we will work on that. Some general tips on how to use it can be found on the following GitHub discussion page (https://github.com/cod-developers/cod-tools/issues/3 <https://github.com/cod-developers/cod-tools/issues/3>). 
> 
> I also crafted a simple example tailored to Your specific query (C, Ca, O):
> 
> http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst <http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst>
> 
> Pasting this URL to Your browser or querying it using alternative web clients like cURL would return an ID list of COD entries matching the given criteria, i.e.:
> 
> curl -s 'http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst <http://www.crystallography.net/cod/result.php?el1=C&el2=Ca&el3=O&strictmax=3&format=lst>'
> 
> If You have any more questions please do not hesitate to ask, but note, that I might be a bit slower to respond due to the New Year being around the corner.
> 
> Sincerely,
> Antanas Vaitkus
> 
> 
> 
> On Sun, 30 Dec 2018 at 22:25, Long Yang <long.yang at columbia.edu <mailto:long.yang at columbia.edu>> wrote:
> Dear Antanas,
> 
> Thanks a lot for your quick reply! Could you please show in more details about how to do it in MySQL level? Sorry I was a beginner of SQL.
> 
> For example, I would like to query all entries only with elements “Ca”, “C”, “O”, what would the MySQL query like?
> 
> I tried the query below but it doesn’t work.
> mysql -u cod_reader -h www.crystallography.net <http://www.crystallography.net/> -e 'select file from data where formula like "- C Ca O -“ and nel=3' cod
> 
> BTW, how about using RESTful API, it seems to have the “el1, el2, ..., el8” and “strictmin, strictmax” parameters according to the wiki doc. But I don’t know how to use it.
> 
> Thank you!
> 
> Best Regards,
> 
> Long
> 
>> On Dec 30, 2018, at 3:08 PM, Antanas Vaitkus <antanas.vaitkus90 at gmail.com <mailto:antanas.vaitkus90 at gmail.com>> wrote:
>> 
>> Dear Long Yang,
>> 
>> the specific elements are only listed in the chemical formula fields ('formula', 'calcformula', 'cellformula'). The formula fields all follow the same format:
>> "- el1[no1] el2[no2] el3[no3] -", for example, "- F16 H3 O6 Sr5 V3 -".
>> 
>> On a pure MySQL level You can use the "LIKE" or "RLIKE" operators to construct the appropriate queries.
>> 
>> Hope that helps.
>> 
>> Sincerely,
>> Antanas Vaitkus
>> 
>> On Sun, 30 Dec 2018 at 21:59, Long Yang <long.yang at columbia.edu <mailto:long.yang at columbia.edu>> wrote:
>> Hi,
>> 
>> I am Long Yang, graduate student from Columbia University, USA. I am trying to query COD by specifying elements. It works as expected on your website, by entering the elements, for example, “Ca”, “C”, “O”. But when I do the SQL query, it doesn’t have similar schema as search in the website. The schema (http://wiki.crystallography.net/cod_mysql_schema/ <http://wiki.crystallography.net/cod_mysql_schema/>) only have “ nel” to specify the number of elements, but does not have “elements” thing to enter “Ca”, “C”, “O”. How to do it?
>> 
>> Thank you!
>> 
>> Best Regards,
>> 
>> Long
>> _______________________________________________
>> Cod-bugs mailing list
>> Cod-bugs at lists.crystallography.net <mailto:Cod-bugs at lists.crystallography.net>
>> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs <http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs>
>> 
>> 
>> -- 
>> Antanas Vaitkus,
>> PhD student at Vilnius University Institute of Biotechnology,
>> room V325, Saulėtekio al. 7,
>> LT-10257 Vilnius, Lithuania
>> 
>> 
> 
> 
> 
> -- 
> Antanas Vaitkus,
> PhD student at Vilnius University Institute of Biotechnology,
> room V325, Saulėtekio al. 7,
> LT-10257 Vilnius, Lithuania
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.crystallography.net/pipermail/cod-bugs/attachments/20181230/292e7367/attachment-0001.html>


More information about the Cod-bugs mailing list