From andrius.merkys at gmail.com Mon Jun 17 21:09:02 2019 From: andrius.merkys at gmail.com (Andrius Merkys) Date: Mon, 17 Jun 2019 21:09:02 +0300 Subject: [Cod-bugs] Question about search by SMILES In-Reply-To: <03da90f14549d5be01d64afef723395f@cam.ac.uk> References: <7e26db5bd8de13f4425f4321ddb16c94@cam.ac.uk> <03da90f14549d5be01d64afef723395f@cam.ac.uk> Message-ID: <9ab7b2f2-b218-2736-8eab-7a943220f4b8@gmail.com> Dear Jiuyang, I have fixed the problem with the search by SMILES/SMARTS, it should work as expected now. Thanks again for the bug report. As for the RESTful API, SMILES/SMARTS query may be passed using 'smarts' field, for example: http://www.crystallography.net/cod/result.php?smarts=c1ccccc1 I have also updated the COD Wiki page to include this piece of information. Best wishes, Andrius On 2019-06-12 21:26, J. Zhao wrote: > > Dear Andirus, > > Thanks for your reply. > > And one more thing, is there any way to query the database by using a > SMILES as parameter in the RESTful API? > > I can only see parameters like COD id, 'Hill Notation'? > etc.?http://wiki.crystallography.net/RESTful_API/#index1h1 > > Thanks again. > > Regards, > > Jiuyang > > On 2019-06-12 18:49, Andrius Merkys wrote: > >> Dear Jiuyang, >> ? >> Thanks a lot for the detailed description of the problem. Indeed, the >> search is not working as it should be. We will try to investigate and >> fix the problem as soon as possible. For the time being you may use >> applet-based substructure search >> (http://www.crystallography.net/cod/jsme_search.html). >> ? >> Best?regards, >> Andrius >> >> On Wed, 12 Jun 2019, 13:20 J. Zhao, > > wrote: >> >> Hi Andrius, >> >> Thanks for your reply. >> >> So if you go to this >> url?http://www.crystallography.net/cod/search.html, at the second >> search box "Enter SMILES", enter "c1ccccc1" which is the SMILES >> for benzene. >> >> Then click 'Search', then the new page will tell you "Results: >> there are 402769 entries in the selection". And whatever I >> entered in that search box, it always return 402769 entires as >> result - which is the total number of entries of COD. >> >> So does it mean the SMILES search not working properly? And is >> there a way to use RESTful API to perform SMILES search? >> >> The normal RESTful API page is >> here?http://wiki.crystallography.net/RESTful_API/#index1h1, and >> it doesn't say anything about search by SMILES. >> >> Thanks again! >> >> Regards, >> >> Jiuyang >> >> On 2019-06-12 10:29, Andrius Merkys wrote: >> >> Hi Jiuyang, >> ? >> Could you please provide a single example of a query which >> you performed? I will look into the issue then. >> ? >> Best wishes, >> Andrius >> >> On Wed, 12 Jun 2019, 09:16 J. Zhao, > > wrote: >> >> Hi, >> >> I was trying to search for chemical structures by SMILES. >> But whatever I entered in the 'Enter SMILES' >> at?http://www.crystallography.net/cod/search.html, it >> just returned all the structures in the database. >> >> Could you please let me know how to properly search by >> SMILES? And is there a way to use RESTful API to search >> by SMILES? >> >> Many thanks! >> >> Regards, >> >> Jiuyang >> >> ? >> _______________________________________________ >> Cod-bugs mailing list >> Cod-bugs at lists.crystallography.net >> >> http://lists.crystallography.net/cgi-bin/mailman/listinfo/cod-bugs >> >> ? >> >> ? >> > ? > > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrius.merkys at gmail.com Tue Jun 18 11:51:33 2019 From: andrius.merkys at gmail.com (Andrius Merkys) Date: Tue, 18 Jun 2019 11:51:33 +0300 Subject: [Cod-bugs] Question about search by SMILES In-Reply-To: References: <7e26db5bd8de13f4425f4321ddb16c94@cam.ac.uk> <03da90f14549d5be01d64afef723395f@cam.ac.uk> <9ab7b2f2-b218-2736-8eab-7a943220f4b8@gmail.com> Message-ID: <57f1f514-fc29-1a0a-facd-ca984dbbce7a@gmail.com> Dear Jiuyang, To search for SMILES/SMARTS we use the Open Babel package [1]. The idea behind the current implementation in the COD is that the database is queried for all structures containing (not matching exactly) the requested fragment, in your case the benzene ring. The returned entries aren't sorted by relevance, but merely by their IDs in the database. While sorting by relevance might be implemented in principle, it depends a lot on how the relevance is determined. I would argue that Levenshtein's distance isn't general enough to be applied in all the cases. Other distances, for example Tanimoto index of fingerprint similarity, might be investigated, but currently it's out of the scope for the COD. Should you find COD's search capabilities too basic, you can download the whole COD SMILES database [2] for local examination. Best wishes, Andrius [1] http://openbabel.org [2] http://crystallography.net/cod/smi/allcod.smi On 2019-06-17 22:21, J. Zhao wrote: > > But I've noticed that if you search for 'c1ccccc1', actually non of > the structures on the first few pages are really Benzene's (c1ccccc1) > structure. But they all contain string 'c1ccccc1' in their SMILES. > > For example, 'O[C@@H](C)[C@@H]([C@@H](C)c1ccccc1)c1ccccc1', which is > the SMILES for '3,4-Diphenylpentan-2-ol' appears as the 13th entry in > the search result. > > To be honest, other databases search results seem to have this problem > as well. Although?I have no idea how COD ranks its search result, I > would suggest rank them by some distance measurements between search > results' SMILES and the target SMILES if possible,? like Levenshtein > distance maybe? Then the entry with SMILES 'c1ccccc1' will rank first > since the Levenshtein distance is 0?and it is indeed the correct > structure for query 'c1ccccc1'. > -- Andrius Merkys Vilnius University Institute of Biotechnology, Saul?tekio al. 7, room V325 LT-10257 Vilnius, Lithuania