Quite a while ago I did a blog post describing a method to exert more control over the way substructure queries match. The approach described in that post was integrated into the RDKit for the 2015.09 release; since then it's been expanded a bit.
This post shows how to use the approach from Python and directly within the RDKit PostgreSQL cartridge. Note that, as usual, I am using a development version of the RDKit, so a fair amount of the functionality I demo here is new/changed relative to the 2016.03 release.
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
%load_ext sql
print(rdBase.rdkitVersion)
import time
print(time.asctime())
Adjusting queries in the cartridge¶
Our starting point is a query against ChEMBL for molecules that contain a pyridine ring.
data = %sql postgresql://localhost/chembl_21 \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>'c1ccccn1' limit 10;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
If we add two dummy atoms to that in order to look for 2-6 di-substituted pyridines we initially get no results:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>'*c1cccc(*)n1' limit 10 ;
But we can change that by calling the new mol_adjust_query_properties()
function:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
This function makes a number of changes by default:
- Converts dummy atoms into "any" queries
- Adds a degree query to every ring atom so that its substitution must match what was provided
- Aromaticity perception is done (if it hasn't been done already)
The addition of degree queries has the consequence that we only find di-substituted pyridines. We can lift that restriction by providing some additional configuration information to mol_adjust_query_properties()
as JSON:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1',\
'{"adjustDegree"\:false}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
Notice that now we have additional substitution options on the rings. However now there are also some examples where the pyridine is part of a fused ring. We can get rid of these by telling mol_adjust_query_properties()
to also add a ring-count query to every ring atom :
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1',\
'{"adjustDegree"\:false,\
"adjustRingCount"\:true}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
To explore the next set of options, we extend the query a little bit to search for 2-6 substituted pyridines where one of the substituents starts with an NC=O
:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
Let's allow the degrees of ring atoms or dummies to change, but not the degree of chain atoms, this should remove matches like those to molecule 952459 above:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1',\
'{"adjustDegree"\:true,\
"adjustDegreeFlags"\:"IGNORERINGS|IGNOREDUMMIES"\
}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
And now get rid of the fused rings by turning on the ring-count changes too:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1',\
'{"adjustDegree"\:true,\
"adjustDegreeFlags"\:"IGNORERINGS|IGNOREDUMMIES",\
"adjustRingCount"\:true}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
The same adjustments can be used on query molecules constructed from mol blocks:
mb="""
Mrv1561 07261609522D
8 8 0 0 0 0 999 V2000
-1.9866 0.7581 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-2.7011 0.3455 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.7011 -0.4795 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.9866 -0.8920 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2721 -0.4795 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2721 0.3455 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.4155 0.7580 0.0000 A 0 0 0 0 0 0 0 0 0 0 0 0
-0.5577 0.7580 0.0000 A 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 6 2 0 0 0 0
2 3 2 0 0 0 0
3 4 1 0 0 0 0
4 5 2 0 0 0 0
5 6 1 0 0 0 0
2 7 1 0 0 0 0
6 8 1 0 0 0 0
M END
"""
Chem.MolFromMolBlock(mb)
The semantics of mol block queries are a bit different since we already have query atoms:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_from_ctab(:mb) limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
Calling mol_adjust_query_properties()
here effectively just adds the degree queries (since the dummies are already queries), providing the same results as in Block 4 above:
data = %sql \
select molregno,molfile from rdk.mols join compound_structures \
using (molregno) where m@>mol_adjust_query_properties(mol_from_ctab(:mb)) limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)
Using that from Python¶
The same options are available from Python using the function Chem.AdjustQueryProperties()
:
m1 = Chem.MolFromSmiles('Cc1cc(C)nc(NC(=O)c2ccc3c(c2)OCO3)c1')
m2 = Chem.MolFromSmiles('c1cc(C)nc(NC(=O)c2ccc3c(c2)OCO3)c1')
m3 = Chem.MolFromSmiles('c1cc(C)nc(N(C)C(=O)c2ccc3c(c2)OCO3)c1')
Draw.MolsToGridImage((m1,m2,m3),legends=['m1','m2','m3'])
q = Chem.MolFromSmiles('*c1cccc(NC(=O)*)n1')
q
Initially this has no substructure matches since the dummy atoms can't match anything:
m1.HasSubstructMatch(q),m2.HasSubstructMatch(q),m3.HasSubstructMatch(q)
The defaults to Chem.AdjustQueryProperties()
take care of this, but they also exclude the match to m1 due to the degree queries added to ring atoms:
tq = Chem.AdjustQueryProperties(q)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)
Turn off the degree queries and all the molecules match:
params = Chem.AdjustQueryParameters()
params.adjustDegree=False
tq = Chem.AdjustQueryProperties(q,params)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)
We can also use the degree query only on chain atoms, this disables the match of m3:
params = Chem.AdjustQueryParameters()
params.adjustDegree=True
params.adjustDegreeFlags=Chem.ADJUST_IGNORERINGS|Chem.ADJUST_IGNOREDUMMIES
tq = Chem.AdjustQueryProperties(q,params)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)
1 comment:
It always helps to go through your posts.
Very helpful indeed!
Post a Comment