A recent post on In the Pipeline talked about a recent JACS communication on the total synthesis of the natural product maoecrystal V. The article is open access: http://pubs.acs.org/doi/abs/10.1021/jacs.6b06623
Here's the ToC figure, which shows a sketch of the final product on the left:

Since I've been looking at how well the RDKit does at conformation generation for highly constrained systems, I thought this would be a nice example molecule to try out. I was also quite curious about the shape of that beast.
The first problem was getting the structure in. I wasn't completely confident in my ability to reproduce that sketch and JACS, of course, doesn't have a way to provide structures in electronic form. Fortunately the authors had solved x-ray structures of their final product and a bunch of intermediates and the CIF files were in the supplementary material. So I converted the CIF to a mol file with OpenBabel (hopefully using the correct CIF file... they aren't particularly well labelled as far as I can tell) and got started from there.

Note: I haven't been able to make blogger work well with py3Dmol, so there are static screenshots below of the 3D structures. Interactive structures are available by viewing the original notebook: https://github.com/greglandrum/rdkit_blog/blob/master/notebooks/3D%20structure%20of%20a%20natural%20product.ipynb with the fantastic nbviewer: http://nbviewer.jupyter.org/github/greglandrum/rdkit_blog/blob/master/notebooks/3D%20structure%20of%20a%20natural%20product.ipynb

In [1]:

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print(rdBase.rdkitVersion)
import os,time
print( time.asctime())

2016.09.1.dev1
Wed Jul 27 18:48:35 2016

We'll use py3Dmol to show structures in the notebook

In [5]:

import py3Dmol
def drawit(m,p=None,confId=-1):
        mb = Chem.MolToMolBlock(m,confId=confId)
        if p is None:
            p = py3Dmol.view(width=400,height=400)
        p.removeAllModels()
        p.addModel(mb,'sdf')
        p.setStyle({'stick':{}})
        p.setBackgroundColor('0xeeeeee')
        p.zoomTo()
        return p.show()

Start by reading in the mol file and displaying that:

In [8]:

mExpt = Chem.MolFromMolFile("/home/glandrum/Downloads/paper/ja6b06623_si_010.cif.mol",removeHs=False)
drawit(mExpt)

Out[8]:

Now generate a conformer for the molecule:

In [14]:

m = Chem.MolFromMolFile("/home/glandrum/Downloads/paper/ja6b06623_si_010.cif.mol",removeHs=False)
Chem.AssignAtomChiralTagsFromStructure(m)
AllChem.EmbedMolecule(m,useExpTorsionAnglePrefs=True,useBasicKnowledge=True)
drawit(m)

Out[14]:

At least by eye that looks very close, let's check the RMS:

In [20]:

print("RMS: ",AllChem.GetBestRMS(m,mExpt),
      "Heavy Atom RMS:",
      AllChem.GetBestRMS(Chem.RemoveHs(m),Chem.RemoveHs(mExpt)))

RMS:  0.6165388593927497 Heavy Atom RMS: 0.18186257676731582

Not bad! But then, this is a highly constrained structure.
Because we can, let's look at the two structures superimposed:

In [21]:

AllChem.AlignMol(m,mExpt)
mb = Chem.MolToMolBlock(m)
mbExpt = Chem.MolToMolBlock(mExpt)
p = py3Dmol.view(width=400,height=400)
p.removeAllModels()
p.addModel(mb,'sdf')
p.addModel(mbExpt,'sdf')

p.setStyle({'stick':{}})
p.setBackgroundColor('0xeeeeee')
p.zoomTo()
p.show()

Out[21]:

Finally, a bit of 2D work.

Just to have it, here's the SMILES for maoecrystal V

In [12]:

Chem.MolToSmiles(Chem.RemoveHs(m),True)

Out[12]:

'C[C@H]1C(=O)[C@@]23CC[C@@H]1C[C@@]21O[C@@H]2C(C)(C)C=CC(=O)[C@]23COC1=O'

Let's see how well the RDKit's 2D coordinate generation works here:

In [13]:

Chem.MolFromSmiles('C[C@H]1C(=O)[C@@]23CC[C@@H]1C[C@@]21O[C@@H]2C(C)(C)C=CC(=O)[C@]23COC1=O')

Out[13]:

Not great, but definitely not as bad as I had expected.

In [ ]:

Post updated 6.11.2016 to reflect changes to the parameter names

Quite a while ago I did a blog post describing a method to exert more control over the way substructure queries match. The approach described in that post was integrated into the RDKit for the 2015.09 release; since then it's been expanded a bit.

This post shows how to use the approach from Python and directly within the RDKit PostgreSQL cartridge. Note that, as usual, I am using a development version of the RDKit, so a fair amount of the functionality I demo here is new/changed relative to the 2016.03 release.

In [1]:

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
%load_ext sql
print(rdBase.rdkitVersion)
import time
print(time.asctime())

/home/glandrum/anaconda3/lib/python3.5/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/home/glandrum/anaconda3/lib/python3.5/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

2016.09.1.dev1
Wed Jul 27 06:05:49 2016

Adjusting queries in the cartridge¶

Our starting point is a query against ChEMBL for molecules that contain a pyridine ring.

In [2]:

data = %sql postgresql://localhost/chembl_21 \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>'c1ccccn1' limit 10;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[2]:

If we add two dummy atoms to that in order to look for 2-6 di-substituted pyridines we initially get no results:

In [3]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>'*c1cccc(*)n1' limit 10 ;

0 rows affected.

But we can change that by calling the new mol_adjust_query_properties() function:

In [4]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[4]:

This function makes a number of changes by default:

Converts dummy atoms into "any" queries
Adds a degree query to every ring atom so that its substitution must match what was provided
Aromaticity perception is done (if it hasn't been done already)

The addition of degree queries has the consequence that we only find di-substituted pyridines. We can lift that restriction by providing some additional configuration information to mol_adjust_query_properties() as JSON:

In [5]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1',\
                                                         '{"adjustDegree"\:false}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[5]:

Notice that now we have additional substitution options on the rings. However now there are also some examples where the pyridine is part of a fused ring. We can get rid of these by telling mol_adjust_query_properties() to also add a ring-count query to every ring atom :

In [6]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(*)n1',\
                                                         '{"adjustDegree"\:false,\
                                                          "adjustRingCount"\:true}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[6]:

To explore the next set of options, we extend the query a little bit to search for 2-6 substituted pyridines where one of the substituents starts with an NC=O:

In [7]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[7]:

Let's allow the degrees of ring atoms or dummies to change, but not the degree of chain atoms, this should remove matches like those to molecule 952459 above:

In [8]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1',\
                                                         '{"adjustDegree"\:true,\
                                                          "adjustDegreeFlags"\:"IGNORERINGS|IGNOREDUMMIES"\
                                                          }') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[8]:

And now get rid of the fused rings by turning on the ring-count changes too:

In [9]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties('*c1cccc(NC(=O)*)n1',\
                                                         '{"adjustDegree"\:true,\
                                                          "adjustDegreeFlags"\:"IGNORERINGS|IGNOREDUMMIES",\
                                                          "adjustRingCount"\:true}') limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[9]:

The same adjustments can be used on query molecules constructed from mol blocks:

In [10]:

mb="""
  Mrv1561 07261609522D          

  8  8  0  0  0  0            999 V2000
   -1.9866    0.7581    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7011    0.3455    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7011   -0.4795    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.9866   -0.8920    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2721   -0.4795    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2721    0.3455    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4155    0.7580    0.0000 A   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5577    0.7580    0.0000 A   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  6  2  0  0  0  0
  2  3  2  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  5  6  1  0  0  0  0
  2  7  1  0  0  0  0
  6  8  1  0  0  0  0
M  END
"""
Chem.MolFromMolBlock(mb)

Out[10]:

The semantics of mol block queries are a bit different since we already have query atoms:

In [11]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_from_ctab(:mb) limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[11]:

Calling mol_adjust_query_properties() here effectively just adds the degree queries (since the dummies are already queries), providing the same results as in Block 4 above:

In [12]:

data = %sql \
    select molregno,molfile from rdk.mols join compound_structures \
    using (molregno) where m@>mol_adjust_query_properties(mol_from_ctab(:mb)) limit 10 ;
mols = [Chem.MolFromMolBlock(y) for x,y in data]
Draw.MolsToGridImage(mols,legends=[str(x) for x,y in data],molsPerRow=4,useSVG=False)

10 rows affected.

Out[12]:

Using that from Python¶

The same options are available from Python using the function Chem.AdjustQueryProperties():

In [13]:

m1 = Chem.MolFromSmiles('Cc1cc(C)nc(NC(=O)c2ccc3c(c2)OCO3)c1')
m2 = Chem.MolFromSmiles('c1cc(C)nc(NC(=O)c2ccc3c(c2)OCO3)c1')
m3 = Chem.MolFromSmiles('c1cc(C)nc(N(C)C(=O)c2ccc3c(c2)OCO3)c1')
Draw.MolsToGridImage((m1,m2,m3),legends=['m1','m2','m3'])

Out[13]:

In [14]:

q = Chem.MolFromSmiles('*c1cccc(NC(=O)*)n1')
q

Out[14]:

Initially this has no substructure matches since the dummy atoms can't match anything:

In [15]:

m1.HasSubstructMatch(q),m2.HasSubstructMatch(q),m3.HasSubstructMatch(q)

Out[15]:

(False, False, False)

The defaults to Chem.AdjustQueryProperties() take care of this, but they also exclude the match to m1 due to the degree queries added to ring atoms:

In [16]:

tq = Chem.AdjustQueryProperties(q)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)

Out[16]:

(False, True, True)

Turn off the degree queries and all the molecules match:

In [17]:

params = Chem.AdjustQueryParameters()
params.adjustDegree=False
tq = Chem.AdjustQueryProperties(q,params)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)

Out[17]:

(True, True, True)

We can also use the degree query only on chain atoms, this disables the match of m3:

In [18]:

params = Chem.AdjustQueryParameters()
params.adjustDegree=True
params.adjustDegreeFlags=Chem.ADJUST_IGNORERINGS|Chem.ADJUST_IGNOREDUMMIES
tq = Chem.AdjustQueryProperties(q,params)
m1.HasSubstructMatch(tq),m2.HasSubstructMatch(tq),m3.HasSubstructMatch(tq)

Out[18]:

(True, True, False)

Wednesday, July 27, 2016

3D structure of a natural product

Tuesday, July 26, 2016

Tuning substructure queries II

Adjusting queries in the cartridge¶

Using that from Python¶

About Me

Blog Archive