Tuesday, December 3, 2013

Using AllChem.ConstrainedEmbed()

Using AllChem.ConstrainedEmbed

A quick note on the use of one of the RDKit convenience tools.

The function AllChem.ConstrainedEmbed() allows you to generate 3D conformations of a molecule that are constrained to have the coordinates of a subset of atoms match those in a reference molecule. This allows, for example, generating 3D conformers for a series of compounds with a common scaffold geometry.

The data I will use are a set of compounds from the well-known Sutherland 3D QSAR paper[1]. I'm using the pre-processed version that Paolo Tosco created for the Open3DAlign[2] validation set.

The original files can be found here: http://sourceforge.net/projects/open3dalign/files/validation_suite/open3dalign_validation_suite.tar.bz2/download

  1. Sutherland, J. J., O’Brien, L. A. & Weaver, D. F. A Comparison of Methods for Modeling Quantitative Structure−Activity Relationships. J. Med. Chem. 47, 5541–5554 (2004).
  2. Tosco, P., Balle, T. & Shiri, F. Open3DALIGN: an open-source software aimed at unsupervised ligand alignment. J Comput Aided Mol Des 25, 777–783 (2011).
In [1]:
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw,PyMol,MCS
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print rdBase.rdkitVersion
2014.03.1pre

Take the thermolysin data set, since it's got a nice flexible core bit.

In [2]:
ms = [x for x in Chem.SDMolSupplier('../data/open3dalign_validation_suite/therm_data.sdf',removeHs=False)]
ms2 = [x for x in Chem.SDMolSupplier('../data/open3dalign_validation_suite/therm_data.sdf')]
_=[AllChem.Compute2DCoords(x) for x in ms2]
Draw.MolsToGridImage(ms2[:12],molsPerRow=4)
Out[2]:

Use the MCS code to pull out a core:

In [3]:
res=MCS.FindMCS(ms2,threshold=0.9,completeRingsOnly=True)
p = Chem.MolFromSmarts(res.smarts)
p
Out[3]:

Get the molecules that match that core definition:

In [4]:
matchingMols = [x for x in ms if x.HasSubstructMatch(p)]
print len(ms),len(matchingMols)
76 69

In [5]:
v = PyMol.MolViewer()

Get a set of 3D coordinates of the core from one of the molecules:

In [8]:
core1 = AllChem.DeleteSubstructs(AllChem.ReplaceSidechains(Chem.RemoveHs(matchingMols[1]),p),Chem.MolFromSmiles('*'))
core1.UpdatePropertyCache()


v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.GetPNG(preDelay=2)
Out[8]:

Demonstrate that the input conformations can be reasonably aligned to that core:

In [9]:
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
for m in matchingMols[:10]:
    nm = Chem.Mol(m)
    AllChem.AlignMol(core1,nm)
    v.ShowMol(nm,nm.GetProp('_Name'),showOnly=False)
v.GetPNG(preDelay=2)
Out[9]:

Try generating a random conformation and seeing how well it aligns:

In [10]:
nm = Chem.Mol(matchingMols[3])
AllChem.EmbedMolecule(nm,randomSeed=0xF00D)
AllChem.MMFFOptimizeMolecule(nm)
rms = AllChem.AlignMol(core1,nm)
print 'RMS:',rms

v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'straight embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.958738145983

Out[10]:

The poor performance isn't much of a surprise given how flexible the molecules are.

AllChem.ConstrainedEmbed() provides a solution to this problem: it generates conformations that match the template conformation.

The function works by:

  1. doing a substructure match between the template (core) and the molecule.
  2. updating the molecule's distance bounds matrix based on the atom-atom distances in the core
  3. generating a conformation using that distance matrix
  4. cleaning up the geometry using a force field that includes distance constraints to fixed positions corresponding to the core atom positions
  5. aligning the final conformation to the core

The RMS of the final alignment is available in a molecule property named "EmbedRMS".

Here's an illustration of the results:

In [11]:
nm = Chem.Mol(matchingMols[3])
AllChem.ConstrainedEmbed(nm,core1)
rms = float(nm.GetProp('EmbedRMS'))
print 'RMS:',rms

v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'constrained embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.0404323330462

Out[11]:

By default the force field used for the structure refinment is UFF. This can be changed to MMFF:

In [12]:
nm = Chem.Mol(matchingMols[3])
GetFF=lambda x,confId=-1:AllChem.MMFFGetMoleculeForceField(x,AllChem.MMFFGetMoleculeProperties(x),confId=confId)
AllChem.ConstrainedEmbed(nm,core1,getForceField=GetFF)
rms = float(nm.GetProp('EmbedRMS'))
print 'RMS:',rms


v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'constrained embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.0407469103071

Out[12]:

No comments: