Saturday, January 11, 2014

An interesting MCS use case

Inspired by this RDKit-discuss question from Liz Wylie: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html

In [1]:
from rdkit import Chem
from rdkit.Chem import MCS
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print rdBase.rdkitVersion
2014.03.1pre

Liz wanted to use custom atom types in the MCS code, yet still be able to get a readable SMILES for the MCS.

Here's a demonstration of the standard behavior of the MCS code for Liz's example.

In [2]:
smis=["COc1ccc(C(Nc2nc3c(ncn3COCC=O)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1",
      "COc1ccc(C(Nc2nc3c(ncn3COC(CO)(CO)CO)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1"]
ms = [Chem.MolFromSmiles(x) for x in smis]
Draw.MolsToGridImage(ms,subImgSize=(300,300))
Out[2]:

Start by defining a simple atom-type hash that combines atomic num and hybridization

In [3]:
def label(a): 
    return 100*int(a.GetHybridization())+a.GetAtomicNum()

The easiest way to use custom atom types is to set (bogus) isotope labels on the atoms in a copy of each molecule:

In [4]:
nms = [Chem.Mol(x) for x in ms]
for nm in nms:
    for at in nm.GetAtoms():
        at.SetIsotope(label(at))

And now run the MCS on the copy using the "istope" mode and print the results:

In [5]:
mcs=MCS.FindMCS(nms,atomCompare='isotopes')
print mcs.smarts
[406*]-[308*]-[306*]:1:[306*]:[306*]:[306*](:[306*]:[306*]:1)-[406*](-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)(-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)-[307*]-[306*]:1:[307*]:[306*]:2:[306*](:[306*](:[307*]:1)=[308*]):[307*]:[306*]:[307*]:2-[406*]-[408*]-[406*]

That's what we asked for, but it's not exactly readable.

We can get to a more readable form in a two step process

  1. Do a substructure match of the MCS onto a copied molecule
  2. Generate SMILES for the original molecule, using only the atoms that matched in the copy.

This works because we know that the atom indices in the copies and the original molecules are the same.

Start by getting the match

In [6]:
mcsp = Chem.MolFromSmarts(mcs.smarts)
match = nms[0].GetSubstructMatch(mcsp)

And now use Chem.MolFragmentToSmiles to generate the actual SMILES. This function generates the SMILES for a user-specified subset of a molecule.

In [7]:
smi=Chem.MolFragmentToSmiles(ms[0],atomsToUse=match,isomericSmiles=True,canonical=False)
print smi
COc1ccc(C(Nc2nc3c(ncn3COC)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1

In [8]:
core = Chem.MolFromSmiles(smi)
core
Out[8]:

Note that in this particular case, the custom atom types don't make a difference in the MCS that is found.

1 comment:

Evan said...

These diagrams are not standard use case scenarios. It would have been even better if there were use case diagrams with actors for better understanding this tutorial. Thanks for posting!