Inspired by this RDKit-discuss question from Liz Wylie: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html
from rdkit import Chem
from rdkit.Chem import MCS
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print rdBase.rdkitVersion
Liz wanted to use custom atom types in the MCS code, yet still be able to get a readable SMILES for the MCS.
Here's a demonstration of the standard behavior of the MCS code for Liz's example.
smis=["COc1ccc(C(Nc2nc3c(ncn3COCC=O)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1",
"COc1ccc(C(Nc2nc3c(ncn3COC(CO)(CO)CO)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1"]
ms = [Chem.MolFromSmiles(x) for x in smis]
Draw.MolsToGridImage(ms,subImgSize=(300,300))
Start by defining a simple atom-type hash that combines atomic num and hybridization
def label(a):
return 100*int(a.GetHybridization())+a.GetAtomicNum()
The easiest way to use custom atom types is to set (bogus) isotope labels on the atoms in a copy of each molecule:
nms = [Chem.Mol(x) for x in ms]
for nm in nms:
for at in nm.GetAtoms():
at.SetIsotope(label(at))
And now run the MCS on the copy using the "istope" mode and print the results:
mcs=MCS.FindMCS(nms,atomCompare='isotopes')
print mcs.smarts
That's what we asked for, but it's not exactly readable.
We can get to a more readable form in a two step process
- Do a substructure match of the MCS onto a copied molecule
- Generate SMILES for the original molecule, using only the atoms that matched in the copy.
This works because we know that the atom indices in the copies and the original molecules are the same.
Start by getting the match
mcsp = Chem.MolFromSmarts(mcs.smarts)
match = nms[0].GetSubstructMatch(mcsp)
And now use Chem.MolFragmentToSmiles to generate the actual SMILES. This function generates the SMILES for a user-specified subset of a molecule.
smi=Chem.MolFragmentToSmiles(ms[0],atomsToUse=match,isomericSmiles=True,canonical=False)
print smi
core = Chem.MolFromSmiles(smi)
core
Note that in this particular case, the custom atom types don't make a difference in the MCS that is found.
1 comment:
These diagrams are not standard use case scenarios. It would have been even better if there were use case diagrams with actors for better understanding this tutorial. Thanks for posting!
Post a Comment