Friday, February 13, 2015

New Drawing Code

The 2015.03 release of the RDKit will contain a new set of C++-based functionality for drawing molecules. This code, based on a contribution from Dave Cosgrove, is much faster than the current python-based drawing code and provides much more flexibility and higher drawing quality than the current C++ code (which is used in Knime).

This post demonstrates some of the new functionality, which is currently available in the RDKit github repo.

The new code supports rendering to SVG, PNGs (using cairo), Qt, and wx (thanks to a contribution from Igor Filippov). It's pretty easy to add support for other backends as well. Here I'll show the SVG rendering.

In [1]:
from __future__ import print_function
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import SVG
import time
print(time.asctime())
Sat Feb 14 07:48:33 2015

In [2]:
m = Chem.MolFromSmiles('c1cccnc1O')
In [3]:
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
def moltosvg(mol,molSize=(450,150),kekulize=True):
    mc = Chem.Mol(mol.ToBinary())
    if kekulize:
        try:
            Chem.Kekulize(mc)
        except:
            mc = Chem.Mol(mol.ToBinary())
    if not mc.GetNumConformers():
        rdDepictor.Compute2DCoords(mc)
    drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])
    drawer.DrawMolecule(mc)
    drawer.FinishDrawing()
    svg = drawer.GetDrawingText()
    # It seems that the svg renderer used doesn't quite hit the spec.
    # Here are some fixes to make it work in the notebook, although I think
    # the underlying issue needs to be resolved at the generation step
    return svg.replace('svg:','')
In [4]:
SVG(moltosvg(m))
Out[4]:
N OH
In [5]:
import glob
pkls = glob.glob('/home/glandrum/Code/benchmarking_platform/compounds/ChEMBL_II/*.pkl')
from rdkit.six.moves import cPickle
sets = cPickle.load(file(pkls[0]))
In [6]:
smis=sets.values()[0]
In [7]:
ms = [Chem.MolFromSmiles(y) for x,y in smis]
In [8]:
SVG(moltosvg(ms[1]))
Out[8]:
N NH O O N O N

Look at controlling that in more detail.

In [9]:
m = ms[1]
rdDepictor.Compute2DCoords(m)
Out[9]:
0
In [10]:
drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[10]:
N NH O O N O N

Highlight a substructure:

In [11]:
drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=m.GetSubstructMatch(Chem.MolFromSmarts('c1ncoc1')))
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[11]:
N NH O O N O N

Individiual atoms can also be highlighted:

In [12]:
highlight=list(m.GetSubstructMatch(Chem.MolFromSmarts('c1ncoc1'))) + [10,1]

drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=highlight)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[12]:
N NH O O N O N

The highlighting color can be changed:

In [13]:
highlight=[10,1,4,13]
colors={1:(1,0,0), 4:(0,1,0), 13:(0,0,1), 10:(1,0,1)}


drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=highlight,highlightAtomColors=colors)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[13]:
N NH O O N O N

As we saw above, the bonds between adjacent atoms will be highlighted:

In [14]:
highlight=[10,1,4,5,6]
colors={1:(1,0,0), 4:(0,1,0), 5:(0,0,1), 10:(1,0,1)}


drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=highlight,highlightAtomColors=colors)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[14]:
N NH O O N O N

But this can be turned off:

In [15]:
highlight=[10,1,4,5,6]
colors={1:(1,0,0), 4:(0,1,0), 5:(0,0,1), 10:(1,0,1)}


drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=highlight,highlightBonds=[],highlightAtomColors=colors)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[15]:
N NH O O N O N

We can also highlight just bonds:

In [16]:
highlight=[1,5,6]
colors={1:(1,0,0), 4:(0,1,0), 5:(0,0,1), 10:(1,0,1)}


drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=[],highlightBonds=highlight,highlightBondColors=colors)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[16]:
N NH O O N O N

Customizing atom labels:

In [17]:
drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
opts = drawer.drawOptions()
opts.atomLabels[15]='a16'
opts.atomLabels[4]='a5'
opts.atomLabels[20]='a21'
drawer.DrawMolecule(m,highlightAtoms=[4,15,20])
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[17]:
N a5 NH O a16 O N a21 O N

Taking that to an extreme:

In [18]:
drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
opts = drawer.drawOptions()

for i in range(m.GetNumAtoms()):
    opts.atomLabels[i] = m.GetAtomWithIdx(i).GetSymbol()+str(i)

drawer.DrawMolecule(m)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)
Out[18]:
C0 N1 C2 C3 C4 C5 C6 C7 C8 N9 C10 O11 C12 C13 C14 C15 C16 O17 C18 N19 C20 C21 O22 C23 C24 N25 C26 C27

More to come!

One final note on performance.

Here's a quick comparison of the current Python-based renderer with the new C++ renderer for generating SVG:

In [19]:
from rdkit.six.moves import cStringIO
from rdkit.Chem import Draw
def moltosvg_old(mol,molSize=(450,150),kekulize=True):
    mc = Chem.Mol(mol.ToBinary())
    if kekulize:
        try:
            Chem.Kekulize(mc)
        except:
            mc = Chem.Mol(mol.ToBinary())
    if not mc.GetNumConformers():
        rdDepictor.Compute2DCoords(mc)

    sio = cStringIO()
    Draw.MolToFile(mc, sio, size=molSize, imageType="svg",
                           kekulize=False)
    svg = sio.getvalue()
    # It seems that the svg renderer used doesn't quite hit the spec.
    # Here are some fixes to make it work in the notebook, although I think
    # the underlying issue needs to be resolved at the generation step
    return svg.replace('svg:','')
In [20]:
%timeit [moltosvg_old(x) for x in ms]
10 loops, best of 3: 62.9 ms per loop

In [21]:
%timeit [moltosvg(x) for x in ms]
100 loops, best of 3: 7.47 ms per loop

Factors of 10 in performance should never be sneered at. :-)

In []:
 

1 comment:

Güngör Budak said...

I'm trying to generate SVG images from MOL objects and this method could work for me but I have a trusty distribution and only python-rdkit 201309-1 is available (http://packages.ubuntu.com/trusty/python-rdkit). Is it safe to build latest one on trusty? Can you make the latest build available through Ubuntu repos?