Friday, January 3, 2020

Similarity maps with the new drawing code

As part of the 2019.09 release we added a C++ implementation of the RDKit's similarity map functionality (https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-5-43). I forgot to mention this during the "What's New" bit of my presentation at the UGM, but I think it's worth calling attention to. So here's a quick blog post.

In [1]:
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import SimilarityMaps
from IPython.display import SVG
import io
from PIL import Image
import numpy as np
import rdkit
print(rdkit.__version__)
RDKit WARNING: [11:53:45] Enabling RDKit 2019.09.2 jupyter extensions
2019.09.2

I start by using "classic" similarity map functionality to show why atorvastatin (Lipitor) and rosuvastatin (Crestor) are similar to each other when using the Morgan fingerprint.

Here are the two molecules:

In [2]:
atorvastatin = Chem.MolFromSmiles('O=C(O)C[C@H](O)C[C@H](O)CCn2c(c(c(c2c1ccc(F)cc1)c3ccccc3)C(=O)Nc4ccccc4)C(C)C')
rosuvastatin = Chem.MolFromSmiles('OC(=O)C[C@H](O)C[C@H](O)\C=C\c1c(C(C)C)nc(N(C)S(=O)(=O)C)nc1c2ccc(F)cc2')
Draw.MolsToGridImage((atorvastatin,rosuvastatin))
Out[2]:

To use the new drawing code, we create a Draw2D object and pass that to SimilarityMaps.GetSimilarityMapForFingerprint:

In [3]:
def show_png(data):
    bio = io.BytesIO(data)
    img = Image.open(bio)
    return img
In [4]:
d = Draw.MolDraw2DCairo(400, 400)
_, maxWeight = SimilarityMaps.GetSimilarityMapForFingerprint(atorvastatin, rosuvastatin, 
                                        lambda m, i: SimilarityMaps.GetMorganFingerprint(m, i, radius=2, fpType='bv'), 
                                        draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Out[4]:

We can do the same thing with count-based fingerprints:

In [5]:
d = Draw.MolDraw2DCairo(400, 400)
_, maxWeight = SimilarityMaps.GetSimilarityMapForFingerprint(atorvastatin, rosuvastatin, 
                                        lambda m, i: SimilarityMaps.GetMorganFingerprint(m, i, radius=2, fpType='count'), 
                                        draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Out[5]:

The other GetSimilarityMapFrom... functions also accept the optional draw2d argument. Here's a visualization of the contributions made by the atoms in atorvastatin to its calculatied logp value:

In [6]:
from rdkit.Chem import rdMolDescriptors
ator_contribs = rdMolDescriptors._CalcCrippenContribs(atorvastatin)
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(atorvastatin,[x[0] for x in ator_contribs],draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Out[6]:

And a couple more visualizations of various partial charge schemes.

Starting with Gasteiger-Marsilli charges:

In [7]:
from rdkit.Chem import rdPartialCharges
rdPartialCharges.ComputeGasteigerCharges(atorvastatin)
chgs = [x.GetDoubleProp("_GasteigerCharge") for x in atorvastatin.GetAtoms()]
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(atorvastatin,chgs,draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Out[7]:

And also the partial charges calculated with extended Hueckel theory (eHT) using Mulliken analysis:

In [8]:
from rdkit.Chem import rdEHTTools
from rdkit.Chem import rdDistGeom
mh = Chem.AddHs(atorvastatin)
rdDistGeom.EmbedMolecule(mh)
_,res = rdEHTTools.RunMol(mh)
static_chgs = res.GetAtomicCharges()[:atorvastatin.GetNumAtoms()]
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(atorvastatin,list(static_chgs),draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Out[8]:

As one final demo, I'll use the method to visualize the variability of the eHT charges with conformation for atorvastatin.

Start by generating 10 diverse conformers, calculating the charges for each, and plotting the average:

In [9]:
mh = Chem.AddHs(atorvastatin)
ps = rdDistGeom.ETKDGv2()
ps.pruneRmsThresh = 0.5
ps.randomSeed = 0xf00d
rdDistGeom.EmbedMultipleConfs(mh,10,ps)
print(f'Found {mh.GetNumConformers()} conformers')
chgs = []
for conf in mh.GetConformers():
    _,res = rdEHTTools.RunMol(mh,confId=conf.GetId())
    chgs.append(res.GetAtomicCharges()[:atorvastatin.GetNumAtoms()])
chgs = np.array(chgs)
mean_chgs = np.mean(chgs,axis=0)
std_chgs = np.std(chgs,axis=0)
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(atorvastatin,list(mean_chgs),draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
Found 10 conformers
Out[9]:

That doesn't look hugely different from what we saw above.

To show the variability, plot the standard deviation of the charges across the 10 conformers:

In [10]:
print(std_chgs)
print(max(std_chgs),min(std_chgs))
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(atorvastatin,list(std_chgs),draw2d=d)
d.FinishDrawing()
show_png(d.GetDrawingText())
[0.01292592 0.00743163 0.01971312 0.01433223 0.01063085 0.01283745
 0.01219511 0.00748435 0.01234194 0.01492494 0.00640842 0.02264999
 0.02481744 0.00987842 0.00843151 0.01289956 0.00560632 0.00498617
 0.00604883 0.005569   0.00452067 0.00796675 0.00718033 0.00581337
 0.00702613 0.00634237 0.00699789 0.00539868 0.00521868 0.02412709
 0.03131741 0.03709349 0.00657276 0.01175903 0.00674661 0.01012909
 0.0050995  0.01139418 0.00831795 0.00581207 0.00960073]
0.03709348867462464 0.00452067345998171
Out[10]:

The deviations aren't huge (the printed array shows that), but the largest value is clearly the amide N.

There's definitely a ToDo here to improve the way the negative contours are drawn so that the fact that they are being drawn with dashed lines is visible.

Note: The GitHub version of this post uses SVG instead of PNG. The SVG currently generated by the RDKit for similarity maps is quite inefficient and blogger wasn't up to the challenge, so I used PNGs here.

No comments: