Tuesday, October 16, 2018

Using the new fingerprint bit rendering code

Motivation

A frequent desire when working with chemical fingerprints is to see what the indvidual bits actually "mean". For most of the fingerprinting algorithms in the RDKit this question only means something in the context of a molecule, so there have been a number of code samples floating around (including in the RDKit documentation) allowing the user to show the atoms/bonds in a molecule that are responsible for setting a particular bit. In the 2018.09 release we made this easier and added a few functions to the rdkit.Chem.Draw package that allow direct visualization of the atoms and bonds from a molecule that set bits from Morgan and RDKit fingerprints.
This notebook shows how to use this functionality. Note that since this is new code, it will only work in RDKit versions 2018.09 and later.
The code that is used for this is derived from the bit rendering code that Nadine Schneider wrote for CheTo.
Start by importing the required packages:
In [1]:
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
We'll use epinephrine as the demo molecule:
In [2]:
epinephrine = Chem.MolFromSmiles('CNC[C@H](O)c1ccc(O)c(O)c1')
epinephrine
Out[2]:

Looking at Morgan bits

Generate a Morgan fingerprint and save information about the bits that are set using the bitInfo argument:
In [3]:
bi = {}
fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(epinephrine, radius=2, bitInfo=bi)
# show 10 of the set bits:
list(fp.GetOnBits())[:10]
Out[3]:
[1, 80, 227, 315, 589, 606, 632, 807, 875, 1057]
In its simplest form, the new code lets you display the atomic environment that sets a particular bit. Here we will look at bit 589:
In [4]:
Draw.DrawMorganBit(epinephrine,589,bi)
Out[4]:
There's also a function, DrawMorganBits(), for drawing multiple bits at once (thanks to Pat Walters for suggesting this one):
In [5]:
tpls = [(epinephrine,x,bi) for x in fp.GetOnBits()]
Draw.DrawMorganBits(tpls[:12],molsPerRow=4,legends=[str(x) for x in fp.GetOnBits()][:12])
Out[5]:
Some notes about the rendering here:
  • The molecule fragment is drawn with the atoms in the same positions as in the original molecule.
  • The central atom is highlighted in blue.
  • Aromatic atoms are highlighted in yellow
  • Aliphatic ring atoms (none present here) are highlighted in dark gray
  • Atoms/bonds that are drawn in light gray indicate pieces of the structure that influence the atoms' connectivity invariants but that are not directly part of the fingerprint.
We can also take advantage of the interactivity features of the Jupyter notebook and create a widget allowing is to browse through the bits. Note that this does not work in the blog post or the github view of the notebook; you will need to download the notebook and try it yourself to see this interactivity.
In [6]:
from ipywidgets import interact,fixed,IntSlider
def renderFpBit(mol,bitIdx,bitInfo,fn):
    bid = bitIdx
    return(display(fn(mol,bid,bitInfo)))
In [8]:
interact(renderFpBit, bitIdx=list(bi.keys()),mol=fixed(epinephrine),
         bitInfo=fixed(bi),fn=fixed(Draw.DrawMorganBit));
This interactivity does not work in the blogger view, so here's an animated GIF of what it looks like:

Looking at RDKit bits

We can do the same thing with RDKit bits:
In [9]:
rdkbi = {}
rdkfp = Chem.RDKFingerprint(epinephrine, maxPath=5, bitInfo=rdkbi)
# show 10 of the set bits:
list(rdkfp.GetOnBits())[:10]
Out[9]:
[93, 103, 112, 122, 148, 149, 161, 166, 194, 208]
In [10]:
print(rdkfp.GetNumOnBits(),len(rdkbi))
128 128
In [11]:
Draw.DrawRDKitBit(epinephrine,103,rdkbi)
Out[11]:
In [12]:
tpls = [(epinephrine,x,rdkbi) for x in rdkbi]
Draw.DrawRDKitBits(tpls[:12],molsPerRow=4,legends=[str(x) for x in rdkbi][:12])
Out[12]:
Some notes about the rendering here:
  • The molecule fragment is drawn with the atoms in the same positions as in the original molecule.
  • All bonds that are drawn are part of the bit.
  • Aromatic atoms are highlighted in yellow
  • By default there is no additional highlighting for aliphatic ring atoms, but this can be changed using the nonAromaticColor optional argument.
We can, of course, use the same jupyter interactivity features here as we did for the Morgan fingerprint:
In [13]:
interact(renderFpBit, bitIdx=list(rdkbi.keys()),mol=fixed(epinephrine),
         bitInfo=fixed(rdkbi),fn=fixed(Draw.DrawRDKitBit));
This interactivity does not work in the blogger view, so here's an animated GIF of what it looks like: