Comparing fingerprints to each other. Part 1

Goal: Look at the differences between different similarity methods.

This uses a set of pairs of molecules that have a baseline similarity: a Tanimoto similarity using count-based Morgan0 fingerprints of at least 0.7. The construction of this set was presented in an earlier post: http://rdkit.blogspot.com/2013/10/building-similarity-comparison-set-goal.html.

Note: this notebook and the data it uses/generates can be found in the github repo: https://github.com/greglandrum/rdkit_blog

Set up

Do the usual imports, read in the molecules, set up the fingerprints we'll compare, and calculate the similarities between the pairs of molecules using those fingerprints.

In [1]:

from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
from rdkit.Avalon import pyAvalonTools
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
from rdkit import DataStructs
from collections import defaultdict
import cPickle,random,gzip
import scipy as sp
import pandas
from scipy import stats
from IPython.core.display import display,HTML,Javascript
print rdBase.rdkitVersion

2013.09.1beta

In [2]:

ind = [x.split() for x in gzip.open('../data/chembl16_25K.pairs.txt.gz')]
ms1 = []
ms2 = []
for i,row in enumerate(ind):
    m1 = Chem.MolFromSmiles(row[1])
    ms1.append((row[0],m1))
    m2 = Chem.MolFromSmiles(row[3])
    ms2.append((row[2],m2))

In [2]:

methods = [(lambda x:Chem.RDKFingerprint(x,maxPath=4),'RDKit4'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=5),'RDKit5'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=6),'RDKit6'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=7),'RDKit7'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=4,branchedPaths=False),'RDKit4-linear'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=5,branchedPaths=False),'RDKit5-linear'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=6,branchedPaths=False),'RDKit6-linear'),
           (lambda x:Chem.RDKFingerprint(x,maxPath=7,branchedPaths=False),'RDKit7-linear'),
           
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,0),'MFP0'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,1),'MFP1'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,2),'MFP2'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,3),'MFP3'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,0,useFeatures=True),'FeatMFP0'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,1,useFeatures=True),'FeatMFP1'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,2,useFeatures=True),'FeatMFP2'),
           (lambda x:rdMolDescriptors.GetMorganFingerprint(x,3,useFeatures=True),'FeatMFP3'),
           (lambda x:rdMolDescriptors.GetHashedMorganFingerprint(x,0),'MFP0-bits'),
           (lambda x:rdMolDescriptors.GetHashedMorganFingerprint(x,1),'MFP1-bits'),
           (lambda x:rdMolDescriptors.GetHashedMorganFingerprint(x,2),'MFP2-bits'),
           (lambda x:rdMolDescriptors.GetHashedMorganFingerprint(x,3),'MFP3-bits'),
           
           (lambda x:rdMolDescriptors.GetAtomPairFingerprint(x),'AP'),
           (lambda x:rdMolDescriptors.GetTopologicalTorsionFingerprint(x),'TT'),
           (lambda x:rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(x),'AP-bits'),
           (lambda x:rdMolDescriptors.GetHashedTopologicalTorsionFingerprintAsBitVect(x),'TT-bits'),
            
           (lambda x:rdMolDescriptors.GetMACCSKeysFingerprint(x),'MACCS'),
           (lambda x:pyAvalonTools.GetAvalonFP(x,512),'Avalon-512'),
           (lambda x:pyAvalonTools.GetAvalonFP(x,1024),'Avalon-1024'),
           
           ]

In [30]:

scoredLists={}

In [35]:

for method,nm in methods:
    if not scoredLists.has_key(nm):
        print 'Doing: ',nm
        rl=[]
        for i,(m1,m2) in enumerate(zip(ms1,ms2)):
            fp1 = method(m1[-1])
            fp2 = method(m2[-1])
            sim = DataStructs.TanimotoSimilarity(fp1,fp2)
            rl.append((sim,i))
        scoredLists[nm]=rl

Doing:  AP-bits
Doing:  TT-bits
Doing:

In [36]:

cPickle.dump(scoredLists,gzip.open('../data/chembl16_25K.pairs.sims.pkl.gz','wb+'))

Set up the comparison code

In [2]:

scoredLists = cPickle.load(gzip.open('../data/chembl16_25K.pairs.sims.pkl.gz','rb'))

In [3]:

def directCompare(scoredLists,fp1,fp2,plotIt=True,silent=False):
    """ We return: Kendall tau, Spearman rho, and Pearson R
    
    """
    l1 = scoredLists[fp1]
    l2 = scoredLists[fp2]
    rl1=[x[-1] for x in l1]
    rl2=[x[-1] for x in l2]
    vl1=[x[0] for x in l1]
    vl2=[x[0] for x in l2]
    if plotIt:
        _=scatter(vl1,vl2,edgecolors='none')
        maxv=max(max(vl1),max(vl2))
        minv=min(min(vl1),min(vl2))
        _=plot((minv,maxv),(minv,maxv),color='k',linestyle='-')
        xlabel(fp1)
        ylabel(fp2)
    
    tau,tau_p=stats.kendalltau(vl1,vl2)
    spearman_rho,spearman_p=stats.spearmanr(vl1,vl2)
    pearson_r,pearson_p = stats.pearsonr(vl1,vl2)
    if not silent:
        print fp1,fp2,tau,tau_p,spearman_rho,spearman_p,pearson_r,pearson_p
    return tau,spearman_rho,pearson_r

And now compare a few methods to each other

Start with two very closely related fingerprints:

In [4]:

_=directCompare(scoredLists,'MFP0','MFP0-bits')

MFP0 MFP0-bits 0.948023508497 0.0 0.958016189378 0.0 0.968091057994 0.0

What about a two different Morgan fingerprint radii?

In [5]:

_=directCompare(scoredLists,'MFP1','MFP2')

MFP1 MFP2 0.837913477353 0.0 0.961737756836 0.0 0.961445138489 0.0

And a couple RDKit fingerprint sizes

In [6]:

_=directCompare(scoredLists,'RDKit4','RDKit6')

RDKit4 RDKit6 0.671796319619 0.0 0.84783440863 0.0 0.927591672276 0.0

Do all the comparisons so that we can do some statistics on them

In [7]:

ks = sorted(scoredLists.keys())
kappas={}
spearmans={}
pearsons={}
for i,ki in enumerate(ks):
    for j in range(i+1,len(ks)):
        kappa,spearman,pearson=directCompare(scoredLists,ki,ks[j],plotIt=False,silent=True)
        kappas[(ki,ks[j])]=kappa
        spearmans[(ki,ks[j])]=spearman
        pearsons[(ki,ks[j])]=pearson

In [8]:

cPickle.dump((ks,kappas,spearmans,pearsons),gzip.open('../data/chembl16_25K.pairs.sim_workup.pkl.gz','wb+'))

In [10]:

(ks,kappas,spearmans,pearsons)=cPickle.load(gzip.open('../data/chembl16_25K.pairs.sim_workup.pkl.gz','rb'))

Load the data into a Pandas dataframe

In [11]:

rows=[]
for k in kappas.keys():
    rows.append([k[0],k[1],kappas[k],spearmans[k],pearsons[k]])

df = pandas.DataFrame(data=rows,columns=['Sim1','Sim2','Tau','Spearman','Pearson'])
df.sort(columns=('Sim1',),inplace=True)
df.head()

Out[11]:

	Sim1	Sim2	Tau	Spearman	Pearson
16	AP	MFP3-bits	0.371669	0.524035	0.752549
26	AP	MFP3	0.371409	0.523917	0.753598
54	AP	TT-bits	0.474149	0.650504	0.782760
81	AP	FeatMFP2	0.360262	0.508026	0.695196
86	AP	MFP1	0.416702	0.579992	0.746307

Let's get a feeling for what the correlations look like for various tau values.

In [12]:

figure(figsize=(24,4))
subplot(1,5,1)
tau,s,p=directCompare(scoredLists,'MFP1','MFP1-bits',silent=True)
title('tau=%.2f'%tau)
subplot(1,5,2)
tau,s,p=directCompare(scoredLists,'FeatMFP1','FeatMFP3',silent=True)
title('tau=%.2f'%tau)
subplot(1,5,3)
tau,s,p=directCompare(scoredLists,'Avalon-1024','RDKit6',silent=True)
title('tau=%.2f'%tau)
subplot(1,5,4)
tau,s,p=directCompare(scoredLists,'RDKit7','TT',silent=True)
title('tau=%.2f'%tau)
subplot(1,5,5)
tau,s,p=directCompare(scoredLists,'MFP0','RDKit7',silent=True)
_=title('tau=%.2f'%tau)

That last one is somewhat artificial due to the lower bound on MFP0 enforced by the data set.

There's not much correlation left at tau=0.25.

Similar Fingerprints: all the pairs of fingerprints where tau>0.85

In [13]:

HTML(df[df.Tau>0.85].sort(columns=['Tau'],ascending=False).to_html(float_format=lambda x: '%4.3f' % x,
    classes="table display"))

Out[13]:

	Sim1	Sim2	Tau	Spearman	Pearson
80	MFP1	MFP1-bits	0.953	0.992	0.996
141	MFP0	MFP0-bits	0.948	0.958	0.968
240	MFP2	MFP3	0.926	0.992	0.989
48	MFP2	MFP2-bits	0.923	0.989	0.996
205	FeatMFP2	FeatMFP3	0.919	0.990	0.981
106	RDKit5-linear	RDKit6-linear	0.917	0.990	0.993
203	RDKit4-linear	RDKit5-linear	0.914	0.990	0.990
251	RDKit6-linear	RDKit7-linear	0.899	0.985	0.994
165	RDKit4	RDKit4-linear	0.893	0.984	0.988
161	MFP2-bits	MFP3-bits	0.892	0.983	0.986
247	MFP3	MFP3-bits	0.887	0.979	0.996
242	MFP2-bits	MFP3	0.883	0.980	0.985
22	RDKit4	RDKit5-linear	0.877	0.978	0.985
329	RDKit4	RDKit5	0.871	0.976	0.983
224	MFP2	MFP3-bits	0.867	0.974	0.984
158	TT	TT-bits	0.857	0.969	0.981
109	RDKit4-linear	RDKit6-linear	0.856	0.971	0.973

Nothing terribly suprising there.

Different Fingerprints: all the pairs of fingerprints where tau<0.3

We also exclude all of the MFP0 variants.

In [14]:

subset=df[df.Tau<0.3]\
   [~df.Sim1.isin(('MFP0','MFP0-bits','FeatMFP0',))]\
   [~df.Sim2.isin(('MFP0','MFP0-bits','FeatMFP0',))]

HTML(subset.sort(columns=['Tau'],ascending=False).to_html(float_format=lambda x: '%4.3f' % x,
    classes="table display"))

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py:2021: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)

Out[14]:

	Sim1	Sim2	Tau	Spearman	Pearson
281	AP	RDKit6	0.299	0.426	0.663
333	AP	RDKit4	0.299	0.428	0.623
278	AP	RDKit5-linear	0.298	0.427	0.634
236	AP-bits	Avalon-1024	0.296	0.425	0.620
168	Avalon-1024	FeatMFP2	0.296	0.425	0.653
338	Avalon-1024	TT	0.293	0.420	0.683
94	Avalon-1024	FeatMFP3	0.291	0.418	0.679
190	Avalon-1024	FeatMFP1	0.291	0.420	0.563
170	AP-bits	MACCS	0.289	0.417	0.517
289	AP	RDKit4-linear	0.284	0.409	0.599
122	AP-bits	Avalon-512	0.279	0.402	0.574
56	RDKit7	TT-bits	0.278	0.398	0.668
85	Avalon-1024	MFP1	0.261	0.377	0.615
34	FeatMFP1	MACCS	0.260	0.377	0.462
317	Avalon-1024	MFP1-bits	0.259	0.375	0.613
174	Avalon-1024	MFP2-bits	0.257	0.372	0.656
331	Avalon-1024	MFP2	0.257	0.371	0.658
139	Avalon-1024	MFP3-bits	0.255	0.369	0.662
4	FeatMFP2	MACCS	0.253	0.367	0.492
128	Avalon-1024	MFP3	0.251	0.363	0.663
273	RDKit7	TT	0.250	0.360	0.647
241	FeatMFP3	MACCS	0.248	0.361	0.496
138	AP	Avalon-1024	0.248	0.360	0.599
212	Avalon-512	TT-bits	0.245	0.354	0.604
87	AP	RDKit7	0.244	0.352	0.589
183	MACCS	MFP1	0.243	0.354	0.481
19	MACCS	TT-bits	0.243	0.354	0.490
127	FeatMFP2	RDKit7	0.240	0.347	0.594
160	MACCS	MFP1-bits	0.239	0.348	0.476
301	FeatMFP3	RDKit7	0.230	0.332	0.626
336	MACCS	MFP2	0.229	0.335	0.480
107	MACCS	MFP2-bits	0.228	0.333	0.478
223	FeatMFP1	RDKit7	0.228	0.331	0.495
119	MACCS	MFP3-bits	0.227	0.332	0.474
132	MACCS	MFP3	0.225	0.330	0.474
27	Avalon-512	TT	0.224	0.326	0.588
277	MACCS	TT	0.224	0.327	0.470
172	Avalon-512	FeatMFP1	0.212	0.310	0.461
219	Avalon-512	FeatMFP2	0.211	0.309	0.540
218	AP	Avalon-512	0.206	0.301	0.522
24	Avalon-512	FeatMFP3	0.203	0.297	0.567
196	AP	MACCS	0.200	0.293	0.433
7	MFP2-bits	RDKit7	0.191	0.279	0.600
238	MFP3-bits	RDKit7	0.190	0.278	0.617
268	Avalon-512	MFP2-bits	0.184	0.270	0.551
51	MFP2	RDKit7	0.183	0.269	0.597
176	Avalon-512	MFP3-bits	0.181	0.267	0.560
104	Avalon-512	MFP2	0.179	0.265	0.550
201	Avalon-512	MFP1	0.178	0.262	0.502
312	Avalon-512	MFP1-bits	0.177	0.262	0.500
272	MFP3	RDKit7	0.173	0.254	0.610
229	MFP1-bits	RDKit7	0.173	0.254	0.532
243	MFP1	RDKit7	0.173	0.254	0.533
258	Avalon-512	MFP3	0.171	0.252	0.557

What about methods that work well for similarity-based virtual screening?

Look at the methods that we found to be "best" as measured by AUC for similarity-based virtual screening in our benchmarking paper (http://www.jcheminf.com/content/5/1/26 ). The table itself is here: http://www.jcheminf.com/content/5/1/26/table/T1

I've got best in quotes here because there wasn't a statistically significant difference in performance.

In [15]:

subset=df[df.Sim1.isin(('AP','Avalon-1024','TT','RDKit5'))][df.Sim2.isin(('AP','Avalon-1024','TT','RDKit5'))]
HTML(subset.to_html(float_format=lambda x: '%4.3f' % x,
    classes="table display"))

Out[15]:

	Sim1	Sim2	Tau	Spearman	Pearson
130	AP	RDKit5	0.308	0.439	0.656
138	AP	Avalon-1024	0.248	0.360	0.599
348	AP	TT	0.484	0.662	0.788
126	Avalon-1024	RDKit5	0.468	0.641	0.824
338	Avalon-1024	TT	0.293	0.420	0.683
191	RDKit5	TT	0.397	0.552	0.742

That is the correlation over the entire range of similarities. What about if we just look at the top pairs for each fingerprint?

In [16]:

nToDo=200
apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]
ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]
avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]
rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]

In [17]:

idsToKeep=set()
idsToKeep.update([x[1] for x in apl])
idsToKeep.update([x[1] for x in ttl])
idsToKeep.update([x[1] for x in avl])
idsToKeep.update([x[1] for x in rdkl])
print len(idsToKeep)

In [18]:

limitedLists={}
for fp in ('AP','TT','Avalon-1024','RDKit5'):
    limitedLists[fp]=[scoredLists[fp][x] for x in idsToKeep]

In [19]:

figure(figsize=(30,4))
subplot(1,6,1)
tau,s,p=directCompare(limitedLists,'AP','TT',silent=True)
title('tau=%.2f'%tau)
subplot(1,6,2)
tau,s,p=directCompare(limitedLists,'AP','Avalon-1024',silent=True)
title('tau=%.2f'%tau)
subplot(1,6,3)
tau,s,p=directCompare(limitedLists,'AP','RDKit5',silent=True)
title('tau=%.2f'%tau)
subplot(1,6,4)
tau,s,p=directCompare(limitedLists,'TT','Avalon-1024',silent=True)
title('tau=%.2f'%tau)
subplot(1,6,5)
tau,s,p=directCompare(limitedLists,'TT','RDKit5',silent=True)
title('tau=%.2f'%tau)
subplot(1,6,6)
tau,s,p=directCompare(limitedLists,'Avalon-1024','RDKit5',silent=True)
_=title('tau=%.2f'%tau)

The Tau values are still pretty low. The rankings from these fingerprints tend to have a low correlation with each other.

The comparison in the benchmarking paper showed, on the other hand, that across a broad range of data sets the fingerprints perform at about the same level when it comes to enrichment. It seems like there's either a contradiction or this set of pairs isn't particularly representative of what we used for that paper.

Even more concrete: look at the number of overlapping pairs in that pick

Look at the overlap between the top picks of those fingerprints.

In [20]:

nToDo=200
apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]
ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]
avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]
rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]
idsToKeep=set()
idsToKeep.update([x[1] for x in apl])
idsToKeep.update([x[1] for x in ttl])
idsToKeep.update([x[1] for x in avl])
idsToKeep.update([x[1] for x in rdkl])
print 'Overall number:',len(idsToKeep)
ids={}
ids['AP']=set([x[1] for x in apl])
ids['TT']=set([x[1] for x in ttl])
ids['Avalon-1024']=set([x[1] for x in avl])
ids['RDKit5']=set([x[1] for x in rdkl])

ks = sorted(ids.keys())
for i,k in enumerate(ks):
    for j in range(i+1,len(ks)):
        overlap=len(ids[k].intersection(ids[ks[j]]))
        print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))

Overall number: 384
AP Avalon-1024 102 0.51
AP RDKit5 112 0.56
AP TT 137 0.69
Avalon-1024 RDKit5 125 0.62
Avalon-1024 TT 111 0.56
RDKit5 TT 117 0.58

So each of those sets of picks has a good fraction (>40%) of different compounds. Nice!

Repeat that for fewer picks:

In [21]:

nToDo=100
apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]
ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]
avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]
rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]
idsToKeep=set()
idsToKeep.update([x[1] for x in apl])
idsToKeep.update([x[1] for x in ttl])
idsToKeep.update([x[1] for x in avl])
idsToKeep.update([x[1] for x in rdkl])
print 'Overall number:',len(idsToKeep)
ids={}
ids['AP']=set([x[1] for x in apl])
ids['TT']=set([x[1] for x in ttl])
ids['Avalon-1024']=set([x[1] for x in avl])
ids['RDKit5']=set([x[1] for x in rdkl])

ks = sorted(ids.keys())
for i,k in enumerate(ks):
    for j in range(i+1,len(ks)):
        overlap=len(ids[k].intersection(ids[ks[j]]))
        print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))

Overall number: 175
AP Avalon-1024 58 0.58
AP RDKit5 69 0.69
AP TT 86 0.86
Avalon-1024 RDKit5 56 0.56
Avalon-1024 TT 60 0.60
RDKit5 TT 70 0.70

Still a significant number of unique compounds when considering pairwise overlaps. AP--TT is, of course, something of an exception.

Idle Curiosity: Difference between the correlation coefficients

In [22]:

xk='Tau'
yk='Spearman'
tplt=df.plot(x=xk,y=yk,style='o')
minV=min(min(df[xk]),min(df[yk]))
maxV=max(max(df[xk]),max(df[yk]))
tplt.plot((minV,maxV),(minV,maxV))
xlabel(xk)
_=ylabel(yk)

In [23]:

xk='Tau'
yk='Pearson'
tplt=df.plot(x=xk,y=yk,style='o')
minV=min(min(df[xk]),min(df[yk]))
maxV=max(max(df[xk]),max(df[yk]))
tplt.plot((minV,maxV),(minV,maxV))
xlabel(xk)
_=ylabel(yk)

In [24]:

xk='Spearman'
yk='Pearson'
tplt=df.plot(x=xk,y=yk,style='o')
minV=min(min(df[xk]),min(df[yk]))
maxV=max(max(df[xk]),max(df[yk]))
tplt.plot((minV,maxV),(minV,maxV))
xlabel(xk)
_=ylabel(yk)

Here's the full set of results... there are a lot

In [25]:

from IPython.core.display import display,HTML,Javascript
HTML(df.sort(columns=['Tau'],ascending=False).to_html(float_format=lambda x: '%4.3f' % x,
    classes="table display"))

Out[25]:

	Sim1	Sim2	Tau	Spearman	Pearson
80	MFP1	MFP1-bits	0.953	0.992	0.996
141	MFP0	MFP0-bits	0.948	0.958	0.968
240	MFP2	MFP3	0.926	0.992	0.989
48	MFP2	MFP2-bits	0.923	0.989	0.996
205	FeatMFP2	FeatMFP3	0.919	0.990	0.981
106	RDKit5-linear	RDKit6-linear	0.917	0.990	0.993
203	RDKit4-linear	RDKit5-linear	0.914	0.990	0.990
251	RDKit6-linear	RDKit7-linear	0.899	0.985	0.994
165	RDKit4	RDKit4-linear	0.893	0.984	0.988
161	MFP2-bits	MFP3-bits	0.892	0.983	0.986
247	MFP3	MFP3-bits	0.887	0.979	0.996
242	MFP2-bits	MFP3	0.883	0.980	0.985
22	RDKit4	RDKit5-linear	0.877	0.978	0.985
329	RDKit4	RDKit5	0.871	0.976	0.983
224	MFP2	MFP3-bits	0.867	0.974	0.984
158	TT	TT-bits	0.857	0.969	0.981
109	RDKit4-linear	RDKit6-linear	0.856	0.971	0.973
152	RDKit4	RDKit6-linear	0.847	0.967	0.973
70	RDKit5	RDKit6-linear	0.845	0.966	0.980
112	RDKit5-linear	RDKit7-linear	0.843	0.964	0.979
162	MFP1	MFP2	0.838	0.962	0.961
256	RDKit5	RDKit7-linear	0.837	0.962	0.976
287	RDKit5	RDKit5-linear	0.836	0.962	0.977
239	MFP1-bits	MFP2	0.823	0.955	0.958
129	MFP1-bits	MFP2-bits	0.819	0.953	0.958
12	MFP1	MFP2-bits	0.813	0.950	0.957
315	FeatMFP1	FeatMFP2	0.805	0.947	0.937
271	RDKit4-linear	RDKit5	0.804	0.947	0.960
105	Avalon-1024	Avalon-512	0.800	0.943	0.963
60	RDKit4	RDKit7-linear	0.800	0.943	0.958
309	MFP1	MFP3	0.793	0.939	0.923
110	RDKit4-linear	RDKit7-linear	0.792	0.940	0.952
45	MFP1-bits	MFP3	0.780	0.931	0.920
156	RDKit5	RDKit6	0.768	0.919	0.974
57	MFP1-bits	MFP3-bits	0.765	0.923	0.918
46	MFP1	MFP3-bits	0.762	0.921	0.919
116	FeatMFP1	FeatMFP3	0.755	0.916	0.873
101	RDKit6	RDKit7-linear	0.745	0.906	0.959
100	RDKit6	RDKit6-linear	0.692	0.865	0.946
93	AP	AP-bits	0.683	0.862	0.928
209	RDKit6	RDKit7	0.683	0.861	0.931
193	RDKit4	RDKit6	0.672	0.848	0.928
148	RDKit5-linear	RDKit6	0.655	0.834	0.927
77	RDKit4-linear	RDKit6	0.624	0.807	0.894
325	FeatMFP3	MFP3	0.620	0.804	0.902
91	MFP1	TT	0.618	0.802	0.871
33	MFP1-bits	TT	0.615	0.799	0.868
88	FeatMFP3	MFP2	0.608	0.791	0.897
252	MFP2	TT	0.605	0.789	0.893
300	FeatMFP3	MFP3-bits	0.604	0.788	0.897
89	MFP2-bits	TT	0.600	0.784	0.890
184	FeatMFP3	MFP2-bits	0.598	0.782	0.893
291	FeatMFP2	MFP2	0.583	0.768	0.852
142	FeatMFP2	MFP3	0.582	0.768	0.836
344	MFP1	TT-bits	0.582	0.767	0.851
2	MFP1-bits	TT-bits	0.579	0.764	0.849
283	FeatMFP3	MFP1	0.574	0.759	0.850
222	MFP3	TT	0.574	0.758	0.876
28	FeatMFP2	MFP2-bits	0.574	0.759	0.848
95	FeatMFP2	MFP1	0.570	0.754	0.838
335	FeatMFP3	MFP1-bits	0.569	0.754	0.847
154	FeatMFP2	MFP3-bits	0.569	0.754	0.832
284	MFP2	TT-bits	0.567	0.751	0.872
262	MFP3-bits	TT	0.567	0.750	0.873
13	FeatMFP2	MFP1-bits	0.565	0.750	0.835
49	MFP2-bits	TT-bits	0.563	0.746	0.870
265	FeatMFP0	FeatMFP1	0.553	0.737	0.770
97	Avalon-512	RDKit7	0.541	0.723	0.821
121	MFP3	TT-bits	0.536	0.718	0.855
155	MFP3-bits	TT-bits	0.531	0.712	0.853
143	Avalon-1024	RDKit6	0.516	0.694	0.850
245	Avalon-512	RDKit6	0.486	0.663	0.795
348	AP	TT	0.484	0.662	0.788
10	FeatMFP1	MFP1	0.482	0.658	0.718
83	FeatMFP2	RDKit5	0.479	0.655	0.774
71	FeatMFP1	MFP1-bits	0.478	0.653	0.715
276	FeatMFP2	RDKit4	0.477	0.654	0.760
303	RDKit7	RDKit7-linear	0.475	0.648	0.831
54	AP	TT-bits	0.474	0.651	0.783
11	FeatMFP3	RDKit5	0.473	0.646	0.788
257	Avalon-1024	MACCS	0.473	0.650	0.697
231	FeatMFP2	RDKit6-linear	0.472	0.648	0.766
250	FeatMFP3	RDKit4	0.471	0.646	0.763
304	RDKit5	RDKit7	0.469	0.638	0.835
126	Avalon-1024	RDKit5	0.468	0.641	0.824
195	FeatMFP3	RDKit6-linear	0.468	0.642	0.782
69	FeatMFP2	RDKit5-linear	0.467	0.642	0.756
290	FeatMFP1	RDKit4	0.467	0.643	0.699
305	Avalon-1024	RDKit7	0.466	0.637	0.798
318	FeatMFP3	RDKit5-linear	0.463	0.637	0.766
228	FeatMFP2	RDKit7-linear	0.462	0.635	0.764
173	FeatMFP1	RDKit5	0.461	0.636	0.689
263	FeatMFP1	MFP2	0.459	0.633	0.685
134	FeatMFP2	TT	0.458	0.629	0.766
337	FeatMFP1	RDKit6-linear	0.458	0.632	0.685
39	FeatMFP3	RDKit7-linear	0.457	0.628	0.787
153	FeatMFP1	RDKit5-linear	0.457	0.631	0.687
194	Avalon-1024	RDKit7-linear	0.455	0.625	0.808
320	FeatMFP3	TT	0.454	0.624	0.799
68	FeatMFP1	MFP3	0.454	0.628	0.659
302	FeatMFP2	RDKit4-linear	0.453	0.627	0.735
159	Avalon-512	MACCS	0.453	0.627	0.684
213	FeatMFP1	MFP2-bits	0.452	0.625	0.681
197	FeatMFP3	RDKit4-linear	0.449	0.621	0.738
140	FeatMFP1	RDKit4-linear	0.449	0.623	0.684
55	FeatMFP1	MFP3-bits	0.445	0.617	0.656
332	FeatMFP1	RDKit7-linear	0.444	0.616	0.673
14	FeatMFP0	FeatMFP2	0.443	0.614	0.622
114	MACCS	RDKit6	0.439	0.609	0.659
295	FeatMFP2	TT-bits	0.438	0.605	0.753
113	Avalon-1024	RDKit6-linear	0.433	0.599	0.794
310	Avalon-1024	RDKit4	0.432	0.599	0.781
266	FeatMFP3	TT-bits	0.432	0.597	0.785
297	AP-bits	TT-bits	0.425	0.590	0.741
25	MACCS	RDKit5	0.424	0.591	0.658
74	Avalon-1024	RDKit5-linear	0.420	0.584	0.776
166	MACCS	RDKit4	0.417	0.583	0.659
210	FeatMFP0	FeatMFP3	0.417	0.583	0.554
86	AP	MFP1	0.417	0.580	0.746
151	AP	MFP1-bits	0.416	0.579	0.745
167	AP-bits	TT	0.413	0.575	0.729
66	RDKit5	TT-bits	0.412	0.571	0.752
189	RDKit6-linear	RDKit7	0.412	0.570	0.795
349	Avalon-1024	RDKit4-linear	0.410	0.573	0.749
330	FeatMFP2	RDKit6	0.410	0.569	0.742
319	MACCS	RDKit4-linear	0.410	0.575	0.649
260	MACCS	RDKit7-linear	0.408	0.571	0.634
78	MACCS	RDKit5-linear	0.402	0.565	0.641
84	MACCS	RDKit6-linear	0.401	0.563	0.636
99	FeatMFP3	RDKit6	0.401	0.557	0.767
169	AP	MFP2	0.398	0.557	0.761
164	AP	MFP2-bits	0.397	0.556	0.760
191	RDKit5	TT	0.397	0.552	0.742
199	RDKit7-linear	TT-bits	0.396	0.551	0.737
296	RDKit6-linear	TT-bits	0.395	0.550	0.727
249	RDKit4	TT-bits	0.391	0.546	0.705
29	FeatMFP1	RDKit6	0.390	0.547	0.642
53	RDKit4	RDKit7	0.389	0.540	0.759
311	MACCS	RDKit7	0.387	0.545	0.620
200	AP-bits	RDKit6	0.386	0.539	0.696
202	RDKit5-linear	TT-bits	0.385	0.538	0.705
230	RDKit6	TT-bits	0.384	0.535	0.761
102	RDKit6-linear	TT	0.382	0.534	0.717
198	RDKit7-linear	TT	0.380	0.532	0.727
282	Avalon-512	RDKit5	0.380	0.533	0.734
342	MFP1	RDKit4	0.378	0.533	0.674
40	RDKit4	TT	0.378	0.530	0.695
5	RDKit5-linear	RDKit7	0.378	0.527	0.760
188	FeatMFP1	TT	0.378	0.531	0.614
187	MFP1	RDKit5	0.377	0.530	0.695
227	Avalon-512	RDKit7-linear	0.376	0.528	0.720
135	MFP1	RDKit6-linear	0.375	0.528	0.685
217	MFP1	RDKit5-linear	0.373	0.526	0.673
328	MFP1-bits	RDKit4	0.372	0.525	0.668
178	MFP1-bits	RDKit5	0.372	0.524	0.691
16	AP	MFP3-bits	0.372	0.524	0.753
146	RDKit5-linear	TT	0.372	0.522	0.695
26	AP	MFP3	0.371	0.524	0.754
346	MFP1-bits	RDKit6-linear	0.369	0.521	0.681
347	MFP2	RDKit5	0.368	0.519	0.730
30	MFP1-bits	RDKit5-linear	0.367	0.518	0.668
275	FeatMFP1	TT-bits	0.366	0.516	0.607
288	MFP2	RDKit6-linear	0.366	0.516	0.721
192	MFP2	RDKit4	0.366	0.517	0.694
323	MFP3	RDKit5	0.366	0.516	0.732
123	MFP3	RDKit4	0.365	0.516	0.691
65	MFP2-bits	RDKit5	0.365	0.515	0.727
32	MFP3	RDKit6-linear	0.365	0.515	0.725
292	MFP3-bits	RDKit5	0.364	0.513	0.729
259	RDKit4-linear	TT-bits	0.363	0.511	0.667
186	AP-bits	RDKit7-linear	0.363	0.511	0.679
108	MFP1	RDKit7-linear	0.362	0.511	0.686
76	MFP3	RDKit5-linear	0.362	0.512	0.702
341	MFP2	RDKit5-linear	0.362	0.511	0.701
246	MFP2-bits	RDKit6-linear	0.362	0.510	0.718
235	RDKit6	TT	0.361	0.507	0.748
1	MFP3-bits	RDKit6-linear	0.361	0.510	0.722
3	AP-bits	RDKit7	0.361	0.508	0.663
261	MFP2-bits	RDKit4	0.361	0.510	0.690
81	AP	FeatMFP2	0.360	0.508	0.695
52	MFP1	RDKit4-linear	0.360	0.509	0.646
96	MFP3-bits	RDKit4	0.359	0.508	0.687
322	AP-bits	MFP1-bits	0.358	0.504	0.689
294	AP-bits	RDKit5	0.358	0.504	0.670
313	MFP1-bits	RDKit7-linear	0.357	0.506	0.682
299	AP-bits	MFP1	0.357	0.503	0.690
298	MFP2-bits	RDKit5-linear	0.357	0.504	0.697
215	RDKit4-linear	RDKit7	0.357	0.501	0.718
204	MFP2	RDKit7-linear	0.356	0.503	0.729
41	MFP3-bits	RDKit5-linear	0.356	0.504	0.698
6	MFP3	RDKit7-linear	0.354	0.500	0.737
136	MFP2-bits	RDKit7-linear	0.354	0.500	0.727
255	MFP1-bits	RDKit4-linear	0.354	0.501	0.640
314	MFP3-bits	RDKit7-linear	0.353	0.499	0.735
73	RDKit4-linear	TT	0.350	0.495	0.656
293	MFP3	RDKit4-linear	0.349	0.495	0.663
270	MFP2	RDKit4-linear	0.348	0.493	0.665
58	AP-bits	MFP2-bits	0.346	0.490	0.708
274	AP	FeatMFP3	0.346	0.489	0.720
42	AP-bits	FeatMFP2	0.345	0.487	0.671
103	AP-bits	RDKit6-linear	0.343	0.485	0.659
149	AP-bits	MFP2	0.342	0.485	0.707
157	MFP2-bits	RDKit4-linear	0.342	0.486	0.660
182	MFP3-bits	RDKit4-linear	0.342	0.486	0.659
345	Avalon-512	RDKit6-linear	0.339	0.481	0.696
144	AP-bits	RDKit4	0.333	0.473	0.635
115	Avalon-512	RDKit4	0.333	0.473	0.679
18	AP-bits	MFP3-bits	0.330	0.469	0.705
75	AP-bits	RDKit5-linear	0.329	0.468	0.639
234	AP-bits	FeatMFP3	0.328	0.465	0.688
324	AP	FeatMFP1	0.327	0.465	0.588
181	AP-bits	FeatMFP1	0.326	0.463	0.584
279	Avalon-512	RDKit5-linear	0.322	0.458	0.673
8	AP-bits	MFP3	0.320	0.455	0.701
214	AP-bits	RDKit4-linear	0.316	0.450	0.610
321	AP	RDKit7-linear	0.315	0.448	0.672
206	MFP2-bits	RDKit6	0.315	0.448	0.720
43	MFP3-bits	RDKit6	0.314	0.446	0.729
248	MFP2	RDKit6	0.312	0.445	0.721
343	Avalon-512	RDKit4-linear	0.312	0.446	0.644
47	MFP1	RDKit6	0.312	0.445	0.671
37	MFP1-bits	RDKit6	0.309	0.441	0.667
118	Avalon-1024	TT-bits	0.308	0.439	0.693
180	AP	RDKit6-linear	0.308	0.439	0.656
130	AP	RDKit5	0.308	0.439	0.656
79	MFP3	RDKit6	0.306	0.437	0.729
281	AP	RDKit6	0.299	0.426	0.663
333	AP	RDKit4	0.299	0.428	0.623
278	AP	RDKit5-linear	0.298	0.427	0.634
236	AP-bits	Avalon-1024	0.296	0.425	0.620
168	Avalon-1024	FeatMFP2	0.296	0.425	0.653
338	Avalon-1024	TT	0.293	0.420	0.683
94	Avalon-1024	FeatMFP3	0.291	0.418	0.679
190	Avalon-1024	FeatMFP1	0.291	0.420	0.563
170	AP-bits	MACCS	0.289	0.417	0.517
289	AP	RDKit4-linear	0.284	0.409	0.599
122	AP-bits	Avalon-512	0.279	0.402	0.574
56	RDKit7	TT-bits	0.278	0.398	0.668
171	FeatMFP0	RDKit4	0.269	0.386	0.422
111	FeatMFP0	RDKit4-linear	0.266	0.383	0.423
82	FeatMFP0	RDKit5-linear	0.265	0.382	0.413
163	FeatMFP0	RDKit6-linear	0.264	0.380	0.405
316	FeatMFP0	RDKit5	0.263	0.379	0.405
85	Avalon-1024	MFP1	0.261	0.377	0.615
34	FeatMFP1	MACCS	0.260	0.377	0.462
120	MFP0-bits	MFP1-bits	0.260	0.372	0.600
317	Avalon-1024	MFP1-bits	0.259	0.375	0.613
174	Avalon-1024	MFP2-bits	0.257	0.372	0.656
179	FeatMFP0	RDKit7-linear	0.257	0.370	0.395
331	Avalon-1024	MFP2	0.257	0.371	0.658
286	MFP0	MFP1	0.255	0.365	0.605
254	MFP0	MFP1-bits	0.255	0.365	0.603
139	Avalon-1024	MFP3-bits	0.255	0.369	0.662
150	MFP0-bits	MFP1	0.253	0.362	0.593
4	FeatMFP2	MACCS	0.253	0.367	0.492
128	Avalon-1024	MFP3	0.251	0.363	0.663
273	RDKit7	TT	0.250	0.360	0.647
241	FeatMFP3	MACCS	0.248	0.361	0.496
138	AP	Avalon-1024	0.248	0.360	0.599
212	Avalon-512	TT-bits	0.245	0.354	0.604
87	AP	RDKit7	0.244	0.352	0.589
183	MACCS	MFP1	0.243	0.354	0.481
19	MACCS	TT-bits	0.243	0.354	0.490
127	FeatMFP2	RDKit7	0.240	0.347	0.594
160	MACCS	MFP1-bits	0.239	0.348	0.476
301	FeatMFP3	RDKit7	0.230	0.332	0.626
117	FeatMFP0	RDKit6	0.230	0.334	0.373
336	MACCS	MFP2	0.229	0.335	0.480
107	MACCS	MFP2-bits	0.228	0.333	0.478
223	FeatMFP1	RDKit7	0.228	0.331	0.495
119	MACCS	MFP3-bits	0.227	0.332	0.474
132	MACCS	MFP3	0.225	0.330	0.474
27	Avalon-512	TT	0.224	0.326	0.588
63	MFP0-bits	MFP2-bits	0.224	0.322	0.565
277	MACCS	TT	0.224	0.327	0.470
9	MFP0-bits	MFP2	0.220	0.316	0.563
35	MFP0	MFP2	0.219	0.315	0.570
31	MFP0	MFP2-bits	0.218	0.313	0.568
216	MFP0-bits	MFP3-bits	0.215	0.310	0.559
207	MFP0-bits	MFP3	0.213	0.307	0.558
172	Avalon-512	FeatMFP1	0.212	0.310	0.461
226	MFP0	MFP3	0.211	0.304	0.565
36	FeatMFP0	MFP1	0.211	0.308	0.369
219	Avalon-512	FeatMFP2	0.211	0.309	0.540
92	FeatMFP0	MFP1-bits	0.210	0.305	0.366
244	MFP0	MFP3-bits	0.209	0.300	0.562
218	AP	Avalon-512	0.206	0.301	0.522
24	Avalon-512	FeatMFP3	0.203	0.297	0.567
50	Avalon-1024	FeatMFP0	0.200	0.294	0.341
196	AP	MACCS	0.200	0.293	0.433
221	AP	MFP0	0.198	0.285	0.545
326	FeatMFP0	MFP3	0.198	0.290	0.328
125	FeatMFP0	MFP2	0.197	0.288	0.340
20	FeatMFP1	MFP0-bits	0.197	0.283	0.442
334	AP-bits	FeatMFP0	0.196	0.287	0.346
327	AP	MFP0-bits	0.196	0.282	0.533
177	FeatMFP0	MFP0	0.195	0.259	0.319
253	FeatMFP0	MFP3-bits	0.195	0.286	0.327
208	FeatMFP1	MFP0	0.195	0.280	0.447
38	FeatMFP0	MFP2-bits	0.194	0.284	0.338
225	FeatMFP0	MFP0-bits	0.193	0.256	0.313
7	MFP2-bits	RDKit7	0.191	0.279	0.600
238	MFP3-bits	RDKit7	0.190	0.278	0.617
0	FeatMFP2	MFP0-bits	0.188	0.271	0.487
268	Avalon-512	MFP2-bits	0.184	0.270	0.551
59	FeatMFP3	MFP0-bits	0.183	0.264	0.508
237	FeatMFP2	MFP0	0.183	0.264	0.490
51	MFP2	RDKit7	0.183	0.269	0.597
176	Avalon-512	MFP3-bits	0.181	0.267	0.560
67	FeatMFP0	MACCS	0.180	0.263	0.308
104	Avalon-512	MFP2	0.179	0.265	0.550
201	Avalon-512	MFP1	0.178	0.262	0.502
131	FeatMFP3	MFP0	0.178	0.256	0.512
312	Avalon-512	MFP1-bits	0.177	0.262	0.500
272	MFP3	RDKit7	0.173	0.254	0.610
229	MFP1-bits	RDKit7	0.173	0.254	0.532
243	MFP1	RDKit7	0.173	0.254	0.533
258	Avalon-512	MFP3	0.171	0.252	0.557
350	AP-bits	MFP0-bits	0.170	0.246	0.492
98	AP-bits	MFP0	0.169	0.244	0.501
233	MFP0-bits	TT	0.162	0.235	0.467
124	AP	FeatMFP0	0.161	0.236	0.308
64	FeatMFP0	TT	0.159	0.233	0.295
264	MFP0	TT	0.159	0.231	0.471
308	Avalon-512	FeatMFP0	0.158	0.233	0.289
147	MFP0-bits	TT-bits	0.157	0.228	0.457
90	FeatMFP0	TT-bits	0.156	0.228	0.293
21	MFP0	TT-bits	0.154	0.223	0.461
211	MFP0-bits	RDKit4	0.149	0.217	0.411
232	MFP0-bits	RDKit4-linear	0.148	0.214	0.397
269	MFP0-bits	RDKit5-linear	0.148	0.214	0.415
185	MFP0	RDKit4	0.147	0.213	0.413
44	MACCS	MFP0	0.147	0.213	0.356
220	FeatMFP0	RDKit7	0.146	0.215	0.288
306	MFP0-bits	RDKit6-linear	0.146	0.212	0.426
15	MACCS	MFP0-bits	0.145	0.211	0.349
23	MFP0	RDKit4-linear	0.145	0.211	0.400
137	MFP0	RDKit5-linear	0.144	0.210	0.417
17	MFP0-bits	RDKit5	0.144	0.209	0.425
175	MFP0	RDKit6-linear	0.142	0.206	0.428
339	MFP0-bits	RDKit7-linear	0.140	0.204	0.432
340	MFP0	RDKit5	0.140	0.203	0.427
280	Avalon-1024	MFP0	0.139	0.202	0.414
145	Avalon-1024	MFP0-bits	0.138	0.201	0.409
307	MFP0	RDKit7-linear	0.136	0.198	0.434
267	MFP0-bits	RDKit6	0.121	0.176	0.414
133	MFP0	RDKit6	0.116	0.169	0.416
61	Avalon-512	MFP0	0.108	0.158	0.352
62	Avalon-512	MFP0-bits	0.107	0.157	0.347
72	MFP0-bits	RDKit7	0.077	0.112	0.340
285	MFP0	RDKit7	0.072	0.105	0.341

In []:

Sunday, October 27, 2013