tag:blogger.com,1999:blog-684443317148892945.post5381025254562043247..comments2024-02-29T23:54:39.092-08:00Comments on RDKit: Colliding Bits IIIgreg landrumhttp://www.blogger.com/profile/10263150365422242369noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-684443317148892945.post-5519964155829315382016-02-26T06:29:32.422-08:002016-02-26T06:29:32.422-08:00There is a related paper on how the bit-based natu...There is a related paper on how the bit-based nature of fingerprints affects clustering here:<br /><br /><a href="http://www.ncbi.nlm.nih.gov/pubmed/11206366" rel="nofollow">Ties in proximity and clustering compounds.</a><br /><br />This is caused by similarity measures having a finite number of bits that can be used in union and intersection operators which means that is a finite number of tanimoto values, say, that can actually be calculated. Collisions affect this behavior adversely, although ironically, can make identify similar compounds with fingerprints more effective due to locality sensitive hashing (i.e. similar molecules SHOULD have collisions more often). An rdkit based example is here http://chembl.blogspot.com/2015/08/lsh-based-similarity-search-in-mongodb.htmlAnonymoushttps://www.blogger.com/profile/08220913015418466051noreply@blogger.com