Saturday, February 14, 2015

Feels like getting something for nothing...

Background

One of the focus points for the 2015.03 RDKit release is improving performance. To this end we've made changes that mitigate or remove some of the performance bottlenecks. These include, among others, modifications to the way SMILES are generated, rearranging the way the molecular GetProp/SetProp interface is used internally, and making the RDKit molecule smaller so that less memory is required. There are a couple of other changes coming; I think there should be a nice increase in the speed of common operations when the new version is released.

Getting something for nothing

Brian Kelley pointed out that using tcmalloc instead of the system-provided malloc implementation can lead to big speedups. It's super-easy to test (just add LD_PRELOAD=/usr/local/lib/libtcmalloc.so to the command line, so I gave it a try with the RDKit. Wow did it make a difference!
Many of the tests in the RDKit's basic python performance suite run too quickly to really be able to say much about performance, so I created a second performance suite that runs larger tests where it makes sense. This isn't yet complete - I need to add some reasonably sized tests of the conformation generation and force fields - but it's a decent start.
Here's a performance comparison for the current trunk status. The tests were run on my linux box (a three year old Dell Studio XPS) running Unbuntu 14.04.
test default with tcmalloc fraction
50K mols from SMILES 21.7 12.7 0.59
generate SMILES 12.7 6.5 0.51
10x1K mols from SDF 8.2 5.6 0.68
823 queries from SMILES 0.1 0.1 1.00
HasSubstructMatch 102.0 80.9 0.79
GetSubstructMatches 115.3 91.9 0.80
428 queries from SMARTS 0.0 0.0 0.0
HasSubstructMatch 287.0 239.6 0.83
GetSubstructMatches 288.2 240.8 0.84
generate Mol blocks 37.8 24.5 0.65
BRICS decomposition 79.4 53.6 0.68
generate 2D coords 27.1 23.9 0.88
generate RDKit fingerprints 148.4 80.8 0.54
generate Morgan fingerprints 7.5 3.8 0.51
That's a pretty dramatic speedup for no work at all!
It is, unfortunatetly, not possible to make using tcmalloc the default at RDKit build time: this would require that other programs using the RDKit shared libraries (python, postgresql, etc.) also be re-compiled to use tcmalloc. It's probably also not safe to use the LD_PRELOAD trick in your .bashrc, but setting it before starting a long-running process seems like it definitely could be a win.

1 comment:

Unknown said...

A bit late, but for posterity under OSX, FaceBook's jemalloc is a better replacement than tcmalloc. Homebrew has this, so

> brew install jemalloc
> DYLD_INSERT_LIBRARIES=/usr/local/lib/libjemalloc.dylib

and you are off to the races.