This one is a request for advice/expertise on performance tuning/compiler flag tweaking on Windows. The short story is that when the RDKit is built using Visual Studio on Windows it ends up being substantially slower than when it's built with g++ and run using the Windows Subsystem for Linux. This doesn't seem like it should be true, but I'm not an expert with either Visual Studio or Windows, so I'm asking for help.
Some more details:
When I've used the RDKit on Windows machines it has always seemed slower than it should. I've never really quantified that and so I've always just kind of shrugged and moved on. Now I've measured a real difference and I'd like to try and do something about it.
Some experiments that I did with Docker on Windows convinced me that the effect was real, but with the advent of Bash on Windows 10 (https://msdn.microsoft.com/en-us/commandline/wsl/install_guide) - an awesome thing, by the way - I have some real numbers.
The RDKit includes some code that I've used over the years to track the performance of some basic tasks. This script - https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/timings.py - looks at a broad subset of RDKit functionality.
The tests are:
- construct 1000 molecules from sdf
- construct 1000 molecules from smiles
- construct 823 fragment molecules from SMARTS (smiles really)
- 1000 x 100 HasSubstructMatch (100 from t3)
- 1000 x 100 GetSubstructMatches (100 from t3)
- construct 428 queries from RLewis_smarts.txt
- 1000 x 428 HasSubstructMatch
- 1000 x 428 GetSubstructMatches
- Generate canonical SMILES for 1000 molecules
- Generate mol blocks for 1000 molecules
- RECAP decomposition of the 1000 molecules
- Generate 2D coordinates for the 1000 molecules
- Generate 3D coordinates for the 1000 molecules
- Optimize those conformations using UFF
- Generate unique subgraphs of length 6 for the 1000 molecules
- Generate RDK fingerprints for the 1000 molecules
- Optimize the conformations above (test 13) using MMFF
0.8 || 0.4 || 0.1 || 1.2 || 1.3 || 0.0 || 4.2 || 4.2 || 0.2 || 0.3 || 7.4 || 0.3 || 7.5 || 18.5 || 2.2 || 1.4 || 41.7 0.6 || 0.3 || 0.1 || 0.9 || 1.0 || 0.0 || 3.1 || 3.2 || 0.1 || 0.2 || 6.3 || 0.3 || 6.2 || 15.2 || 2.1 || 1.0 || 29.8that's a real difference.
The Windows build is done using build files generated by cmake. It's a release mode build with Visual Studio using the flags: "/MD /O2 /Ob2 /D NDEBUG" (those are the defaults that cmake creates).
It doesn't seem right to me that the code generated by Visual Studio and running under Windows should be so much slower than the code generated by g++ and running under the Windows Linux subsystem. I'm hoping, for the good of all of the users of the RDKit on Windows, to find a tweak for the Visual C++ command-line options that produces faster compild code.
For what it's worth, here's a different set of benchmarks, run on a larger set of molecules. The script is here (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py):
- construct 50K molecules from SMILES
- generate canonical SMILES for those
- construct 10K molecules from SDF
- construct 823 fragment molecules from SMARTS (smiles really)
- 60K x 100 HasSubstructMatch
- 60K x 100 GetSubstructMatches
- construct 428 queries from RLewis_smarts.txt
- 60K x 428 HasSubstructMatch
- 60K x 428 GetSubstructMatches
- Generate 60K mol blocks
- BRICS decomposition of the 60K molecules
- Generate 2D coords for the 60K molecules
- Generate RDKit fingerpirnts for the 60K molecules
- Generate Morgan (radius=2) fingerprints for the 60K molecules.
The timings show the same, at times dramatic, performance differences :
18.8 || 8.5 || 6.8 || 0.1 || 85.8 || 106.2 || 0.0 || 264.2 || 268.6 || 14.0 || 77.2 || 20.9 || 104.7 || 13.0 17.5 || 9.8 || 6.7 || 0.1 || 68.0 || 74.2 || 0.0 || 204.6 || 208.2 || 9.6 || 56.5 || 20.6 || 89.0 || 6.6
If you have thoughts about what's going on here, please comment here, reach out on twitter, google+, or linkedin, or post to the mailing list.
Thanks!