Sunday, November 29, 2015

Using vagrant with the RDKit

This is more of a technical post that may be of interest to people doing RDKit development work. It's about getting a vagrant box setup to build and test the RDKit.

Background

There are a lot of people using the RDKit via our KNIME integration. The KNIME nodes are an easy way to access a subset of the RDKit functionality in a nice graphical workflow environment. The nodes themselves, like KNIME, are written in Java and make use the RDKit Java wrappers, so whenever we do a new release of the RDKit, I need to build new versions of those wrappers so that we can update the KNIME nodes. Since we have KNIME users who are stuck using older versions of Linux, I try to maintain compatibility back as far as RHEL5 (released 8.5 years ago, but still being somewhat supported by RedHat). This means that I need to build the Linux version of the Java wrappers with an ancient version of gcc: v4.1. Ubuntu (the Linux distrib that I normally use for RDKit development) hasn't supported gcc v4.1 since the 10.04 release (not still being supported), so I also need an old Linux install. I used to keep a couple of vmware images, one 32bit and one 64bit, around for the builds, but this became a pain, so I wanted to try something else.

Enter Vagrant

Vagrant is a tool for creating reproducible development environments. There's a lot more information, including some excellent documentation, on the other side of that link.

I started by installing Virtual Box on my current development machine (running Ubuntu 15.04). I tracked down a couple of vagrant boxes for Ubuntu 10.04 at https://atlas.hashicorp.com/boxes/search (mrgcastle/ubuntu-lucid32 and f500/ubuntu-lucid64). For each of these I created a local box. Here's the process for the 64bit box:
mkdir lucid64
cd lucid64
vagrant box add f500/ubuntu-lucid64
vagrant init f500/ubuntu-lucid64
This downloads the box image, and configures everything. After editing the resulting Vagrantfile to mount my local RDKit pull (this isn't necessary, I could have just as easily pulled the code from github):
# Share an additional folder to the guest VM. The first argument is
# the path on the host to the actual folder. The second argument is
# the path on the guest to mount the folder. And the optional third
# argument is a set of non-required options.
config.vm.synced_folder "/scratch/RDKit_git/", "/home/vagrant/RDKit_git"
I created a file to provision the machine so that it had everything required to build the RDKit Java wrappers and then brought the machine up:
vagrant up
A couple minutes later I could connect to the machine (with vagrant ssh) and build the Java wrappers.

Between that Vagrantfile and bootstrap.sh, I have everything I need to easily and reproducibly create an VM that is configured and ready to go to build the RDKit. Nice!

Given that such an ancient configuration probably isn't that useful to others, here's a Vagrantfile and bootstrap.sh for a 64bit Ubuntu 12.04 box. This can be used on another Linux machine, a Mac, or Windows (I haven't actually tried Windows).