Friday, July 10, 2020

Adding molecular metadata to PNGs

I really like the idea of having an image of a molecule that I can re-construct the molecule from. It's been possible to do this with SVGs from the RDKit for a while by adding chemical metadata. There's a desciption of that here that shows both adding the metadata and constructing the molecule from it.

An aside: Since you can now add SVGs to powerpoint documents, I was super excited about the idea of being able to have molecule images in my Powerpoint decks together with the metadata to reconstruct those molecules. Unforunately, as of this writing at least, Powerpoint strips out all the metadata that it doesn't recognize, so the chemistry information all ends up being removed. Sad...

Anyway, back to the point of this blog post: I wanted to do the same thing with the PNGs that the RDKit can generate, but kept getting stuck on how to do this from the C++ side. Today I realized that it's quite easy to do it in Python and that this would possibly be useful to most RDKit users. Thus this post.

If anyone knows of a decent C++ snippet or library with an BSD-compatible license (this rules out exiv2, which otherwise looks like the perfect thing) that shows how to add metadata to a PNG, please do let me know.

In [1]:
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.molSize = 350,300
from PIL import Image
from io import BytesIO
import rdkit
print(rdkit.__version__)
2020.03.4
In [2]:
colchicine = Chem.MolFromSmiles('COc1cc2c(c(OC)c1OC)-c1ccc(OC)c(=O)cc1[C@@H](NC(C)=O)CC2')
colchicine
Out[2]:

A quick reminder on what the metadata in the SVG looks like:

In [3]:
dm = Draw.PrepareMolForDrawing(colchicine)
d2d = Draw.MolDraw2DSVG(450,400)
d2d.DrawMolecule(dm)
d2d.AddMoleculeMetadata(dm)
d2d.FinishDrawing()
svg = d2d.GetDrawingText()
with open('/tmp/blah.svg','w+') as outf:
    outf.write(svg)
In [4]:
!grep rdkit /tmp/blah.svg
                      xmlns:rdkit='http://www.rdkit.org/xml'
<rdkit:mol xmlns:rdkit = "http://www.rdkit.org/xml" version="0.9">
<rdkit:atom idx="1" atom-smiles="[CH3]" drawing-x="429.545" drawing-y="166.585" x="6.46024" y="1.03002" z="0" />
<rdkit:atom idx="2" atom-smiles="[O]" drawing-x="390.087" drawing-y="133.821" x="5.30621" y="1.98825" z="0" />
<rdkit:atom idx="3" atom-smiles="[c]" drawing-x="341.983" drawing-y="151.611" x="3.89934" y="1.46795" z="0" />
<rdkit:atom idx="4" atom-smiles="[cH]" drawing-x="302.525" drawing-y="118.847" x="2.74531" y="2.42618" z="0" />
<rdkit:atom idx="5" atom-smiles="[c]" drawing-x="254.421" drawing-y="136.637" x="1.33844" y="1.90588" z="0" />
<rdkit:atom idx="6" atom-smiles="[c]" drawing-x="245.776" drawing-y="187.191" x="1.0856" y="0.427343" z="0" />
<rdkit:atom idx="7" atom-smiles="[c]" drawing-x="285.235" drawing-y="219.955" x="2.23963" y="-0.530891" z="0" />
<rdkit:atom idx="8" atom-smiles="[O]" drawing-x="276.59" drawing-y="270.509" x="1.98679" y="-2.00943" z="0" />
<rdkit:atom idx="9" atom-smiles="[CH3]" drawing-x="316.048" drawing-y="303.273" x="3.14082" y="-2.96766" z="0" />
<rdkit:atom idx="10" atom-smiles="[c]" drawing-x="333.338" drawing-y="202.165" x="3.6465" y="-0.0105878" z="0" />
<rdkit:atom idx="11" atom-smiles="[O]" drawing-x="372.797" drawing-y="234.929" x="4.80053" y="-0.968822" z="0" />
<rdkit:atom idx="12" atom-smiles="[CH3]" drawing-x="364.152" drawing-y="285.483" x="4.54769" y="-2.44736" z="0" />
<rdkit:atom idx="13" atom-smiles="[c]" drawing-x="200.861" drawing-y="211.952" x="-0.228013" y="-0.296833" z="0" />
<rdkit:atom idx="14" atom-smiles="[cH]" drawing-x="215.007" drawing-y="261.251" x="0.1857" y="-1.73865" z="0" />
<rdkit:atom idx="15" atom-smiles="[cH]" drawing-x="185.283" drawing-y="303.047" x="-0.683614" y="-2.96106" z="0" />
<rdkit:atom idx="16" atom-smiles="[c]" drawing-x="134.073" drawing-y="305.868" x="-2.18134" y="-3.04357" z="0" />
<rdkit:atom idx="17" atom-smiles="[O]" drawing-x="114.396" drawing-y="353.231" x="-2.75685" y="-4.42878" z="0" />
<rdkit:atom idx="18" atom-smiles="[CH3]" drawing-x="63.5393" drawing-y="359.871" x="-4.24422" y="-4.62298" z="0" />
<rdkit:atom idx="19" atom-smiles="[c]" drawing-x="99.9385" drawing-y="267.59" x="-3.17967" y="-1.92404" z="0" />
<rdkit:atom idx="20" atom-smiles="[O]" drawing-x="50.64" drawing-y="281.735" x="-4.62149" y="-2.33775" z="0" />
<rdkit:atom idx="21" atom-smiles="[cH]" drawing-x="108.584" drawing-y="217.035" x="-2.92683" y="-0.445502" z="0" />
<rdkit:atom idx="22" atom-smiles="[c]" drawing-x="153.498" drawing-y="192.275" x="-1.61322" y="0.278673" z="0" />
<rdkit:atom idx="23" atom-smiles="[C@@H]" drawing-x="139.353" drawing-y="142.976" x="-2.02693" y="1.72049" z="0" />
<rdkit:atom idx="24" atom-smiles="[NH]" drawing-x="88.7988" drawing-y="134.331" x="-3.50547" y="1.97333" z="0" />
<rdkit:atom idx="25" atom-smiles="[C]" drawing-x="71.0086" drawing-y="86.2273" x="-4.02577" y="3.3802" z="0" />
<rdkit:atom idx="26" atom-smiles="[CH3]" drawing-x="20.4545" drawing-y="77.5822" x="-5.50431" y="3.63304" z="0" />
<rdkit:atom idx="27" atom-smiles="[O]" drawing-x="103.772" drawing-y="46.7688" x="-3.06754" y="4.53423" z="0" />
<rdkit:atom idx="28" atom-smiles="[CH2]" drawing-x="169.076" drawing-y="101.179" x="-1.15762" y="2.9429" z="0" />
<rdkit:atom idx="29" atom-smiles="[CH2]" drawing-x="220.287" drawing-y="98.3583" x="0.340111" y="3.02541" z="0" />
<rdkit:bond idx="1" begin-atom-idx="1" end-atom-idx="2" bond-smiles="-" />
<rdkit:bond idx="2" begin-atom-idx="2" end-atom-idx="3" bond-smiles="-" />
<rdkit:bond idx="3" begin-atom-idx="3" end-atom-idx="4" bond-smiles="=" />
<rdkit:bond idx="4" begin-atom-idx="4" end-atom-idx="5" bond-smiles="-" />
<rdkit:bond idx="5" begin-atom-idx="5" end-atom-idx="6" bond-smiles="=" />
<rdkit:bond idx="6" begin-atom-idx="6" end-atom-idx="7" bond-smiles="-" />
<rdkit:bond idx="7" begin-atom-idx="7" end-atom-idx="8" bond-smiles="-" />
<rdkit:bond idx="8" begin-atom-idx="8" end-atom-idx="9" bond-smiles="-" />
<rdkit:bond idx="9" begin-atom-idx="7" end-atom-idx="10" bond-smiles="=" />
<rdkit:bond idx="10" begin-atom-idx="10" end-atom-idx="11" bond-smiles="-" />
<rdkit:bond idx="11" begin-atom-idx="11" end-atom-idx="12" bond-smiles="-" />
<rdkit:bond idx="12" begin-atom-idx="6" end-atom-idx="13" bond-smiles="-" />
<rdkit:bond idx="13" begin-atom-idx="13" end-atom-idx="14" bond-smiles="=" />
<rdkit:bond idx="14" begin-atom-idx="14" end-atom-idx="15" bond-smiles="-" />
<rdkit:bond idx="15" begin-atom-idx="15" end-atom-idx="16" bond-smiles="=" />
<rdkit:bond idx="16" begin-atom-idx="16" end-atom-idx="17" bond-smiles="-" />
<rdkit:bond idx="17" begin-atom-idx="17" end-atom-idx="18" bond-smiles="-" />
<rdkit:bond idx="18" begin-atom-idx="16" end-atom-idx="19" bond-smiles="-" />
<rdkit:bond idx="19" begin-atom-idx="19" end-atom-idx="20" bond-smiles="=" />
<rdkit:bond idx="20" begin-atom-idx="19" end-atom-idx="21" bond-smiles="-" />
<rdkit:bond idx="21" begin-atom-idx="21" end-atom-idx="22" bond-smiles="=" />
<rdkit:bond idx="22" begin-atom-idx="22" end-atom-idx="23" bond-smiles="-" />
<rdkit:bond idx="23" begin-atom-idx="23" end-atom-idx="24" bond-smiles="-" />
<rdkit:bond idx="24" begin-atom-idx="24" end-atom-idx="25" bond-smiles="-" />
<rdkit:bond idx="25" begin-atom-idx="25" end-atom-idx="26" bond-smiles="-" />
<rdkit:bond idx="26" begin-atom-idx="25" end-atom-idx="27" bond-smiles="=" />
<rdkit:bond idx="27" begin-atom-idx="23" end-atom-idx="28" bond-smiles="-" />
<rdkit:bond idx="28" begin-atom-idx="28" end-atom-idx="29" bond-smiles="-" />
<rdkit:bond idx="29" begin-atom-idx="10" end-atom-idx="3" bond-smiles="-" />
<rdkit:bond idx="30" begin-atom-idx="22" end-atom-idx="13" bond-smiles="-" />
<rdkit:bond idx="31" begin-atom-idx="29" end-atom-idx="5" bond-smiles="-" />
</rdkit:mol></metadata>

Ok, let's move onto PNGs. Start by getting a PIL Image object with the molecule drawing. There's a convenience function for this:

In [5]:
img = Draw.MolToImage(dm,size=(450,400))
img
Out[5]:

Now use the PngImagePlugin which is part of pillow to add CXSMILES as the metadata and write the PNG out to a file:

In [6]:
from PIL.PngImagePlugin import PngInfo
metadata = PngInfo()
metadata.add_text("RDKit_SMILES",Chem.MolToCXSmiles(dm))
img.save("/tmp/blah.png",format="PNG",pnginfo=metadata)

Confirm that we can access the metadata when we read the file back in:

In [7]:
nimg = Image.open('/tmp/blah.png')
nimg.text
Out[7]:
{'RDKit_SMILES': 'COc1cc2c(-c3ccc(OC)c(=O)cc3[C@@H](NC(C)=O)CC2)c(OC)c1OC |(6.46024,1.03002,;5.30621,1.98825,;3.89934,1.46795,;2.74531,2.42618,;1.33844,1.90588,;1.0856,0.427343,;-0.228013,-0.296833,;0.1857,-1.73865,;-0.683614,-2.96106,;-2.18134,-3.04357,;-2.75685,-4.42878,;-4.24422,-4.62298,;-3.17967,-1.92404,;-4.62149,-2.33775,;-2.92683,-0.445502,;-1.61322,0.278673,;-2.02693,1.72049,;-3.50547,1.97333,;-4.02577,3.3802,;-5.50431,3.63304,;-3.06754,4.53423,;-1.15762,2.9429,;0.340111,3.02541,;2.23963,-0.530891,;1.98679,-2.00943,;3.14082,-2.96766,;3.6465,-0.0105878,;4.80053,-0.968822,;4.54769,-2.44736,)|'}
In [8]:
tmol = Chem.MolFromSmiles(nimg.text['RDKit_SMILES'])
tmol
Out[8]:

And demonstrate that this isn't just something the PIL tools can read... show the metadata using imagemagick:

In [9]:
!identify -verbose /tmp/blah.png | grep RDK
    RDKit_SMILES: COc1cc2c(-c3ccc(OC)c(=O)cc3[C@@H](NC(C)=O)CC2)c(OC)c1OC |(6.46024,1.03002,;5.30621,1.98825,;3.89934,1.46795,;2.74531,2.42618,;1.33844,1.90588,;1.0856,0.427343,;-0.228013,-0.296833,;0.1857,-1.73865,;-0.683614,-2.96106,;-2.18134,-3.04357,;-2.75685,-4.42878,;-4.24422,-4.62298,;-3.17967,-1.92404,;-4.62149,-2.33775,;-2.92683,-0.445502,;-1.61322,0.278673,;-2.02693,1.72049,;-3.50547,1.97333,;-4.02577,3.3802,;-5.50431,3.63304,;-3.06754,4.53423,;-1.15762,2.9429,;0.340111,3.02541,;2.23963,-0.530891,;1.98679,-2.00943,;3.14082,-2.96766,;3.6465,-0.0105878,;4.80053,-0.968822,;4.54769,-2.44736,)|

Since people inevitably want to put structures into Powerpoint, make sure that we can safely add these and not lose the metadata. I will use the python-pptx library to demo this.

In [10]:
from pptx import Presentation
from pptx.util import Mm

bio = BytesIO()
img.save(bio,format="PNG",pnginfo=metadata)

prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)

left = top = Mm(50)
pic = slide.shapes.add_picture(bio, left, top, height=Mm(60))

prs.save('/tmp/test.pptx')

Make sure we don't lose the metadata:

In [11]:
nprs = Presentation('/tmp/test.pptx')
slide = nprs.slides[0]
pic = slide.shapes[0]
bio = BytesIO(pic.image.blob)
nimg = Image.open(bio)
nimg.text
Out[11]:
{'RDKit_SMILES': 'COc1cc2c(-c3ccc(OC)c(=O)cc3[C@@H](NC(C)=O)CC2)c(OC)c1OC |(6.46024,1.03002,;5.30621,1.98825,;3.89934,1.46795,;2.74531,2.42618,;1.33844,1.90588,;1.0856,0.427343,;-0.228013,-0.296833,;0.1857,-1.73865,;-0.683614,-2.96106,;-2.18134,-3.04357,;-2.75685,-4.42878,;-4.24422,-4.62298,;-3.17967,-1.92404,;-4.62149,-2.33775,;-2.92683,-0.445502,;-1.61322,0.278673,;-2.02693,1.72049,;-3.50547,1.97333,;-4.02577,3.3802,;-5.50431,3.63304,;-3.06754,4.53423,;-1.15762,2.9429,;0.340111,3.02541,;2.23963,-0.530891,;1.98679,-2.00943,;3.14082,-2.96766,;3.6465,-0.0105878,;4.80053,-0.968822,;4.54769,-2.44736,)|'}

And to be super sure, I opened the file in PPT in Office365, added some text, and then re-downloaded it

In [12]:
nprs = Presentation('/home/glandrum/Downloads/test.pptx')
slide = nprs.slides[0]
pic = slide.shapes[0]
bio = BytesIO(pic.image.blob)
nimg = Image.open(bio)
nimg.text
Out[12]:
{'RDKit_SMILES': 'COc1cc2c(-c3ccc(OC)c(=O)cc3[C@@H](NC(C)=O)CC2)c(OC)c1OC |(-4.30669,0.156621,;-3.68129,-0.623579,;-2.68309,-0.470179,;-2.34369,0.441021,;-1.32889,0.576821,;-0.68409,-0.187979,;0.27171,-0.186179,;0.52551,-1.16158,;1.40091,-1.58258,;2.32971,-1.17158,;3.10331,-1.78918,;4.03411,-1.42338,;2.52231,-0.182779,;3.50711,0.0420207,;1.92331,0.619821,;0.93211,0.590821,;0.67991,1.56562,;1.46991,2.19622,;1.32211,3.18522,;0.39191,3.55222,;2.10511,3.80762,;-0.21529,2.02862,;-1.08509,1.57442,;-1.05729,-1.09018,;-0.42849,-1.88038,;-0.79549,-2.81058,;-2.06789,-1.25978,;-2.42669,-2.18238,;-3.41509,-2.33478,)|'}

Looks good!