This is a relatively short one because I just wanted to point people to an older (thus buried) dataset from some former colleagues that I've found really useful in the past but that I think a lot of folks aren't aware of.
The dataset is in the supplementary material to this paper: https://pubs.acs.org/doi/abs/10.1021/jm020472j
"Informative Library Design as an Efficient Strategy to Identify and Optimize Leads: Application to Cyclin-Dependent Kinase 2 Antagonists" by Erin Bradley et al. The paper itself is worth reading, but the buried treasure is the Excel file in the supplementary material, which contains SMILES and measured data for >17K compounds. There's also a very useful PDF which explains the columns in that file.
What's very cool is that the compounds are a mix of things from a small general-purpose screening library and compounds purchased or synthesized for a med chem project. I'm not aware of any other public datasets that have this type of information.
Let's look at what's there.
Note: This is another one of those posts that blogger couldn't quite deal with, so I did some editing here. There's a bit more in the
jupyter notebook in github