The primary instigator/organizer of this is Vincent Scalfani. For people who may not have "met" him yet, Vin is the one who wrote the fantastic new version of the RDKit Cookbook and has made a number of other contributions to improve the RDKit documentation.
Here's Vin's anouncement about our planned participation:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg09805.html
And here are the ideas we're currently thinking of. I will try and keep this post up to date as our thinking evolves. If you have other ideas or suggestions, please get in touch!
Description: The RDKit Book serves to describe and outline supported features available within RDKit [1]. The Book offers a high level overview of supported features for users. Some examples include supported molecular file formats, molecular descriptor calculation methods, and chemical reaction support. There are many supported methods that are not yet documented in the RDKit Book such as additional file format reading support, available molecular descriptor/fingerprint calculations (e.g., Morse atom fingerprints, Coulomb Matrices, QED descriptor), and chemical validation/standardization (MolVS) integration. This project would inventory the available features mentioned in the RDKit Release notes [2] that are not yet described in the RDKit Book. These additional features would then be added to the RDKit book with a description, example, and links to the original GitHub pull request that added the feature. Moreover, related scientific literature references and a link to the API docs describing the module can also be added. Further, it would be useful to update existing feature methods currently in the RDKit Book with links to the GitHub code, related literature, and API docs, where possible.
Description: The RDKit Python API Documentation serves as the comprehensive reference guideto the available Python modules for accessing RDKit functionality with Python [1]. This reference work can be intimidating and confusing to new users. A highly useful contribution to the RDKit documentation would be to create a guide on how to read and use the RDKit Python API documentation. This could include:
- A summary overview of the different packages/subpackages along with an explanation of the syntax and instructions for importing the appropriate function in a Python script (see the SciPy API docs as an example [2])
- A graphical depiction of the API structure
- A worked example of using a particular module and demonstrating how to import the module, use a particular class, and specify options
- An explanation of what the C++ signatures mean and why these can be useful within the context of the Python API.
- Addition of links to the source code (see for example the Deepchem API docs [3])
Related Material:
[1] https://pandas.pydata.org/
[2] https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html;
[3] https://stackoverflow.com/search?q=rdkit+pandas;
[4] https://www.mail-archive.com/search?q=pandas&l=rdkit-discuss%40lists.sourceforge.net;
[5] https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/004_RDKit_pandas_support.ipynb