Monday, April 27, 2020

Possible Google Season of Docs Projects

We're planning to apply to be a mentoring organization for Google Season of Docs this year.

The primary instigator/organizer of this is Vincent Scalfani. For people who may not have "met" him yet, Vin is the one who wrote the fantastic new version of the RDKit Cookbook and has made a number of other contributions to improve the RDKit documentation.

Here's Vin's anouncement about our planned participation:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg09805.html

And here are the ideas we're currently thinking of. I will try and keep this post up to date as our thinking evolves. If you have other ideas or suggestions, please get in touch!

Project Name: Expansion of the RDKit Book to Outline Additional Methods Implemented




Description: The RDKit Book serves to describe and outline supported features available within RDKit [1]. The Book offers a high level overview of supported features for users. Some examples include supported molecular file formats, molecular descriptor calculation methods, and chemical reaction support. There are many supported methods that are not yet documented in the RDKit Book such as additional file format reading support, available molecular descriptor/fingerprint calculations (e.g., Morse atom fingerprints, Coulomb Matrices, QED descriptor), and chemical validation/standardization (MolVS) integration. This project would inventory the available features mentioned in the RDKit Release notes [2] that are not yet described in the RDKit Book. These additional features would then be added to the RDKit book with a description, example, and links to the original GitHub pull request that added the feature. Moreover, related scientific literature references and a link to the API docs describing the module can also be added. Further, it would be useful to update existing feature methods currently in the RDKit Book with links to the GitHub code, related literature, and API docs, where possible.
Related Material: 
[2] RDKit Release Notes: https://github.com/rdkit/rdkit/releases


Project Name:  A Guide for Reading and Making the Most of the RDKit Python API Docs

Description: The RDKit Python API Documentation serves as the comprehensive reference guideto the available Python modules for accessing RDKit functionality with Python [1]. This reference work can be intimidating and confusing to new users. A highly useful contribution to the RDKit documentation would be to create a guide on how to read and use the RDKit Python API documentation. This could include:
  1. A summary overview of the different packages/subpackages along with an explanation of the syntax and instructions for importing the appropriate function in a Python script (see the SciPy API docs as an example [2])
  2. A graphical depiction of the API structure
  3. A worked example of using a particular module and demonstrating how to import the module, use a particular class, and specify options
  4. An explanation of what the C++ signatures mean and why these can be useful within the context of the Python API.
  5. Addition of links to the source code (see for example the Deepchem API docs [3])
Related Material: 
[1] RDKit Python API Reference: https://www.rdkit.org/docs/api-docs.html
[3] DeepChem API Docs: https://www.deepchem.io/docs/deepchem.html 



Project Name: RDKit and Pandas Book

Description: RDKit integrates with the Python Pandas data analysis and data structures tools [1]. This integration allows users to store and analyze molecules in convenient data frames [2]. There is not yet a detailed formal documentation guide to using RDKit with Pandas. However, there are individual code examples scattered throughout Stack Overflow [3], the RDKit mailing lists [4], and GitHub [5]. It would be useful to compile and harmonize these examples into an in-depth RDKit and Pandas Documentation Book. Similarly to other RDKit Documentation, the RDKit and Pandas Book would be written in reStructuredText with Sphinx doctest to allow testing of code snippets, where possible.

Related Material:
[1] https://pandas.pydata.org/
[2] https://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html;
[3] https://stackoverflow.com/search?q=rdkit+pandas;
[4] https://www.mail-archive.com/search?q=pandas&l=rdkit-discuss%40lists.sourceforge.net;
[5] https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/004_RDKit_pandas_support.ipynb

No comments: