Scientific software developer in the Washington, D.C. area.
Portfolio of my projects
Why Some Organic Molecules Have a Color
It’s usually because of a long chain of conjugated bonds. I search 20K data points to find a series of molecules where extending the conjugated chain increases the absorption wavelength.
Tautomer Generation Algorithms and InChI Representations
Which cheminformatics algorithms produce the most tautomers? And how successful is InChI at representing with a single representation all tautomers of a given structure?
Molecular Isotopic Distributions: Permutations and Combinations
These posts use two different methods to calculate molecular isotopic mass distributions.
RDKit Contribution MolsMatrixToGridImage()
I contributed MolsMatrixToGridImage to the RDKit 2023.09.1 release to draw row-and-column grids of molecules.
Display Molecular Formulas
Uses Python, RDKit, seaborn, and matplotlib
How to display molecular formulas such as C3H4O2 in molecular grids, tables, and graphs. Also works for other HTML-, Markdown-, or LaTeX-formatted text.
Molecular Formula Generation
Uses Python and RDKit
In cheminformatics, the typical way of representing a molecule is with a SMILES string such as CCO
for ethanol. However, there are still cases where the molecular formula such as C2H6O is useful.
Refitting Data From Wiener’s Classic Cheminformatics Paper
Uses Python, SciPy, Polars, NumPy, seaborn, matplotlib, and mol_frame
How well did cheminformatics pioneers Egloff and Wiener fit their models to boiling points of alkanes in the 1940s? This blog post revisits their fits using digital tools.
Revisiting a Classic Cheminformatics Paper: The Wiener Index
Uses Python, RDKit, Polars, matplotlib, seaborn, py2opsin, and mol_frame
This post revisits Harry Wiener’s article “Structural Determination of Paraffin Boiling Points”, extracts data for molecules from it, recalculates cheminformatics parameters and boiling points, and plots the data.
RDKit Utility to Check Whether Starting Materials for Synthesizing Your Target Molecules Are Commercially Available
Uses Python, RDKit, PubChem’s API, asyncio, and Semaphore
Given target molecules and reactions to synthesize them, determine whether the starting materials are commercially available using PubChem’s API, and thus whether the target is synthetically accessible.
RDKit Utility to Create a Mass Spectrometry Fragmentation Tree
Uses Python and RDKit
Given a mass spec fragmentation hierarchy, with species as SMILES strings, display the fragmentation tree in a grid, labeling each species with its name and either mass or mass to charge ratio m/z
.
RDKit Utility to Find the Maximum Common Substructure, and Groups Off It, Between a Set of Molecules
Uses Python and RDKit
Given a collection of molecules as SMILES strings, find the maximum common substructure (MCS) match between them, and the groups off that common core for each molecule, displaying the results using a grid.
Chemistry machine learning for drug discovery with DeepChem
Uses Python, DeepChem, seaborn, Matplotlib, and pandas
Use the DeepChem deep learning package to predict compounds’ lipophilicity–how well they are absorbed into the lipids of biological membranes, which is important for oral delivery of drugs.
RDKit Utility to Visualize Retrosynthetic Analysis Hierarchically
Uses Python and RDKit
Given a target molecule, use the Recap algorithm to decompose it into a set of fragments that could be combined to make the parent molecule using common reactions. Display the fragmentation hierarchically.
RDKit Utility to Find and Highlight the Maximum Common Substructure Amongst Molecules
Uses Python and RDKit
Given a collection of molecules as SMILES strings, find the maximum common substructure (MCS) match between them as a SMARTS string, display the match pattern as a molecule, and highlight the match pattern in each molecule using a grid.
Materials and Cheminformatics Sampler
Uses Python, NumPy, SymPy, ChemPy, Flask, JavaScript, and Bootstrap
Find a given number of points which satisfy constraints given in a constraints file for an n-dimensional space defined on the unit hypercube, then write them to an output file.
Optionally, identify the components (dimensions) in the constraints file using chemical formulas, and Sampler will use ChemPy to calculate their molar masses, then output the component weight fraction.
Periodic Table Navigator
Uses Ruby, Sinatra, PostgreSQL, and JavaScript
Understand how the elements are related to each other. Emphasizes electronic configuration of the elements.
My open-source contributions
RDKit cheminformatics package
- Conceived, proposed, and coded MolsMatrixToGridImage feature to use a two-dimensional (nested) data structure as input to create molecular grid images. Feature was merged into the main codebase by the project maintainer and scheduled for 2023_09_1 release.
- Improved documentation by illustrating drawing capability in tutorial and adding SMILES (chemical notation) for R groups
SymPy computer algebra system in pure Python
- Technical writer for funded 2022 Season of Docs project: Creating documentation for how to solve equations
- Core developer wrote “I think you are doing excellent work on the SymPy documentation. Thank you!”
- Led selection of new Sphinx theme for SymPy documentation; the new theme was implemented
- Contributed code for documentation to explain usage of a core class for users and developers, and improve accessibility
- Lead developer wrote “You’ve been doing great work with the Sphinx theme and other documentation work”
ChemPy package for chemistry in Python
- Initiated and provided scientific and coding direction to issue to improve interpretation of chemical formulas
- Spurred a developer to improve code
- Package author wrote “Great work guys!”
Sphinx documentation generator
- Initiated issue to improve accessibility and internationalization of documentation generated by Sphinx; was addressed within a day by Sphinx’s main developer
Posts
Why some organic molecules have a color: Correlating optical absorption wavelength with conjugated bond chain length
Comparing Tautomer Generation Algorithms
Molecular Isotopic Distributions Take 2: Combinations
Molecular Isotopic Distributions Take 1: Permutations
MolsMatrixToGridImage Simplifies Code
Displaying Molecular Formulas in Molecular Grids, Tables, and Graphs for Elemental Analysis
Molecular Formula Generation
Refitting Data From Wiener’s Classic Cheminformatics Paper
Revisiting a Classic Cheminformatics Paper: The Wiener Index
Are the Starting Materials for Synthesizing Your Target Molecules Commercially Available?
Draw a Mass Spectrometry Fragmentation Tree Using RDKit
Find the Maximum Common Substructure, and Groups Off It, For a Set of Molecules Using RDKit
Visualizing Nonbinary Trees: Classification of Chemical Isomers
Chemistry machine learning for drug discovery with DeepChem
Retrosynthetically Decompose a Molecule In a Tree Structure Using Recap in RDKit
Find and Highlight the Maximum Common Substructure Between a Set of Molecules Using RDKit
subscribe via RSS