Latest Posts
-
How to Write Cheminformatics Blog Posts
As the YouTubers would say, “A lot of you have been asking me about how to write cheminformatics blog posts.” Well, not a lot, but at least a couple! I realized that there’s a pattern to how I write cheminformatics blog posts (16 so far), so I’m sharing that here.
-
Why some organic molecules have a color: Correlating optical absorption wavelength with conjugated bond chain length
Molecules have a color if their electronic energy levels are close enough to absorb visible rather than ultraviolet light. For organic molecules, that’s often because of an extensive chain of conjugated bonds. Can we use cheminformatics to find evidence that increasing conjugated bond chain length decreases absorption wavelength, which makes a molecule colored?
-
Comparing Tautomer Generation Algorithms
Tautomers are chemical structures that readily interconvert under given conditions. For example, an amino acid has a neutral form, and a zwitterionic form with separated positive and negative charges. Cheminformatics packages have algorithms to enumerate tautomers based on rules. Which algorithms produce the most tautomers? And how successful is InChI at representing with a single representation all tautomers of a given structure?
-
Molecular Isotopic Distributions Take 2: Combinations
This blog post presents a more computationally-efficient way to determine the abundance of the molecular isotopes of a molecule.
-
Molecular Isotopic Distributions Take 1: Permutations
Elements can have several isotopes, which have the same number of protons and electrons but different numbers of neutrons. Because a neutron has a mass of approximately 1 amu (atomic mass unit), different isotopes of an element appear at different mass-to-charge ratios in a mass spectrum as measured by a mass spectrometer.
-
MolsMatrixToGridImage Simplifies Code
I contributed MolsMatrixToGridImage to the RDKit 2023.09.1 release because I found myself writing similar code over and over to draw row-and-column grids of molecules. For projects where each row represented something, such as a molecule and the fragments off a common core, my mental model corresponded to a two-dimensional (nested) data structure, whereas the pre-existing function MolsToGridImage supported only linear (flat) data structures.
-
Displaying Molecular Formulas in Molecular Grids, Tables, and Graphs for Elemental Analysis
Here’s how to display formatted molecular formulas in tables and graphs. In addition to formatted molecular formulas, these techniques should work for any Markdown or LaTeX.
-
Molecular Formula Generation
In cheminformatics, the typical way of representing a molecule is with a SMILES string such as
CCO
for ethanol. A SMILES string can be converted into a molecular graph, which can be used to determine molecular structure and related properties. However, there are still cases where the molecular formula such as C2H6O is useful. -
Refitting Data From Wiener’s Classic Cheminformatics Paper
In a previous post, I revisited Wiener’s paper predicting alkanes’ boiling points using modern cheminformatics tools. This follow-up post refits the data with modern mathematical tools to check how well the literature parameters, and the current parameters optimized here, fit the data.
-
Revisiting a Classic Cheminformatics Paper: The Wiener Index
Harry Wiener was “a pioneer in cheminformatics and chemical graph theory”. In his 1947 Journal of the American Chemical Society article “Structural Determination of Paraffin Boiling Points”, he introduced the path number $\omega$ “as the sum of the distances between any two carbon atoms in the molecule, in terms of carbon-carbon bonds”, which is now known as the Wiener index. He used his index to model the boiling points of alkanes (also known as paraffins). This post revisits that article, extracts data for molecules from it, recalculates cheminformatics parameters and boiling points, and plots the data.
-
Are the Starting Materials for Synthesizing Your Target Molecules Commercially Available?
This utility reports whether the starting materials are commercially available for a set of synthesis targets given reactions. You give it your synthesis targets and the reaction to create each, it determines the starting materials, checks whether they are commercially available, and tells you whether each target is accessible–whether all its starting materials are commercially available.
-
Draw a Mass Spectrometry Fragmentation Tree Using RDKit
This utility plots a mass spectrometry fragmentation tree given the species’ SMILES and their hierarchy, that is which species fragments into which other species.
-
Find the Maximum Common Substructure, and Groups Off It, For a Set of Molecules Using RDKit
In drug discovery, the lead optimization step often involves creating analogues of a hit (a promising compound which produces a desired result in an assay) to optimize selectivity and minimize toxicity. Because it is typically easier to chemically modify the periphery of the molecule (for example the functional groups) than the scaffold, it is helpful to compare the groups off of the common scaffold. This utility function uses RDKit to find the maximum common substructure (MCS) between a set of molecules, then show the groups off of that MCS.
-
Visualizing Nonbinary Trees: Classification of Chemical Isomers
This example lets you visualize the hierarchy of nonbinary trees. An example is the classification of chemical isomers, which are compounds that have the same chemical formula, but different arrangements of atoms in space.
-
Chemistry machine learning for drug discovery with DeepChem
This example uses machine learning to predict the lipophilicity of compounds.
-
Retrosynthetically Decompose a Molecule In a Tree Structure Using Recap in RDKit
Retrosynthetic analysis involves decomposing a target molecule into a set of fragments that could be combined to make the parent molecule using common reactions. The Recap algorithm by X. Lewell, D. Judd, S. Watson, and M. Hann accomplishes that. Recap is implemented in the RDKit cheminformatics Python package.
-
Find and Highlight the Maximum Common Substructure Between a Set of Molecules Using RDKit
When analyzing a set of molecules, you might want to find the maximum common substructure (MCS) match between them. This utility function
SmilesMCStoGridImage
does that for a set of molecules specified by SMILES, displays the SMARTS substructure as a molecule, and displays all the molecules in a grid with that substructure highlighted and aligned.