Phylogeny-Aware Chemoinformatic Analysis of Chemical Diversity in Lamiaceae Enables Iridoid Pathway Assembly and Discovery of Aucubin Synthase

Carlos E. Rodríguez-López*, Yindi Jiang, Mohamed O. Kamileen, Benjamin R. Lichman, Benke Hong, Brieanne Vaillancourt, C. Robin Buell, Sarah E. O'Connor

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Countless reports describe the isolation and structural characterization of natural products, yet this information remains disconnected and underutilized. Using a cheminformatics approach, we leverage the reported observations of iridoid glucosides with the known phylogeny of a large iridoid producing plant family (Lamiaceae) to generate a set of biosynthetic pathways that best explain the extant iridoid chemical diversity. We developed a pathway reconstruction algorithm that connects iridoid reports via reactions and prunes this solution space by considering phylogenetic relationships between genera. We formulate a model that emulates the evolution of iridoid glucosides to create a synthetic data set, used to select the parameters that would best reconstruct the pathways, and apply them to the iridoid data set to generate pathway hypotheses. These computationally generated pathways were then used as the basis by which to select and screen biosynthetic enzyme candidates. Our model was successfully applied to discover a cytochrome P450 enzyme from Callicarpa americana that catalyzes the oxidation of bartsioside to aucubin, predicted by our model despite neither molecule having been observed in the genus. We also demonstrate aucubin synthase activity in orthologues of Vitex agnus-castus, and the outgroup Paulownia tomentosa, further strengthening the hypothesis, enabled by our model, that the reaction was present in the ancestral biosynthetic pathway. This is the first systematic hypothesis on the epi-iridoid glucosides biosynthesis in 25 years and sets the stage for streamlined work on the iridoid pathway. This work highlights how curation and computational analysis of widely available structural data can facilitate hypothesis-based gene discovery.

Original languageEnglish
Article numbermsac057
Number of pages18
JournalMolecular Biology and Evolution
Issue number4
Early online date17 Mar 2022
Publication statusPublished - 1 Apr 2022

Bibliographical note

Publisher Copyright:
© 2022 The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.


  • chemical diversity
  • cheminformatics
  • comparative biochemistry
  • cytochrome P450
  • iridoids
  • pathway reconstruction

Cite this