Abstract
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Original language | English |
---|---|
Pages (from-to) | 2660-2672 |
Number of pages | 13 |
Journal | Molecular Biology and Evolution |
Volume | 38 |
Issue number | 6 |
Early online date | 6 Feb 2021 |
DOIs | |
Publication status | Published - 19 May 2021 |
Bibliographical note
Publisher Copyright:© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Keywords
- haplotype reconstruction
- linkage disequilibrium
- regularization
- within-host evolution