By the same authors

From the same journal

Evaluating techniques for metagenome annotation using simulated sequence data

Research output: Contribution to journalArticle



Publication details

DateAccepted/In press - 4 May 2016
DateE-pub ahead of print - 8 May 2016
DatePublished (current) - 11 May 2016
Number of pages39
Early online date8/05/16
Original languageEnglish


The advent of next-generation sequencing has allowed huge amounts of DNA
sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The
current challenge is identifying from which microorganisms and genes the DNA originated.
Several tools and databases are available for annotating DNA sequences. The tools, databases
and parameters used can have a significant impact on the results: naïve choice of these factors
can result in a false representation of community composition and function. We use a
simulated metagenome to show how different parameters affect annotation accuracy by
evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and
Megablast. This simulated metagenome allowed the recovery of known organism and
function abundances to be quantitatively evaluated, which is not possible for environmental
metagenomes. The performance of each program and database varied, e.g. One Codex
correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced
many false positive annotations. This effect decreased as the taxonomic level investigated
increased. Selecting more stringent parameters decreases the annotation sensitivity, but
increases precision. Ultimately, there is a trade-off between taxonomic resolution and
annotation accuracy. These results should be considered when annotating metagenomes and
interpreting results from previous studies.

Bibliographical note

© FEMS 2016.

Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations