A Statistical Approach to Correcting Cross Annotations in a
Metagenomic Functional Profile Generated by Short Reads

Du R; Mercante D; An L; Fang Z

A Statistical Approach to Correcting Cross Annotations in a Metagenomic Functional Profile Generated by Short Reads

Abstract

Du R, Mercante D, An L, Fang Z

Background: Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searching of metagenomic sequencing reads against a protein database, the relative abundance of a family can be represented by the number of reads aligned to its members. However, it has been observed that, for short reads generated by next-generation sequencing platforms, some may be erroneously assigned to the functional families they are not associated to. This commonly occurred phenomenon is termed as cross-annotation. Current methods for functional profiling of a metagenomic sample use empirical cutoff values, to select the alignments and ignore such cross-annotation problem, or employ summarized equation to do a simple adjustment. Result: By introducing latent variables, we use the Probabilistic Latent Semantic Analysis to model the proportions of reads assigned to functional families in a metagenomic sample. The approach can be applied on a metagenomic sample after the list of the true functional families being obtained or estimated. It was implemented in metagenomic samples functionally characterized by the database of Clusters of Orthologous Groups of proteins, and successfully addressed the cross-annotation issue on both in vitro-simulated, bioinformatics tool simulated metagenomic samples, and a real-world data. Conclusions: Correcting cross-annotation will increase the accuracy of the functional profiling of a metagenome generated by short reads. It will further benefit differential abundance analysis of metagenomic samples under different conditions.

Avertissement: Ce résumé a été traduit à l'aide d'outils d'intelligence artificielle et n'a pas encore été examiné ni vérifié

Partagez cet article

Faits saillants de la revue

Indexé dans

Index Copernic
Google Scholar
Sherpa Roméo
Base de données des revues académiques
Ouvrir la porte J
JournalSeek de génamique
Clés académiques
JournalTOC
RechercheBible
Infrastructure nationale du savoir de Chine (CNKI)
Annuaire des périodiques d'Ulrich
Accès à la recherche mondiale en ligne sur l'agriculture (AGORA)
Bibliothèque de revues électroniques
Recherche de référence
Université Hamdard
EBSCO AZ
Répertoire d’indexation des résumés pour les revues
OCLC-WorldCat
Catalogue en ligne SWB
Bibliothèque virtuelle de biologie (vifabio)
Publons
Euro Pub

Journal de biométrie et biostatistique