Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.

Vázquez-Castellanos, Jorge F; García-López, Rodrigo; Pérez-Brocal, Vicente; Pignatelli, Miguel; Moya, Andrés

Vázquez-Castellanos, Jorge F; García-López, Rodrigo; Pérez-Brocal, Vicente; Pignatelli, Miguel; Moya, Andrés.

Afiliação

Moya A; Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valencia (FISABIO)-Salud Pública, Avenida de Cataluña 21, 46020 Valencia, Spain. Andres.Moya@uv.es.

BMC Genomics ; 15: 37, 2014 Jan 18.

Article em En | MEDLINE | ID: mdl-24438450

RESUMO

BACKGROUND: The main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases. To address these issues, several bioinformatic tools have been specifically designed or adapted for metagenomics by improving read assembly and creating more sensitive methods for homology detection. This study compares the performance of different available assemblers and taxonomic annotation software using simulated viral-metagenomic data. RESULTS: We simulated two 454 viral metagenomes using genomes from NCBI's RefSeq database based on the list of actual viruses found in previously published metagenomes. Three different assembly strategies, spanning six assemblers, were tested for performance: overlap-layout-consensus algorithms Newbler, Celera and Minimo; de Bruijn graphs algorithms Velvet and MetaVelvet; and read probabilistic model Genovo. The performance of the assemblies was measured by the length of resulting contigs (using N50), the percentage of reads assembled and the overall accuracy when comparing against corresponding reference genomes. Additionally, the number of chimeras per contig and the lowest common ancestor were estimated in order to assess the effect of assembling on taxonomic and functional annotation. The functional classification of the reads was evaluated by counting the reads that correctly matched the functional data previously reported for the original genomes and calculating the number of over-represented functional categories in chimeric contigs. The sensitivity and specificity of tBLASTx, PhymmBL and the k-mer frequencies were measured by accurate predictions when comparing simulated reads against the NCBI Virus genomes RefSeq database. CONCLUSIONS: Assembling improves functional annotation by increasing accurate assignations and decreasing ambiguous hits between viruses and bacteria. However, the success is limited by the chimeric contigs occurring at all taxonomic levels. The assembler and its parameters should be selected based on the focus of each study. Minimo's non-chimeric contigs and Genovo's long contigs excelled in taxonomy assignation and functional annotation, respectively.tBLASTx stood out as the best approach for taxonomic annotation for virus identification. PhymmBL proved useful in datasets in which no related sequences are present as it uses genomic features that may help identify distant taxa. The k-frequencies underperformed in all viral datasets.

Assuntos

Algoritmos; Biologia Computacional/métodos; Bases de Dados Genéticas; Intestinos/virologia; Metagenômica; Vírus/genética; Bactérias/classificação; Bactérias/genética; Análise por Conglomerados; Biologia Computacional/normas; Simulação por Computador; Mapeamento de Sequências Contíguas; Humanos; Internet; Intestinos/microbiologia; Análise de Componente Principal; Interface Usuário-Computador; Vírus/classificação

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Vírus / Algoritmos / Biologia Computacional / Bases de Dados Genéticas / Metagenômica / Intestinos Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Genomics Assunto da revista: GENETICA Ano de publicação: 2014 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google