Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 30(17): 2423-31, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-24812337

RESUMO

MOTIVATION: Estimation of bacterial community composition from a high-throughput sequenced sample is an important task in metagenomics applications. As the sample sequence data typically harbors reads of variable lengths and different levels of biological and technical noise, accurate statistical analysis of such data is challenging. Currently popular estimation methods are typically time-consuming in a desktop computing environment. RESULTS: Using sparsity enforcing methods from the general sparse signal processing field (such as compressed sensing), we derive a solution to the community composition estimation problem by a simultaneous assignment of all sample reads to a pre-processed reference database. A general statistical model based on kernel density estimation techniques is introduced for the assignment task, and the model solution is obtained using convex optimization tools. Further, we design a greedy algorithm solution for a fast solution. Our approach offers a reasonably fast community composition estimation method, which is shown to be more robust to input data variation than a recently introduced related method. AVAILABILITY AND IMPLEMENTATION: A platform-independent Matlab implementation of the method is freely available at http://www.ee.kth.se/ctsoftware; source code that does not require access to Matlab is currently being tested and will be made available later through the above Web site.


Assuntos
Bactérias/classificação , Metagenômica/métodos , Algoritmos , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Estatísticos , RNA Ribossômico 16S/genética , Análise de Sequência de DNA
2.
PLoS One ; 10(10): e0140644, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26496191

RESUMO

MOTIVATION: Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. RESULTS: There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. AVAILABILITY: An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.


Assuntos
Algoritmos , Bactérias/genética , Metagenômica/métodos , Microbiota/genética , Bactérias/classificação , Análise por Conglomerados , DNA Bacteriano/química , DNA Bacteriano/genética , Fezes/microbiologia , Humanos , Internet , Reação em Cadeia da Polimerase , RNA Ribossômico 16S/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA