Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nat Methods ; 10(12): 1196-9, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24141494

RESUMEN

To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome-based methods. An implementation of the method is available at http://www.bork.embl.de/software/mOTU/.


Asunto(s)
Metagenómica , Microbiota , Alineación de Secuencia/métodos , Algoritmos , Calibración , Análisis por Conglomerados , Biología Computacional/métodos , ADN Ribosómico/genética , Ligamiento Genético , Marcadores Genéticos , Genoma , Humanos , Intestinos/microbiología , Filogenia , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN/métodos
2.
Mol Biol Evol ; 31(4): 993-1009, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24473288

RESUMEN

Nucleotide positions in the hypervariable V4 and V9 regions of the small subunit (SSU)-rDNA locus are normally difficult to align and are usually removed before standard phylogenetic analyses. Yet, with next-generation sequencing data, amplicons of these regions are all that are available to answer ecological and evolutionary questions that rely on phylogenetic inferences. With ciliates, we asked how inclusion of the V4 or V9 regions, regardless of alignment quality, affects tree topologies using distinct phylogenetic methods (including PairDist that is introduced here). Results show that the best approach is to place V4 amplicons into an alignment of full-length Sanger SSU-rDNA sequences and to infer the phylogenetic tree with RAxML. A sliding window algorithm as implemented in RAxML shows, though, that not all nucleotide positions in the V4 region are better than V9 at inferring the ciliate tree. With this approach and an ancestral-state reconstruction, we use V4 amplicons from European nearshore sampling sites to infer that rather than being primarily terrestrial and freshwater, colpodean ciliates may have repeatedly transitioned from terrestrial/freshwater to marine environments.


Asunto(s)
Cilióforos/genética , Microbiología del Agua , Teorema de Bayes , ADN Espaciador Ribosómico/genética , Evolución Molecular , Agua Dulce/microbiología , Genes Protozoarios , Especiación Genética , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Genéticos , Filogenia , Subunidades Ribosómicas Pequeñas/genética , Agua de Mar/microbiología , Análisis de Secuencia de ADN
3.
J Mol Evol ; 78(2): 148-62, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24343640

RESUMEN

The internal transcribed spacer region (ITS) of the nuclear rDNA cistron represents the barcoding locus for Fungi. Intragenomic variation of this multicopy gene can interfere with accurate phylogenetic reconstruction of biological entities. We investigated the amount and nature of this variation for the lichenized fungus Cora inversa in the Hygrophoraceae (Basidiomycota: Agaricales), analyzing base call and length variation in ITS1 454 pyrosequencing data of three samples of the target mycobiont, for a total of 16,665 reads obtained from three separate repeats of the same samples under different conditions. Using multiple fixed alignment methods (PaPaRa) and maximum likelihood phylogenetic analysis (RAxML), we assessed phylogenetic relationships of the obtained reads, together with Sanger ITS sequences from the same samples. Phylogenetic analysis showed that all ITS1 reads belonged to a single species, C. inversa. Pyrosequencing data showed 266 insertion sites in addition to the 325 sites expected from Sanger sequences, for a total of 15,654 insertions (0.94 insertions per read). An additional 3,279 substitutions relative to the Sanger sequences were detected in the dataset, out of 5,461,125 bases to be called. Up to 99.3% of the observed indels in the dataset could be interpreted as 454 pyrosequencing errors, approximately 65% corresponding to incorrectly recovered homopolymer segments, and 35% to carry-forward-incomplete-extension errors. Comparison of automated clustering and alignment-based phylogenetic analysis demonstrated that clustering of these reads produced a 35-fold overestimation of biological diversity in the dataset at the 95% similarity threshold level, whereas phylogenetic analysis using a maximum likelihood approach accurately recovered a single biological entity. We conclude that variation detected in 454 pyrosequencing data must be interpreted with great care and that a combination of a sufficiently large number of reads per taxon, a set of Sanger references for the same taxon, and at least two runs under different emulsion PCR and sequencing conditions, are necessary to reliably separate biological variation from 454 sequencing errors. Our study shows that clustering methods are highly sensitive to artifactual sequence variation and inadequate to properly recover biological diversity in a dataset, if sequencing errors are substantial and not removed prior to clustering analysis.


Asunto(s)
Basidiomycota/genética , ADN Espaciador Ribosómico , Genoma Fúngico , Haplotipos , Basidiomycota/clasificación , Biodiversidad , Evolución Molecular , Variación Genética , Mutagénesis Insercional
4.
BMC Bioinformatics ; 13: 196, 2012 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-22876807

RESUMEN

BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the "competing programmer approach" is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms.


Asunto(s)
Filogenia , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos
5.
Bioinformatics ; 27(15): 2068-75, 2011 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-21636595

RESUMEN

MOTIVATION: Likelihood-based methods for placing short read sequences from metagenomic samples into reference phylogenies have been recently introduced. At present, it is unclear how to align those reads with respect to the reference alignment that was deployed to infer the reference phylogeny. Moreover, the adaptability of such alignment methods with respect to the underlying reference alignment strategies/philosophies has not been explored. It has also not been assessed if the reference phylogeny can be deployed in conjunction with the reference alignment to improve alignment accuracy in this context. RESULTS: We assess different strategies for short read alignment and propose a novel phylogeny-aware alignment procedure. Our alignment method can improve the accuracy of subsequent phylogenetic placement of the reads into a reference phylogeny by up to 5.8 times compared with phylogeny-agnostic methods. It can be deployed to align reads to alignments generated by using fundamentally different alignment strategies (e.g. PRANK(+F) versus MUSCLE). AVAILABILITY: http://www.exelixis-lab.org/software.html


Asunto(s)
Filogenia , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , Funciones de Verosimilitud , Estándares de Referencia
6.
Syst Biol ; 60(3): 291-302, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21436105

RESUMEN

We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.


Asunto(s)
Algoritmos , Evolución Molecular , Funciones de Verosimilitud , Filogenia , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Secuencia de Bases , Simulación por Computador , Internet , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos
7.
BMC Genomics ; 11: 461, 2010 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-20687950

RESUMEN

BACKGROUND: Shotgun sequencing of environmental DNA is an essential technique for characterizing uncultivated microbes in situ. However, the taxonomic and functional assignment of the obtained sequence fragments remains a pressing problem. RESULTS: Existing algorithms are largely optimized for speed and coverage; in contrast, we present here a software framework that focuses on a restricted set of informative gene families, using Maximum Likelihood to assign these with the best possible accuracy. This framework ('MLTreeMap'; http://mltreemap.org/) uses raw nucleotide sequences as input, and includes hand-curated, extensible reference information. CONCLUSIONS: We discuss how we validated our pipeline using complete genomes as well as simulated and actual environmental sequences.


Asunto(s)
ADN/análisis , Filogenia , Análisis de Secuencia de ADN/métodos , Diseño de Software , Algoritmos , ADN/clasificación , ADN/genética , Internet , Funciones de Verosimilitud
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA