Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Nature ; 602(7895): 142-147, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35082445

RESUMEN

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.


Asunto(s)
Nube Computacional , Bases de Datos Genéticas , Virus ARN/genética , Virus ARN/aislamiento & purificación , Alineación de Secuencia/métodos , Virología/métodos , Viroma/genética , Animales , Archivos , Bacteriófagos/enzimología , Bacteriófagos/genética , Biodiversidad , Coronavirus/clasificación , Coronavirus/enzimología , Coronavirus/genética , Evolución Molecular , Virus de la Hepatitis Delta/enzimología , Virus de la Hepatitis Delta/genética , Humanos , Modelos Moleculares , Virus ARN/clasificación , Virus ARN/enzimología , ARN Polimerasa Dependiente del ARN/química , ARN Polimerasa Dependiente del ARN/genética , Programas Informáticos
2.
Nat Methods ; 18(4): 366-368, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33828273

RESUMEN

We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Alineación de Secuencia , Algoritmos
3.
Nat Methods ; 12(1): 59-60, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25402007

RESUMEN

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.


Asunto(s)
Metagenómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , Humanos , Microbiota/genética , Sensibilidad y Especificidad , Análisis de Secuencia de ADN
4.
Genome Biol ; 24(1): 168, 2023 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-37461051

RESUMEN

Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe at tree-of-life scale. To overcome this limitation, we introduce pairwise alignment-safety to uncover the amino acid positions robustly shared across all suboptimal solutions. We implement EMERALD, a software library for alignment-safety inference, and apply it to 400k sequences from the SwissProt database.


Asunto(s)
Algoritmos , Programas Informáticos , Animales , Secuencia de Aminoácidos , Alineación de Secuencia , Proteínas/genética , Proteínas/química , Aves
5.
Microbiome ; 7(1): 61, 2019 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-30992083

RESUMEN

BACKGROUND: Short-read sequencing technologies have long been the work-horse of microbiome analysis. Continuing technological advances are making the application of long-read sequencing to metagenomic samples increasingly feasible. RESULTS: We demonstrate that whole bacterial chromosomes can be obtained from an enriched community, by application of MinION sequencing to a sample from an EBPR bioreactor, producing 6 Gb of sequence that assembles into multiple closed bacterial chromosomes. We provide a simple pipeline for processing such data, which includes a new approach to correcting erroneous frame-shifts. CONCLUSIONS: Advances in long-read sequencing technology and corresponding algorithms will allow the routine extraction of whole chromosomes from environmental samples, providing a more detailed picture of individual members of a microbiome.


Asunto(s)
Cromosomas Bacterianos , Metagenómica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Reactores Biológicos/microbiología , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA