Search | VHL Search Portal

Petabase-scale sequence alignment catalyses viral discovery.

Edgar, Robert C; Taylor, Brie; Lin, Victor; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lohr, Dan; Novakovsky, Gherman; Buchfink, Benjamin; Al-Shayeb, Basem; Banfield, Jillian F; de la Peña, Marcos; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem.

Nature ; 602(7895): 142-147, 2022 02.

Article in English | MEDLINE | ID: mdl-35082445

ABSTRACT

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

Subject(s)

Cloud Computing , Databases, Genetic , RNA Viruses/genetics , RNA Viruses/isolation & purification , Sequence Alignment/methods , Virology/methods , Virome/genetics , Animals , Archives , Bacteriophages/enzymology , Bacteriophages/genetics , Biodiversity , Coronavirus/classification , Coronavirus/enzymology , Coronavirus/genetics , Evolution, Molecular , Hepatitis Delta Virus/enzymology , Hepatitis Delta Virus/genetics , Humans , Models, Molecular , RNA Viruses/classification , RNA Viruses/enzymology , RNA-Dependent RNA Polymerase/chemistry , RNA-Dependent RNA Polymerase/genetics , Software

Sensitive protein alignments at tree-of-life scale using DIAMOND.

Buchfink, Benjamin; Reuter, Klaus; Drost, Hajk-Georg.

Nat Methods ; 18(4): 366-368, 2021 04.

Article in English | MEDLINE | ID: mdl-33828273

ABSTRACT

We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.

Subject(s)

Computational Biology/methods , Proteins/chemistry , Sequence Alignment , Algorithms

Fast and sensitive protein alignment using DIAMOND.

Buchfink, Benjamin; Xie, Chao; Huson, Daniel H.

Nat Methods ; 12(1): 59-60, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25402007

ABSTRACT

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

Subject(s)

Metagenomics/methods , Sequence Alignment/methods , Software , Algorithms , Base Sequence , Humans , Microbiota/genetics , Sensitivity and Specificity , Sequence Analysis, DNA

Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD.

Grigorjew, Andreas; Gynter, Artur; Dias, Fernando H C; Buchfink, Benjamin; Drost, Hajk-Georg; Tomescu, Alexandru I.

Genome Biol ; 24(1): 168, 2023 07 17.

Article in English | MEDLINE | ID: mdl-37461051

ABSTRACT

Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe at tree-of-life scale. To overcome this limitation, we introduce pairwise alignment-safety to uncover the amino acid positions robustly shared across all suboptimal solutions. We implement EMERALD, a software library for alignment-safety inference, and apply it to 400k sequences from the SwissProt database.

Subject(s)

Algorithms , Software , Animals , Amino Acid Sequence , Sequence Alignment , Proteins/genetics , Proteins/chemistry , Birds

Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data.

Arumugam, Krithika; Bagci, Caner; Bessarab, Irina; Beier, Sina; Buchfink, Benjamin; Górska, Anna; Qiu, Guanglei; Huson, Daniel H; Williams, Rohan B H.

Microbiome ; 7(1): 61, 2019 04 16.

Article in English | MEDLINE | ID: mdl-30992083

ABSTRACT

BACKGROUND: Short-read sequencing technologies have long been the work-horse of microbiome analysis. Continuing technological advances are making the application of long-read sequencing to metagenomic samples increasingly feasible. RESULTS: We demonstrate that whole bacterial chromosomes can be obtained from an enriched community, by application of MinION sequencing to a sample from an EBPR bioreactor, producing 6 Gb of sequence that assembles into multiple closed bacterial chromosomes. We provide a simple pipeline for processing such data, which includes a new approach to correcting erroneous frame-shifts. CONCLUSIONS: Advances in long-read sequencing technology and corresponding algorithms will allow the routine extraction of whole chromosomes from environmental samples, providing a more detailed picture of individual members of a microbiome.

Subject(s)

Chromosomes, Bacterial , Metagenomics/methods , Sequence Analysis, DNA/methods , Algorithms , Bioreactors/microbiology , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL