Your browser doesn't support javascript.
loading
Petabase-scale sequence alignment catalyses viral discovery.
Edgar, Robert C; Taylor, Brie; Lin, Victor; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lohr, Dan; Novakovsky, Gherman; Buchfink, Benjamin; Al-Shayeb, Basem; Banfield, Jillian F; de la Peña, Marcos; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem.
Afiliación
  • Edgar RC; Independent researcher, Corte Madera, CA, USA.
  • Taylor B; Independent researcher, Vancouver, British Columbia, Canada.
  • Lin V; Independent researcher, Seattle, WA, USA.
  • Altman T; Altman Analytics, San Francisco, CA, USA.
  • Barbera P; Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
  • Meleshko D; Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia.
  • Lohr D; Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY, USA.
  • Novakovsky G; Unaffiliated, Atlanta, GA, USA.
  • Buchfink B; Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada.
  • Al-Shayeb B; Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany.
  • Banfield JF; Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA.
  • de la Peña M; Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA.
  • Korobeynikov A; Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain.
  • Chikhi R; Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia.
  • Babaian A; Department of Statistical Modelling, St Petersburg State University, St Petersburg, Russia.
Nature ; 602(7895): 142-147, 2022 02.
Article en En | MEDLINE | ID: mdl-35082445
ABSTRACT
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Asunto(s)

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Virus ARN / Virología / Alineación de Secuencia / Bases de Datos Genéticas / Nube Computacional / Viroma Límite: Animals / Humans Idioma: En Revista: Nature Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Virus ARN / Virología / Alineación de Secuencia / Bases de Datos Genéticas / Nube Computacional / Viroma Límite: Animals / Humans Idioma: En Revista: Nature Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos