Your browser doesn't support javascript.
loading
Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA.
Lemane, Téo; Lezzoche, Nolan; Lecubin, Julien; Pelletier, Eric; Lescot, Magali; Chikhi, Rayan; Peterlongo, Pierre.
Afiliação
  • Lemane T; Univ. Rennes, Inria, CNRS, IRISA - UMR 6074, Rennes, France. teo.lemane@genoscope.cns.fr.
  • Lezzoche N; Génomique Métabolique, Genoscope, Institut de Biologie François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, Evry, France. teo.lemane@genoscope.cns.fr.
  • Lecubin J; Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, Marseille, France.
  • Pelletier E; SIP, OSU PYTHEAS, Marseille, France.
  • Lescot M; Génomique Métabolique, Genoscope, Institut de Biologie François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, Evry, France.
  • Chikhi R; Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GO-SEE, CNRS, Paris, France.
  • Peterlongo P; Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, Marseille, France.
Nat Comput Sci ; 4(2): 104-109, 2024 Feb.
Article em En | MEDLINE | ID: mdl-38413777
ABSTRACT
Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Água do Mar / Genômica Idioma: En Revista: Nat Comput Sci Ano de publicação: 2024 Tipo de documento: Article País de afiliação: França

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Água do Mar / Genômica Idioma: En Revista: Nat Comput Sci Ano de publicação: 2024 Tipo de documento: Article País de afiliação: França