Your browser doesn't support javascript.
loading
MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants.
Srikakulam, Sanjay K; Keller, Sebastian; Dabbaghie, Fawaz; Bals, Robert; Kalinina, Olga V.
Afiliação
  • Srikakulam SK; Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), 66123 Saarbrücken, Germany.
  • Keller S; Graduate School of Computer Science, Saarland University, 66123 Saarbrücken, Germany.
  • Dabbaghie F; Interdisciplinary Graduate School of Natural Product Research, Saarland University, 66123 Saarbrücken, Germany.
  • Bals R; Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), 66123 Saarbrücken, Germany.
  • Kalinina OV; Graduate School of Computer Science, Saarland University, 66123 Saarbrücken, Germany.
Bioinformatics ; 39(3)2023 03 01.
Article em En | MEDLINE | ID: mdl-36825843
MOTIVATION: Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance. RESULTS: We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build indexes of amino acid sequences and query them with both amino acid and nucleotide sequences, thus bringing sequence comparison to the biologically relevant protein level. MetaProFi implements additional efficient engineering solutions, such as a shared memory system, chunked data storage and efficient compression. In addition to its conceptual novelty, MetaProFi demonstrates state-of-the-art performance and excellent memory consumption-to-speed ratio when applied to various large datasets. AVAILABILITY AND IMPLEMENTATION: Source code in Python is available at https://github.com/kalininalab/metaprofi.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Compressão de Dados Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Compressão de Dados Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article