Your browser doesn't support javascript.
loading
SICaRiO: short indel call filtering with boosting.
Bhuyan, Md Shariful Islam; Pe'er, Itsik; Rahman, M Sohel.
Afiliação
  • Bhuyan MSI; Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.
  • Pe'er I; Department of Computer Science, Fu Foundation School of Engineering, and the Chair at the Center for Health Analytics, Data Science Institute, Columbia University, New York, USA.
  • Rahman MS; Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.
Brief Bioinform ; 22(4)2021 07 20.
Article em En | MEDLINE | ID: mdl-33003198
ABSTRACT
Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Bases de Dados de Ácidos Nucleicos / Mutação INDEL / Sequenciamento de Nucleotídeos em Larga Escala / Aprendizado de Máquina Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Bangladesh

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Bases de Dados de Ácidos Nucleicos / Mutação INDEL / Sequenciamento de Nucleotídeos em Larga Escala / Aprendizado de Máquina Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Bangladesh