Pesquisa | BVS Integralidade em Saúde

SICaRiO: short indel call filtering with boosting.

Bhuyan, Md Shariful Islam; Pe'er, Itsik; Rahman, M Sohel.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33003198

RESUMO

Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives.

Assuntos

Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Aprendizado de Máquina , Software

A protein-dependent side-chain rotamer library.

Bhuyan, Md Shariful Islam; Gao, Xin.

BMC Bioinformatics ; 12 Suppl 14: S10, 2011 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-22373394

RESUMO

BACKGROUND: Protein side-chain packing problem has remained one of the key open problems in bioinformatics. The three main components of protein side-chain prediction methods are a rotamer library, an energy function and a search algorithm. Rotamer libraries summarize the existing knowledge of the experimentally determined structures quantitatively. Depending on how much contextual information is encoded, there are backbone-independent rotamer libraries and backbone-dependent rotamer libraries. Backbone-independent libraries only encode sequential information, whereas backbone-dependent libraries encode both sequential and locally structural information. However, side-chain conformations are determined by spatially local information, rather than sequentially local information. Since in the side-chain prediction problem, the backbone structure is given, spatially local information should ideally be encoded into the rotamer libraries. METHODS: In this paper, we propose a new type of backbone-dependent rotamer library, which encodes structural information of all the spatially neighboring residues. We call it protein-dependent rotamer libraries. Given any rotamer library and a protein backbone structure, we first model the protein structure as a Markov random field. Then the marginal distributions are estimated by the inference algorithms, without doing global optimization or search. The rotamers from the given library are then re-ranked and associated with the updated probabilities. RESULTS: Experimental results demonstrate that the proposed protein-dependent libraries significantly outperform the widely used backbone-dependent libraries in terms of the side-chain prediction accuracy and the rotamer ranking ability. Furthermore, without global optimization/search, the side-chain prediction power of the protein-dependent library is still comparable to the global-search-based side-chain prediction methods.

Assuntos

Biologia Computacional/métodos , Proteínas/química , Algoritmos , Modelos Moleculares , Biblioteca de Peptídeos , Probabilidade , Conformação Proteica

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa