Pesquisa | Portal de Pesquisa da BVS Enfermagem

Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.

Medvedev, Aleksandr; Mishra Sharma, Satyarth; Tsatsorin, Evgenii; Nabieva, Elena; Yarotsky, Dmitry.

PLoS One ; 17(8): e0273293, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36044406

RESUMO

Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we make a threefold contribution to this problem. First, we ask if state-of-the-art nonlinear predictive models, such as boosted decision trees, can be more efficient for phenotype prediction than conventional linear models. We find that this is indeed the case if model features include a sufficiently rich set of covariates, but probably not otherwise. Second, we ask if the conventional selection of single nucleotide polymorphisms (SNPs) by genome wide association studies (GWAS) can be replaced by a more efficient procedure, taking into account information in previously selected SNPs. We propose such a procedure, based on a sequential feature importance estimation with decision trees, and show that this approach indeed produced informative SNP sets that are much more compact than when selected with GWAS. Finally, we show that the highest prediction accuracy can ultimately be achieved by ensembling individual linear and nonlinear models. To the best of our knowledge, for some of the phenotypes that we consider (asthma, hypothyroidism), our results are a new state-of-the-art.

Assuntos

Estudo de Associação Genômica Ampla , Genótipo , Fenótipo , Estudo de Associação Genômica Ampla/métodos , Humanos , Modelos Genéticos , Dinâmica não Linear , Polimorfismo de Nucleotídeo Único

Accurate fetal variant calling in the presence of maternal cell contamination.

Nabieva, Elena; Sharma, Satyarth Mishra; Kapushev, Yermek; Garushyants, Sofya K; Fedotova, Anna V; Moskalenko, Viktoria N; Serebrenikova, Tatyana E; Glazyrina, Eugene; Kanivets, Ilya V; Pyankov, Denis V; Neretina, Tatyana V; Logacheva, Maria D; Bazykin, Georgii A; Yarotsky, Dmitry.

Eur J Hum Genet ; 28(11): 1615-1623, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32728107

RESUMO

High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods "learn" the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.

Assuntos

Amostra da Vilosidade Coriônica/métodos , Contaminação por DNA , Testes Genéticos/métodos , Análise de Sequência de DNA/métodos , Adulto , Teorema de Bayes , Amostra da Vilosidade Coriônica/normas , Feminino , Testes Genéticos/normas , Humanos , Aprendizado de Máquina , Mutação , Gravidez , Análise de Sequência de DNA/normas , Razão Sinal-Ruído

Error bounds for approximations with deep ReLU networks.

Yarotsky, Dmitry.

Neural Netw ; 94: 103-114, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-28756334

RESUMO

We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.

Assuntos

Redes Neurais de Computação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA