Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning.

Tran, Linh N; Sun, Connie K; Struck, Travis J; Sajan, Mathews; Gutenkunst, Ryan N.

Mol Biol Evol ; 41(5)2024 May 03.

Artigo em Inglês | MEDLINE | ID: mdl-38636507

RESUMO

Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.

Assuntos

Frequência do Gene , Modelos Genéticos , Aprendizado de Máquina Supervisionado , Genética Populacional/métodos , Redes Neurais de Computação , Humanos

2.

Computationally efficient demographic history inference from allele frequencies with supervised machine learning.

Tran, Linh N; Sun, Connie K; Struck, Travis J; Sajan, Mathews; Gutenkunst, Ryan N.

bioRxiv ; 2024 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-38405827

RESUMO

Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we developed donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future input data AFS. We demonstrated that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.

3.

dadi-cli: Automated and distributed population genetic model inference from allele frequency spectra.

Huang, Xin; Struck, Travis J; Davey, Sean W; Gutenkunst, Ryan N.

bioRxiv ; 2023 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-37398279

RESUMO

Summary: dadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementation: dadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available at https://github.com/xin-huang/dadi-cli . dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2 https://cacao.jetstream-cloud.org/ .

4.

Demes: a standard format for demographic models.

Gower, Graham; Ragsdale, Aaron P; Bisschop, Gertjan; Gutenkunst, Ryan N; Hartfield, Matthew; Noskova, Ekaterina; Schiffels, Stephan; Struck, Travis J; Kelleher, Jerome; Thornton, Kevin R.

Genetics ; 222(3)2022 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-36173327

RESUMO

Understanding the demographic history of populations is a key goal in population genetics, and with improving methods and data, ever more complex models are being proposed and tested. Demographic models of current interest typically consist of a set of discrete populations, their sizes and growth rates, and continuous and pulse migrations between those populations over a number of epochs, which can require dozens of parameters to fully describe. There is currently no standard format to define such models, significantly hampering progress in the field. In particular, the important task of translating the model descriptions in published work into input suitable for population genetic simulators is labor intensive and error prone. We propose the Demes data model and file format, built on widely used technologies, to alleviate these issues. Demes provide a well-defined and unambiguous model of populations and their properties that is straightforward to implement in software, and a text file format that is designed for simplicity and clarity. We provide thoroughly tested implementations of Demes parsers in multiple languages including Python and C, and showcase initial support in several simulators and inference methods. An introduction to the file format and a detailed specification are available at https://popsim-consortium.github.io/demes-spec-docs/.

Assuntos

Genética Populacional , Software , Demografia

5.

The genomic origins of the world's first farmers.

Marchi, Nina; Winkelbach, Laura; Schulz, Ilektra; Brami, Maxime; Hofmanová, Zuzana; Blöcher, Jens; Reyna-Blanco, Carlos S; Diekmann, Yoan; Thiéry, Alexandre; Kapopoulou, Adamandia; Link, Vivian; Piuz, Valérie; Kreutzer, Susanne; Figarska, Sylwia M; Ganiatsou, Elissavet; Pukaj, Albert; Struck, Travis J; Gutenkunst, Ryan N; Karul, Necmi; Gerritsen, Fokke; Pechtl, Joachim; Peters, Joris; Zeeb-Lanz, Andrea; Lenneis, Eva; Teschler-Nicola, Maria; Triantaphyllou, Sevasti; Stefanovic, Sofija; Papageorgopoulou, Christina; Wegmann, Daniel; Burger, Joachim; Excoffier, Laurent.

Cell ; 185(11): 1842-1859.e18, 2022 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-35561686

RESUMO

The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.

Assuntos

Fazendeiros , Genoma , Agricultura , DNA Mitocondrial/genética , Europa (Continente) , Deriva Genética , Genômica , História Antiga , Migração Humana , Humanos

6.

Inferring Genome-Wide Correlations of Mutation Fitness Effects between Populations.

Huang, Xin; Fortier, Alyssa Lyn; Coffman, Alec J; Struck, Travis J; Irby, Megan N; James, Jennifer E; León-Burguete, José E; Ragsdale, Aaron P; Gutenkunst, Ryan N.

Mol Biol Evol ; 38(10): 4588-4602, 2021 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-34043790

RESUMO

The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.

Assuntos

Drosophila melanogaster , Aptidão Genética , Animais , Drosophila melanogaster/genética , Frequência do Gene , Genoma , Modelos Genéticos , Mutação

7.

A community-maintained standard library of population genetic models.

Adrion, Jeffrey R; Cole, Christopher B; Dukler, Noah; Galloway, Jared G; Gladstein, Ariella L; Gower, Graham; Kyriazis, Christopher C; Ragsdale, Aaron P; Tsambos, Georgia; Baumdicker, Franz; Carlson, Jedidiah; Cartwright, Reed A; Durvasula, Arun; Gronau, Ilan; Kim, Bernard Y; McKenzie, Patrick; Messer, Philipp W; Noskova, Ekaterina; Ortega-Del Vecchyo, Diego; Racimo, Fernando; Struck, Travis J; Gravel, Simon; Gutenkunst, Ryan N; Lohmueller, Kirk E; Ralph, Peter L; Schrider, Daniel R; Siepel, Adam; Kelleher, Jerome; Kern, Andrew D.

Elife ; 92020 06 23.

Artigo em Inglês | MEDLINE | ID: mdl-32573438

RESUMO

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.

Assuntos

Genética Populacional , Biblioteca Genômica , Modelos Genéticos , Animais , Arabidopsis/genética , Cães/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Genética Populacional/métodos , Genética Populacional/organização & administração , Genoma/genética , Genoma Humano/genética , Humanos , Pongo abelii/genética

8.

The impact of genome-wide association studies on biomedical research publications.

Struck, Travis J; Mannakee, Brian K; Gutenkunst, Ryan N.

Hum Genomics ; 12(1): 38, 2018 08 13.

Artigo em Inglês | MEDLINE | ID: mdl-30103832

RESUMO

The past decade has seen major investment in genome-wide association studies (GWAS). Among the many goals of GWAS, a major one is to identify and motivate research on novel genes involved in complex human disease. To assess whether this goal is being met, we quantified the effect of GWAS on the overall distribution of biomedical research publications and on the subsequent publication history of genes newly associated with complex disease. We found that the historical skew of publications toward genes involved in Mendelian disease has not changed since the advent of GWAS. Genes newly implicated by GWAS in complex disease do experience additional publications compared to control genes, and they are more likely to become exceptionally studied. But the magnitude of both effects has declined over the past decade. Our results suggest that reforms to encourage follow-up studies may be needed for GWAS to most successfully guide biomedical research toward the molecular mechanisms underlying complex human disease.

Assuntos

Pesquisa Biomédica , Doenças Genéticas Inatas/genética , Estudo de Associação Genômica Ampla , Regulação da Expressão Gênica/genética , Humanos , Polimorfismo de Nucleotídeo Único , Publicações

9.

Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations.

Ragsdale, Aaron P; Coffman, Alec J; Hsieh, PingHsun; Struck, Travis J; Gutenkunst, Ryan N.

Genetics ; 203(1): 513-23, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-27029732

RESUMO

The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.

Assuntos

Frequência do Gene , Aptidão Genética , Mutação , Animais , Códon/genética , Drosophila melanogaster/genética , Evolução Molecular , Genoma de Inseto , Modelos Genéticos

10.

Testing whether metazoan tyrosine loss was driven by selection against promiscuous phosphorylation.

Pandya, Siddharth; Struck, Travis J; Mannakee, Brian K; Paniscus, Mary; Gutenkunst, Ryan N.

Mol Biol Evol ; 32(1): 144-52, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25312910

RESUMO

Protein tyrosine phosphorylation is a key regulatory modification in metazoans, and the corresponding kinase enzymes have diversified dramatically. This diversification is correlated with a genome-wide reduction in protein tyrosine content, and it was recently suggested that this reduction was driven by selection to avoid promiscuous phosphorylation that might be deleterious. We tested three predictions of this intriguing hypothesis. 1) Selection should be stronger on residues that are more likely to be phosphorylated due to local solvent accessibility or structural disorder. 2) Selection should be stronger on proteins that are more likely to be promiscuously phosphorylated because they are abundant. We tested these predictions by comparing distributions of tyrosine within and among human and yeast orthologous proteins. 3) Selection should be stronger against mutations that create tyrosine versus remove tyrosine. We tested this prediction using human population genomic variation data. We found that all three predicted effects are modest for tyrosine when compared with the other amino acids, suggesting that selection against deleterious phosphorylation was not dominant in driving metazoan tyrosine loss.

Assuntos

Evolução Molecular , Proteínas Quinases/metabolismo , Proteínas/química , Proteínas/genética , Tirosina/genética , Tirosina/metabolismo , Leveduras/metabolismo , Animais , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Frequência do Gene , Genoma Fúngico , Genoma Humano , Humanos , Mutação , Fosforilação , Seleção Genética , Leveduras/genética

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA