Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38948697

RESUMO

Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.

2.
Nat Genet ; 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38977852

RESUMO

Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.

3.
bioRxiv ; 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-39005431

RESUMO

Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.

4.
Ann Appl Stat ; 18(1): 858-881, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38784669

RESUMO

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the p-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

5.
Elife ; 132024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38288729

RESUMO

Ancient DNA research in the past decade has revealed that European population structure changed dramatically in the prehistoric period (14,000-3000 years before present, YBP), reflecting the widespread introduction of Neolithic farmer and Bronze Age Steppe ancestries. However, little is known about how population structure changed from the historical period onward (3000 YBP - present). To address this, we collected whole genomes from 204 individuals from Europe and the Mediterranean, many of which are the first historical period genomes from their region (e.g. Armenia and France). We found that most regions show remarkable inter-individual heterogeneity. At least 7% of historical individuals carry ancestry uncommon in the region where they were sampled, some indicating cross-Mediterranean contacts. Despite this high level of mobility, overall population structure across western Eurasia is relatively stable through the historical period up to the present, mirroring geography. We show that, under standard population genetics models with local panmixia, the observed level of dispersal would lead to a collapse of population structure. Persistent population structure thus suggests a lower effective migration rate than indicated by the observed dispersal. We hypothesize that this phenomenon can be explained by extensive transient dispersal arising from drastically improved transportation networks and the Roman Empire's mobilization of people for trade, labor, and military. This work highlights the utility of ancient DNA in elucidating finer scale human population dynamics in recent history.


Assuntos
DNA Antigo , Genoma Humano , Humanos , Europa (Continente) , França , Genética Populacional , Dinâmica Populacional , Migração Humana
6.
bioRxiv ; 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37292653

RESUMO

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be over-looked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

7.
bioRxiv ; 2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37873127

RESUMO

Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we here use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ~ 0.70 - 0.79 for most samples. In addition to recapitulating known associations between histone PTMs and expression patterns, our models predict that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how engineered vs. natural deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold changes among genes in response to the dCas9-p300 system; however, their ability to rank fold changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.

8.
Nat Genet ; 55(11): 1866-1875, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37857933

RESUMO

Most signals in genome-wide association studies (GWAS) of complex traits implicate noncoding genetic variants with putative gene regulatory effects. However, currently identified regulatory variants, notably expression quantitative trait loci (eQTLs), explain only a small fraction of GWAS signals. Here, we show that GWAS and cis-eQTL hits are systematically different: eQTLs cluster strongly near transcription start sites, whereas GWAS hits do not. Genes near GWAS hits are enriched in key functional annotations, are under strong selective constraint and have complex regulatory landscapes across different tissue/cell types, whereas genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variant, and support the use of complementary functional approaches alongside the next generation of eQTL studies.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Regulação da Expressão Gênica/genética , Locos de Características Quantitativas/genética , Expressão Gênica , Polimorfismo de Nucleotídeo Único/genética
9.
Genetics ; 225(3)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37724741

RESUMO

The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.


Assuntos
Bancos de Espécimes Biológicos , Genética Populacional , Frequência do Gene , Deriva Genética , Probabilidade , Modelos Genéticos , Seleção Genética
10.
Nat Ecol Evol ; 7(9): 1515-1524, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37592021

RESUMO

The Iron Age was a dynamic period in central Mediterranean history, with the expansion of Greek and Phoenician colonies and the growth of Carthage into the dominant maritime power of the Mediterranean. These events were facilitated by the ease of long-distance travel following major advances in seafaring. We know from the archaeological record that trade goods and materials were moving across great distances in unprecedented quantities, but it is unclear how these patterns correlate with human mobility. Here, to investigate population mobility and interactions directly, we sequenced the genomes of 30 ancient individuals from coastal cities around the central Mediterranean, in Tunisia, Sardinia and central Italy. We observe a meaningful contribution of autochthonous populations, as well as highly heterogeneous ancestry including many individuals with non-local ancestries from other parts of the Mediterranean region. These results highlight both the role of local populations and the extreme interconnectedness of populations in the Iron Age Mediterranean. By studying these trans-Mediterranean neighbours together, we explore the complex interplay between local continuity and mobility that shaped the Iron Age societies of the central Mediterranean.


Assuntos
DNA Antigo , Migração Humana , Região do Mediterrâneo , Arqueologia , Migração Humana/história , Humanos , Análise de Componente Principal , Genética Humana , DNA Antigo/análise , Análise de Sequência de DNA , Sepultamento , Antropologia , História Antiga
11.
Res Sq ; 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37398424

RESUMO

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

12.
Elife ; 122023 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-37342968

RESUMO

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.


Assuntos
Genoma , Software , Simulação por Computador , Genética Populacional , Genômica
13.
bioRxiv ; 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37293115

RESUMO

The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.

14.
Genome Biol ; 24(1): 79, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-37072822

RESUMO

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.


Assuntos
Algoritmos , Epigenômica , Genômica/métodos
15.
Cell ; 186(5): 923-939.e14, 2023 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-36868214

RESUMO

We conduct high coverage (>30×) whole-genome sequencing of 180 individuals from 12 indigenous African populations. We identify millions of unreported variants, many predicted to be functionally important. We observe that the ancestors of southern African San and central African rainforest hunter-gatherers (RHG) diverged from other populations >200 kya and maintained a large effective population size. We observe evidence for ancient population structure in Africa and for multiple introgression events from "ghost" populations with highly diverged genetic lineages. Although currently geographically isolated, we observe evidence for gene flow between eastern and southern Khoesan-speaking hunter-gatherer populations lasting until ∼12 kya. We identify signatures of local adaptation for traits related to skin color, immune response, height, and metabolic processes. We identify a positively selected variant in the lightly pigmented San that influences pigmentation in vitro by regulating the enhancer activity and gene expression of PDPK1.


Assuntos
Aclimatação , Pigmentação da Pele , Humanos , Sequenciamento Completo do Genoma , Densidade Demográfica , África , Proteínas Quinases Dependentes de 3-Fosfoinositídeo
16.
bioRxiv ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-36778251

RESUMO

With hundreds of copies of ribosomal DNA (rDNA) it is unknown whether they possess sequence variations that ultimately form different types of ribosomes. Here, we developed an algorithm for variant-calling between paralog genes (termed RGA) and compared rDNA variations with rRNA variations from long-read sequencing of translating ribosomes (RIBO-RT). Our analyses identified dozens of highly abundant rRNA variants, largely indels, that are incorporated into translationally active ribosomes and assemble into distinct ribosome subtypes encoded on different chromosomes. We developed an in-situ rRNA sequencing method (SWITCH-seq) revealing that variants are co-expressed within individual cells and found that they possess different structures. Lastly, we observed tissue-specific rRNA-subtype expression and linked specific rRNA variants to cancer. This study therefore reveals the variation landscape of translating ribosomes within human cells.

17.
Science ; 377(6613): 1431-1435, 2022 09 23.
Artigo em Inglês | MEDLINE | ID: mdl-36137047

RESUMO

Anthropogenic habitat loss and climate change are reducing species' geographic ranges, increasing extinction risk and losses of species' genetic diversity. Although preserving genetic diversity is key to maintaining species' adaptability, we lack predictive tools and global estimates of genetic diversity loss across ecosystems. We introduce a mathematical framework that bridges biodiversity theory and population genetics to understand the loss of naturally occurring DNA mutations with decreasing habitat. By analyzing genomic variation of 10,095 georeferenced individuals from 20 plant and animal species, we show that genome-wide diversity follows a mutations-area relationship power law with geographic area, which can predict genetic diversity loss from local population extinctions. We estimate that more than 10% of genetic diversity may already be lost for many threatened and nonthreatened species, surpassing the United Nations' post-2020 targets for genetic preservation.


Assuntos
Efeitos Antropogênicos , Mudança Climática , Extinção Biológica , Variação Genética , Animais , Biodiversidade
18.
Am J Hum Genet ; 109(7): 1286-1297, 2022 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-35716666

RESUMO

Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , LDL-Colesterol , Expressão Gênica , Humanos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
19.
Nat Genet ; 53(6): 830-839, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33821002

RESUMO

Evidence from model organisms and clinical genetics suggests coordination between the developing brain and face, but the role of this link in common genetic variation remains unknown. We performed a multivariate genome-wide association study of cortical surface morphology in 19,644 individuals of European ancestry, identifying 472 genomic loci influencing brain shape, of which 76 are also linked to face shape. Shared loci include transcription factors involved in craniofacial development, as well as members of signaling pathways implicated in brain-face cross-talk. Brain shape heritability is equivalently enriched near regulatory regions active in either forebrain organoids or facial progenitors. However, we do not detect significant overlap between shared brain-face genome-wide association study signals and variants affecting behavioral-cognitive traits. These results suggest that early in embryogenesis, the face and brain mutually shape each other through both structural effects and paracrine signaling, but this interplay may not impact later brain development associated with cognitive function.


Assuntos
Encéfalo/anatomia & histologia , Face/anatomia & histologia , Padrões de Herança/genética , Adulto , Idoso , Comportamento , Cognição , Feminino , Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Imageamento por Ressonância Magnética , Masculino , Transtornos Mentais/genética , Pessoa de Meia-Idade , Análise Multivariada
20.
Sci Adv ; 5(10): eaaw9206, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31681842

RESUMO

Fine-scale rates of meiotic recombination vary by orders of magnitude across the genome and differ between species and even populations. Studying cross-population differences has been stymied by the confounding effects of demographic history. To address this problem, we developed a demography-aware method to infer fine-scale recombination rates and applied it to 26 diverse human populations, inferring population-specific recombination maps. These maps recapitulate many aspects of the history of these populations including signatures of the trans-Atlantic slave trade and the Iberian colonization of the Americas. We also investigated modulators of the local recombination rate, finding further evidence that Polycomb group proteins and the trimethylation of H3K27 elevate recombination rates. Further differences in the recombination landscape across the genome and between populations are driven by variation in the gene that encodes the DNA binding protein PRDM9, and we quantify the weak effect of meiotic drive acting to remove its binding sites.


Assuntos
Mapeamento Cromossômico , Genética Populacional , Recombinação Genética , Sítios de Ligação , Cromatina/metabolismo , Simulação por Computador , Demografia , Conversão Gênica , Histona-Lisina N-Metiltransferase/genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA