Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Am J Hum Genet ; 109(7): 1286-1297, 2022 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-35716666

RESUMEN

Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , LDL-Colesterol , Expresión Génica , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
2.
Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-34822764

RESUMEN

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.


Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Teorema de Bayes , Femenino , Humanos , Masculino , Fenotipo
3.
PLoS Biol ; 15(9): e2002458, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28873088

RESUMEN

A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10-6 for fathers and P~2.0 × 10-3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans.


Asunto(s)
Evolución Molecular , Aptitud Genética , Genética de Población/métodos , Modelos Genéticos , Selección Genética , Estudios de Cohortes , Femenino , Frecuencia de los Genes , Variación Genética , Humanos , Masculino
4.
Proc Natl Acad Sci U S A ; 114(21): 5455-5460, 2017 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-28490503

RESUMEN

SNARE proteins are the core of the cell's fusion machinery and mediate virtually all known intracellular membrane fusion reactions on which exocytosis and trafficking depend. Fusion is catalyzed when vesicle-associated v-SNAREs form trans-SNARE complexes ("SNAREpins") with target membrane-associated t-SNAREs, a zippering-like process releasing ∼65 kT per SNAREpin. Fusion requires several SNAREpins, but how they cooperate is unknown and reports of the number required vary widely. To capture the collective behavior on the long timescales of fusion, we developed a highly coarse-grained model that retains key biophysical SNARE properties such as the zippering energy landscape and the surface charge distribution. In simulations the ∼65-kT zippering energy was almost entirely dissipated, with fully assembled SNARE motifs but uncomplexed linker domains. The SNAREpins self-organized into a circular cluster at the fusion site, driven by entropic forces that originate in steric-electrostatic interactions among SNAREpins and membranes. Cooperative entropic forces expanded the cluster and pulled the membranes together at the center point with high force. We find that there is no critical number of SNAREs required for fusion, but instead the fusion rate increases rapidly with the number of SNAREpins due to increasing entropic forces. We hypothesize that this principle finds physiological use to boost fusion rates to meet the demanding timescales of neurotransmission, exploiting the large number of v-SNAREs available in synaptic vesicles. Once in an unfettered cluster, we estimate ≥15 SNAREpins are required for fusion within the ∼1-ms timescale of neurotransmitter release.


Asunto(s)
Exocitosis , Fusión de Membrana , Modelos Biológicos , Proteínas SNARE/metabolismo , Entropía , Método de Montecarlo
5.
bioRxiv ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-37292653

RESUMEN

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be over-looked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

6.
Nat Genet ; 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38977852

RESUMEN

Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.

7.
bioRxiv ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38948697

RESUMEN

Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.

8.
Res Sq ; 2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37398424

RESUMEN

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

9.
Genetics ; 225(3)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37724741

RESUMEN

The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.


Asunto(s)
Bancos de Muestras Biológicas , Genética de Población , Frecuencia de los Genes , Flujo Genético , Probabilidad , Modelos Genéticos , Selección Genética
10.
bioRxiv ; 2023 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-37293115

RESUMEN

The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.

11.
Nat Genet ; 55(11): 1866-1875, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37857933

RESUMEN

Most signals in genome-wide association studies (GWAS) of complex traits implicate noncoding genetic variants with putative gene regulatory effects. However, currently identified regulatory variants, notably expression quantitative trait loci (eQTLs), explain only a small fraction of GWAS signals. Here, we show that GWAS and cis-eQTL hits are systematically different: eQTLs cluster strongly near transcription start sites, whereas GWAS hits do not. Genes near GWAS hits are enriched in key functional annotations, are under strong selective constraint and have complex regulatory landscapes across different tissue/cell types, whereas genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variant, and support the use of complementary functional approaches alongside the next generation of eQTL studies.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Regulación de la Expresión Génica/genética , Sitios de Carácter Cuantitativo/genética , Expresión Génica , Polimorfismo de Nucleótido Simple/genética
12.
Elife ; 92020 01 30.
Artículo en Inglés | MEDLINE | ID: mdl-31999256

RESUMEN

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.


Complex diseases like cancer and heart disease are caused by the interplay of many factors: the variants of genes we inherit, the lifestyles we lead and the environments we inhabit, plus the interaction of all these factors. In fact, almost every trait, even how many years we will spend studying, is influenced both by our environment and our genes. To identify some of the genetic factors at play, scientists perform analyses known as genome-wide association studies, or GWAS for short. In these studies, the genomes from many different people are scanned to look for genetic differences associated with differences in traits. By summing up all the small genetic differences, so-called "polygenic scores" can be calculated. When there is a large genetic component to a trait, polygenic scores can be useful predictive tools. But there is a catch: polygenic scores make less accurate predictions for individuals of a different ancestry than those involved in the GWAS, which limits the use of these tools around the world. Mostafavi, Harpak et al. set out to understand if there are other factors in addition to ancestry that could influence the performance of polygenic scores. Using data from the UK Biobank, an international health resource that pairs genomic data and clinical information, Mostafavi, Harpak et al. examined polygenic scores among individuals that share a single, common ancestry. These polygenic scores were used to predict three traits (blood pressure, body mass index and educational attainment) in individuals and the predictions were then compared to the actual trait values to see how accurate they were. The analysis revealed that even within a group of people with similar ancestry, the accuracy of polygenic scores can vary, depending on characteristics such as the sex, age or socioeconomic status of the individuals. This analysis emphasises how variable GWAS and their predictive value can be even within seemingly similar population groups. It further highlights both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use in medical and social sciences.


Asunto(s)
Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Adulto , Factores de Edad , Anciano , Femenino , Frecuencia de los Genes/genética , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética , Factores Sexuales , Factores Socioeconómicos , Reino Unido
13.
Nat Genet ; 51(5): 772-776, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30962618

RESUMEN

In numerous applications, from working with animal models to mapping the genetic basis of human disease susceptibility, knowing whether a single disrupting mutation in a gene is likely to be deleterious is useful. With this goal in mind, a number of measures have been developed to identify genes in which protein-truncating variants (PTVs), or other types of mutations, are absent or kept at very low frequency in large population samples-genes that appear 'intolerant' to mutation. One measure in particular, the probability of being loss-of-function intolerant (pLI), has been widely adopted. This measure was designed to classify genes into three categories, null, recessive and haploinsufficient, on the basis of the contrast between observed and expected numbers of PTVs. Such population-genetic approaches can be useful in many applications. As we clarify, however, they reflect the strength of selection acting on heterozygotes and not dominance or haploinsufficiency.


Asunto(s)
Mutación , Animales , Frecuencia de los Genes , Genes Recesivos , Flujo Genético , Genética de Población , Haploinsuficiencia , Heterocigoto , Humanos , Mutación con Pérdida de Función , Modelos Genéticos , Selección Genética
14.
Elife ; 82019 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-30895923

RESUMEN

Several recent papers have reported strong signals of selection on European polygenic height scores. These analyses used height effect estimates from the GIANT consortium and replication studies. Here, we describe a new analysis based on the the UK Biobank (UKB), a large, independent dataset. We find that the signals of selection using UKB effect estimates are strongly attenuated or absent. We also provide evidence that previous analyses were confounded by population stratification. Therefore, the conclusion of strong polygenic adaptation now lacks support. Moreover, these discrepancies highlight (1) that methods for correcting for population stratification in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of differences in polygenic scores between populations should be treated with caution until these issues are better understood. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).


Asunto(s)
Adaptación Biológica , Estatura , Herencia Multifactorial , Selección Genética , Bioestadística , Bases de Datos Factuales , Europa (Continente) , Humanos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda