Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Nat Genet ; 55(6): 952-963, 2023 06.
Article in English | MEDLINE | ID: mdl-37231098

ABSTRACT

We explored ancestry-related differences in the genetic architecture of whole-blood gene expression using whole-genome and RNA sequencing data from 2,733 African Americans, Puerto Ricans and Mexican Americans. We found that heritability of gene expression significantly increased with greater proportions of African genetic ancestry and decreased with higher proportions of Indigenous American ancestry, reflecting the relationship between heterozygosity and genetic variance. Among heritable protein-coding genes, the prevalence of ancestry-specific expression quantitative trait loci (anc-eQTLs) was 30% in African ancestry and 8% for Indigenous American ancestry segments. Most anc-eQTLs (89%) were driven by population differences in allele frequency. Transcriptome-wide association analyses of multi-ancestry summary statistics for 28 traits identified 79% more gene-trait associations using transcriptome prediction models trained in our admixed population than models trained using data from the Genotype-Tissue Expression project. Our study highlights the importance of measuring gene expression across large and ancestrally diverse populations for enabling new discoveries and reducing disparities.


Subject(s)
Black or African American , Hispanic or Latino , Mexican Americans , Humans , Black or African American/genetics , Genome-Wide Association Study , Hispanic or Latino/genetics , Mexican Americans/genetics , Phenotype , Polymorphism, Single Nucleotide , Transcriptome
2.
Science ; 378(6621): 754-761, 2022 11 18.
Article in English | MEDLINE | ID: mdl-36395242

ABSTRACT

The observation of genetic correlations between disparate human traits has been interpreted as evidence of widespread pleiotropy. Here, we introduce cross-trait assortative mating (xAM) as an alternative explanation. We observe that xAM affects many phenotypes and that phenotypic cross-mate correlation estimates are strongly associated with genetic correlation estimates (R2=74%). We demonstrate that existing xAM plausibly accounts for substantial fractions of genetic correlation estimates and that previously reported genetic correlation estimates between some pairs of psychiatric disorders are congruent with xAM alone. Finally, we provide evidence for a history of xAM at the genetic level using cross-trait even/odd chromosome polygenic score correlations. Together, our results demonstrate that previous reports have likely overestimated the true genetic similarity between many phenotypes.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Cell Communication , Phenotype
4.
Nat Commun ; 13(1): 1632, 2022 03 28.
Article in English | MEDLINE | ID: mdl-35347136

ABSTRACT

To identify genetic determinants of airway dysfunction, we performed a transcriptome-wide association study for asthma by combining RNA-seq data from the nasal airway epithelium of 681 children, with UK Biobank genetic association data. Our airway analysis identified 95 asthma genes, 58 of which were not identified by transcriptome-wide association analyses using other asthma-relevant tissues. Among these genes were MUC5AC, an airway mucin, and FOXA3, a transcriptional driver of mucus metaplasia. Muco-ciliary epithelial cultures from genotyped donors revealed that the MUC5AC risk variant increases MUC5AC protein secretion and mucus secretory cell frequency. Airway transcriptome-wide association analyses for mucus production and chronic cough also identified MUC5AC. These cis-expression variants were associated with trans effects on expression; the MUC5AC variant was associated with upregulation of non-inflammatory mucus secretory network genes, while the FOXA3 variant was associated with upregulation of type-2 inflammation-induced mucus-metaplasia pathway genes. Our results reveal genetic mechanisms of airway mucus pathobiology.


Subject(s)
Asthma , Transcriptome , Asthma/genetics , Asthma/metabolism , Child , Epithelium/metabolism , Humans , Metaplasia/metabolism , Mucin 5AC/genetics , Mucin 5AC/metabolism , Mucus/metabolism
5.
Front Genet ; 12: 673167, 2021.
Article in English | MEDLINE | ID: mdl-34108994

ABSTRACT

Genome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population, p = 3.78e-6, 6.19e-101, ∼0 for FST = 0.2, 0.5, 0.8, respectively). While some of these FST values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.

6.
Cell ; 184(8): 2068-2083.e11, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33861964

ABSTRACT

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.


Subject(s)
Ethnicity/genetics , Population Health , Databases, Genetic , Electronic Health Records , Genomics , Humans , Self Report
7.
Am J Hum Genet ; 108(2): 219-239, 2021 02 04.
Article in English | MEDLINE | ID: mdl-33440170

ABSTRACT

We present a full-likelihood method to infer polygenic adaptation from DNA sequence variation and GWAS summary statistics to quantify recent transient directional selection acting on a complex trait. Through simulations of polygenic trait architecture evolution and GWASs, we show the method substantially improves power over current methods. We examine the robustness of the method under stratification, uncertainty and bias in marginal effects, uncertainty in the causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, controlling for pleiotropy even among traits with strong genetic correlation (|rg|=80%) while retaining high power to attribute selection to the causal trait. When the causal trait is excluded from analysis, selection is attributed to its closest proxy. We discuss limitations of the method, cautioning against strongly causal interpretations of the results, and the possibility of undetectable gene-by-environment (GxE) interactions. We apply the method to 56 human polygenic traits, revealing signals of directional selection on pigmentation, life history, glycated hemoglobin (HbA1c), and other traits. We also conduct joint testing of 137 pairs of genetically correlated traits, revealing widespread correlated response acting on these traits (2.6-fold enrichment, p = 1.5 × 10-7). Signs of selection on some traits previously reported as adaptive (e.g., educational attainment and hair color) are largely attributable to correlated response (p = 2.9 × 10-6 and 1.7 × 10-4, respectively). Lastly, our joint test shows antagonistic selection has increased type 2 diabetes risk and decrease HbA1c (p = 1.5 × 10-5).


Subject(s)
Genome, Human , Multifactorial Inheritance , Selection, Genetic , Computer Simulation , Diabetes Mellitus, Type 2/genetics , Evolution, Molecular , Gene-Environment Interaction , Genetic Heterogeneity , Genetic Pleiotropy , Genome-Wide Association Study , Glycated Hemoglobin/genetics , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Sample Size
9.
Cell Rep ; 31(1): 107489, 2020 04 07.
Article in English | MEDLINE | ID: mdl-32268104

ABSTRACT

Gene expression levels vary across developmental stage, cell type, and region in the brain. Genomic variants also contribute to the variation in expression, and some neuropsychiatric disorder loci may exert their effects through this mechanism. To investigate these relationships, we present BrainVar, a unique resource of paired whole-genome and bulk tissue RNA sequencing from the dorsolateral prefrontal cortex of 176 individuals across prenatal and postnatal development. Here we identify common variants that alter gene expression (expression quantitative trait loci [eQTLs]) constantly across development or predominantly during prenatal or postnatal stages. Both "constant" and "temporal-predominant" eQTLs are enriched for loci associated with neuropsychiatric traits and disorders and colocalize with specific variants. Expression levels of more than 12,000 genes rise or fall in a concerted late-fetal transition, with the transitional genes enriched for cell-type-specific genes and neuropsychiatric risk loci, underscoring the importance of cataloging developmental trajectories in understanding cortical physiology and pathology.


Subject(s)
Brain/embryology , Computational Biology/methods , Prefrontal Cortex/metabolism , Base Sequence/genetics , Brain/growth & development , Brain/metabolism , Databases, Genetic , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Genome-Wide Association Study/methods , Genomics/methods , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, RNA/methods , Transcriptome/genetics , Exome Sequencing/methods , Whole Genome Sequencing/methods
10.
Evol Lett ; 3(1): 69-79, 2019 Feb.
Article in English | MEDLINE | ID: mdl-30788143

ABSTRACT

Selection and mutation shape the genetic variation underlying human traits, but the specific evolutionary mechanisms driving complex trait variation are largely unknown. We developed a statistical method that uses polarized genome-wide association study (GWAS) summary statistics from a single population to detect signals of mutational bias and selection. We found evidence for nonneutral signals on variation underlying several traits (body mass index [BMI], schizophrenia, Crohn's disease, educational attainment, and height). We then used simulations that incorporate simultaneous negative and positive selection to show that these signals are consistent with mutational bias and shifts in the fitness-phenotype relationship, but not stabilizing selection or mutational bias alone. We additionally replicate two of our top three signals (BMI and educational attainment) in an external cohort, and show that population stratification may have confounded GWAS summary statistics for height in the GIANT cohort. Our results provide a flexible and powerful framework for evolutionary analysis of complex phenotypes in humans and other species, and offer insights into the evolutionary mechanisms driving variation in human polygenic traits.

11.
Am J Hum Genet ; 100(1): 31-39, 2017 Jan 05.
Article in English | MEDLINE | ID: mdl-28017371

ABSTRACT

Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ2 = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ2 = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples.


Subject(s)
Family , Genetic Association Studies/methods , Models, Genetic , Bias , Calibration , Diabetes Mellitus, Type 2/genetics , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Retrospective Studies
12.
Genome Res ; 26(7): 863-73, 2016 07.
Article in English | MEDLINE | ID: mdl-27197206

ABSTRACT

The role of rare alleles in complex phenotypes has been hotly debated, but most rare variant association tests (RVATs) do not account for the evolutionary forces that affect genetic architecture. Here, we use simulation and numerical algorithms to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance. We then assess the ability of RVATs to detect causal loci using simulations and human RNA-seq data. Surprisingly, we find that statistical performance is worst for phenotypes in which genetic variance is due mainly to rare alleles, and explosive population growth decreases power. Although many studies have attempted to identify causal rare variants, few have reported novel associations. This has sometimes been interpreted to mean that rare variants make negligible contributions to complex trait heritability. Our work shows that RVATs are not robust to realistic human evolutionary forces, so general conclusions about the impact of rare variants on complex traits may be premature.


Subject(s)
Evolution, Molecular , Models, Genetic , Alleles , Chromosomes, Human/genetics , Genetic Variation , Genetics, Medical , Humans , Phenotype , Population Growth , White People/genetics
13.
Genome Res ; 25(7): 927-36, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25953952

ABSTRACT

Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.


Subject(s)
Genomic Imprinting , Genomics , Adult , Alleles , Cluster Analysis , DNA Methylation , Databases, Nucleic Acid , Female , Gene Expression Regulation , Genetic Variation , Genotype , Humans , Male , Organ Specificity/genetics , Polymorphism, Single Nucleotide , Reproducibility of Results , Sex Factors
14.
Am J Hum Genet ; 96(5): 720-30, 2015 May 07.
Article in English | MEDLINE | ID: mdl-25892111

ABSTRACT

We introduce a liability-threshold mixed linear model (LTMLM) association statistic for case-control studies and show that it has a well-controlled false-positive rate and more power than existing mixed-model methods for diseases with low prevalence. Existing mixed-model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem by using a χ(2) score statistic computed from posterior mean liabilities (PMLs) under the liability-threshold model. Each individual's PML is conditional not only on that individual's case-control status but also on every individual's case-control status and the genetic relationship matrix (GRM) obtained from the data. The PMLs are estimated with a multivariate Gibbs sampler; the liability-scale phenotypic covariance matrix is based on the GRM, and a heritability parameter is estimated via Haseman-Elston regression on case-control phenotypes and then transformed to the liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed-model methods for diseases with low prevalence, and the magnitude of the improvement depended on sample size and severity of case-control ascertainment. In a Wellcome Trust Case Control Consortium 2 multiple sclerosis dataset with >10,000 samples, LTMLM was correctly calibrated and attained a 4.3% improvement (p = 0.005) in χ(2) statistics over existing mixed-model methods at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, case-control studies of diseases with low prevalence can achieve power higher than that in existing mixed-model methods.


Subject(s)
Genetic Association Studies , Models, Genetic , Models, Theoretical , Case-Control Studies , Chromosome Mapping , Computer Simulation , Humans , Multiple Sclerosis/genetics , Multiple Sclerosis/pathology , Phenotype , Polymorphism, Single Nucleotide , Sample Size
15.
Bioinformatics ; 31(15): 2497-504, 2015 Aug 01.
Article in English | MEDLINE | ID: mdl-25819081

ABSTRACT

MOTIVATION: RNA sequencing enables allele-specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression (GTEx) project is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. RESULTS: We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally.


Subject(s)
Extracellular Matrix Proteins/metabolism , High-Throughput Nucleotide Sequencing/methods , Lipoid Proteinosis of Urbach and Wiethe/metabolism , Polymorphism, Single Nucleotide/genetics , RNA/analysis , Alleles , Extracellular Matrix Proteins/genetics , Humans , Lipoid Proteinosis of Urbach and Wiethe/genetics , Organ Specificity , Protein Isoforms , RNA/genetics
16.
Nat Genet ; 46(2): 100-6, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24473328

ABSTRACT

Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.


Subject(s)
Genetic Association Studies/methods , Linear Models , Research Design , Colitis, Ulcerative/genetics , Computer Simulation , Genetic Association Studies/statistics & numerical data , Humans , Multiple Sclerosis/genetics
18.
Nat Rev Genet ; 11(7): 459-63, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20548291

ABSTRACT

Genome-wide association (GWA) studies are an effective approach for identifying genetic variants associated with disease risk. GWA studies can be confounded by population stratification--systematic ancestry differences between cases and controls--which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities.


Subject(s)
Genome-Wide Association Study/methods , Models, Genetic , Computer Simulation , Humans , Polymorphism, Single Nucleotide
19.
J Comput Biol ; 17(3): 547-60, 2010 Mar.
Article in English | MEDLINE | ID: mdl-20377463

ABSTRACT

Genome-wide association studies have proven to be a highly successful method for identification of genetic loci for complex phenotypes in both humans and model organisms. These large scale studies rely on the collection of hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome. Standard high-throughput genotyping technologies capture only a fraction of the total genetic variation. Recent efforts have shown that it is possible to "impute" with high accuracy the genotypes of SNPs that are not collected in the study provided that they are present in a reference data set which contains both SNPs collected in the study as well as other SNPs. We here introduce a novel HMM based technique to solve the imputation problem that addresses several shortcomings of existing methods. First, our method is adaptive which lets it estimate population genetic parameters from the data and be applied to model organisms that have very different evolutionary histories. Compared to previous methods, our method is up to ten times more accurate on model organisms such as mouse. Second, our algorithm scales in memory usage in the number of collected markers as opposed to the number of known SNPs. This issue is very relevant due to the size of the reference data sets currently being generated. We compare our method over mouse and human data sets to existing methods, and show that each has either comparable or better performance and much lower memory usage. The method is available for download at http://genetics.cs.ucla.edu/eminim.


Subject(s)
Algorithms , Haplotypes/genetics , Animals , Case-Control Studies , Diploidy , Humans , Markov Chains , Mice , Mice, Inbred Strains , Models, Genetic , Polymorphism, Single Nucleotide/genetics
20.
Nat Genet ; 42(4): 348-54, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20208533

ABSTRACT

Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.


Subject(s)
Genome-Wide Association Study , Models, Statistical , Population Groups/genetics , Humans , Models, Genetic , Polymorphism, Single Nucleotide , Principal Component Analysis , Quantitative Trait Loci , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...