Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 132
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 184(8): 2068-2083.e11, 2021 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-33861964

RESUMEN

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.


Asunto(s)
Etnicidad/genética , Salud Poblacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Genómica , Humanos , Autoinforme
2.
Am J Hum Genet ; 111(7): 1462-1480, 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38866020

RESUMEN

Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex.


Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Humanos , Herencia Multifactorial/genética , Masculino , Femenino , Carácter Cuantitativo Heredable , Fenotipo , Modelos Genéticos , Sitios de Carácter Cuantitativo
3.
Am J Hum Genet ; 111(2): 242-258, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38211585

RESUMEN

Tumor mutational burden (TMB), the total number of somatic mutations in the tumor, and copy number burden (CNB), the corresponding measure of aneuploidy, are established fundamental somatic features and emerging biomarkers for immunotherapy. However, the genetic and non-genetic influences on TMB/CNB and, critically, the manner by which they influence patient outcomes remain poorly understood. Here, we present a large germline-somatic study of TMB/CNB with >23,000 individuals across 17 cancer types, of which 12,000 also have extensive clinical, treatment, and overall survival (OS) measurements available. We report dozens of clinical associations with TMB/CNB, observing older age and male sex to have a strong effect on TMB and weaker impact on CNB. We additionally identified significant germline influences on TMB/CNB, including fine-scale European ancestry and germline polygenic risk scores (PRSs) for smoking, tanning, white blood cell counts, and educational attainment. We quantify the causal effect of exposures on somatic mutational processes using Mendelian randomization. Many of the identified features associated with TMB/CNB were additionally associated with OS for individuals treated at a single tertiary cancer center. For individuals receiving immunotherapy, we observed a complex relationship between PRSs for educational attainment, self-reported college attainment, TMB, and survival, suggesting that the influence of this biomarker may be substantially modified by socioeconomic status. While the accumulation of somatic alterations is a stochastic process, our work demonstrates that it can be shaped by host characteristics including germline genetics.


Asunto(s)
Neoplasias , Humanos , Masculino , Mutación/genética , Neoplasias/genética , Neoplasias/patología , Inmunoterapia , Biomarcadores de Tumor/genética , Células Germinativas/patología
4.
Am J Hum Genet ; 111(2): 323-337, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38306997

RESUMEN

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.


Asunto(s)
Estudio de Asociación del Genoma Completo , Litio , Humanos , Estudio de Asociación del Genoma Completo/métodos , RNA-Seq , Sitios de Carácter Cuantitativo/genética , Fenotipo , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
5.
Am J Hum Genet ; 110(8): 1319-1329, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37490908

RESUMEN

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.


Asunto(s)
Genómica , Polimorfismo de Nucleótido Simple , Incertidumbre , Genotipo , Genómica/métodos , Secuenciación Completa del Genoma , Polimorfismo de Nucleótido Simple/genética
6.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38856173

RESUMEN

Multivariate analysis is becoming central in studies investigating high-throughput molecular data, yet, some important features of these data are seldom explored. Here, we present MANOCCA (Multivariate Analysis of Conditional CovAriance), a powerful method to test for the effect of a predictor on the covariance matrix of a multivariate outcome. The proposed test is by construction orthogonal to tests based on the mean and variance and is able to capture effects that are missed by both approaches. We first compare the performances of MANOCCA with existing correlation-based methods and show that MANOCCA is the only test correctly calibrated in simulation mimicking omics data. We then investigate the impact of reducing the dimensionality of the data using principal component analysis when the sample size is smaller than the number of pairwise covariance terms analysed. We show that, in many realistic scenarios, the maximum power can be achieved with a limited number of components. Finally, we apply MANOCCA to 1000 healthy individuals from the Milieu Interieur cohort, to assess the effect of health, lifestyle and genetic factors on the covariance of two sets of phenotypes, blood biomarkers and flow cytometry-based immune phenotypes. Our analyses identify significant associations between multiple factors and the covariance of both omics data.


Asunto(s)
Análisis de Componente Principal , Humanos , Análisis Multivariante , Biología Computacional/métodos , Fenotipo , Algoritmos , Genómica/métodos , Biomarcadores/sangre , Simulación por Computador
7.
Nucleic Acids Res ; 52(11): e50, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38797520

RESUMEN

Whole-genome bisulfite sequencing (BS-Seq) measures cytosine methylation changes at single-base resolution and can be used to profile cell-free DNA (cfDNA). In plasma, ultrashort single-stranded cfDNA (uscfDNA, ∼50 nt) has been identified together with 167 bp double-stranded mononucleosomal cell-free DNA (mncfDNA). However, the methylation profile of uscfDNA has not been described. Conventional BS-Seq workflows may not be helpful because bisulfite conversion degrades larger DNA into smaller fragments, leading to erroneous categorization as uscfDNA. We describe the '5mCAdpBS-Seq' workflow in which pre-methylated 5mC (5-methylcytosine) single-stranded adapters are ligated to heat-denatured cfDNA before bisulfite conversion. This method retains only DNA fragments that are unaltered by bisulfite treatment, resulting in less biased uscfDNA methylation analysis. Using 5mCAdpBS-Seq, uscfDNA had lower levels of DNA methylation (∼15%) compared to mncfDNA and was enriched in promoters and CpG islands. Hypomethylated uscfDNA fragments were enriched in upstream transcription start sites (TSSs), and the intensity of enrichment was correlated with expressed genes of hemopoietic cells. Using tissue-of-origin deconvolution, we inferred that uscfDNA is derived primarily from eosinophils, neutrophils, and monocytes. As proof-of-principle, we show that characteristics of the methylation profile of uscfDNA can distinguish non-small cell lung carcinoma from non-cancer samples. The 5mCAdpBS-Seq workflow is recommended for any cfDNA methylation-based investigations.


Asunto(s)
5-Metilcitosina , Ácidos Nucleicos Libres de Células , Islas de CpG , Metilación de ADN , ADN de Cadena Simple , Humanos , Ácidos Nucleicos Libres de Células/sangre , Ácidos Nucleicos Libres de Células/genética , ADN de Cadena Simple/metabolismo , ADN de Cadena Simple/genética , ADN de Cadena Simple/sangre , 5-Metilcitosina/metabolismo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/sangre , Sulfitos/química , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN/métodos , Secuenciación Completa del Genoma/métodos
8.
PLoS Genet ; 18(11): e1010447, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36342933

RESUMEN

We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Pleiotropía Genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Metaanálisis como Asunto
9.
Am J Hum Genet ; 108(2): 219-239, 2021 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-33440170

RESUMEN

We present a full-likelihood method to infer polygenic adaptation from DNA sequence variation and GWAS summary statistics to quantify recent transient directional selection acting on a complex trait. Through simulations of polygenic trait architecture evolution and GWASs, we show the method substantially improves power over current methods. We examine the robustness of the method under stratification, uncertainty and bias in marginal effects, uncertainty in the causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, controlling for pleiotropy even among traits with strong genetic correlation (|rg|=80%) while retaining high power to attribute selection to the causal trait. When the causal trait is excluded from analysis, selection is attributed to its closest proxy. We discuss limitations of the method, cautioning against strongly causal interpretations of the results, and the possibility of undetectable gene-by-environment (GxE) interactions. We apply the method to 56 human polygenic traits, revealing signals of directional selection on pigmentation, life history, glycated hemoglobin (HbA1c), and other traits. We also conduct joint testing of 137 pairs of genetically correlated traits, revealing widespread correlated response acting on these traits (2.6-fold enrichment, p = 1.5 × 10-7). Signs of selection on some traits previously reported as adaptive (e.g., educational attainment and hair color) are largely attributable to correlated response (p = 2.9 × 10-6 and 1.7 × 10-4, respectively). Lastly, our joint test shows antagonistic selection has increased type 2 diabetes risk and decrease HbA1c (p = 1.5 × 10-5).


Asunto(s)
Genoma Humano , Herencia Multifactorial , Selección Genética , Simulación por Computador , Diabetes Mellitus Tipo 2/genética , Evolución Molecular , Interacción Gen-Ambiente , Heterogeneidad Genética , Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Hemoglobina Glucada/genética , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Tamaño de la Muestra
10.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33833052

RESUMEN

Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Herencia Multifactorial/genética , Evolución Molecular , Predisposición Genética a la Enfermedad , Humanos , Carácter Cuantitativo Heredable
11.
J Allergy Clin Immunol ; 151(6): 1503-1512, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36796456

RESUMEN

BACKGROUND: Albuterol is the drug most widely used as asthma treatment among African Americans despite having a lower bronchodilator drug response (BDR) than other populations. Although BDR is affected by gene and environmental factors, the influence of DNA methylation is unknown. OBJECTIVE: This study aimed to identify epigenetic markers in whole blood associated with BDR, study their functional consequences by multi-omic integration, and assess their clinical applicability in admixed populations with a high asthma burden. METHODS: We studied 414 children and young adults (8-21 years old) with asthma in a discovery and replication design. We performed an epigenome-wide association study on 221 African Americans and replicated the results on 193 Latinos. Functional consequences were assessed by integrating epigenomics with genomics, transcriptomics, and environmental exposure data. Machine learning was used to develop a panel of epigenetic markers to classify treatment response. RESULTS: We identified 5 differentially methylated regions and 2 CpGs genome-wide significantly associated with BDR in African Americans located in FGL2 (cg08241295, P = 6.8 × 10-9) and DNASE2 (cg15341340, P = 7.8 × 10-8), which were regulated by genetic variation and/or associated with gene expression of nearby genes (false discovery rate < 0.05). The CpG cg15341340 was replicated in Latinos (P = 3.5 × 10-3). Moreover, a panel of 70 CpGs showed good classification for those with response and nonresponse to albuterol therapy in African American and Latino children (area under the receiver operating characteristic curve for training, 0.99; for validation, 0.70-0.71). The DNA methylation model showed similar discrimination as clinical predictors (P > .05). CONCLUSIONS: We report novel associations of epigenetic markers with BDR in pediatric asthma and demonstrate for the first time the applicability of pharmacoepigenetics in precision medicine of respiratory diseases.


Asunto(s)
Asma , Broncodilatadores , Niño , Adulto Joven , Humanos , Adolescente , Adulto , Broncodilatadores/uso terapéutico , Epigenoma , Multiómica , Asma/tratamiento farmacológico , Asma/genética , Asma/metabolismo , Albuterol/uso terapéutico , Metilación de ADN , Estudio de Asociación del Genoma Completo , Fibrinógeno/metabolismo
12.
Annu Rev Genomics Hum Genet ; 21: 413-435, 2020 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-32873077

RESUMEN

Disease classification, or nosology, was historically driven by careful examination of clinical features of patients. As technologies to measure and understand human phenotypes advanced, so too did classifications of disease, and the advent of genetic data has led to a surge in genetic subtyping in the past decades. Although the fundamental process of refining disease definitions and subtypes is shared across diverse fields, each field is driven by its own goals and technological expertise, leading to inconsistent and conflicting definitions of disease subtypes. Here, we review several classical and recent subtypes and subtyping approaches and provide concrete definitions to delineate subtypes. In particular, we focus on subtypes with distinct causal disease biology, which are of primary interest to scientists, and subtypes with pragmatic medical benefits, which are of primary interest to physicians. We propose genetic heterogeneity as a gold standard for establishing biologically distinct subtypes of complex polygenic disease. We focus especially on methods to find and validate genetic subtypes, emphasizing common pitfalls and how to avoid them.


Asunto(s)
Biomarcadores/análisis , Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Herencia Multifactorial , Mutación , Neoplasias/genética , Regulación Neoplásica de la Expresión Génica , Estudios de Asociación Genética , Enfermedades Genéticas Congénitas/clasificación , Enfermedades Genéticas Congénitas/patología , Humanos , Neoplasias/clasificación , Neoplasias/patología
13.
Am J Hum Genet ; 106(1): 71-91, 2020 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-31901249

RESUMEN

Gene-environment interactions (GxE) can be fundamental in applications ranging from functional genomics to precision medicine and is a conjectured source of substantial heritability. However, unbiased methods to profile GxE genome-wide are nascent and, as we show, cannot accommodate general environment variables, modest sample sizes, heterogeneous noise, and binary traits. To address this gap, we propose a simple, unifying mixed model for gene-environment interaction (GxEMM). In simulations and theory, we show that GxEMM can dramatically improve estimates and eliminate false positives when the assumptions of existing methods fail. We apply GxEMM to a range of human and model organism datasets and find broad evidence of context-specific genetic effects, including GxSex, GxAdversity, and GxDisease interactions across thousands of clinical and molecular phenotypes. Overall, GxEMM is broadly applicable for testing and quantifying polygenic interactions, which can be useful for explaining heritability and invaluable for determining biologically relevant environments.


Asunto(s)
Interacción Gen-Ambiente , Marcadores Genéticos , Trastornos Mentales/genética , Trastornos Mentales/patología , Modelos Genéticos , Herencia Multifactorial/genética , Adulto , Animales , Simulación por Computador , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Fenómica , Fenotipo , Ratas
14.
Graefes Arch Clin Exp Ophthalmol ; 261(8): 2245-2255, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-36917316

RESUMEN

BACKGROUND: This study evaluated the relationship between statin use and the age of onset of age-related macular degeneration (AMD). METHODS: Electronic Health Records from 52,840 patients evaluated at University of California Los Angeles (UCLA) Ophthalmology Clinics and 9,977 patients evaluated at University of California San Francisco (UCSF) Ophthalmology Clinics were screened. Survival analysis was performed using Cox proportional hazards regression models and visualized using Kaplan Meier survival curves, with the following covariates-sex, ethnicity, smoking history, fluoxetine use, obesity, diabetes mellitus, and hypertension. RESULTS: 5,498 of 52,840 patients at UCLA were diagnosed with AMD. Statin use was associated with a later AMD onset (HR = 0.8823, p < 0.0001), while female sex (HR = 1.0852, p= 00,035), obesity (HR = 1.4555, p < 0.0001), and fluoxetine (HR = 1.3797, p= 0.0003) were associated with an earlier AMD onset. Non-hispanic black (HR = 0.5687, p < 0.0001) and hispanic ethnicities (HR = 0.8269, p= 0.0028) were associated with a later AMD onset. When stratifying for ethnicity, statins, fluoxetine, sex, and obesity were significant only within non-hispanic white subjects. Statin use was significant among patients with dry AMD (HR = 0.8410, p= 0.0001) but not wet AMD (0.9188, p= 0.0351). In the replication cohort, 526 of 9,977 patients at UCSF had AMD. Associations between statins (HR = 0.7643, p= 0.0033), non-hispanic black ethnicity (HR = 0.5043, p= 0.0035), and obesity (HR = 1.9602, p < 0.0001) on AMD onset were confirmed. CONCLUSIONS: In both cohorts, statin use and non-hispanic black ethnicity are associated with a later AMD onset, while obesity with an earlier AMD onset.


Asunto(s)
Inhibidores de Hidroximetilglutaril-CoA Reductasas , Degeneración Macular , Humanos , Femenino , Estudios Retrospectivos , Edad de Inicio , Fluoxetina , Factores de Riesgo , Obesidad
15.
PLoS Genet ; 16(10): e1009165, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33104702

RESUMEN

BACKGROUND: The majority of quantitative genetic models used to map complex traits assume that alleles have similar effects across all individuals. Significant evidence suggests, however, that epistatic interactions modulate the impact of many alleles. Nevertheless, identifying epistatic interactions remains computationally and statistically challenging. In this work, we address some of these challenges by developing a statistical test for polygenic epistasis that determines whether the effect of an allele is altered by the global genetic ancestry proportion from distinct progenitors. RESULTS: We applied our method to data from mice and yeast. For the mice, we observed 49 significant genotype-by-ancestry interaction associations across 14 phenotypes as well as over 1,400 Bonferroni-corrected genotype-by-ancestry interaction associations for mouse gene expression data. For the yeast, we observed 92 significant genotype-by-ancestry interactions across 38 phenotypes. Given this evidence of epistasis, we test for and observe evidence of rapid selection pressure on ancestry specific polymorphisms within one of the cohorts, consistent with epistatic selection. CONCLUSIONS: Unlike our prior work in human populations, we observe widespread evidence of ancestry-modified SNP effects, perhaps reflecting the greater divergence present in crosses using mice and yeast.


Asunto(s)
Epistasis Genética , Evolución Molecular , Herencia Multifactorial/genética , Selección Genética/genética , Alelos , Animales , Genotipo , Humanos , Ratones , Modelos Genéticos , Fenotipo , Sitios de Carácter Cuantitativo/genética , Saccharomyces cerevisiae/genética
16.
PLoS Genet ; 16(8): e1008927, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32797036

RESUMEN

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.


Asunto(s)
Negro o Afroamericano/genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Transcriptoma , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/normas , Estudio de Asociación del Genoma Completo/normas , Humanos , Sitios de Carácter Cuantitativo , RNA-Seq/métodos , RNA-Seq/normas , Estándares de Referencia
17.
PLoS Genet ; 15(4): e1008009, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30951530

RESUMEN

Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Herencia Multifactorial , Fenotipo , Algoritmos , Glucemia/efectos de los fármacos , Glucemia/genética , Análisis por Conglomerados , Simulación por Computador , Enfermedad Coronaria/sangre , Enfermedad Coronaria/tratamiento farmacológico , Enfermedad Coronaria/genética , Trastorno Depresivo Mayor/clasificación , Trastorno Depresivo Mayor/genética , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Lípidos/sangre , Polimorfismo de Nucleótido Simple , Estado Prediabético/genética , Sitios de Carácter Cuantitativo
18.
PLoS Genet ; 15(3): e1008018, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30849075

RESUMEN

Several bacteria in the gut microbiota have been shown to be associated with inflammatory bowel disease (IBD), and dozens of IBD genetic variants have been identified in genome-wide association studies. However, the role of the microbiota in the etiology of IBD in terms of host genetic susceptibility remains unclear. Here, we studied the association between four major genetic variants associated with an increased risk of IBD and bacterial taxa in up to 633 IBD cases. We performed systematic screening for associations, identifying and replicating associations between NOD2 variants and two taxa: the Roseburia genus and the Faecalibacterium prausnitzii species. By exploring the overall association patterns between genes and bacteria, we found that IBD risk alleles were significantly enriched for associations concordant with bacteria-IBD associations. To understand the significance of this pattern in terms of the study design and known effects from the literature, we used counterfactual principles to assess the fitness of a few parsimonious gene-bacteria-IBD causal models. Our analyses showed evidence that the disease risk of these genetic variants were likely to be partially mediated by the microbiome. We confirmed these results in extensive simulation studies and sensitivity analyses using the association between NOD2 and F. prausnitzii as a case study.


Asunto(s)
Microbioma Gastrointestinal/genética , Interacciones Microbiota-Huesped/genética , Enfermedades Inflamatorias del Intestino/genética , Enfermedades Inflamatorias del Intestino/microbiología , Adulto , Proteínas Adaptadoras de Señalización CARD/genética , Clostridiales/genética , Clostridiales/aislamiento & purificación , Clostridiales/patogenicidad , Faecalibacterium prausnitzii/genética , Faecalibacterium prausnitzii/aislamiento & purificación , Faecalibacterium prausnitzii/patogenicidad , Femenino , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Variación Genética , Humanos , Enfermedades Inflamatorias del Intestino/etiología , Masculino , Persona de Mediana Edad , Modelos Genéticos , Proteína Adaptadora de Señalización NOD2/genética , Polimorfismo de Nucleótido Simple
19.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30474154

RESUMEN

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Asunto(s)
Genética de Población , Adulto , Envejecimiento/genética , Artritis Reumatoide/genética , Bancos de Muestras Biológicas , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Genotipo , Humanos , Fenotipo , Carácter Cuantitativo Heredable , Reino Unido
20.
Am J Hum Genet ; 100(1): 31-39, 2017 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-28017371

RESUMEN

Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ2 = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ2 = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples.


Asunto(s)
Familia , Estudios de Asociación Genética/métodos , Modelos Genéticos , Sesgo , Calibración , Diabetes Mellitus Tipo 2/genética , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Estudios Retrospectivos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA