RESUMEN
Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.
Asunto(s)
Enfermedades Cardiovasculares , Estudio de Asociación del Genoma Completo , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial , Polimorfismo de Nucleótido Simple/genética , Grupos de PoblaciónRESUMEN
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.
Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento/genética , Genotipo , Bancos de Muestras Biológicas , Reino Unido , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Normal and pathologic neurobiological processes influence brain morphology in coordinated ways that give rise to patterns of structural covariance (PSC) across brain regions and individuals during brain aging and diseases. The genetic underpinnings of these patterns remain largely unknown. We apply a stochastic multivariate factorization method to a diverse population of 50,699 individuals (12 studies and 130 sites) and derive data-driven, multi-scale PSCs of regional brain size. PSCs were significantly correlated with 915 genomic loci in the discovery set, 617 of which are newly identified, and 72% were independently replicated. Key pathways influencing PSCs involve reelin signaling, apoptosis, neurogenesis, and appendage development, while pathways of breast cancer indicate potential interplays between brain metastasis and PSCs associated with neurodegeneration and dementia. Using support vector machines, multi-scale PSCs effectively derive imaging signatures of several brain diseases. Our results elucidate genetic and biological underpinnings that influence structural covariance patterns in the human brain.
Asunto(s)
Neoplasias Encefálicas , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Encéfalo/patología , Mapeo Encefálico/métodos , Genómica , Neoplasias Encefálicas/patologíaRESUMEN
A major challenge of genome-wide association studies (GWASs) is to translate phenotypic associations into biological insights. Here, we integrate a large GWAS on blood lipids involving 1.6 million individuals from five ancestries with a wide array of functional genomic datasets to discover regulatory mechanisms underlying lipid associations. We first prioritize lipid-associated genes with expression quantitative trait locus (eQTL) colocalizations and then add chromatin interaction data to narrow the search for functional genes. Polygenic enrichment analysis across 697 annotations from a host of tissues and cell types confirms the central role of the liver in lipid levels and highlights the selective enrichment of adipose-specific chromatin marks in high-density lipoprotein cholesterol and triglycerides. Overlapping transcription factor (TF) binding sites with lipid-associated loci identifies TFs relevant in lipid biology. In addition, we present an integrative framework to prioritize causal variants at GWAS loci, producing a comprehensive list of candidate causal genes and variants with multiple layers of functional evidence. We highlight two of the prioritized genes, CREBRF and RRBP1, which show convergent evidence across functional datasets supporting their roles in lipid biology.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Cromatina/genética , Genómica , Humanos , Lípidos/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
As a type of relatively new methodology, the transcriptome-wide association study (TWAS) has gained interest due to capacity for gene-level association testing. However, the development of TWAS has outpaced statistical evaluation of TWAS gene prioritization performance. Current TWAS methods vary in underlying biological assumptions about tissue specificity of transcriptional regulatory mechanisms. In a previous study from our group, this may have affected whether TWAS methods better identified associations in single tissues versus multiple tissues. We therefore designed simulation analyses to examine how the interplay between particular TWAS methods and tissue specificity of gene expression affects power and type I error rates for gene prioritization. We found that cross-tissue identification of expression quantitative trait loci (eQTLs) improved TWAS power. Single-tissue TWAS (i.e., PrediXcan) had robust power to identify genes expressed in single tissues, but, often found significant associations in the wrong tissues as well (therefore had high false positive rates). Cross-tissue TWAS (i.e., UTMOST) had overall equal or greater power and controlled type I error rates for genes expressed in multiple tissues. Based on these simulation results, we applied a tissue specificity-aware TWAS (TSA-TWAS) analytic framework to look for gene-based associations with pre-treatment laboratory values from AIDS Clinical Trial Group (ACTG) studies. We replicated several proof-of-concept transcriptionally regulated gene-trait associations, including UGT1A1 (encoding bilirubin uridine diphosphate glucuronosyltransferase enzyme) and total bilirubin levels (p = 3.59×10-12), and CETP (cholesteryl ester transfer protein) with high-density lipoprotein cholesterol (p = 4.49×10-12). We also identified several novel genes associated with metabolic and virologic traits, as well as pleiotropic genes that linked plasma viral load, absolute basophil count, and/or triglyceride levels. By highlighting the advantages of different TWAS methods, our simulation study promotes a tissue specificity-aware TWAS analytic framework that revealed novel aspects of HIV-related traits.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Transcriptoma/genética , Simulación por Computador , Regulación de la Expresión Génica/genética , Humanos , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Genome-wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS-significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole-genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker-quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker-QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker-QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker-QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height.
Asunto(s)
Enfermedad/genética , Genoma Humano , Modelos Genéticos , Sitios de Carácter Cuantitativo , Simulación por Computador , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Análisis de RegresiónRESUMEN
The complex biological mechanisms underlying human brain aging remain incompletely understood. This study investigated the genetic architecture of three brain age gaps (BAG) derived from gray matter volume (GM-BAG), white matter microstructure (WM-BAG), and functional connectivity (FC-BAG). We identified sixteen genomic loci that reached genome-wide significance (P-value < 5×10-8). A gene-drug-disease network highlighted genes linked to GM-BAG for treating neurodegenerative and neuropsychiatric disorders and WM-BAG genes for cancer therapy. GM-BAG displayed the most pronounced heritability enrichment in genetic variants within conserved regions. Oligodendrocytes and astrocytes, but not neurons, exhibited notable heritability enrichment in WM and FC-BAG, respectively. Mendelian randomization identified potential causal effects of several chronic diseases on brain aging, such as type 2 diabetes on GM-BAG and AD on WM-BAG. Our results provide insights into the genetics of human brain aging, with clinical implications for potential lifestyle and therapeutic interventions. All results are publicly available at https://labs.loni.usc.edu/medicine .
Asunto(s)
Diabetes Mellitus Tipo 2 , Sustancia Blanca , Humanos , Encéfalo , Sustancia Gris , Imagen por Resonancia Magnética/métodos , Sustancia Blanca/fisiología , Análisis de la Aleatorización MendelianaRESUMEN
This PSB 2023 session discusses challenges in clinical implication and application of risk prediction models, which includes but is not limited to: implementation of risk models, responsible use of polygenic risk scores (PGS), and other risk prediction strategies. We focus on the development and use of new, scalable methods for harmonizing and refining risk prediction models by incorporating genetic and non-genetic risk factors, applying new phenotyping strategies, and integrating clinical factors and biomarkers. Lastly, we will discuss innovation in expanding the utility of these prediction models to underrepresented populations. This session focuses on the overarching theme of enabling early diagnosis, and treatment and preventive measures related to complex diseases and comorbidities.
Asunto(s)
Biología Computacional , Herencia Multifactorial , Humanos , Factores de Riesgo , Predisposición Genética a la EnfermedadRESUMEN
The complex biological mechanisms underlying human brain aging remain incompletely understood, involving multiple body organs and chronic diseases. In this study, we used multimodal magnetic resonance imaging and artificial intelligence to examine the genetic architecture of the brain age gap (BAG) derived from gray matter volume (GM-BAG, N=31,557 European ancestry), white matter microstructure (WM-BAG, N=31,674), and functional connectivity (FC-BAG, N=32,017). We identified sixteen genomic loci that reached genome-wide significance (P-value<5×10-8). A gene-drug-disease network highlighted genes linked to GM-BAG for treating neurodegenerative and neuropsychiatric disorders and WM-BAG genes for cancer therapy. GM-BAG showed the highest heritability enrichment for genetic variants in conserved regions, whereas WM-BAG exhibited the highest heritability enrichment in the 5' untranslated regions; oligodendrocytes and astrocytes, but not neurons, showed significant heritability enrichment in WM and FC-BAG, respectively. Mendelian randomization identified potential causal effects of several exposure variables on brain aging, such as type 2 diabetes on GM-BAG (odds ratio=1.05 [1.01, 1.09], P-value=1.96×10-2) and AD on WM-BAG (odds ratio=1.04 [1.02, 1.05], P-value=7.18×10-5). Overall, our results provide valuable insights into the genetics of human brain aging, with clinical implications for potential lifestyle and therapeutic interventions. All results are publicly available at the MEDICINE knowledge portal: https://labs.loni.usc.edu/medicine.
RESUMEN
ABSTRACT The mixed linear model (MLM) is an advanced statistical technique applicable to many fields of science. The multivariate MLM can be used to model longitudinal data, such as repeated ratings of disease resistance taken across time. In this study, using an example data set from a multi-environment trial of northern leaf blight disease on 290 maize lines with diverse levels of resistance, multivariate MLM analysis was performed and its utility was examined. In the population and environments tested, genotypic effects were highly correlated across disease ratings and followed an autoregressive pattern of correlation decay. Because longitudinal data are often converted to the univariate measure of area under the disease progress curve (AUDPC), comparisons between univariate MLM analysis of AUDPC and multivariate MLM analysis of longitudinal data were made. Univariate analysis had the advantage of simplicity and reduced computational demand, whereas multivariate analysis enabled a comprehensive perspective on disease development, providing the opportunity for unique insights into disease resistance. To aid in the application of multivariate MLM analysis of longitudinal data on disease resistance, annotated program syntax for model fitting is provided for the software ASReml.
Asunto(s)
Ascomicetos/inmunología , Resistencia a la Enfermedad , Modelos Lineales , Enfermedades de las Plantas/inmunología , Zea mays/inmunología , Ascomicetos/fisiología , Simulación por Computador , Interpretación Estadística de Datos , Genotipo , Estudios Longitudinales , Análisis Multivariante , Enfermedades de las Plantas/microbiología , Proyectos de Investigación , Programas Informáticos , Zea mays/microbiologíaRESUMEN
Importance: Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity might aid in elucidating etiological mechanisms and support precision and individualized medicine. Objective: To cross-sectionally and longitudinally delineate disease-related heterogeneity in LLD associated with neuroanatomy, cognitive functioning, clinical symptoms, and genetic profiles. Design, Setting, and Participants: The Imaging-Based Coordinate System for Aging and Neurodegenerative Diseases (iSTAGING) study is an international multicenter consortium investigating brain aging in pooled and harmonized data from 13 studies with more than 35â¯000 participants, including a subset of individuals with major depressive disorder. Multimodal data from a multicenter sample (N = 996), including neuroimaging, neurocognitive assessments, and genetics, were analyzed in this study. A semisupervised clustering method (heterogeneity through discriminative analysis) was applied to regional gray matter (GM) brain volumes to derive dimensional representations. Data were collected from July 2017 to July 2020 and analyzed from July 2020 to December 2021. Main Outcomes and Measures: Two dimensions were identified to delineate LLD-associated heterogeneity in voxelwise GM maps, white matter (WM) fractional anisotropy, neurocognitive functioning, clinical phenotype, and genetics. Results: A total of 501 participants with LLD (mean [SD] age, 67.39 [5.56] years; 332 women) and 495 healthy control individuals (mean [SD] age, 66.53 [5.16] years; 333 women) were included. Patients in dimension 1 demonstrated relatively preserved brain anatomy without WM disruptions relative to healthy control individuals. In contrast, patients in dimension 2 showed widespread brain atrophy and WM integrity disruptions, along with cognitive impairment and higher depression severity. Moreover, 1 de novo independent genetic variant (rs13120336; chromosome: 4, 186387714; minor allele, G) was significantly associated with dimension 1 (odds ratio, 2.35; SE, 0.15; P = 3.14 ×108) but not with dimension 2. The 2 dimensions demonstrated significant single-nucleotide variant-based heritability of 18% to 27% within the general population (N = 12â¯518 in UK Biobank). In a subset of individuals having longitudinal measurements, those in dimension 2 experienced a more rapid longitudinal change in GM and brain age (Cohen f2 = 0.03; P = .02) and were more likely to progress to Alzheimer disease (Cohen f2 = 0.03; P = .03) compared with those in dimension 1 (N = 1431 participants and 7224 scans from the Alzheimer's Disease Neuroimaging Initiative [ADNI], Baltimore Longitudinal Study of Aging [BLSA], and Biomarkers for Older Controls at Risk for Dementia [BIOCARD] data sets). Conclusions and Relevance: This study characterized heterogeneity in LLD into 2 dimensions with distinct neuroanatomical, cognitive, clinical, and genetic profiles. This dimensional approach provides a potential mechanism for investigating the heterogeneity of LLD and the relevance of the latent dimensions to possible disease mechanisms, clinical outcomes, and responses to interventions.
Asunto(s)
Enfermedad de Alzheimer , Trastorno Depresivo Mayor , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Encéfalo/diagnóstico por imagen , Cognición , Depresión , Trastorno Depresivo Mayor/diagnóstico por imagen , Trastorno Depresivo Mayor/genética , Femenino , Humanos , Estudios Longitudinales , Imagen por Resonancia Magnética/métodos , Masculino , NeuroimagenRESUMEN
Clinical and epidemiological studies have shown that circulatory system diseases and nervous system disorders often co-occur in patients. However, genetic susceptibility factors shared between these disease categories remain largely unknown. Here, we characterized pleiotropy across 107 circulatory system and 40 nervous system traits using an ensemble of methods in the eMERGE Network and UK Biobank. Using a formal test of pleiotropy, five genomic loci demonstrated statistically significant evidence of pleiotropy. We observed region-specific patterns of direction of genetic effects for the two disease categories, suggesting potential antagonistic and synergistic pleiotropy. Our findings provide insights into the relationship between circulatory system diseases and nervous system disorders which can provide context for future prevention and treatment strategies.
Asunto(s)
Enfermedades Cardiovasculares , Enfermedades del Sistema Nervioso , Enfermedades Cardiovasculares/genética , Pleiotropía Genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Enfermedades del Sistema Nervioso/genética , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Genetic variants within nearly 1000 loci are known to contribute to modulation of blood lipid levels. However, the biological pathways underlying these associations are frequently unknown, limiting understanding of these findings and hindering downstream translational efforts such as drug target discovery. RESULTS: To expand our understanding of the underlying biological pathways and mechanisms controlling blood lipid levels, we leverage a large multi-ancestry meta-analysis (N = 1,654,960) of blood lipids to prioritize putative causal genes for 2286 lipid associations using six gene prediction approaches. Using phenome-wide association (PheWAS) scans, we identify relationships of genetically predicted lipid levels to other diseases and conditions. We confirm known pleiotropic associations with cardiovascular phenotypes and determine novel associations, notably with cholelithiasis risk. We perform sex-stratified GWAS meta-analysis of lipid levels and show that 3-5% of autosomal lipid-associated loci demonstrate sex-biased effects. Finally, we report 21 novel lipid loci identified on the X chromosome. Many of the sex-biased autosomal and X chromosome lipid loci show pleiotropic associations with sex hormones, emphasizing the role of hormone regulation in lipid metabolism. CONCLUSIONS: Taken together, our findings provide insights into the biological mechanisms through which associated variants lead to altered lipid levels and potentially cardiovascular disease risk.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Caracteres Sexuales , Fenotipo , Lípidos/genética , Polimorfismo de Nucleótido Simple , Pleiotropía GenéticaRESUMEN
Plasma lipids are known heritable risk factors for cardiovascular disease, but increasing evidence also supports shared genetics with diseases of other organ systems. We devised a comprehensive three-phase framework to identify new lipid-associated genes and study the relationships among lipids, genotypes, gene expression and hundreds of complex human diseases from the Electronic Medical Records and Genomics (347 traits) and the UK Biobank (549 traits). Aside from 67 new lipid-associated genes with strong replication, we found evidence for pleiotropic SNPs/genes between lipids and diseases across the phenome. These include discordant pleiotropy in the HLA region between lipids and multiple sclerosis and putative causal paths between triglycerides and gout, among several others. Our findings give insights into the genetic basis of the relationship between plasma lipids and diseases on a phenome-wide scale and can provide context for future prevention and treatment strategies.
Asunto(s)
Biomarcadores , Susceptibilidad a Enfermedades , Registros Electrónicos de Salud , Lípidos/sangre , Alelos , Bancos de Muestras Biológicas , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Polimorfismo de Nucleótido Simple , Vigilancia en Salud Pública , Carácter Cuantitativo Heredable , Reino UnidoRESUMEN
Characterizing how variation at the level of individual nucleotides contributes to traits and diseases has been an area of growing interest since the completion of sequencing the first human genome. Our understanding of how a single nucleotide polymorphism (SNP) leads to a pathogenic phenotype on a genome-wide scale is a fruitful endeavor for anyone interested in developing diagnostic tests, therapeutics, or simply wanting to understand the etiology of a disease or trait. To this end, many datasets and algorithms have been developed as resources/tools to annotate SNPs. One of the most common practices is to annotate coding SNPs that affect the protein sequence. Synonymous variants are often grouped as one type of variant, however there are in fact many tools available to dissect their effects on gene expression. More recently, large consortiums like ENCODE and GTEx have made it possible to annotate non-coding regions. Although annotating variants is a common technique among human geneticists, the constant advances in tools and biology surrounding SNPs requires an updated summary of what is known and the trajectory of the field. This review will discuss the history behind SNP annotation, commonly used tools, and newer strategies for SNP annotation. Additionally, we will comment on the caveats that distinguish approaches from one another, along with gaps in the current state of knowledge, and potential future directions. We do not intend for this to be a comprehensive review for any specific area of SNP annotation, but rather it will be an excellent resource for those unfamiliar with computational tools used to functionally characterize SNPs. In summary, this review will help illustrate how each SNP annotation method impacts the way in which the genetic and molecular etiology of a disease is explored in-silico.
RESUMEN
In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified? We consider studying effect heterogeneity using Bayesian random effect interaction models. The proposed methodology can be applied using shrinkage and variable selection methods, and produces useful information about effect heterogeneity in the form of whole-genome summaries (e.g., the proportions of variance of a complex trait explained by a set of SNPs and the average correlation of effects) as well as SNP-specific attributes. Using simulations, we show that the proposed methodology yields (nearly) unbiased estimates when the sample size is not too small relative to the number of SNPs used. Subsequently, we used the methodology for the analyses of four complex human traits (standing height, high-density lipoprotein, low-density lipoprotein, and serum urate levels) in European-Americans (EAs) and African-Americans (AAs). The estimated correlations of effects between the two subpopulations were well below unity for all the traits, ranging from 0.73 to 0.50. The extent of effect heterogeneity varied between traits and SNP sets. Height showed less differences in SNP effects between AAs and EAs whereas HDL, a trait highly influenced by lifestyle, exhibited a greater extent of effect heterogeneity. For all the traits, we observed substantial variability in effect heterogeneity across SNPs, suggesting that effect heterogeneity varies between regions of the genome.
Asunto(s)
Etnicidad/genética , Heterogeneidad Genética , Modelos Genéticos , Población/genética , Carácter Cuantitativo Heredable , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/normas , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Transcriptome-wide association studies (TWAS) have recently gained great attention due to their ability to prioritize complex trait-associated genes and promote potential therapeutics development for complex human diseases. TWAS integrates genotypic data with expression quantitative trait loci (eQTLs) to predict genetically regulated gene expression components and associates predictions with a trait of interest. As such, TWAS can prioritize genes whose differential expressions contribute to the trait of interest and provide mechanistic explanation of complex trait(s). Tissue-specific eQTL information grants TWAS the ability to perform association analysis on tissues whose gene expression profiles are otherwise hard to obtain, such as liver and heart. However, as eQTLs are tissue context-dependent, whether and how the tissue-specificity of eQTLs influences TWAS gene prioritization has not been fully investigated. In this study, we addressed this question by adopting two distinct TWAS methods, PrediXcan and UTMOST, which assume single tissue and integrative tissue effects of eQTLs, respectively. Thirty-eight baseline laboratory traits in 4,360 antiretroviral treatment-naïve individuals from the AIDS Clinical Trials Group (ACTG) studies comprised the input dataset for TWAS. We performed TWAS in a tissue-specific manner and obtained a total of 430 significant gene-trait associations (q-value < 0.05) across multiple tissues. Single tissue-based analysis by PrediXcan contributed 116 of the 430 associations including 64 unique gene-trait pairs in 28 tissues. Integrative tissue-based analysis by UTMOST found the other 314 significant associations that include 50 unique gene-trait pairs across all 44 tissues. Both analyses were able to replicate some associations identified in past variant-based genome-wide association studies (GWAS), such as high-density lipoprotein (HDL) and CETP (PrediXcan, q-value = 3.2e-16). Both analyses also identified novel associations. Moreover, single tissue-based and integrative tissuebased analysis shared 11 of 103 unique gene-trait pairs, for example, PSRC1-low-density lipoprotein (PrediXcan's lowest q-value = 8.5e-06; UTMOST's lowest q-value = 1.8e-05). This study suggests that single tissue-based analysis may have performed better at discovering gene-trait associations when combining results from all tissues. Integrative tissue-based analysis was better at prioritizing genes in multiple tissues and in trait-related tissue. Additional exploration is needed to confirm this conclusion. Finally, although single tissue-based and integrative tissue-based analysis shared significant novel discoveries, tissue context-dependency of eQTLs impacted TWAS gene prioritization. This study provides preliminary data to support continued work on tissue contextdependency of eQTL studies and TWAS.
Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Especificidad de Órganos/genética , Sitios de Carácter Cuantitativo , Transcriptoma , Fármacos Anti-VIH/uso terapéutico , Biología Computacional , Perfilación de la Expresión Génica/métodos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/genética , Humanos , Variantes Farmacogenómicas , Polimorfismo de Nucleótido SimpleRESUMEN
The link between cardiovascular diseases and neurological disorders has been widely observed in the aging population. Disease prevention and treatment rely on understanding the potential genetic nexus of multiple diseases in these categories. In this study, we were interested in detecting pleiotropy, or the phenomenon in which a genetic variant influences more than one phenotype. Marker-phenotype association approaches can be grouped into univariate, bivariate, and multivariate categories based on the number of phenotypes considered at one time. Here we applied one statistical method per category followed by an eQTL colocalization analysis to identify potential pleiotropic variants that contribute to the link between cardiovascular and neurological diseases. We performed our analyses on ~530,000 common SNPs coupled with 65 electronic health record (EHR)-based phenotypes in 43,870 unrelated European adults from the Electronic Medical Records and Genomics (eMERGE) network. There were 31 variants identified by all three methods that showed significant associations across late onset cardiac- and neurologic- diseases. We further investigated functional implications of gene expression on the detected "lead SNPs" via colocalization analysis, providing a deeper understanding of the discovered associations. In summary, we present the framework and landscape for detecting potential pleiotropy using univariate, bivariate, multivariate, and colocalization methods. Further exploration of these potentially pleiotropic genetic variants will work toward understanding disease causing mechanisms across cardiovascular and neurological diseases and may assist in considering disease prevention as well as drug repositioning in future research.
Asunto(s)
Enfermedades Cardiovasculares/genética , Pleiotropía Genética , Enfermedades del Sistema Nervioso/genética , Adulto , Anciano , Biología Computacional , Registros Electrónicos de Salud , Femenino , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , Análisis Multivariante , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
Transcriptome-wide association studies (TWAS) have recently been employed as an approach that can draw upon the advantages of genome-wide association studies (GWAS) and gene expression studies to identify genes associated with complex traits. Unlike standard GWAS, summary level data suffices for TWAS and offers improved statistical power. Two popular TWAS methods include either (a) imputing the cis genetic component of gene expression from smaller sized studies (using multi-SNP prediction or MP) into much larger effective sample sizes afforded by GWAS - TWAS-MP or (b) using summary-based Mendelian randomization - TWAS-SMR. Although these methods have been effective at detecting functional variants, it remains unclear how extensive variability in the genetic architecture of complex traits and diseases impacts TWAS results. Our goal was to investigate the different scenarios under which these methods yielded enough power to detect significant expression-trait associations. In this study, we conducted extensive simulations based on 6000 randomly chosen, unrelated Caucasian males from Geisinger's MyCode population to compare the power to detect cis expression-trait associations (within 500 kb of a gene) using the above-described approaches. To test TWAS across varying genetic backgrounds we simulated gene expression and phenotype using different quantitative trait loci per gene and cis-expression /trait heritability under genetic models that differentiate the effect of causality from that of pleiotropy. For each gene, on a training set ranging from 100 to 1000 individuals, we either (a) estimated regression coefficients with gene expression as the response using five different methods: LASSO, elastic net, Bayesian LASSO, Bayesian spike-slab, and Bayesian ridge regression or (b) performed eQTL analysis. We then sampled with replacement 50,000, 150,000, and 300,000 individuals respectively from the testing set of the remaining 5000 individuals and conducted GWAS on each set. Subsequently, we integrated the GWAS summary statistics derived from the testing set with the weights (or eQTLs) derived from the training set to identify expression-trait associations using (a) TWAS-MP (b) TWAS-SMR (c) eQTL-based GWAS, or (d) standalone GWAS. Finally, we examined the power to detect functionally relevant genes using the different approaches under the considered simulation scenarios. In general, we observed great similarities among TWAS-MP methods although the Bayesian methods resulted in improved power in comparison to LASSO and elastic net as the trait architecture grew more complex while training sample sizes and expression heritability remained small. Finally, we observed high power under causality but very low to moderate power under pleiotropy.