RESUMO
Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.
Assuntos
Aprendizado de Máquina Supervisionado , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Genoma Humano , Genética Populacional/métodos , Etnicidade/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , GenótipoRESUMO
Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
Assuntos
Frequência do Gene , Genótipo , Polimorfismo de Nucleotídeo Único , Software , Humanos , Estudos de Coortes , Desequilíbrio de Ligação , Estudo de Associação Genômica Ampla/métodos , Genoma Humano , Controle de Qualidade , Aprendizado de Máquina , Sequenciamento Completo do Genoma/normas , Sequenciamento Completo do Genoma/métodosRESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Assuntos
Biomarcadores , Estudo de Associação Genômica Ampla , Inflamação , Medicina de Precisão , Sequenciamento Completo do Genoma , Humanos , Medicina de Precisão/métodos , Inflamação/genética , Estudo de Associação Genômica Ampla/métodos , Sequenciamento Completo do Genoma/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Predisposição Genética para Doença , Feminino , Interleucina-6/genéticaRESUMO
Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions in lipid metabolism. Large-scale whole-genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess more associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with measurement of blood lipids and lipoproteins (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare-variant aggregate association tests using the STAAR (variant-set test for association using annotation information) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare-coding variants in nearby protein-coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500-kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variation and rare protein-coding variation at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNAs.
Assuntos
RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Estudo de Associação Genômica Ampla , Medicina de Precisão , Sequenciamento Completo do Genoma/métodos , Lipídeos/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.
Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , ProbabilidadeRESUMO
Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.
Assuntos
Estudo de Associação Genômica Ampla , Genoma , Humanos , Estudo de Associação Genômica Ampla/métodos , Sequenciamento Completo do Genoma/métodos , Fenótipo , Variação GenéticaRESUMO
Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.
Assuntos
Genoma Humano , Software , Humanos , Anotação de Sequência Molecular , Genômica , Genótipo , Variação GenéticaRESUMO
SUMMARY: We developed the variant-Set Test for Association using Annotation infoRmation (STAAR) workflow description language (WDL) workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the WDL allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power. This tool was specifically developed and optimized to be implemented on cloud-based platforms such as BioData Catalyst Powered by Terra. It provides easy-to-use functionality for rare variant analysis that can be incorporated into an exhaustive whole genome sequencing analysis pipeline. AVAILABILITY AND IMPLEMENTATION: The workflow is freely available from https://dockstore.org/workflows/github.com/sheilagaynor/STAAR_workflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Computação em Nuvem , Software , Fluxo de Trabalho , Genoma , Estudo de Associação Genômica AmplaRESUMO
Clinical trial results have recently demonstrated that inhibiting inflammation by targeting the interleukin-1ß pathway can offer a significant reduction in lung cancer incidence and mortality, highlighting a pressing and unmet need to understand the benefits of inflammation-focused lung cancer therapies at the genetic level. While numerous genome-wide association studies (GWAS) have explored the genetic etiology of lung cancer, there remains a large gap between the type of information that may be gleaned from an association study and the depth of understanding necessary to explain and drive translational findings. Thus, in this study we jointly model and integrate extensive multiomics data sources, utilizing a total of 40 genome-wide functional annotations that augment previously published results from the International Lung Cancer Consortium (ILCCO) GWAS, to prioritize and characterize single nucleotide polymorphisms (SNPs) that increase risk of squamous cell lung cancer through the inflammatory and immune responses. Our work bridges the gap between correlative analysis and translational follow-up research, refining GWAS association measures in an interpretable and systematic manner. In particular, reanalysis of the ILCCO data highlights the impact of highly associated SNPs from nuclear factor-κB signaling pathway genes as well as major histocompatibility complex mediated variation in immune responses. One consequence of prioritizing likely functional SNPs is the pruning of variants that might be selected for follow-up work by over an order of magnitude, from potentially tens of thousands to hundreds. The strategies we introduce provide informative and interpretable approaches for incorporating extensive genome-wide annotation data in analysis of genetic association studies.
Assuntos
Estudo de Associação Genômica Ampla , Neoplasias Pulmonares , Células Epiteliais , Predisposição Genética para Doença , Humanos , Inflamação/genética , Neoplasias Pulmonares/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Whole-genome sequencing (WGS) studies are being widely conducted in order to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set-based analyses are commonly used by researchers for analyzing rare variants. However, existing variant-set-based approaches need to pre-specify genetic regions for analysis; hence, they are not directly applicable to WGS data because of the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding-window method requires the pre-specification of fixed window sizes, which are often unknown as a priori, are difficult to specify in practice, and are subject to limitations given that the sizes of genetic-association regions are likely to vary across the genome and phenotypes. We propose a computationally efficient and dynamic scan-statistic method (Scan the Genome [SCANG]) for analyzing WGS data; this method flexibly detects the sizes and the locations of rare-variant association regions without the need to specify a prior, fixed window size. The proposed method controls for the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected sizes of rare-variant association regions to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative methods for detecting rare-variant-associations while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.
Assuntos
Algoritmos , Biologia Computacional/métodos , Variação Genética , Genoma Humano , Estudo de Associação Genômica Ampla , Sequenciamento Completo do Genoma/métodos , Humanos , Desequilíbrio de Ligação , Modelos GenéticosRESUMO
A growing number of studies clearly demonstrate a substantial link between metabolic dysfunction and the risk of Alzheimer's disease (AD), especially glucose-related dysfunction; one hypothesis for this comorbidity is the presence of a common genetic etiology. We conducted a large-scale cross-trait GWAS to investigate the genetic overlap between AD and ten metabolic traits. Among all the metabolic traits, fasting glucose, fasting insulin and HDL were found to be genetically associated with AD. Local genetic covariance analysis found that 19q13 region had strong local genetic correlation between AD and T2D (P = 6.78 × 10- 22), LDL (P = 1.74 × 10- 253) and HDL (P = 7.94 × 10- 18). Cross-trait meta-analysis identified 4 loci that were associated with AD and fasting glucose, 3 loci that were associated with AD and fasting insulin, and 20 loci that were associated with AD and HDL (Pmeta < 1.6 × 10- 8, single trait P < 0.05). Functional analysis revealed that the shared genes are enriched in amyloid metabolic process, lipoprotein remodeling and other related biological pathways; also in pancreas, liver, blood and other tissues. Our work identifies common genetic architectures shared between AD and fasting glucose, fasting insulin and HDL, and sheds light on molecular mechanisms underlying the association between metabolic dysregulation and AD.
Assuntos
Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Metabolismo Energético/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Característica Quantitativa Herdável , Doença de Alzheimer/metabolismo , Glicemia , Jejum , Estudos de Associação Genética , Humanos , Insulina/sangue , Desequilíbrio de Ligação , Redes e Vias Metabólicas , Fenótipo , Locos de Características QuantitativasRESUMO
BACKGROUND: A growing number of studies clearly demonstrate a substantial association between chronic obstructive pulmonary disease (COPD) and cardiovascular diseases (CVD), although little is known about the shared genetics that contribute to this association. METHODS: We conducted a large-scale cross-trait genome-wide association study to investigate genetic overlap between COPD (Ncase = 12,550, Ncontrol = 46,368) from the International COPD Genetics Consortium and four primary cardiac traits: resting heart rate (RHR) (N = 458,969), high blood pressure (HBP) (Ncase = 144,793, Ncontrol = 313,761), coronary artery disease (CAD)(Ncase = 60,801, Ncontrol = 123,504), and stroke (Ncase = 40,585, Ncontrol = 406,111) from UK Biobank, CARDIoGRAMplusC4D Consortium, and International Stroke Genetics Consortium data. RESULTS: RHR and HBP had modest genetic correlation, and CAD had borderline evidence with COPD at a genome-wide level. We found evidence of local genetic correlation with particular regions of the genome. Cross-trait meta-analysis of COPD identified 21 loci jointly associated with RHR, 22 loci with HBP, and 3 loci with CAD. Functional analysis revealed that shared genes were enriched in smoking-related pathways and in cardiovascular, nervous, and immune system tissues. An examination of smoking-related genetic variants identified SNPs located in 15q25.1 region associated with cigarettes per day, with effects on RHR and CAD. A Mendelian randomization analysis showed a significant positive causal effect of COPD on RHR (causal estimate = 0.1374, P = 0.008). CONCLUSION: In a set of large-scale GWAS, we identify evidence of shared genetics between COPD and cardiac traits.
Assuntos
Doenças Cardiovasculares/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Doença Pulmonar Obstrutiva Crônica/genética , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Bases de Dados Genéticas/tendências , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Humanos , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/epidemiologia , Característica Quantitativa HerdávelRESUMO
BACKGROUND: China and other developing countries in Asia follow similar economic growth patterns described by the flying geese (FG) model, which explains the "catching-up" process of industrialization in latecomer economies. Japan, newly industrialized economies, and China have followed this path, with similar economic development trajectories. Based on the FG model, we postulated a "flying S" hypothesis stating that if a country is located within an FG region and its energy matrix is relatively constant, its per capita CO2 emission curve will mirror that of "leading geese" countries in the same FG group. METHOD: Historical CO2 emissions data were obtained from literature review and national reports and were calculated using bottom-up methods. A sigmoid-shaped, non-linear mixed effect model was applied to examine ex post data with 1000 simulated predictions to construct 95% empirical bands from these fits. By multiplying by estimated population, we predicted total emissions of selected FG countries. RESULTS: Per capita CO2 emissions from the same FG group mirror each other, especially among second and third industrial sectors. We estimated an annual 18,252.24 million tons of CO2 emissions (MtCO2) (95% CIâ¯=â¯9458.88-23,972.88) in China and 8281.76 MtCO2 (95% CIâ¯=â¯2765.68-14,959.12) in India in 2030. CONCLUSION: This study bridges the macroeconomic FG paradigm to study climate change and proposes a "flying S" hypothesis to predict greenhouse gas emissions in East Asia. By applying our theory to empirical data, we provide an alternative framework to predict CO2 emissions in 2030 and beyond.
Assuntos
Dióxido de Carbono , Carbono , Ásia , China , Índia , JapãoRESUMO
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Assuntos
Redes Reguladoras de Genes , Variação Genética , Genoma Humano , Genômica , Estudos de Associação Genética , Loci Gênicos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MutaçãoRESUMO
The KRAS mutation is the most common oncogenic driver in patients with non-small cell lung cancer (NSCLC). However, a detailed understanding of how self-reported race and/or ethnicity (SIRE), genetically inferred ancestry (GIA), and their interaction affect KRAS mutation is largely unknown. Here, we investigated the associations between SIRE, quantitative GIA, and KRAS mutation and its allele-specific subtypes in a multi-ethnic cohort of 3,918 patients from the Boston Lung Cancer Survival cohort and the Chinese OrigiMed cohort with an independent validation cohort of 1,450 patients with NSCLC. This comprehensive analysis included detailed covariates such as age at diagnosis, sex, clinical stage, cancer histology, and smoking status. We report that SIRE is significantly associated with KRAS mutations, modified by sex, with SIRE-Asian patients showing lower rates of KRAS mutation, transversion substitution, and the allele-specific subtype KRASG12C compared to SIRE-White patients after adjusting for potential confounders. Moreover, GIA was found to correlate with KRAS mutations, where patients with a higher proportion of European ancestry had an increased risk of KRAS mutations, especially more transition substitutions and KRASG12D. Notably, among SIRE-White patients, an increase in European ancestry was linked to a higher likelihood of KRAS mutations, whereas an increase in admixed American ancestry was associated with a reduced likelihood, suggesting that quantitative GIA offers additional information beyond SIRE. The association of SIRE, GIA, and their interplay with KRAS driver mutations in NSCLC highlights the importance of incorporating both into population-based cancer research, aiming to refine clinical decision-making processes and mitigate health disparities.
Assuntos
Alelos , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Mutação , Proteínas Proto-Oncogênicas p21(ras) , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/etnologia , Carcinoma Pulmonar de Células não Pequenas/patologia , Proteínas Proto-Oncogênicas p21(ras)/genética , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/etnologia , Neoplasias Pulmonares/patologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Prevalência , Etnicidade/genética , Grupos Raciais/genética , Predisposição Genética para DoençaRESUMO
The role of rare non-coding variation in complex human phenotypes is still largely unknown. To elucidate the impact of rare variants in regulatory elements, we performed a whole-genome sequencing association analysis for height using 333,100 individuals from three datasets: UK Biobank (N = 200,003), TOPMed (N = 87,652) and All of Us (N = 45,445). We performed rare ( < 0.1% minor-allele-frequency) single-variant and aggregate testing of non-coding variants in regulatory regions based on proximal-regulatory, intergenic-regulatory and deep-intronic annotation. We observed 29 independent variants associated with height at P < 6 × 10 - 10 after conditioning on previously reported variants, with effect sizes ranging from -7cm to +4.7 cm. We also identified and replicated non-coding aggregate-based associations proximal to HMGA1 containing variants associated with a 5 cm taller height and of highly-conserved variants in MIR497HG on chromosome 17. We have developed an approach for identifying non-coding rare variants in regulatory regions with large effects from whole-genome sequencing data associated with complex traits.
Assuntos
Estatura , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma , Humanos , Estatura/genética , Masculino , Feminino , Frequência do Gene , Genoma Humano , Variação Genética , FenótipoRESUMO
Circulating metabolite levels may reflect the state of the human organism in health and disease, however, the genetic architecture of metabolites is not fully understood. We have performed a whole-genome sequencing association analysis of both common and rare variants in up to 11,840 multi-ethnic participants from five studies with up to 1666 circulating metabolites. We have discovered 1985 novel variant-metabolite associations, and validated 761 locus-metabolite associations reported previously. Seventy-nine novel variant-metabolite associations have been replicated, including three genetic loci located on the X chromosome that have demonstrated its involvement in metabolic regulation. Gene-based analysis have provided further support for seven metabolite-replicated loci pairs and their biologically plausible genes. Among those novel replicated variant-metabolite pairs, follow-up analyses have revealed that 26 metabolites have colocalized with 21 tissues, seven metabolite-disease outcome associations have been putatively causal, and 7 metabolites might be regulated by plasma protein levels. Our results have depicted the genetic contribution to circulating metabolite levels, providing additional insights into understanding human disease.
Assuntos
Etnicidade , Locos de Características Quantitativas , Humanos , Etnicidade/genética , Metaboloma/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
RESUMO
BACKGROUND: Individuals with type 2 diabetes (T2D) have an increased risk of coronary artery disease (CAD), but questions remain about the underlying pathology. Identifying which CAD loci are modified by T2D in the development of subclinical atherosclerosis (coronary artery calcification [CAC], carotid intima-media thickness, or carotid plaque) may improve our understanding of the mechanisms leading to the increased CAD in T2D. METHODS: We compared the common and rare variant associations of known CAD loci from the literature on CAC, carotid intima-media thickness, and carotid plaque in up to 29â 670 participants, including up to 24â 157 normoglycemic controls and 5513 T2D cases leveraging whole-genome sequencing data from the Trans-Omics for Precision Medicine program. We included first-order T2D interaction terms in each model to determine whether CAD loci were modified by T2D. The genetic main and interaction effects were assessed using a joint test to determine whether a CAD variant, or gene-based rare variant set, was associated with the respective subclinical atherosclerosis measures and then further determined whether these loci had a significant interaction test. RESULTS: Using a Bonferroni-corrected significance threshold of P<1.6×10-4, we identified 3 genes (ATP1B1, ARVCF, and LIPG) associated with CAC and 2 genes (ABCG8 and EIF2B2) associated with carotid intima-media thickness and carotid plaque, respectively, through gene-based rare variant set analysis. Both ATP1B1 and ARVCF also had significantly different associations for CAC in T2D cases versus controls. No significant interaction tests were identified through the candidate single-variant analysis. CONCLUSIONS: These results highlight T2D as an important modifier of rare variant associations in CAD loci with CAC.