ABSTRACT
Multiple sclerosis (MS) is an autoimmune disorder where T cells attack neurons in the central nervous system (CNS) leading to demyelination and neurological deficits. A driver of increased MS risk is the soluble form of the interleukin-7 receptor alpha chain gene (sIL7R) produced by alternative splicing of IL7R exon 6. Here, we identified the RNA helicase DDX39B as a potent activator of this exon and consequently a repressor of sIL7R, and we found strong genetic association of DDX39B with MS risk. Indeed, we showed that a genetic variant in the 5' UTR of DDX39B reduces translation of DDX39B mRNAs and increases MS risk. Importantly, this DDX39B variant showed strong genetic and functional epistasis with allelic variants in IL7R exon 6. This study establishes the occurrence of biological epistasis in humans and provides mechanistic insight into the regulation of IL7R exon 6 splicing and its impact on MS risk.
Subject(s)
DEAD-box RNA Helicases/metabolism , Epistasis, Genetic , Interleukin-7 Receptor alpha Subunit/genetics , RNA Splicing , DEAD-box RNA Helicases/genetics , Exons , HeLa Cells , Humans , Multiple Sclerosis/genetics , Protein Biosynthesis , RNA, Small Interfering/metabolism , T-Lymphocytes/immunologyABSTRACT
Since the identification of sickle cell trait as a heritable form of resistance to malaria, candidate gene studies, linkage analysis paired with sequencing, and genome-wide association (GWA) studies have revealed many examples of genetic resistance and susceptibility to infectious diseases. GWA studies enabled the identification of many common variants associated with small shifts in susceptibility to infectious diseases. This is exemplified by multiple loci associated with leprosy, malaria, HIV, tuberculosis, and coronavirus disease 2019 (COVID-19), which illuminate genetic architecture and implicate pathways underlying pathophysiology. Despite these successes, most of the heritability of infectious diseases remains to be explained. As the field advances, current limitations may be overcome by applying methodological innovations such as cellular GWA studies and phenome-wide association (PheWA) studies as well as by improving methodological rigor with more precise case definitions, deeper phenotyping, increased cohort diversity, and functional validation of candidate loci in the laboratory or human challenge studies.
Subject(s)
COVID-19 , Communicable Diseases , Humans , Genome-Wide Association Study , COVID-19/genetics , Communicable Diseases/genetics , Human GeneticsABSTRACT
Variability in quantitative traits has clinical, ecological, and evolutionary significance. Most genetic variants identified for complex quantitative traits have only a detectable effect on the mean of trait. We have developed the mean-variance test (MVtest) to simultaneously model the mean and log-variance of a quantitative trait as functions of genotypes and covariates by using estimating equations. The advantages of MVtest include the facts that it can detect effect modification, that multiple testing can follow conventional thresholds, that it is robust to non-normal outcomes, and that association statistics can be meta-analyzed. In simulations, we show control of type I error of MVtest over several alternatives. We identified 51 and 37 previously unreported associations for effects on blood-pressure variance and mean, respectively, in the UK Biobank. Transcriptome-wide association studies revealed 633 significant unique gene associations with blood-pressure mean variance. MVtest is broadly applicable to studies of complex quantitative traits and provides an important opportunity to detect novel loci.
Subject(s)
Blood Pressure , Genome-Wide Association Study , Quantitative Trait Loci , Humans , Blood Pressure/genetics , Polymorphism, Single Nucleotide , Models, Genetic , Genotype , Genetic Variation , Computer Simulation , PhenotypeABSTRACT
Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.
Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Computer Simulation , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to DiseaseABSTRACT
Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Subject(s)
Whole Genome Sequencing , Humans , Whole Genome Sequencing/methods , Alzheimer Disease/genetics , Genome-Wide Association Study/methods , Autism Spectrum Disorder/genetics , Genetic Variation , Software , Chromatin/genetics , Chromatin/metabolism , Genome, HumanABSTRACT
Genome-wide Association Studies (GWAS) methods have identified individual single-nucleotide polymorphisms (SNPs) significantly associated with specific phenotypes. Nonetheless, many complex diseases are polygenic and are controlled by multiple genetic variants that are usually non-linearly dependent. These genetic variants are marginally less effective and remain undetected in GWAS analysis. Kernel-based tests (KBT), which evaluate the joint effect of a group of genetic variants, are therefore critical for complex disease analysis. However, choosing different kernel functions in KBT can significantly influence the type I error control and power, and selecting the optimal kernel remains a statistically challenging task. A few existing methods suffer from inflated type 1 errors, limited scalability, inferior power or issues of ambiguous conclusions. Here, we present a new Bayesian framework, BayesKAT (https://github.com/wangjr03/BayesKAT), which overcomes these kernel specification issues by selecting the optimal composite kernel adaptively from the data while testing genetic associations simultaneously. Furthermore, BayesKAT implements a scalable computational strategy to boost its applicability, especially for high-dimensional cases where other methods become less effective. Based on a series of performance comparisons using both simulated and real large-scale genetics data, BayesKAT outperforms the available methods in detecting complex group-level associations and controlling type I errors simultaneously. Applied on a variety of groups of functionally related genetic variants based on biological pathways, co-expression gene modules and protein complexes, BayesKAT deciphers the complex genetic basis and provides mechanistic insights into human diseases.
Subject(s)
Bayes Theorem , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Genetic Predisposition to Disease , Algorithms , Software , Computational Biology/methods , Genetic Association Studies/methodsABSTRACT
Advances in proteomic assay technologies have significantly increased coverage and throughput, enabling recent increases in the number of large-scale population-based proteomic studies of human plasma and serum. Improvements in multiplexed protein assays have facilitated the quantification of thousands of proteins over a large dynamic range, a key requirement for detecting the lowest-ranging, and potentially the most disease-relevant, blood-circulating proteins. In this perspective, we examine how populational proteomic datasets in conjunction with other concurrent omic measures can be leveraged to better understand the genomic and non-genomic correlates of the soluble proteome, constructing biomarker panels for disease prediction, among others. Mass spectrometry workflows are discussed as they are becoming increasingly competitive with affinity-based array platforms in terms of speed, cost, and proteome coverage due to advances in both instrumentation and workflows. Despite much success, there remain considerable challenges such as orthogonal validation and absolute quantification. We also highlight emergent challenges associated with study design, analytical considerations, and data integration as population-scale studies are run in batches and may involve longitudinal samples collated over many years. Lastly, we take a look at the future of what the nascent next-generation proteomic technologies might provide to the analysis of large sets of blood samples, as well as the difficulties in designing large-scale studies that will likely require participation from multiple and complex funding sources and where data sharing, study designs, and financing must be solved.
Subject(s)
Proteomics , Humans , Biomarkers/blood , Mass Spectrometry/methods , Proteome/metabolism , Proteomics/methodsABSTRACT
Persistent opioid use after surgery is a common morbidity outcome associated with subsequent opioid use disorder, overdose, and death. While phenotypic associations have been described, genetic associations remain unidentified. Here, we conducted the largest genetic study of persistent opioid use after surgery, comprising ~40,000 non-Hispanic, European-ancestry Michigan Genomics Initiative participants (3198 cases and 36,321 surgically exposed controls). Our study primarily focused on the reproducibility and reliability of 72 genetic studies of opioid use disorder phenotypes. Nominal associations (p < 0.05) occurred at 12 of 80 unique (r2 < 0.8) signals from these studies. Six occurred in OPRM1 (most significant: rs79704991-T, OR = 1.17, p = 8.7 × 10-5), with two surviving multiple testing correction. Other associations were rs640561-LRRIQ3 (p = 0.015), rs4680-COMT (p = 0.016), rs9478495 (p = 0.017, intergenic), rs10886472-GRK5 (p = 0.028), rs9291211-SLC30A9/BEND4 (p = 0.043), and rs112068658-KCNN1 (p = 0.048). Two highly referenced genes, OPRD1 and DRD2/ANKK1, had no signals in MGI. Associations at previously identified OPRM1 variants suggest common biology between persistent opioid use and opioid use disorder, further demonstrating connections between opioid dependence and addiction phenotypes. Lack of significant associations at other variants challenges previous studies' reliability.
ABSTRACT
Transcriptome-wide association studies and colocalization analysis are popular computational approaches for integrating genetic-association data from molecular and complex traits. They show the unique ability to go beyond variant-level genetic-association evidence and implicate critical functional units, e.g., genes, in disease etiology. However, in practice, when the two approaches are applied to the same molecular and complex-trait data, the inference results can be markedly different. This paper systematically investigates the inferential reproducibility between the two approaches through theoretical derivation, numerical experiments, and analyses of four complex trait GWAS and GTEx eQTL data. We identify two classes of inconsistent inference results. We find that the first class of inconsistent results (i.e., genes with strong colocalization but weak transcriptome-wide association study [TWAS] signals) might suggest an interesting biological phenomenon, i.e., horizontal pleiotropy; thus, the two approaches are truly complementary. The inconsistency in the second class (i.e., genes with weak colocalization but strong TWAS signals) can be understood and effectively reconciled. To this end, we propose a computational approach for locus-level colocalization analysis. We demonstrate that the joint TWAS and locus-level colocalization analysis improves specificity and sensitivity for implicating biologically relevant genes.
Subject(s)
Genome-Wide Association Study , Transcriptome , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Reproducibility of Results , Transcriptome/geneticsABSTRACT
Identifying genotype-by-environment interaction (GEI) is challenging because the GEI analysis generally has low power. Large-scale consortium-based studies are ultimately needed to achieve adequate power for identifying GEI. We introduce Multi-Trait Analysis of Gene-Environment Interactions (MTAGEI), a powerful, robust, and computationally efficient framework to test gene-environment interactions on multiple traits in large data sets, such as the UK Biobank (UKB). To facilitate the meta-analysis of GEI studies in a consortium, MTAGEI efficiently generates summary statistics of genetic associations for multiple traits under different environmental conditions and integrates the summary statistics for GEI analysis. MTAGEI enhances the power of GEI analysis by aggregating GEI signals across multiple traits and variants that would otherwise be difficult to detect individually. MTAGEI achieves robustness by combining complementary tests under a wide spectrum of genetic architectures. We demonstrate the advantages of MTAGEI over existing single-trait-based GEI tests through extensive simulation studies and the analysis of the whole exome sequencing data from the UKB.
Subject(s)
Gene-Environment Interaction , Genome-Wide Association Study , Humans , Phenotype , Computer SimulationABSTRACT
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Subject(s)
Genetic Variation , Genome-Wide Association Study , Humans , Phenotype , Case-Control Studies , Models, GeneticABSTRACT
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Subject(s)
Models, Genetic , Polymorphism, Single Nucleotide , Haplotypes , Bayes Theorem , Phenotype , Genome-Wide Association StudyABSTRACT
BACKGROUND: Adolescent idiopathic scoliosis (AIS), the predominant genetic-influenced scoliosis, results in spinal deformities without vertebral malformations. However, the molecular aetiology of AIS remains unclear. METHODS: Using genome/exome sequencing, we studied 368 patients with severe AIS (Cobb angle >40°) and 3794 controls from a Han Chinese cohort. We performed gene-based and pathway-based weighted rare variant association tests to assess the mutational burden of genes and established biological pathways. Differential expression analysis of muscle tissues from 14 patients with AIS and 15 controls was served for validation. RESULTS: SLC16A8, a lactate transporter linked to retinal glucose metabolism, was identified as a novel severe AIS-associated gene (p=3.08E-06, false discovery rate=0.009). Most AIS cases with deleterious SLC16A8 variants demonstrated early onset high myopia preceding scoliosis. Pathway-based burden test also revealed a significant enrichment in multiple carbohydrate metabolism pathways, especially galactose metabolism. Patients with deleterious variants in these genes demonstrated a significantly larger spinal curve. Genes related to catabolic processes and nutrient response showed divergent expression between AIS cases and controls, reinforcing our genomic findings. CONCLUSION: This study uncovers the pivotal role of genetic variants in carbohydrate metabolism in the development of AIS, unveiling new insights into its aetiology and potential treatment.
Subject(s)
Carbohydrate Metabolism , Scoliosis , Humans , Scoliosis/genetics , Scoliosis/pathology , Adolescent , Female , Male , Carbohydrate Metabolism/genetics , Genetic Predisposition to Disease , Child , Exome Sequencing , Monocarboxylic Acid Transporters/genetics , Case-Control Studies , Genetic Association Studies , MutationABSTRACT
BACKGROUND: Colorectal cancer (CRC) is the third highest incidence cancer and is the leading cause of cancer mortality worldwide. Metastasis to distal organ is the major cause of cancer mortality. However, the underlying genetic factors are unclear. This study aimed to identify metastasis-relevant genes and pathways for better management of metastasis-prone patients. METHODS: A case-case genome-wide association study comprising 2677 sporadic Chinese CRC cases (1282 metastasis-positive vs 1395 metastasis-negative) was performed using the Human SNP6 microarray platform and analysed with the correlation/trend test based on the additive model. SNP variants with association testing -log10 p value ≥5 were imported into Functional Mapping and Annotation (FUMA) for functional annotation. RESULTS: Glycolysis was uncovered as the top hallmark gene set. Transcripts from two of the five genes profiled, hematopoietic substrate 1 associated protein X 1 (HAX1) and hyaluronan-mediatedmotility receptor (HMMR), were significantly upregulated in the metastasis-positive tumours. In contrast to disease-risk variants, HAX1 appeared to act synergistically with HMMR in significantly impacting metastasis-free survival. Examining the subtype datasets with FUMA and Ingenuity Pathway Analysis (IPA) identified distinct pathways demonstrating sexual dimorphism in CRC metastasis. CONCLUSIONS: Combining genome-wide association testing with in silico functional annotation and wet-bench validation identified metastasis-relevant genes that could serve as features to develop subtype-specific metastasis-risk signatures for tailored management of patients with stage I-III CRC.
Subject(s)
Colorectal Neoplasms , Genome-Wide Association Study , Humans , Genetic Predisposition to Disease , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Genes, Neoplasm , Polymorphism, Single Nucleotide/genetics , Adaptor Proteins, Signal Transducing/geneticsABSTRACT
Populations of non-European ancestry are substantially underrepresented in genome-wide association studies (GWAS). As genetic effects can differ between ancestries due to possibly different causal variants or linkage disequilibrium patterns, a meta-analysis that includes GWAS of all populations yields biased estimation in each of the populations and the bias disproportionately impacts non-European ancestry populations. This is because meta-analysis combines study-specific estimates with inverse variance as the weights, which causes biases towards studies with the largest sample size, typical of the European ancestry population. In this paper, we propose two empirical Bayes (EB) estimators to borrow the strength of information across populations although accounting for between-population heterogeneity. Extensive simulation studies show that the proposed EB estimators are largely unbiased and improve efficiency compared to the population-specific estimator. In contrast, even though the meta-analysis estimator has a much smaller variance, it yields significant bias when the genetic effect is heterogeneous across populations. We apply the proposed EB estimators to a large-scale trans-ancestry GWAS of stroke and demonstrate that the EB estimators reduce the variance of the population-specific estimator substantially, with the effect estimates close to the population-specific estimates.
Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Bayes Theorem , Computer Simulation , Linkage DisequilibriumABSTRACT
In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.
Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Genome-Wide Association Study/methods , Phenotype , Case-Control Studies , Polymorphism, Single NucleotideABSTRACT
BACKGROUND: Both genetic factors and environmental air pollution contribute to the risk of stroke. However, it is unknown whether the association between air pollution and stroke risk is influenced by the genetic susceptibilities of stroke and its risk factors. METHODS: This prospective cohort study included 40â 827 Chinese adults without stroke history. Satellite-based monthly fine particulate matter (PM2.5) estimation at 1-km resolution was used for exposure assessment. Based on 534 identified genetic variants from genome-wide association studies in East Asians, we constructed 6 polygenic risk scores for stroke and its risk factors, including atrial fibrillation, blood pressure, type 2 diabetes, body mass index, and triglyceride. The Cox proportional hazards model was applied to evaluate the hazard ratios and 95% CIs for the associations of PM2.5 and polygenic risk score with incident stroke and the potential effect modifications. RESULTS: Over a median follow-up of 12.06 years, 3147 incident stroke cases were documented. Compared with the lowest quartile of PM2.5 exposure, the hazard ratio (95% CI) for stroke in the highest quartile group was 2.72 (2.42-3.06). Among individuals at high genetic risk, the relative risk of stroke was 57% (1.57; 1.40-1.76) higher than those at low genetic risk. Although no statistically significant interaction was found, participants with both the highest PM2.5 and high genetic risk showed the highest risk of stroke, with ≈4× that of the lowest PM2.5 and low genetic risk group (hazard ratio, 3.55 [95% CI, 2.84-4.44]). Similar upward gradients were observed in the risk of stroke when assessing the joint effects of PM2.5 and genetic risks of blood pressure, type 2 diabetes, body mass index, atrial fibrillation, and triglyceride. CONCLUSIONS: Long-term exposure to PM2.5 was associated with a higher risk of incident stroke across different genetic susceptibilities. Our findings highlighted the great importance of comprehensive assessment of air pollution and genetic risk in the prevention of stroke.
Subject(s)
Air Pollutants , Air Pollution , Atrial Fibrillation , Diabetes Mellitus, Type 2 , Stroke , Adult , Humans , Particulate Matter/adverse effects , Particulate Matter/analysis , Prospective Studies , Atrial Fibrillation/complications , Genome-Wide Association Study , Environmental Exposure/adverse effects , Incidence , Stroke/epidemiology , Stroke/genetics , Stroke/chemically induced , Air Pollution/adverse effects , Risk Factors , Genetic Predisposition to Disease , Triglycerides , Air Pollutants/adverse effectsABSTRACT
BACKGROUND: An increasing number of monogenic conditions underlying stroke are being identified. We explored the possibilities of increasing the diagnostic yield of monogenic stroke in a population under 56 years of age. METHODS: Fifty probands ≤55 years at their first stroke episode were characterized clinically and investigated by whole genome sequencing. Probands had one or more of: (1) one or more first to second degree relatives with stroke under 60 years or same stroke-causing condition/disease; (2) no hypertension, hypercholesterolemia, diabetes, heart disease, or smoking; or (3) either multiple stroke episodes or multiple arterial dissections. Variants with minor allele frequency under 0.01, identified by using our stroke gene panels, were assessed. The stroke subtypes, including large artery atherosclerotic, large artery nonatherosclerotic (tortuosity, dolichoectasia, aneurysm, nonatherosclerotic dissection, or occlusion), cerebral small vessel disease, cardioembolic (arrhythmia, heart defect, or cardiomyopathy), coagulation dysfunctions (venous thrombosis, arterial thrombosis, or bleeding tendency), intracerebral hemorrhage, vascular malformations (cavernoma or arteriovenous malformations), metabolic disorders, or cryptogenic embolic, were used for genotype-phenotype correlation. In a final step, we combined genetic and clinical information to determine if the genetic variant likely was the cause of stroke in the patients. RESULTS: Whole genome sequencing of younger patients with stroke identified 17 clinically matching genetic variants in 15 of 50 (30%) patients, while a stronger clinical correlation with stroke was established in only 6 (12%) of them. Stroke-related genetic variants were identified in 4 of 5 (80%) patients with cardioembolic stroke subtype, 3 of 4 (75%) with intracerebral hemorrhage, 7 of 18 (39%) with cryptogenic embolic stroke, 1 of 6 (17%) with small vessel disease, and 3 of 15 (20%) of patients with nonatherosclerotic large artery stroke, including 1 of 11 (9%) with cervical dissection stroke. CONCLUSIONS: Careful clinical interpretation of whole genome data using stroke gene panels can detect monogenic causes of early stroke, allowing individualized follow-up and opening new possibilities for potential treatment.
ABSTRACT
BACKGROUND: Chronic Obstructive Pulmonary Disease (COPD) describes a group of progressive lung diseases causing breathing difficulties. While COPD development typically involves a complex interplay between genetic and environmental factors, genetics play a role in disease susceptibility. This study used genome-wide association studies (GWAS) and polygenic risk score (PRS) to elucidate the genetic basis for COPD in Taiwanese patients. RESULTS: GWAS was performed on a Taiwanese COPD case-control cohort with a sample size of 5,442 cases and 17,681 controls. Additionally, the PRS was calculated and assessed in our target groups. GWAS results indicate that although there were no single nucleotide polymorphisms (SNPs) of genome-wide significance, prominent COPD susceptibility loci on or nearby genes such as WWTR1, EXT1, INTU, MAP3K7CL, MAMDC2, BZW1/CLK1, LINC01197, LINC01894, and CFAP95 (C9orf135) were identified, which had not been reported in previous studies. Thirteen susceptibility loci, such as CHRNA4, AFAP1, and DTWD1, previously reported in other populations were replicated and confirmed to be associated with COPD in Taiwanese populations. The PRS was determined in the target groups using the summary statistics from our base group, yielding an effective association with COPD (odds ratio [OR] 1.09, 95% confidence interval [CI] 1.02-1.17, p = 0.011). Furthermore, replication a previous lung function trait PRS model in our target group, showed a significant association of COPD susceptibility with PRS of Forced Expiratory Volume in one second (FEV1)/Forced Vital Capacity (FCV) (OR 0.89, 95% CI 0.83-0.95, p = 0.001). CONCLUSIONS: Novel COPD-related genes were identified in the studied Taiwanese population. The PRS model, based on COPD or lung function traits, enables disease risk estimation and enhances prediction before suffering. These results offer new perspectives on the genetics of COPD and serve as a basis for future research.
Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Pulmonary Disease, Chronic Obstructive , Pulmonary Disease, Chronic Obstructive/genetics , Humans , Taiwan , Male , Female , Aged , Multifactorial Inheritance , Case-Control Studies , Middle Aged , Risk Factors , Genetic Loci , Asian People/genetics , Genetic Risk ScoreABSTRACT
Important factors contribute to a gained momentum in candidate gene association studies (CGASs), including the generalized use of next-generation sequencing (NGS), growing opportunities for hospital-based research, and the availability of open-source databases and bioinformatics tools. This article summarizes the general principles and analytical methods as a guide to CGASs in today's favorable context.