ABSTRACT
Genetic factors play a fundamental role in disease development. Studying the genetic association with clinical outcomes is critical for understanding disease biology and devising novel treatment targets. However, the frequencies of genetic variations are often low, making it difficult to examine the variants one-by-one. Moreover, the clinical outcomes are complex, including patients' survival time and other binary or continuous outcomes such as recurrences and lymph node count, and how to effectively analyze genetic association with these outcomes remains unclear. In this article, we proposed a structured test statistic for testing genetic association with mixed types of survival, binary, and continuous outcomes. The structured testing incorporates known biological information of variants while allowing for their heterogeneous effects and is a powerful strategy for analyzing infrequent genetic factors. Simulation studies show that the proposed test statistic has correct type I error and is highly effective in detecting significant genetic variants. We applied our approach to a uterine corpus endometrial carcinoma study and identified several genetic pathways associated with the clinical outcomes.
ABSTRACT
Despite interest in the joint modeling of multiple functional responses such as diffusion properties in neuroimaging, robust statistical methods appropriate for this task are lacking. To address this need, we propose a varying coefficient quantile regression model able to handle bivariate functional responses. Our work supports innovative insights into biomedical data by modeling the joint distribution of functional variables over their domains and across clinical covariates. We propose an estimation procedure based on the alternating direction method of multipliers and propagation separation algorithms to estimate varying coefficients using a B-spline basis and an $L_2$ smoothness penalty that encourages interpretability. A simulation study and an application to a real-world neurodevelopmental data set demonstrates the performance of our model and the insights provided by modeling functional fractional anisotropy and mean diffusivity jointly and their association with gestational age and sex.
Subject(s)
Algorithms , Diffusion Tensor Imaging , Humans , Diffusion Tensor Imaging/methods , Computer Simulation , NeuroimagingABSTRACT
We aimed to evaluate differences in dietary factors between young-onset (diagnosed at ages <50) and older-onset colorectal cancer (CRC). CRC patients diagnosed from 1998 to 2018 reported to the Puget Sound Surveillance, Epidemiology, and End Results registry were recruited using mail and telephone. Consented patients completed questionnaires assessing demographics, medical history, and CRC risk factors, including dietary factors. We used multi-variable logistic regression to calculate adjusted odds ratios (ORs) and 95% confidence intervals (CIs) comparing dietary intake in young-onset vs. older-onset CRC. Analyses included 1,087 young- and 2,554 older-onset CRC patients. Compared to older-onset CRC, young-onset CRC patients had lower intake of vegetables (OR for highest intake vs. lowest = 0.59 CI: 0.55, 0.64) and fruit (OR for highest intake vs. lowest = 0.94 CI: 0.88, 0.99) and higher intake of processed meat (OR for highest intake vs. lowest = 1.82 CI: 1.11, 2.99) and spicy food (OR for highest intake vs. lowest = 1.69 CI: 1.09, 2.61). There was no statistically significant difference between young- and older-onset CRC patients for red meat consumption. Dietary patterns differed between young- and older-onset CRC; young-onset CRC patients had lower intake of vegetables and fruit and higher intakes of processed meat and spicy food.
Subject(s)
Colorectal Neoplasms , Dietary Patterns , Humans , Fruit , Meat , Odds Ratio , Vegetables , Colorectal Neoplasms/epidemiology , Colorectal Neoplasms/etiologyABSTRACT
Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.
Subject(s)
Colorectal Neoplasms/epidemiology , Genetic Predisposition to Disease , Genome, Human/genetics , Risk Assessment , Aged , Asian People/genetics , Bayes Theorem , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Female , Genome-Wide Association Study , Humans , Male , Middle Aged , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , Risk FactorsABSTRACT
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from â¼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.
Subject(s)
Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Quantitative Trait Loci/genetics , Colorectal Neoplasms/pathology , Computational Biology , Gene Expression Regulation, Neoplastic/genetics , Genetic Variation/genetics , Genotype , Humans , Models, Statistical , Phenotype , Polymorphism, Single Nucleotide/geneticsABSTRACT
BACKGROUND: T cell receptors (TCRs) play critical roles in adaptive immune responses, and recent advances in genome technology have made it possible to examine the T cell receptor (TCR) repertoire at the individual sequence level. The analysis of the TCR repertoire with respect to clinical phenotypes can yield novel insights into the etiology and progression of immune-mediated diseases. However, methods for association analysis of the TCR repertoire have not been well developed. METHODS: We introduce an analysis tool, TCR-L, for evaluating the association between the TCR repertoire and disease outcomes. Our approach is developed under a mixed effect modeling, where the fixed effect represents features that can be explicitly extracted from TCR sequences while the random effect represents features that are hidden in TCR sequences and are difficult to be extracted. Statistical tests are developed to examine the two types of effects independently, and then the p values are combined. RESULTS: Simulation studies demonstrate that (1) the proposed approach can control the type I error well; and (2) the power of the proposed approach is greater than approaches that consider fixed effect only or random effect only. The analysis of real data from a skin cutaneous melanoma study identifies an association between the TCR repertoire and the short/long-term survival of patients. CONCLUSION: The TCR-L can accommodate features that can be extracted as well as features that are hidden in TCR sequences. TCR-L provides a powerful approach for identifying association between TCR repertoire and disease outcomes.
Subject(s)
Melanoma , Skin Neoplasms , Humans , Melanoma/genetics , Phenotype , Receptors, Antigen, T-CellABSTRACT
INTRODUCTION: We aimed to combine the fibrosis (FIB)-4 score and fibroscan-derived liver stiffness (LS) into a single score (FIB-5) that predicts incident complications of portal hypertension (PH) in persons with compensated liver disease. METHODS: In this retrospective cohort study, we identified 5849 US veterans who underwent LS measurement from May 01, 2014 to June 30, 2019, and laboratory tests enabling FIB-4 calculation within 6 months of LS measurement. Patients were followed up from the LS measurement date until February 05, 2020, for incident complications of PH. We combined LS values and the individual components of the FIB-4 score (i.e. age, aspartate aminotransferase, alanine aminotransferase, and platelet count) using multivariable Cox proportional hazards modeling and the machine learning algorithm eXtreme gradient boosting to develop the C-FIB-5 and X-FIB-5 models, respectively. Models were internally validated using optimism-corrected measures. RESULTS: Among 5,849 patients, the mean age was 62.8 years, 95.9% were men, and the mean follow-up time was 2.14 ± 1.21 years. Within 3 years after LS measurement date, 116 (2.0%) patients developed complications of PH. The X-FIB-5 (area under the receiver operating characteristic [AUROC] 0.845) and C-FIB-5 scores (AUROC 0.868) demonstrated superior discrimination over LS (AUROC 0.688) and FIB-4 (AUROC 0.672) for predicting incident complications of PH. Both the X-FIB-5 and C-FIB-5 models demonstrated higher classification accuracy across all sensitivity cutoffs when compared with LS or FIB-4 alone. DISCUSSION: We combined LS and the individual components of the FIB-4 into a single scoring system (FIB-5, www.fib5.net ), which can help identify patients with compensated liver disease at risk of developing complications of PH.
Subject(s)
Elasticity Imaging Techniques , Hypertension, Portal , Male , Humans , Middle Aged , Female , Liver Cirrhosis/complications , Liver Cirrhosis/diagnosis , Retrospective Studies , Hypertension, Portal/complications , Hypertension, Portal/diagnosis , Aspartate Aminotransferases , Liver/diagnostic imaging , Biomarkers , BiopsyABSTRACT
MOTIVATION: Cancer is a highly heterogeneous disease, and virtually all types of cancer have subtypes. Understanding the association between cancer subtypes and genetic variations is fundamental to the development of targeted therapies for patients. Somatic mutation plays important roles in tumor development and has emerged as a new type of genetic variations for studying the association with cancer subtypes. However, the low prevalence of individual mutations poses a tremendous challenge to the related statistical analysis. RESULTS: In this article, we propose an approach, subtype analysis with somatic mutations (SASOM), for the association analysis of cancer subtypes with somatic mutations. Our approach tests the association between a set of somatic mutations (from a genetic pathway) and subtypes, while incorporating functional information of the mutations into the analysis. We further propose a robust p-value combination procedure, DAPC, to synthesize statistical significance from different sources. Simulation studies show that the proposed approach has correct type I error and tends to be more powerful than possible alternative methods. In a real data application, we examine the somatic mutations from a cutaneous melanoma dataset, and identify a genetic pathway that is associated with immune-related subtypes. AVAILABILITY AND IMPLEMENTATION: The SASOM R package is available at https://github.com/rksyouyou/SASOM-pkg. R scripts and data are available at https://github.com/rksyouyou/SASOM-analysis. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
ABSTRACT
Pathway analysis, i.e., grouping analysis, has important applications in genomic studies. Existing pathway analysis approaches are mostly focused on a single response and are not suitable for analyzing complex diseases that are often related with multiple response variables. Although a handful of approaches have been developed for multiple responses, these methods are mainly designed for pathways with a moderate number of features. A multi-response pathway analysis approach that is able to conduct statistical inference when the dimension is potentially higher than sample size is introduced. Asymptotical properties of the test statistic are established and theoretical investigation of the statistical power is conducted. Simulation studies and real data analysis show that the proposed approach performs well in identifying important pathways that influence multiple expression quantitative trait loci (eQTL).
ABSTRACT
Genome-wide association studies (GWAS) of esophageal adenocarcinoma (EAC) and its precursor, Barrett's esophagus (BE), have uncovered significant genetic components of risk, but most heritability remains unexplained. Targeted assessment of genetic variation in biologically relevant pathways using novel analytical approaches may identify missed susceptibility signals. Central obesity, a key BE/EAC risk factor, is linked to systemic inflammation, altered hormonal signaling and insulin-like growth factor (IGF) axis dysfunction. Here, we assessed IGF-related genetic variation and risk of BE and EAC. Principal component analysis was employed to evaluate pathway-level and gene-level associations with BE/EAC, using genotypes for 270 single-nucleotide polymorphisms (SNPs) in or near 12 IGF-related genes, ascertained from 3295 BE cases, 2515 EAC cases and 3207 controls in the Barrett's and Esophageal Adenocarcinoma Consortium (BEACON) GWAS. Gene-level signals were assessed using Multi-marker Analysis of GenoMic Annotation (MAGMA) and SNP summary statistics from BEACON and an expanded GWAS meta-analysis (6167 BE cases, 4112 EAC cases, 17 159 controls). Global variation in the IGF pathway was associated with risk of BE (P = 0.0015). Gene-level associations with BE were observed for GHR (growth hormone receptor; P = 0.00046, false discovery rate q = 0.0056) and IGF1R (IGF1 receptor; P = 0.0090, q = 0.0542). These gene-level signals remained significant at q < 0.1 when assessed using data from the largest available BE/EAC GWAS meta-analysis. No significant associations were observed for EAC. This study represents the most comprehensive evaluation to date of inherited genetic variation in the IGF pathway and BE/EAC risk, providing novel evidence that variation in two genes encoding cell-surface receptors, GHR and IGF1R, may influence risk of BE.
Subject(s)
Adenocarcinoma/genetics , Barrett Esophagus/genetics , Biomarkers, Tumor/genetics , Esophageal Neoplasms/genetics , Somatomedins/metabolism , Adenocarcinoma/pathology , Aged , Barrett Esophagus/pathology , Biomarkers, Tumor/metabolism , Carrier Proteins/genetics , Carrier Proteins/metabolism , Esophageal Neoplasms/pathology , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Germ-Line Mutation , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , Receptor, IGF Type 1/genetics , Receptor, IGF Type 1/metabolism , Risk Factors , Signal Transduction/geneticsABSTRACT
[This corrects the article DOI: 10.1371/journal.pgen.1007746.].
ABSTRACT
Somatic mutations drive the growth of tumor cells and are pivotal biomarkers for many cancer treatments. Genetic association analysis using somatic mutations is an effective approach to study the functional impact of somatic mutations. However, standard regression methods are not appropriate for somatic mutation association studies because somatic mutation calls often have non-ignorable false positive rate and/or false negative rate. While large scale association analysis using somatic mutations becomes feasible recently-thanks for the improvement of sequencing techniques and the reduction of sequencing cost-there is an urgent need for a new statistical method designed for somatic mutation association analysis. We propose such a method with computationally efficient software implementation: Somatic mutation Association test with Measurement Errors (SAME). SAME accounts for somatic mutation calling uncertainty using a likelihood based approach. It can be used to assess the associations between continuous/dichotomous outcomes and individual mutations or gene-level mutations. Through simulation studies across a wide range of realistic scenarios, we show that SAME can significantly improve statistical power than the naive generalized linear model that ignores mutation calling uncertainty. Finally, using the data collected from The Cancer Genome Atlas (TCGA) project, we apply SAME to study the associations between somatic mutations and gene expression in 12 cancer types, as well as the associations between somatic mutations and colon cancer subtype defined by DNA methylation data. SAME recovered some interesting findings that were missed by the generalized linear model. In addition, we demonstrated that mutation-level and gene-level analyses are often more appropriate for oncogene and tumor-suppressor gene, respectively.
ABSTRACT
Bronchiolitis obliterans syndrome (BOS) after allogeneic hematopoietic cell transplantation (allo-HCT) is often diagnosed at a late stage when lung dysfunction is severe and irreversible. Identifying patients early after transplantation may offer improved strategies for early detection that could avert the morbidity and mortality of BOS. This study aimed to determine whether a decline in lung function before and early after (days +80 to +100) allo-HCT are associated with a risk of BOS beyond 6 months post-transplantation. In a single-center cohort of 2941 allo-HCT recipients, 186 (6%) met National Institutes of Health criteria for BOS. Pretransplantation and post-transplantation day +80 spirometric parameters were analyzed as continuous variables and included in a multivariable model with other factors, including donor source, graft source, conditioning regimen, use of total body irradiation, and immunoglobulin levels. Pre-transplantation forced expiratory flow between 25% and 75% of maximum (FEF25-75), day +80 forced expiratory volume in 1 second (FEV1), and day +80 FEF25-75 had the strongest associations with increased risk of BOS. Assessment of the multivariable model showed that a decline in day +80 FEF25-75 added additional risk to the day +80 FEV1 model (P = .03), whereas FEV1 decline at day +80 added no additional risk to the day +80 FEF25-75 model (P = .645). Moreover, day +80 FEF25-75 conferred additional risk when considered with pretransplantation FEF25-75. These results suggest that day +80 FEF25-75 may be more important than FEV1 in predicting the development of BOS. This study highlights the importance of obtaining early post-transplantation pulmonary function tests for the potential risk stratification of patients at risk for BOS.
Subject(s)
Bronchiolitis Obliterans , Hematopoietic Stem Cell Transplantation , Lung Transplantation , Bronchiolitis Obliterans/diagnosis , Bronchiolitis Obliterans/etiology , Forced Expiratory Volume , Hematopoietic Stem Cell Transplantation/adverse effects , Humans , Retrospective Studies , SpirometryABSTRACT
Azithromycin exposure during the early phase of allogeneic hematopoietic cell transplantation (HCT) has been associated with an increased incidence of hematologic relapse. We assessed the impact of azithromycin exposure on the occurrence of relapse or new subsequent neoplasm (SN) in patients with bronchiolitis obliterans syndrome (BOS) after HCT who are commonly treated with azithromycin alone or in combination with other agents. In a retrospective study of patients with BOS from 2 large allograft centers, the effect of azithromycin exposure on the risk of relapse or SN was estimated from a Cox model with a time-dependent variable for treatment initiation. The Cox model was adjusted on time-fixed covariates measured at cohort entry, selected for their potential prognostic value. Similar models were used to assess the exposure effect on the cause-specific hazard of relapse, SN, and death free of those events. Sensitivity analyses were performed using propensity score matching. Among 316 patients, 227 (71.8%) were exposed to azithromycin after BOS diagnosis. The corresponding adjusted hazard ratio (HR) in patients exposed to azithromycin versus unexposed was 1.51 (95% confidence interval [CI], 0.90 to 2.55) for relapse or SN, 0.82 (95% CI, 0.37 to 1.83) for relapse, and 2.00 (95% CI, 1.01 to 3.99) for SN. Patients exposed to azithromycin had a significantly lower cause-specific hazard of death free of neoplasm and relapse (adjusted HR, 0.54; 95% CI, 0.34 to 0.89). In conclusion, exposure to azithromycin after BOS after HCT was associated with an increased risk of SN but not relapse.
Subject(s)
Bronchiolitis Obliterans , Hematopoietic Stem Cell Transplantation , Lung Transplantation , Neoplasms , Azithromycin/adverse effects , Bronchiolitis Obliterans/etiology , Hematopoietic Stem Cell Transplantation/adverse effects , Humans , Neoplasms/therapy , Retrospective Studies , Transplantation, HomologousABSTRACT
Background and aim: Alpha-momorcharin (α-MMC) is a type I ribosome-inactivating protein (RIP) that is purified from Momordica charantia. Despite its strong antitumor activities, α-MMC exerts the undesirable immunotoxicity effects of hypersensitivity or immunosuppression. Since α-MMC is a plant protein, its application in vivo can easily induce hypersensitivity, but its immunosuppressive mechanism is still unclear. Materials and methods: The toxicity of α-MMC to peripheral blood cells and the cytokine expression in peripheral blood mononuclear cells (PBMCs) and spleen immune cells were measured in rats. For further confirmation, experiments were performed in vitro with the mononuclear cell line THP-1, B lymphocyte cell line WIL2-S and T lymphocyte cell line Jurkat. Results: High doses of α-MMC (3.0 mg/kg) resulted in weight loss in rats, a decreased percentage of monocytes, and increased percentages of eosinophils and basophils. Both high-dose and low-dose (1.0 mg/kg) α-MMC inhibited cytokine expression in PBMCs and increased cytokine expression in spleen T cells. In in vitro, α-MMC mainly acted on THP-1 cells, with effects including high dose-induced apoptosis and low dose-induced regulation of inhibitory cytokine expression. Conclusions: The action of α-MMC on immune cells mainly affects monocytes, thereby eliciting its immunosuppressive effect. Its mode of action is to guide functional immunosuppressive regulation at low doses and induce apoptosis at high doses. As the monocytes would be recruited into tumor tissues and are polarized into tumor-associated macrophages, the selective cytotoxicity and cytokine release regulation of α-MMC in monocytes may be an important mechanism of its antitumor effects.
Subject(s)
Apoptosis/drug effects , Cytokines/immunology , Gene Expression Regulation/drug effects , Monocytes/immunology , Ribosome Inactivating Proteins/pharmacology , Animals , Apoptosis/immunology , Dose-Response Relationship, Drug , Female , Gene Expression Regulation/immunology , Humans , Jurkat Cells , Monocytes/pathology , Rats , Rats, Sprague-Dawley , THP-1 CellsABSTRACT
Somatic mutations are the driving forces for tumor development, and recent advances in cancer genome sequencing have made it feasible to evaluate the association between somatic mutations and cancer-related traits in large sample sizes. However, despite increasingly large sample sizes, it remains challenging to conduct statistical analysis for somatic mutations, because the vast majority of somatic mutations occur at very low frequencies. Furthermore, cancer is a complex disease and it is often accompanied by multiple traits that reflect various aspects of cancer; how to combine the information of these traits to identify important somatic mutations poses additional challenges. In this article, we introduce a statistical approach, named as SOMAT, for detecting somatic mutations associated with multiple cancer-related traits. Our approach provides a flexible framework for analyzing continuous, binary, or a mixture of both types of traits, and is statistically powerful and computationally efficient. In addition, we propose a data-adaptive procedure, which is grid-search free, for effectively combining test statistics to enhance statistical power. We conduct an extensive study and show that the proposed approach maintains correct type I error and is more powerful than existing approaches under the scenarios considered. We also apply our approach to an exome-sequencing study of liver tumor for illustration.
Subject(s)
Genome-Wide Association Study , Models, Statistical , Multivariate Analysis , Mutation , Humans , Liver Neoplasms/genetics , Neoplasms/genetics , Exome SequencingABSTRACT
OBJECTIVE: Oesophageal adenocarcinoma (OA) incidence has risen sharply in Western countries over recent decades. Local and systemic inflammation is considered an important contributor to OA pathogenesis. Established risk factors for OA and its precursor, Barrett's oesophagus (BE), include symptomatic reflux, obesity and smoking. The role of inherited genetic susceptibility remains an area of active investigation. Here, we explore whether germline variation related to inflammatory processes influences susceptibility to BE/OA. DESIGN: We used data from a genomewide association study of 2515 OA cases, 3295 BE cases and 3207 controls. Our analysis included 7863 single-nucleotide polymorphisms (SNPs) in 449 genes assigned to five pathways: cyclooxygenase (COX), cytokine signalling, oxidative stress, human leucocyte antigen and nuclear factor-κB. A principal components-based analytic framework was employed to evaluate pathway-level and gene-level associations with disease risk. RESULTS: We identified a significant signal for the COX pathway in relation to BE risk (p=0.0059, false discovery rate q=0.03), and in gene-level analyses found an association with microsomal glutathione-S-transferase 1 (MGST1); (p=0.0005, q=0.005). Assessment of 36 MGST1 SNPs identified 14 variants associated with elevated BE risk (q<0.05). Four of these were subsequently confirmed (p<5.5×10-5) in a meta-analysis encompassing an independent set of 1851 BE cases and 3496 controls, and are known strong expression quantitative trait loci for MGST1. Three such variants were associated with similar elevations in OA risk. CONCLUSIONS: This study provides the most comprehensive evaluation of inflammation-related germline variation in relation to risk of BE/OA and suggests that variants in MGST1 influence disease susceptibility.
Subject(s)
Adenocarcinoma/genetics , Barrett Esophagus/genetics , Esophageal Neoplasms/genetics , Germ-Line Mutation , Glutathione Transferase/genetics , Aged , Cytokines/metabolism , Female , Gene-Environment Interaction , Genetic Predisposition to Disease , Genome-Wide Association Study , HLA Antigens/metabolism , Humans , Inflammation/genetics , Male , Middle Aged , NF-kappa B/metabolism , Oxidative Stress , Polymorphism, Single Nucleotide , Principal Component Analysis , Prostaglandin-Endoperoxide Synthases/metabolism , Risk Factors , Signal Transduction/geneticsABSTRACT
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.