ABSTRACT
Massive genetic compendiums such as the UK Biobank have become an invaluable resource for identifying genetic variants that are associated with complex diseases. Due to the difficulties of massive data collection, a common practice of these compendiums is to collect interval-censored data. One challenge in analyzing such data is the lack of methodology available for genetic association studies with interval-censored data. Genetic effects are difficult to detect because of their rare and weak nature, and often the time-to-event outcomes are transformed to binary phenotypes for access to more powerful signal detection approaches. However transforming the data to binary outcomes can result in loss of valuable information. To alleviate such challenges, this work develops methodology to associate genetic variant sets with multiple interval-censored outcomes. Testing sets of variants such as genes or pathways is a common approach in genetic association settings to lower the multiple testing burden, aggregate small effects, and improve interpretations of results. Instead of performing inference with only a single outcome, utilizing multiple outcomes can increase statistical power by aggregating information across multiple correlated phenotypes. Simulations show that the proposed strategy can offer significant power gains over a single outcome approach. We apply the proposed test to the investigation that motivated this study, a search for the genes that perturb risks of bone fractures and falls in the UK Biobank.
Subject(s)
Computer Simulation , Humans , Genetic Association Studies/methods , Models, Statistical , Phenotype , Genetic Variation , Fractures, Bone/genetics , FemaleABSTRACT
Set-based association tests are widely popular in genetic association settings for their ability to aggregate weak signals and reduce multiple testing burdens. In particular, a class of set-based tests including the Higher Criticism, Berk-Jones, and other statistics have recently been popularized for reaching a so-called detection boundary when signals are rare and weak. Such tests have been applied in two subtly different settings: (a) associating a genetic variant set with a single phenotype and (b) associating a single genetic variant with a phenotype set. A significant issue in practice is the choice of test, especially when deciding between innovated and generalized type methods for detection boundary tests. Conflicting guidance is present in the literature. This work describes how correlation structures generate marked differences in relative operating characteristics for settings (a) and (b). The implications for study design are significant. We also develop novel power bounds that facilitate the aforementioned calculations and allow for analysis of individual testing settings. In more concrete terms, our investigation is motivated by translational expression quantitative trait loci (eQTL) studies in lung cancer. These studies involve both testing for groups of variants associated with a single gene expression (multiple explanatory factors) and testing whether a single variant is associated with a group of gene expressions (multiple outcomes). Results are supported by a collection of simulation studies and illustrated through lung cancer eQTL examples.
ABSTRACT
The rapid acceleration of genetic data collection in biomedical settings has recently resulted in the rise of genetic compendiums filled with rich longitudinal disease data. One common feature of these data sets is their plethora of interval-censored outcomes. However, very few tools are available for the analysis of genetic data sets with interval-censored outcomes, and in particular, there is a lack of methodology available for set-based inference. Set-based inference is used to associate a gene, biological pathway, or other genetic construct with outcomes and is one of the most popular strategies in genetics research. This work develops three such tests for interval-censored settings beginning with a variance components test for interval-censored outcomes, the interval-censored sequence kernel association test (ICSKAT). We also provide the interval-censored version of the Burden test, and then we integrate ICSKAT and Burden to construct the interval censored sequence kernel association test-optimal (ICSKATO) combination. These tests unlock set-based analysis of interval-censored data sets with analogs of three highly popular set-based tools commonly applied to continuous and binary outcomes. Simulation studies illustrate the advantages of the developed methods over ad hoc alternatives, including protection of the type I error rate at very low levels and increased power. The proposed approaches are applied to the investigation that motivated this study, an examination of the genes associated with bone mineral density deficiency and fracture risk.