RESUMO
Biobanks linked to massive, longitudinal electronic health record (EHR) data make numerous new genetic research questions feasible. One among these is the study of biomarker trajectories. For example, high blood pressure measurements over visits strongly predict stroke onset, and consistently high fasting glucose and Hb1Ac levels define diabetes. Recent research reveals that not only the mean level of biomarker trajectories but also their fluctuations, or within-subject (WS) variability, are risk factors for many diseases. Glycemic variation, for instance, is recently considered an important clinical metric in diabetes management. It is crucial to identify the genetic factors that shift the mean or alter the WS variability of a biomarker trajectory. Compared to traditional cross-sectional studies, trajectory analysis utilizes more data points and captures a complete picture of the impact of time-varying factors, including medication history and lifestyle. Currently, there are no efficient tools for genome-wide association studies (GWASs) of biomarker trajectories at the biobank scale, even for just mean effects. We propose TrajGWAS, a linear mixed effect model-based method for testing genetic effects that shift the mean or alter the WS variability of a biomarker trajectory. It is scalable to biobank data with 100,000 to 1,000,000 individuals and many longitudinal measurements and robust to distributional assumptions. Simulation studies corroborate that TrajGWAS controls the type I error rate and is powerful. Analysis of eleven biomarkers measured longitudinally and extracted from UK Biobank primary care data for more than 150,000 participants with 1,800,000 observations reveals loci that significantly alter the mean or WS variability.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Biomarcadores , Estudos Transversais , Registros Eletrônicos de Saúde , Humanos , Estudos LongitudinaisRESUMO
The availability of vast amounts of longitudinal data from electronic health records (EHRs) and personal wearable devices opens the door to numerous new research questions. In many studies, individual variability of a longitudinal outcome is as important as the mean. Blood pressure fluctuations, glycemic variations, and mood swings are prime examples where it is critical to identify factors that affect the within-individual variability. We propose a scalable method, within-subject variance estimator by robust regression (WiSER), for the estimation and inference of the effects of both time-varying and time-invariant predictors on within-subject variance. It is robust against the misspecification of the conditional distribution of responses or the distribution of random effects. It shows similar performance as the correctly specified likelihood methods but is 103 â¼ 105 times faster. The estimation algorithm scales linearly in the total number of observations, making it applicable to massive longitudinal data sets. The effectiveness of WiSER is evaluated in extensive simulation studies. Its broad applicability is illustrated using the accelerometry data from the Women's Health Study and a clinical trial for longitudinal diabetes care.
Assuntos
Algoritmos , Modelos Estatísticos , Humanos , Feminino , Simulação por Computador , Probabilidade , Estudos LongitudinaisRESUMO
BACKGROUND: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools. RESULTS: We present TraitSimulation, an open-source Julia package that makes it trivial to quickly simulate phenotypes under a variety of genetic architectures. This package is integrated into our OpenMendel suite for easy downstream analyses. Julia was purpose-built for scientific programming and provides tremendous speed and memory efficiency, easy access to multi-CPU and GPU hardware, and to distributed and cloud-based parallelization. TraitSimulation is designed to encourage flexible trait simulation, including via the standard devices of applied statistics, generalized linear models (GLMs) and generalized linear mixed models (GLMMs). TraitSimulation also accommodates many study designs: unrelateds, sibships, pedigrees, or a mixture of all three. (Of course, for data with pedigrees or cryptic relationships, the simulation process must include the genetic dependencies among the individuals.) We consider an assortment of trait models and study designs to illustrate integrated simulation and analysis pipelines. Step-by-step instructions for these analyses are available in our electronic Jupyter notebooks on Github. These interactive notebooks are ideal for reproducible research. CONCLUSION: The TraitSimulation package has three main advantages. (1) It leverages the computational efficiency and ease of use of Julia to provide extremely fast, straightforward simulation of even the most complex genetic models, including GLMs and GLMMs. (2) It can be operated entirely within, but is not limited to, the integrated analysis pipeline of OpenMendel. And finally (3), by allowing a wider range of more realistic phenotype models, TraitSimulation brings power calculations and diagnostic tools closer to what investigators might see in real-world analyses.
Assuntos
Computação em Nuvem , Testes Genéticos , Idoso , Simulação por Computador , Humanos , Linhagem , FenótipoRESUMO
Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case-control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Algoritmos , Estudos de Casos e Controles , Simulação por Computador , Humanos , Hipertensão/genética , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/fisiopatologia , Análise de Regressão , Testes de Função RespiratóriaRESUMO
Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.
Assuntos
Biologia Computacional/métodos , Genoma Humano , Estudo de Associação Genômica Ampla , Modelos Estatísticos , Linguagens de Programação , Algoritmos , Humanos , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
Aggression is a quantitative trait deeply entwined with individual fitness. Mapping the genomic architecture underlying such traits is complicated by complex inheritance patterns, social structure, pedigree information and gene pleiotropy. Here, we leveraged the pedigree of a reintroduced population of grey wolves (Canis lupus) in Yellowstone National Park, Wyoming, USA, to examine the heritability of and the genetic variation associated with aggression. Since their reintroduction, many ecological and behavioural aspects have been documented, providing unmatched records of aggressive behaviour across multiple generations of a wild population of wolves. Using a linear mixed model, a robust genetic relationship matrix, 12,288 single nucleotide polymorphisms (SNPs) and 111 wolves, we estimated the SNP-based heritability of aggression to be 37% and an additional 14% of the phenotypic variation explained by shared environmental exposures. We identified 598 SNP genotypes from 425 grey wolves to resolve a consensus pedigree that was included in a heritability analysis of 141 individuals with SNP genotype, metadata and aggression data. The pedigree-based heritability estimate for aggression is 14%, and an additional 16% of the phenotypic variation was explained by shared environmental exposures. We find strong effects of breeding status and relative pack size on aggression. Through an integrative approach, these results provide a framework for understanding the genetic architecture of a complex trait that influences individual fitness, with linkages to reproduction, in a social carnivore. Along with a few other studies, we show here the incredible utility of a pedigreed natural population for dissecting a complex, fitness-related behavioural trait.
Assuntos
Agressão , Lobos , Animais , Comportamento Animal , Linhagem , Polimorfismo de Nucleotídeo Único , Reprodução , Estados Unidos , Lobos/genética , WyomingRESUMO
BACKGROUND: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. RESULTS: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. CONCLUSIONS: Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.
Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Lineares , Algoritmos , Predisposição Genética para Doença , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos TestesRESUMO
Background. Road traffic accidents are the fourth leading cause of death in the entire population, and the first among the youth (ages 15-19 years) in Thailand. The situation in Thailand is worse than in neighboring low- to middle-income countries in the Southeast Asia region. Seventy-three percent of the deaths in the country are motorcycle drivers or passengers. Although motorcyclists (both drivers and passengers) have been obligated to wear helmets by law, the prevalence of helmet use nationwide is not high (43.7% in 2010). Methods. We performed a systematic review to examine potential social determinants of helmet use behavior (observational studies) and to summarize previous intervention studies to promote helmet use (interventional studies) in the country. Studies were identified in PubMed and Web of Science, and by additional review of Thai-written literature. Results. We identified 16 relevant studies for social determinants of helmet use and 5 relevant studies for promoting helmet use in Thailand. Our review shows that several factors such as teens and children (age), women (gender), rural areas (geography), and alcohol drinking (interaction with another behavior) are associated with non-helmet use. We also identified 4 interventional studies implemented in Thailand: 1 law enforcement program and 4 community-based educational programs. Although all the studies improved the prevalence of helmet use after the interventions, only 2 studies exceeded 50%. Conclusion. There is consistent evidence that being younger, being a woman, living in non-Bangkok areas, and drinking alcohol are associated with non-helmet use among motorcycle users in Thailand. We also observed that the effect of past intervention programs is limited.
Assuntos
Acidentes de Trânsito/estatística & dados numéricos , Dispositivos de Proteção da Cabeça/estatística & dados numéricos , Motocicletas , Ferimentos e Lesões/prevenção & controle , Acidentes de Trânsito/mortalidade , Humanos , Estudos Observacionais como Assunto , Fatores de Risco , Tailândia/epidemiologia , Ferimentos e Lesões/epidemiologiaRESUMO
PURPOSE: To investigate whether social support and social trust are associated with DED. METHODS: Cross-sectional data from the Japan Public Health Center-Based Prospective Study for the Next Generation (JPHC-NEXT) were used. Subjects are 96,227 Japanese men and women aged 40 to 74. Data from respondents included information on DED, social support and social trust. DED was defined as the presence of clinically diagnosed DED or severe symptoms. Social support was measured by emotional support and tangible support. Social trust was measured by level of general trust in others. Multiple logistic regression analysis was conducted to assess the association of social determinants for DED. RESULTS: Individuals with high levels of social support and social trust were less likely to have severe symptoms of DED and clinically diagnosed DED (P for trendâ¯<â¯0.001 in both cases). Those with the highest levels of social support and social trust were least likely to have DED (odds ratios [OR]â¯=â¯0.64 [0.61-0.67], 95% confidence interval [CI]â¯=â¯0.63 [0.60-0.67] for severe symptoms of DED; ORâ¯=â¯0.88 [0.83-0.93] and 0.85 [0.80-0.91] for clinically diagnosed DED). CONCLUSIONS: High levels of social support and social trust were associated with a lower prevalence of DED.