RESUMEN
Early or late pubertal onset can lead to disease in adulthood, including cancer, obesity, type 2 diabetes, metabolic disorders, bone fractures, and psychopathologies. Thus, knowing the age at which puberty is attained is crucial as it can serve as a risk factor for future diseases. Pubertal development is divided into five stages of sexual maturation in boys and girls according to the standardized Tanner scale. We performed genome-wide association studies (GWAS) on the "Growth and Obesity Chilean Cohort Study" cohort composed of admixed children with mainly European and Native American ancestry. Using joint models that integrate time-to-event data with longitudinal trajectories of body mass index (BMI), we identified genetic variants associated with phenotypic transitions between pairs of Tanner stages. We identified $42$ novel significant associations, most of them in boys. The GWAS on Tanner $3\rightarrow 4$ transition in boys captured an association peak around the growth-related genes LARS2 and LIMD1 genes, the former of which causes ovarian dysfunction when mutated. The associated variants are expression and splicing Quantitative Trait Loci regulating gene expression and alternative splicing in multiple tissues. Further, higher individual Native American genetic ancestry proportions predicted a significantly earlier puberty onset in boys but not in girls. Finally, the joint models identified a longitudinal BMI parameter significantly associated with several Tanner stages' transitions, confirming the association of BMI with pubertal timing.
Asunto(s)
Índice de Masa Corporal , Estudio de Asociación del Genoma Completo , Pubertad , Humanos , Masculino , Pubertad/genética , Femenino , Chile , Niño , Adolescente , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo , Maduración Sexual/genética , Estudios de Cohortes , Obesidad/genéticaRESUMEN
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Mapeo Cromosómico/métodos , Modelos Genéticos , Fenotipo , Sitios de Carácter Cuantitativo/genética , Nativos de Hawái y Otras Islas del Pacífico/genéticaRESUMEN
Biomedical research now commonly integrates diverse data types or views from the same individuals to better understand the pathobiology of complex diseases, but the challenge lies in meaningfully integrating these diverse views. Existing methods often require the same type of data from all views (cross-sectional data only or longitudinal data only) or do not consider any class outcome in the integration method, which presents limitations. To overcome these limitations, we have developed a pipeline that harnesses the power of statistical and deep learning methods to integrate cross-sectional and longitudinal data from multiple sources. In addition, it identifies key variables that contribute to the association between views and the separation between classes, providing deeper biological insights. This pipeline includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks for cross-sectional data and recurrent neural networks for longitudinal data. We applied this pipeline to cross-sectional and longitudinal multiomics data (metagenomics, transcriptomics and metabolomics) from an inflammatory bowel disease (IBD) study and identified microbial pathways, metabolites and genes that discriminate by IBD status, providing information on the etiology of IBD. We conducted simulations to compare the two feature extraction methods.
Asunto(s)
Aprendizaje Profundo , Enfermedades Inflamatorias del Intestino , Humanos , Estudios Transversales , Enfermedades Inflamatorias del Intestino/clasificación , Enfermedades Inflamatorias del Intestino/genética , Estudios Longitudinales , Análisis Discriminante , Metabolómica/métodos , Biología Computacional/métodosRESUMEN
Given the observed deterioration in mental health among Australians over the past decade, this study investigates to what extent this differs in people born in different decades-i.e., possible birth cohort differences in the mental health of Australians. Using 20 y of data from a large, nationally representative panel survey (N = 27,572), we find strong evidence that cohort effects are driving the increase in population-level mental ill-health. Deteriorating mental health is particularly pronounced among people born in the 1990s and seen to a lesser extent among the 1980s cohort. There is little evidence that mental health is worsening with age for people born prior to the 1980s. The findings from this study highlight that it is the poorer mental health of Millennials that is driving the apparent deterioration in population-level mental health. Understanding the context and changes in society that have differentially affected younger people may inform efforts to ameliorate this trend and prevent it continuing for emerging cohorts.
Asunto(s)
Salud Mental , Humanos , Australia/epidemiología , Encuestas y CuestionariosRESUMEN
The climate crisis impairs yield and quality of crucial crops like potatoes. We investigated the effects of heat stress on five morpho-physiological parameters in a diverse panel of 178 potato cultivars under glasshouse conditions. Overall, heat stress increased shoot elongation and green fresh weight, but reduced tuber yield, starch content and harvest index. Genomic information was obtained from 258 tetraploid and three diploid cultivars by a genotyping-by-sequencing approach using methylation-sensitive restriction enzymes. This resulted in an enrichment of sequences in gene-rich regions. Population structure analyses using genetic distances and hierarchical clustering revealed strong kinship but weak overall population structure cultivars. A genome-wide association study (GWAS) was conducted with a subset of 20 K stringently filtered SNPs to identify quantitative trait loci (QTL) linked to heat tolerance. We identified 67 QTL and established haploblock boundaries to narrow down the number of candidate genes. Additionally, GO-enrichment analyses provided insights into gene functions. Heritability and genomic prediction were conducted to assess the usability of the collected data for selecting breeding material. The detected QTL might be exploited in marker-assisted selection to develop heat-resilient potato cultivars.
RESUMEN
The proportion of variation in complex traits that can be attributed to non-additive genetic effects has been a topic of intense debate. The availability of biobank-scale datasets of genotype and trait data from unrelated individuals opens up the possibility of obtaining precise estimates of the contribution of non-additive genetic effects. We present an efficient method to estimate the variation in a complex trait that can be attributed to additive (additive heritability) and dominance deviation (dominance heritability) effects across all genotyped SNPs in a large collection of unrelated individuals. Over a wide range of genetic architectures, our method yields unbiased estimates of additive and dominance heritability. We applied our method, in turn, to array genotypes as well as imputed genotypes (at common SNPs with minor allele frequency [MAF] > 1%) and 50 quantitative traits measured in 291,273 unrelated white British individuals in the UK Biobank. Averaged across these 50 traits, we find that additive heritability on array SNPs is 21.86% while dominance heritability is 0.13% (about 0.48% of the additive heritability) with qualitatively similar results for imputed genotypes. We find no statistically significant evidence for dominance heritability (p<0.05/50 accounting for the number of traits tested) and estimate that dominance heritability is unlikely to exceed 1% for the traits analyzed. Our analyses indicate a limited contribution of dominance heritability to complex trait variation.
Asunto(s)
Bancos de Muestras Biológicas , Conjuntos de Datos como Asunto , Genes Dominantes/genética , Variación Genética , Herencia Multifactorial/genética , Femenino , Humanos , Masculino , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
With the steadily increasing abundance of longitudinal neuroimaging studies with large sample sizes and multiple repeated measures, questions arise regarding the appropriate modeling of variance and covariance. The current study examined the influence of standard classes of variance-covariance structures in linear mixed effects (LME) modeling of fMRI data from patients with pediatric mild traumatic brain injury (pmTBI; N = 181) and healthy controls (N = 162). During two visits, participants performed a cognitive control fMRI paradigm that compared congruent and incongruent stimuli. The hemodynamic response function was parsed into peak and late peak phases. Data were analyzed with a 4-way (GROUP×VISIT×CONGRUENCY×PHASE) LME using AFNI's 3dLME and compound symmetry (CS), autoregressive process of order 1 (AR1), and unstructured (UN) variance-covariance matrices. Voxel-wise results dramatically varied both within the cognitive control network (UN>CS for CONGRUENCY effect) and broader brain regions (CS>UN for GROUP:VISIT) depending on the variance-covariance matrix that was selected. Additional testing indicated that both model fit and estimated standard error were superior for the UN matrix, likely as a result of the modeling of individual terms. In summary, current findings suggest that the interpretation of results from complex designs is highly dependent on the selection of the variance-covariance structure using LME modeling.
Asunto(s)
Imagen por Resonancia Magnética , Humanos , Masculino , Femenino , Adolescente , Niño , Conmoción Encefálica/diagnóstico por imagen , Conmoción Encefálica/fisiopatología , Modelos Lineales , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Mapeo Encefálico/métodos , Función Ejecutiva/fisiologíaRESUMEN
The linear mixed-effects model (LME) is a versatile approach to account for dependence among observations. Many large-scale neuroimaging datasets with complex designs have increased the need for LME; however LME has seldom been used in whole-brain imaging analyses due to its heavy computational requirements. In this paper, we introduce a fast and efficient mixed-effects algorithm (FEMA) that makes whole-brain vertex-wise, voxel-wise, and connectome-wide LME analyses in large samples possible. We validate FEMA with extensive simulations, showing that the estimates of the fixed effects are equivalent to standard maximum likelihood estimates but obtained with orders of magnitude improvement in computational speed. We demonstrate the applicability of FEMA by studying the cross-sectional and longitudinal effects of age on region-of-interest level and vertex-wise cortical thickness, as well as connectome-wide functional connectivity values derived from resting state functional MRI, using longitudinal imaging data from the Adolescent Brain Cognitive DevelopmentSM Study release 4.0. Our analyses reveal distinct spatial patterns for the annualized changes in vertex-wise cortical thickness and connectome-wide connectivity values in early adolescence, highlighting a critical time of brain maturation. The simulations and application to real data show that FEMA enables advanced investigation of the relationships between large numbers of neuroimaging metrics and variables of interest while considering complex study designs, including repeated measures and family structures, in a fast and efficient manner. The source code for FEMA is available via: https://github.com/cmig-research-group/cmig_tools/.
Asunto(s)
Conectoma , Imagen por Resonancia Magnética , Adolescente , Humanos , Imagen por Resonancia Magnética/métodos , Estudios Transversales , Encéfalo/diagnóstico por imagen , Neuroimagen/métodos , Conectoma/métodos , AlgoritmosRESUMEN
With the advances in high-throughput biotechnologies, high-dimensional multi-layer omics data become increasingly available. They can provide both confirmatory and complementary information to disease risk and thus have offered unprecedented opportunities for risk prediction studies. However, the high-dimensionality and complex inter/intra-relationships among multi-omics data have brought tremendous analytical challenges. Here we present a computationally efficient penalized linear mixed model with generalized method of moments estimator (MpLMMGMM) for the prediction analysis on multi-omics data. Our method extends the widely used linear mixed model proposed for genomic risk predictions to model multi-omics data, where kernel functions are used to capture various types of predictive effects from different layers of omics data and penalty terms are introduced to reduce the impact of noise. Compared with existing penalized linear mixed models, the proposed method adopts the generalized method of moments estimator and it is much more computationally efficient. Through extensive simulation studies and the analysis of positron emission tomography imaging outcomes, we have demonstrated that MpLMMGMM can simultaneously consider a large number of variables and efficiently select those that are predictive from the corresponding omics layers. It can capture both linear and nonlinear predictive effects and achieves better prediction performance than competing methods.
Asunto(s)
Algoritmos , Genómica , Genoma , Genómica/métodos , Modelos Lineales , Proyectos de InvestigaciónRESUMEN
Pseudoreplication compromises the validity of research by treating non-independent samples as independent replicates. This review examines the prevalence of pseudoreplication in host-microbiota studies, highlighting the critical need for rigorous experimental design and appropriate statistical analysis. We systematically reviewed 115 manuscripts on host-microbiota interactions. Our analysis revealed that 22% of the papers contained pseudoreplication, primarily due to co-housed organisms, whereas 52% lacked sufficient methodological details. The remaining 26% adequately addressed pseudoreplication through proper experimental design or statistical analysis. The high incidence of pseudoreplication and insufficient information underscores the importance of methodological reporting and statistical rigor to ensure reproducibility of host-microbiota research.
Asunto(s)
Interacciones Microbiota-Huesped , Microbiota , Animales , Humanos , Reproducibilidad de los Resultados , Proyectos de InvestigaciónRESUMEN
PURPOSE: To investigate whether intraocular pressure (IOP) fluctuation is associated independently with the rate of visual field (VF) progression in the United Kingdom Glaucoma Treatment Study. DESIGN: Randomized, double-masked, placebo-controlled multicenter trial. PARTICIPANTS: Participants with ≥5 VFs (213 placebo, 217 treatment). METHODS: Associations between IOP metrics and VF progression rates (mean deviation [MD] and five fastest locations) were assessed with linear mixed models. Fluctuation variables were mean Pascal ocular pulse amplitude (OPA), standard deviation (SD) of diurnal Goldmann IOP (diurnal fluctuation), and SD of Goldmann IOP at all visits (long-term fluctuation). Fluctuation values were normalized for mean IOP to make them independent from the mean IOP. Correlated nonfluctuation IOP metrics (baseline, peak, mean, supine, and peak phasing IOP) were combined with principal component analysis, and principal component 1 (PC1) was included as a covariate. Interactions between covariates and time from baseline modeled the effect of the variables on VF rates. Analyses were conducted separately in the two treatment arms. MAIN OUTCOME MEASURES: Associations between IOP fluctuation metrics and rates of MD and the five fastest test locations. RESULTS: In the placebo arm, only PC1 was associated significantly with the MD rate (estimate, -0.19 dB/year [standard error (SE), 0.04 dB/year]; P < 0.001), whereas normalized IOP fluctuation metrics were not. No variable was associated significantly with MD rates in the treatment arm. For the fastest five locations in the placebo group, PC1 (estimate, -0.58 dB/year [SE, 0.16 dB/year]; P < 0.001), central corneal thickness (estimate, 0.26 dB/year [SE, 0.10 dB/year] for 10 µm thicker; P = 0.01) and normalized OPA (estimate, -3.50 dB/year [SE, 1.04 dB/year]; P = 0.001) were associated with rates of progression; normalized diurnal and long-term IOP fluctuations were not. In the treatment group, only PC1 (estimate, -0.27 dB/year [SE, 0.12 dB/year]; P = 0.028) was associated with the rates of progression. CONCLUSIONS: No evidence supports that either diurnal or long-term IOP fluctuation, as measured in clinical practice, are independent factors for glaucoma progression; other aspects of IOP, including mean IOP and peak IOP, may be more informative. Ocular pulse amplitude may be an independent factor for faster glaucoma progression. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Asunto(s)
Antihipertensivos , Progresión de la Enfermedad , Glaucoma de Ángulo Abierto , Presión Intraocular , Tonometría Ocular , Campos Visuales , Humanos , Presión Intraocular/fisiología , Campos Visuales/fisiología , Método Doble Ciego , Antihipertensivos/uso terapéutico , Masculino , Femenino , Anciano , Glaucoma de Ángulo Abierto/fisiopatología , Glaucoma de Ángulo Abierto/tratamiento farmacológico , Reino Unido , Persona de Mediana Edad , Pruebas del Campo Visual , Trastornos de la Visión/fisiopatología , Latanoprost/uso terapéutico , Ritmo Circadiano/fisiologíaRESUMEN
Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis phenotypes from the AraPheno database.
Asunto(s)
Arabidopsis , Estudio de Asociación del Genoma Completo , Fenotipo , Arabidopsis/genética , Modelos Genéticos , Modelos Lineales , Simulación por ComputadorRESUMEN
Modern biomedical datasets are increasingly high-dimensional and exhibit complex correlation structures. Generalized linear mixed models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high-dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.
Asunto(s)
Algoritmos , Simulación por Computador , Modelos Lineales , Método de MontecarloRESUMEN
In many medical studies, the outcome measure (such as quality of life, QOL) for some study participants becomes informatively truncated (censored, missing, or unobserved) due to death or other forms of dropout, creating a nonignorable missing data problem. In such cases, the use of a composite outcome or imputation methods that fill in unmeasurable QOL values for those who died rely on strong and untestable assumptions and may be conceptually unappealing to certain stakeholders when estimating a treatment effect. The survivor average causal effect (SACE) is an alternative causal estimand that surmounts some of these issues. While principal stratification has been applied to estimate the SACE in individually randomized trials, methods for estimating the SACE in cluster-randomized trials are currently limited. To address this gap, we develop a mixed model approach along with an expectation-maximization algorithm to estimate the SACE in cluster-randomized trials. We model the continuous outcome measure with a random intercept to account for intracluster correlations due to cluster-level randomization, and model the principal strata membership both with and without a random intercept. In simulations, we compare the performance of our approaches with an existing fixed-effects approach to illustrate the importance of accounting for clustering in cluster-randomized trials. The methodology is then illustrated using a cluster-randomized trial of telecare and assistive technology on health-related QOL in the elderly.
Asunto(s)
Modelos Estadísticos , Calidad de Vida , Humanos , Anciano , Ensayos Clínicos Controlados Aleatorios como Asunto , Evaluación de Resultado en la Atención de Salud , SobrevivientesRESUMEN
When analyzing multivariate longitudinal binary data, we estimate the effects on the responses of the covariates while accounting for three types of complex correlations present in the data. These include the correlations within separate responses over time, cross-correlations between different responses at different times, and correlations between different responses at each time point. The number of parameters thus increases quadratically with the dimension of the correlation matrix, making parameter estimation difficult; the estimated correlation matrix must also meet the positive definiteness constraint. The correlation matrix may additionally be heteroscedastic; however, the matrix structure is commonly considered to be homoscedastic and constrained, such as exchangeable or autoregressive with order one. These assumptions are overly strong, resulting in skewed estimates of the covariate effects on the responses. Hence, we propose probit linear mixed models for multivariate longitudinal binary data, where the correlation matrix is estimated using hypersphere decomposition instead of the strong assumptions noted above. Simulations and real examples are used to demonstrate the proposed methods. An open source R package, BayesMGLM, is made available on GitHub at https://github.com/kuojunglee/BayesMGLM/ with full documentation to produce the results.
Asunto(s)
Modelos Lineales , HumanosRESUMEN
Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, functional principal component analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This article presents an empirical simulation study evaluating the behavior of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.
Asunto(s)
Simulación por Computador , Modelos Estadísticos , Análisis de Componente Principal , Humanos , Estudios Longitudinales , Demencia , Estudios de Casos y Controles , Interpretación Estadística de DatosRESUMEN
Longitudinal data from clinical trials are commonly analyzed using mixed models for repeated measures (MMRM) when the time variable is categorical or linear mixed-effects models (ie, random effects model) when the time variable is continuous. In these models, statistical inference is typically based on the absolute difference in the adjusted mean change (for categorical time) or the rate of change (for continuous time). Previously, we proposed a novel approach: modeling the percentage reduction in disease progression associated with the treatment relative to the placebo decline using proportional models. This concept of proportionality provides an innovative and flexible method for simultaneously modeling different cohorts, multivariate endpoints, and jointly modeling continuous and survival endpoints. Through simulated data, we demonstrate the implementation of these models using SAS procedures in both frequentist and Bayesian approaches. Additionally, we introduce a novel method for implementing MMRM models (ie, analysis of response profile) using the nlmixed procedure.
Asunto(s)
Teorema de Bayes , Ensayos Clínicos como Asunto , Simulación por Computador , Modelos Estadísticos , Humanos , Estudios Longitudinales , Ensayos Clínicos como Asunto/métodos , Dinámicas no Lineales , Modelos de Riesgos Proporcionales , Interpretación Estadística de DatosRESUMEN
Stepped wedge design is a popular research design that enables a rigorous evaluation of candidate interventions by using a staggered cluster randomization strategy. While analytical methods were developed for designing stepped wedge trials, the prior focus has been solely on testing for the average treatment effect. With a growing interest on formal evaluation of the heterogeneity of treatment effects across patient subpopulations, trial planning efforts need appropriate methods to accurately identify sample sizes or design configurations that can generate evidence for both the average treatment effect and variations in subgroup treatment effects. To fill in that important gap, this article derives novel variance formulas for confirmatory analyses of treatment effect heterogeneity, that are applicable to both cross-sectional and closed-cohort stepped wedge designs. We additionally point out that the same framework can be used for more efficient average treatment effect analyses via covariate adjustment, and allows the use of familiar power formulas for average treatment effect analyses to proceed. Our results further sheds light on optimal design allocations of clusters to maximize the weighted precision for assessing both the average and heterogeneous treatment effects. We apply the new methods to the Lumbar Imaging with Reporting of Epidemiology Trial, and carry out a simulation study to validate our new methods.
Asunto(s)
Proyectos de Investigación , Heterogeneidad del Efecto del Tratamiento , Humanos , Estudios Transversales , Ensayos Clínicos Controlados Aleatorios como Asunto , Simulación por Computador , Tamaño de la Muestra , Análisis por ConglomeradosRESUMEN
Postmarket drug safety database like vaccine adverse event reporting system (VAERS) collect thousands of spontaneous reports annually, with each report recording occurrences of any adverse events (AEs) and use of vaccines. We hope to identify signal vaccine-AE pairs, for which certain vaccines are statistically associated with certain adverse events (AE), using such data. Thus, the outcomes of interest are multiple AEs, which are binary outcomes and could be correlated because they might share certain latent factors; and the primary covariates are vaccines. Appropriately accounting for the complex correlation among AEs could improve the sensitivity and specificity of identifying signal vaccine-AE pairs. We propose a two-step approach in which we first estimate the shared latent factors among AEs using a working multivariate logistic regression model, and then use univariate logistic regression model to examine the vaccine-AE associations after controlling for the latent factors. Our simulation studies show that this approach outperforms current approaches in terms of sensitivity and specificity. We apply our approach in analyzing VAERS data and report our findings.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Vacunas , Humanos , Estados Unidos , Vacunas/efectos adversos , Bases de Datos Factuales , Simulación por Computador , Programas InformáticosRESUMEN
Statistical models with random intercepts and slopes (RIAS models) are commonly used to analyze longitudinal data. Fitting such models sometimes results in negative estimates of variance components or estimates on parameter space boundaries. This can be an unlucky chance occurrence, but can also occur because certain marginal distributions are mathematically identical to those from RIAS models with negative intercept and/or slope variance components and/or intercept-slope correlations greater than one in magnitude. We term such parameters "pseudo-variances" and "pseudo-correlations," and the models "non-regular." We use eigenvalue theory to explore how and when such non-regular RIAS models arise, showing: (i) A small number of measurements, short follow-up, and large residual variance increase the parameter space for which data (with a positive semidefinite marginal variance-covariance matrix) are compatible with non-regular RIAS models. (ii) Non-regular RIAS models can arise from model misspecification, when non-linearity in fixed effects is ignored or when random effects are omitted. (iii) A non-regular RIAS model can sometimes be interpreted as a regular linear mixed model with one or more additional random effects, which may not be identifiable from the data. (iv) Particular parameterizations of non-regular RIAS models have no generality for all possible numbers of measurements over time. Because of this lack of generality, we conclude that non-regular RIAS models can only be regarded as plausible data-generating mechanisms in some situations. Nevertheless, fitting a non-regular RIAS model can be acceptable, allowing unbiased inference on fixed effects where commonly recommended alternatives such as dropping the random slope result in bias.