Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 221
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 120(12): e2216218120, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36927152

RESUMEN

The concept of fitness is central to evolution, but it quantifies only the expected number of offspring an individual will produce. The actual number of offspring is also subject to demographic stochasticity-that is, randomness associated with birth and death processes. In nature, individuals who are more fecund tend to have greater variance in their offspring number. Here, we develop a model for the evolution of two types competing in a population of nonconstant size. The fitness of each type is determined by pairwise interactions in a prisoner's dilemma game, and the variance in offspring number depends upon its mean. Although defectors are preferred by natural selection in classical population models, since they always have greater fitness than cooperators, we show that sufficiently large offspring variance can reverse the direction of evolution and favor cooperation. Large offspring variance produces qualitatively new dynamics for other types of social interactions, as well, which cannot arise in populations with a fixed size or with a Poisson offspring distribution.


Asunto(s)
Conducta Cooperativa , Teoría del Juego , Humanos , Dinámica Poblacional , Densidad de Población , Selección Genética
2.
Genet Epidemiol ; 48(4): 151-163, 2024 06.
Artículo en Inglés | MEDLINE | ID: mdl-38379245

RESUMEN

Phenotypic heterogeneity at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight into the pathways by which pharmacological interventions may affect disease risk. However, statistical inference in such investigations may be poor if overdispersion heterogeneity in measured genetic associations is unaccounted for. In this work, we first develop conditional F statistics for dimension-reduced genetic associations that enable more accurate measurement of phenotypic heterogeneity. We then develop a novel extension for two-sample multivariable Mendelian randomization that accounts for overdispersion heterogeneity in dimension-reduced genetic associations. Our empirical focus is to use genetic variants in the GLP1R gene region to understand the mechanism by which GLP1R agonism affects coronary artery disease (CAD) risk. Colocalization analyses indicate that distinct variants in the GLP1R gene region are associated with body mass index and type 2 diabetes (T2D). Multivariable Mendelian randomization analyses that were corrected for overdispersion heterogeneity suggest that bodyweight lowering rather than T2D liability lowering effects of GLP1R agonism are more likely contributing to reduced CAD risk. Tissue-specific analyses prioritized brain tissue as the most likely to be relevant for CAD risk, of the tissues considered. We hope the multivariable Mendelian randomization approach illustrated here is widely applicable to better understand mechanisms linking drug targets to diseases outcomes, and hence to guide drug development efforts.


Asunto(s)
Índice de Masa Corporal , Enfermedad de la Arteria Coronaria , Diabetes Mellitus Tipo 2 , Receptor del Péptido 1 Similar al Glucagón , Análisis de la Aleatorización Mendeliana , Fenotipo , Humanos , Receptor del Péptido 1 Similar al Glucagón/genética , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/tratamiento farmacológico , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
3.
Genet Epidemiol ; 47(5): 379-393, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37042632

RESUMEN

Variation in RNA-Seq data creates modeling challenges for differential gene expression (DE) analysis. Statistical approaches address conventional small sample sizes and implement empirical Bayes or non-parametric tests, but frequently produce different conclusions. Increasing sample sizes enable proposal of alternative DE paradigms. Here we develop RoPE, which uses a data-driven adjustment for variation and a robust profile likelihood ratio DE test. Simulation studies show RoPE can have improved performance over existing tools as sample size increases and has the most reliable control of error rates. Application of RoPE demonstrates that an active Pseudomonas aeruginosa infection downregulates the SLC9A3 Cystic Fibrosis modifier gene.


Asunto(s)
Perfilación de la Expresión Génica , Modelos Genéticos , Humanos , Funciones de Verosimilitud , Perfilación de la Expresión Génica/métodos , Teorema de Bayes , Simulación por Computador
4.
Biometrics ; 80(2)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38682464

RESUMEN

The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.


Asunto(s)
Simulación por Computador , Modelos Estadísticos , Distribución de Poisson , Humanos , Tamaño de la Muestra , Biometría/métodos , Análisis Factorial
5.
Stat Med ; 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39237124

RESUMEN

The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.

6.
J Anim Ecol ; 93(4): 501-516, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38409804

RESUMEN

Tropical rainforest trees host a diverse arthropod fauna that can be characterised by their functional diversity (FD) and phylogenetic diversity (PD). Human disturbance degrades tropical forests, often coinciding with species invasion and altered assembly that leads to a decrease in FD and PD. Tree canopies are thought to be particularly vulnerable, but rarely investigated. Here, we studied the effects of forest disturbance on an ecologically important invertebrate group, the ants, in a lowland rainforest in New Guinea. We compared an early successional disturbed plot (secondary forest) to an old-growth plot (primary forest) by exhaustively sampling their ant communities in a total of 852 trees. We expected that for each tree community (1) disturbance would decrease FD and PD in tree-dwelling ants, mediated through species invasion. (2) Disturbance would decrease ant trait variation due to a more homogeneous environment. (3) The main drivers behind these changes would be different contributions of true tree-nesting species and visiting species. We calculated FD and PD based on a species-level phylogeny and 10 ecomorphological traits. Furthermore, we assessed by data exclusion the influence of species, which were not nesting in individual trees (visitors) or only nesting species (nesters), and of non-native species on FD and PD. Primary forests had higher ant species richness and PD than secondary forest. However, we consistently found increased FD in secondary forest. This pattern was robust even if we decoupled functional and phylogenetic signals, or if non-native ant species were excluded from the data. Visitors did not contribute strongly to FD, but they increased PD and their community weighted trait means often varied from nesters. Moreover, all community-weighted trait means changed after forest disturbance. Our finding of contradictory FD and PD patterns highlights the importance of integrative measures of diversity. Our results indicate that the tree community trait diversity is not negatively affected, but possibly even enhanced by disturbance. Therefore, the functional diversity of arboreal ants is relatively robust when compared between old-growth and young trees. However, further study with higher plot-replication is necessary to solidify and generalise our findings.


Asunto(s)
Hormigas , Biodiversidad , Humanos , Animales , Filogenia , Bosques , Bosque Lluvioso , Ecosistema
7.
Proc Natl Acad Sci U S A ; 118(17)2021 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-33833080

RESUMEN

Epidemics generally spread through a succession of waves that reflect factors on multiple timescales. On short timescales, superspreading events lead to burstiness and overdispersion, whereas long-term persistent heterogeneity in susceptibility is expected to lead to a reduction in both the infection peak and the herd immunity threshold (HIT). Here, we develop a general approach to encompass both timescales, including time variations in individual social activity, and demonstrate how to incorporate them phenomenologically into a wide class of epidemiological models through reparameterization. We derive a nonlinear dependence of the effective reproduction number [Formula: see text] on the susceptible population fraction S. We show that a state of transient collective immunity (TCI) emerges well below the HIT during early, high-paced stages of the epidemic. However, this is a fragile state that wanes over time due to changing levels of social activity, and so the infection peak is not an indication of long-lasting herd immunity: Subsequent waves may emerge due to behavioral changes in the population, driven by, for example, seasonal factors. Transient and long-term levels of heterogeneity are estimated using empirical data from the COVID-19 epidemic and from real-life face-to-face contact networks. These results suggest that the hardest hit areas, such as New York City, have achieved TCI following the first wave of the epidemic, but likely remain below the long-term HIT. Thus, in contrast to some previous claims, these regions can still experience subsequent waves.


Asunto(s)
COVID-19 , Epidemias , Inmunidad Colectiva , Modelos Inmunológicos , SARS-CoV-2/inmunología , COVID-19/epidemiología , COVID-19/inmunología , COVID-19/transmisión , Humanos , Estados Unidos/epidemiología
8.
Proc Natl Acad Sci U S A ; 118(14)2021 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-33741734

RESUMEN

Increasing evidence indicates that superspreading plays a dominant role in COVID-19 transmission. Recent estimates suggest that the dispersion parameter k for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is on the order of 0.1, which corresponds to about 10% of cases being the source of 80% of infections. To investigate how overdispersion might affect the outcome of various mitigation strategies, we developed an agent-based model with a social network that allows transmission through contact in three sectors: "close" (a small, unchanging group of mutual contacts as might be found in a household), "regular" (a larger, unchanging group as might be found in a workplace or school), and "random" (drawn from the entire model population and not repeated regularly). We assigned individual infectivity from a gamma distribution with dispersion parameter k We found that when k was low (i.e., greater heterogeneity, more superspreading events), reducing random sector contacts had a far greater impact on the epidemic trajectory than did reducing regular contacts; when k was high (i.e., less heterogeneity, no superspreading events), that difference disappeared. These results suggest that overdispersion of COVID-19 transmission gives the virus an Achilles' heel: Reducing contacts between people who do not regularly meet would substantially reduce the pandemic, while reducing repeated contacts in defined social groups would be less effective.


Asunto(s)
COVID-19/epidemiología , COVID-19/transmisión , Trazado de Contacto/estadística & datos numéricos , Modelos Estadísticos , Pandemias , Distanciamiento Físico , Factores de Edad , COVID-19/prevención & control , COVID-19/virología , Simulación por Computador , Humanos , Cuarentena/estadística & datos numéricos , SARS-CoV-2/patogenicidad , SARS-CoV-2/fisiología , Red Social
9.
Risk Anal ; 44(8): 1839-1849, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38331570

RESUMEN

Biological invasions are a growing threat to biodiversity, food security, and economies. Rising pressure from increased global trade requires improving border inspection efficiency. Here, we depart from the conventional consignment-by-consignment approach advocated in current inspection standards. Instead, we suggest a broader perspective: evaluating border inspection regimes based on their ability to reduce propagule pressure across entire pathways. Additionally, we demonstrate that most biosecurity pathways exhibit superspreading behavior, that is, consignments from the same pathway have varying infestation rates and contain rare right-tail events (also called overdispersion). We show that greater overdispersion leads to more pronounced diminishing returns, with consequences on the optimal allocation of sampling effort. We leverage these two insights to develop a simple and efficient border inspection regime that can significantly reduce propagule pressure compared to current standards. Our analysis revealed that consignment size is a key driver of biosecurity risk and that sampling proportional to the square root of consignment size is near optimal. In testing, our framework reduced propagule pressure by 31 to 38% compared to current standards. We also identified opportunities to further improve inspection efficiency by considering additional pathway characteristics (i.e., overdispersion parameters, zero inflation, relative risk, sampling cost, detectability) and developed solutions for these more complex scenarios. We anticipate our result will mitigate biological invasion risk with significant implications for biodiversity conservation, food security, and economies worldwide.


Asunto(s)
Bioaseguramiento , Especies Introducidas , Medición de Riesgo/métodos , Humanos , Biodiversidad , Comercio , Seguridad Alimentaria , Animales
10.
Pharm Stat ; 23(1): 46-59, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38267827

RESUMEN

Count outcomes are collected in clinical trials for new drug development in several therapeutic areas and the event rate is commonly used as a single primary endpoint. Count outcomes that are greater than the mean value are termed overdispersion; thus, count outcomes are assumed to have a negative binomial distribution. However, in clinical trials for treating asthma and chronic obstructive pulmonary disease (COPD), a regulatory agency has suggested that a continuous endpoint related to lung function must be evaluated as a primary endpoint in addition to the event rate. The two co-primary endpoints that need to be evaluated include overdispersed count and continuous outcomes. Some researchers have proposed sample size calculation methods in the context of co-primary endpoints for various outcome types. However, methodologies for sample size calculation in trials with two co-primary endpoints, including overdispersed count and continuous outcomes, required when planning clinical trials for treating asthma and COPD, remain to be proposed. In this study, we aimed to develop a hypothesis-testing method and a corresponding sample size calculation method with two co-primary endpoints including overdispersed count and continuous outcomes. In a simulation, we demonstrated that the proposed sample size calculation method has adequate power accuracy. In addition, we illustrated an application of the proposed sample size calculation method to a placebo-controlled Phase 3 trial for patients with COPD.


Asunto(s)
Asma , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Tamaño de la Muestra , Asma/tratamiento farmacológico , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/tratamiento farmacológico , Distribución Binomial , Simulación por Computador
11.
Behav Res Methods ; 56(4): 2765-2781, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38383801

RESUMEN

Count outcomes are frequently encountered in single-case experimental designs (SCEDs). Generalized linear mixed models (GLMMs) have shown promise in handling overdispersed count data. However, the presence of excessive zeros in the baseline phase of SCEDs introduces a more complex issue known as zero-inflation, often overlooked by researchers. This study aimed to deal with zero-inflated and overdispersed count data within a multiple-baseline design (MBD) in single-case studies. It examined the performance of various GLMMs (Poisson, negative binomial [NB], zero-inflated Poisson [ZIP], and zero-inflated negative binomial [ZINB] models) in estimating treatment effects and generating inferential statistics. Additionally, a real example was used to demonstrate the analysis of zero-inflated and overdispersed count data. The simulation results indicated that the ZINB model provided accurate estimates for treatment effects, while the other three models yielded biased estimates. The inferential statistics obtained from the ZINB model were reliable when the baseline rate was low. However, when the data were overdispersed but not zero-inflated, both the ZINB and ZIP models exhibited poor performance in accurately estimating treatment effects. These findings contribute to our understanding of using GLMMs to handle zero-inflated and overdispersed count data in SCEDs. The implications, limitations, and future research directions are also discussed.


Asunto(s)
Estudios de Casos Únicos como Asunto , Humanos , Modelos Lineales , Análisis Multinivel/métodos , Interpretación Estadística de Datos , Modelos Estadísticos , Distribución de Poisson , Simulación por Computador , Proyectos de Investigación
12.
Behav Res Methods ; 56(7): 7963-7984, 2024 10.
Artículo en Inglés | MEDLINE | ID: mdl-38987450

RESUMEN

Generalized linear mixed models (GLMMs) have great potential to deal with count data in single-case experimental designs (SCEDs). However, applied researchers have faced challenges in making various statistical decisions when using such advanced statistical techniques in their own research. This study focused on a critical issue by investigating the selection of an appropriate distribution to handle different types of count data in SCEDs due to overdispersion and/or zero-inflation. To achieve this, I proposed two model selection frameworks, one based on calculating information criteria (AIC and BIC) and another based on utilizing a multistage-model selection procedure. Four data scenarios were simulated including Poisson, negative binominal (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB). The same set of models (i.e., Poisson, NB, ZIP, and ZINB) were fitted for each scenario. In the simulation, I evaluated 10 model selection strategies within the two frameworks by assessing the model selection bias and its consequences on the accuracy of the treatment effect estimates and inferential statistics. Based on the simulation results and previous work, I provide recommendations regarding which model selection methods should be adopted in different scenarios. The implications, limitations, and future research directions are also discussed.


Asunto(s)
Método de Montecarlo , Modelos Lineales , Humanos , Estudios de Casos Únicos como Asunto , Simulación por Computador , Interpretación Estadística de Datos , Modelos Estadísticos , Distribución de Poisson , Proyectos de Investigación
13.
J Neurosci ; 42(26): 5268-5280, 2022 06 29.
Artículo en Inglés | MEDLINE | ID: mdl-35641190

RESUMEN

Hippocampal place cells form a map of the environment of an animal. Changes in the hippocampal map can be brought about in a number of ways, including changes to the environment, task, internal state of the subject, and the passage of time. These changes in the hippocampal map have been called remapping. In this study, we examine remapping during repeated exposure to the same environment. Different animals can have different remapping responses to the same changes. This variability across animals in remapping behavior is not well understood. In this work, we analyzed electrophysiological recordings from the CA3 region of the hippocampus performed by Alme et al. (2014), in which five male rats were exposed to 11 different environments, including a variety of repetitions of those environments. To compare the hippocampal maps between two experiences, we computed average rate map correlation coefficients. We found changes in the hippocampal maps between different sessions in the same environment. These changes consisted of partial remapping, a form of remapping in which some place cells maintain their place fields, whereas other place cells remap their place fields. Each animal exhibited partial remapping differently. We discovered that the heterogeneity in hippocampal representational changes across animals is structured; individual animals had consistently different levels of partial remapping across a range of independent comparisons. Our findings highlight that partial hippocampal remapping between repeated environments depends on animal-specific factors.SIGNIFICANCE STATEMENT Context identification is a difficult problem. Animals are not provided with objective context identity labels, so they must infer which experiences come from which contexts. Different animals may have different strategies for performing this inference. We find that different animals have stereotypically different extents of partial hippocampal remapping, a neural correlate of subjective assessment of context identity.


Asunto(s)
Hipocampo , Células de Lugar , Animales , Región CA1 Hipocampal , Hipocampo/fisiología , Masculino , Ratas , Percepción Espacial
14.
J Biopharm Stat ; 33(3): 335-356, 2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-36662165

RESUMEN

Based on the well-known Poisson (P) distribution and the new generalized Lindley distribution (NGLD) developed by using gamma (α,θ) and gamma (α-1,θ) distributions, a new compound two-parameter Poisson generalized Lindley (TPPGL) distribution is proposed in this paper and thereon systematically explores the mathematical properties. Closed form expressions are assembled for such properties including the probability generating function, moments, skewness, kurtosis, etc. The likelihood-based method is used for estimating the parameters followed by a broad Monte Carlo simulation study. To further motivate the proposed model, a count regression model and a first order integer valued autoregressive process are constructed based on the novel TPPGL distribution. The empirical importance of the proposed models is confirmed through application to four real datasets.


Asunto(s)
Funciones de Verosimilitud , Humanos , Simulación por Computador , Distribución de Poisson , Método de Montecarlo
15.
BMC Public Health ; 23(1): 1003, 2023 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-37254143

RESUMEN

BACKGROUND: A recurrent feature of infectious diseases is the observation that different individuals show different levels of secondary transmission. This inter-individual variation in transmission potential is often quantified by the dispersion parameter k. Low values of k indicate a high degree of variability and a greater probability of superspreading events. Understanding k for COVID-19 across contexts can assist policy makers prepare for future pandemics. METHODS: A literature search following a systematic approach was carried out in PubMed, Embase, Web of Science, Cochrane Library, medRxiv, bioRxiv and arXiv to identify publications containing epidemiological findings on superspreading in COVID-19. Study characteristics, epidemiological data, including estimates for k and R0, and public health recommendations were extracted from relevant records. RESULTS: The literature search yielded 28 peer-reviewed studies. The mean k estimates ranged from 0.04 to 2.97. Among the 28 studies, 93% reported mean k estimates lower than one, which is considered as marked heterogeneity in inter-individual transmission potential. Recommended control measures were specifically aimed at preventing superspreading events. The combination of forward and backward contact tracing, timely confirmation of cases, rapid case isolation, vaccination and preventive measures were suggested as important components to suppress superspreading. CONCLUSIONS: Superspreading events were a major feature in the pandemic of SARS-CoV-2. On the one hand, this made outbreaks potentially more explosive but on the other hand also more responsive to public health interventions. Going forward, understanding k is critical for tailoring public health measures to high-risk groups and settings where superspreading events occur.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Pandemias/prevención & control , Salud Pública , Trazado de Contacto
16.
BMC Health Serv Res ; 23(1): 23, 2023 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-36627627

RESUMEN

BACKGROUND: Institutions or clinicians (units) are often compared according to a performance indicator such as in-hospital mortality. Several approaches have been proposed for the detection of outlying units, whose performance deviates from the overall performance. METHODS: We provide an overview of three approaches commonly used to monitor institutional performances for outlier detection. These are the common-mean model, the 'Normal-Poisson' random effects model and the 'Logistic' random effects model. For the latter we also propose a visualisation technique. The common-mean model assumes that the underlying true performance of all units is equal and that any observed variation between units is due to chance. Even after applying case-mix adjustment, this assumption is often violated due to overdispersion and a post-hoc correction may need to be applied. The random effects models relax this assumption and explicitly allow the true performance to differ between units, thus offering a more flexible approach. We discuss the strengths and weaknesses of each approach and illustrate their application using audit data from England and Wales on Adult Cardiac Surgery (ACS) and Percutaneous Coronary Intervention (PCI). RESULTS: In general, the overdispersion-corrected common-mean model and the random effects approaches produced similar p-values for the detection of outliers. For the ACS dataset (41 hospitals) three outliers were identified in total but only one was identified by all methods above. For the PCI dataset (88 hospitals), seven outliers were identified in total but only two were identified by all methods. The common-mean model uncorrected for overdispersion produced several more outliers. The reason for observing similar p-values for all three approaches could be attributed to the fact that the between-hospital variance was relatively small in both datasets, resulting only in a mild violation of the common-mean assumption; in this situation, the overdispersion correction worked well. CONCLUSION: If the common-mean assumption is likely to hold, all three methods are appropriate to use for outlier detection and their results should be similar. Random effect methods may be the preferred approach when the common-mean assumption is likely to be violated.


Asunto(s)
Intervención Coronaria Percutánea , Humanos , Hospitales , Ajuste de Riesgo , Modelos Logísticos , Inglaterra
17.
Entropy (Basel) ; 25(1)2023 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-36673267

RESUMEN

Binomial autoregressive models are frequently used for modeling bounded time series counts. However, they are not well developed for more complex bounded time series counts of the occurrence of n exchangeable and dependent units, which are becoming increasingly common in practice. To fill this gap, this paper first constructs an exchangeable Conway-Maxwell-Poisson-binomial (CMPB) thinning operator and then establishes the Conway-Maxwell-Poisson-binomial AR (CMPBAR) model. We establish its stationarity and ergodicity, discuss the conditional maximum likelihood (CML) estimate of the model's parameters, and establish the asymptotic normality of the CML estimator. In a simulation study, the boxplots illustrate that the CML estimator is consistent and the qqplots show the asymptotic normality of the CML estimator. In the real data example, our model takes a smaller AIC and BIC than its main competitors.

18.
Brief Bioinform ; 21(5): 1756-1765, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31688892

RESUMEN

Mapping of expression quantitative trait loci (eQTLs) facilitates interpretation of the regulatory path from genetic variants to their associated disease or traits. High-throughput sequencing of RNA (RNA-seq) has expedited the exploration of these regulatory variants. However, eQTL mapping is usually confronted with the analysis challenges caused by overdispersion and excessive dropouts in RNA-seq. The heavy-tailed distribution of gene expression violates the assumption of Gaussian distributed errors in linear regression for eQTL detection, which results in increased Type I or Type II errors. Applying rank-based inverse normal transformation (INT) can make the expression values more normally distributed. However, INT causes information loss and leads to uninterpretable effect size estimation. After comprehensive examination of the impact from overdispersion and excessive dropouts, we propose to apply a robust model, quantile regression, to map eQTLs for genes with high degree of overdispersion or large number of dropouts. Simulation studies show that quantile regression has the desired robustness to outliers and dropouts, and it significantly improves eQTL mapping. From a real data analysis, the most significant eQTL discoveries differ between quantile regression and the conventional linear model. Such discrepancy becomes more prominent when the dropout effect or the overdispersion effect is large. All the results suggest that quantile regression provides more reliable and accurate eQTL mapping than conventional linear models. It deserves more attention for the large-scale eQTL mapping.


Asunto(s)
Sitios de Carácter Cuantitativo , Simulación por Computador , Expresión Génica , Variación Genética , Humanos , Modelos Lineales , Análisis de Secuencia de ARN/métodos
19.
Stat Med ; 41(15): 2804-2821, 2022 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-35417078

RESUMEN

Recently developed actigraphy devices have made it possible for continuous and objective monitoring of sleep over multiple nights. Sleep variables captured by wrist actigraphy devices include sleep onset, sleep end, total sleep time, wake time after sleep onset, number of awakenings, etc. Currently available statistical methods to analyze such actigraphy data have limitations. First, averages over multiple nights are used to summarize sleep activities, ignoring variability over multiple nights from the same subject. Second, sleep variables are often analyzed independently. However, sleep variables tend to be correlated with each other. For example, how long a subject sleeps at night can be correlated with how long and how frequent he/she wakes up during that night. It is important to understand these inter-relationships. We therefore propose a joint mixed effect model on total sleep time, number of awakenings, and wake time. We develop an estimating procedure based upon a sequence of generalized linear mixed effects models, which can be implemented using existing software. The use of these models not only avoids computational intensity and instability that may occur by directly applying a numerical algorithm on a complicated joint likelihood function, but also provides additional insights on sleep activities. We demonstrated in simulation studies that the proposed estimating procedure performed well in estimating both fixed and random effects' parameters. We applied the proposed model to data from the Women's Interagency HIV Sleep Study to examine the association of employment status and age with overall sleep quality assessed by several actigraphy measured sleep variables.


Asunto(s)
Actigrafía , Muñeca , Actigrafía/métodos , Femenino , Humanos , Polisomnografía/métodos , Sueño
20.
J Biomed Inform ; 131: 104097, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35643272

RESUMEN

BACKGROUND: Observational studies incorporating real-world data from multiple institutions facilitate study of rare outcomes or exposures and improve generalizability of results. Due to privacy concerns surrounding patient-level data sharing across institutions, methods for performing regression analyses distributively are desirable. Meta-analysis of institution-specific estimates is commonly used, but has been shown to produce biased estimates in certain settings. While distributed regression methods are increasingly available, methods for analyzing count outcomes are currently limited. Count data in practice are commonly subject to overdispersion, exhibiting greater variability than expected under a given statistical model. OBJECTIVE: We propose a novel computational method, a one-shot distributed algorithm for quasi-Poisson regression (ODAP), to distributively model count outcomes while accounting for overdispersion. METHODS: ODAP incorporates a surrogate likelihood approach to perform distributed quasi-Poisson regression without requiring patient-level data sharing, only requiring sharing of aggregate data from each participating institution. ODAP requires at most three rounds of non-iterative communication among institutions to generate coefficient estimates and corresponding standard errors. In simulations, we evaluate ODAP under several data scenarios possible in multi-site analyses, comparing ODAP and meta-analysis estimates in terms of error relative to pooled regression estimates, considered the gold standard. In a proof-of-concept real-world data analysis, we similarly compare ODAP and meta-analysis in terms of relative error to pooled estimatation using data from the OneFlorida Clinical Research Consortium, modeling length of stay in COVID-19 patients as a function of various patient characteristics. In a second proof-of-concept analysis, using the same outcome and covariates, we incorporate data from the UnitedHealth Group Clinical Discovery Database together with the OneFlorida data in a distributed analysis to compare estimates produced by ODAP and meta-analysis. RESULTS: In simulations, ODAP exhibited negligible error relative to pooled regression estimates across all settings explored. Meta-analysis estimates, while largely unbiased, were increasingly variable as heterogeneity in the outcome increased across institutions. When baseline expected count was 0.2, relative error for meta-analysis was above 5% in 25% of iterations (250/1000), while the largest relative error for ODAP in any iteration was 3.59%. In our proof-of-concept analysis using only OneFlorida data, ODAP estimates were closer to pooled regression estimates than those produced by meta-analysis for all 15 covariates. In our distributed analysis incorporating data from both OneFlorida and the UnitedHealth Group Clinical Discovery Database, ODAP and meta-analysis estimates were largely similar, while some differences in estimates (as large as 13.8%) could be indicative of bias in meta-analytic estimates. CONCLUSIONS: ODAP performs privacy-preserving, communication-efficient distributed quasi-Poisson regression to analyze count outcomes using data stored within multiple institutions. Our method produces estimates nearly matching pooled regression estimates and sometimes more accurate than meta-analysis estimates, most notably in settings with relatively low counts and high outcome heterogeneity across institutions.


Asunto(s)
COVID-19 , Algoritmos , COVID-19/epidemiología , Humanos , Funciones de Verosimilitud , Modelos Estadísticos , Análisis de Regresión
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA