Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 137
Filtrar
1.
BMC Infect Dis ; 24(1): 1006, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39300391

RESUMO

BACKGROUND: It is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu's search volume, in the early warning and predicting the epidemic trend of COVID-19. METHODS: The daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable. RESULTS: The Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of "Influenza" and "Pneumonia" in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of "SARS", "Pneumonia", "Coronavirus" in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while "Influenza" changed to negative correlated (rs: -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of "COVID-19", "Pneumonia", "Coronavirus", "SARS" and "Mask" could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February. CONCLUSION: The Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.


Assuntos
COVID-19 , Modelos Estatísticos , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/diagnóstico , China/epidemiologia , SARS-CoV-2/isolamento & purificação , Análise de Regressão , Surtos de Doenças
2.
Dent Res J (Isfahan) ; 21: 26, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39188390

RESUMO

Background: Pregnant women have poor knowledge of oral hygiene during pregnancy. One problem with the follow-up of dental caries in this group is zero accumulation in the decayed, missing, and filled teeth (DMFT) index, for which some models must be used to achieve valid results. The studied population may be heterogeneous in longitudinal studies, leading to biased estimates. We aimed to assess the impact of oral health education on dental caries in pregnant women using a suitable model in a longitudinal experimental study with heterogeneous random effects. Materials and Methods: This longitudinal, experimental research was carried out on pregnant women who visited medical centers in Tehran. The educational group (236 cases) received education for three sessions. The control group (200 cases) received only standard training. The DMFT index assessed oral and dental health at baseline, 6 months, and 24 months after delivery. The Chi-square test was used for comparing nominal variables and the Mann-Whitney U test for ordinal variables. The zero-inflated Poisson (ZIP) model was applied under heterogeneous and homogeneous random effects using R 4.2.1, SPSS 26, and SAS 9.4. The level of significance was set at 0.05. Results: Data from 436 women aged 15 years and older were analyzed. Zero accumulation in the DMFT was mainly related to the filled teeth (51%). The heterogeneous ZIP model fitted better to the data. On average, the intervention group exhibited a higher rate of change in filled teeth over time than the control group (P = 0.021). Conclusion: The proposed ZIP model is a suitable model for predicting filled teeth in pregnant women. An educational intervention during pregnancy can improve oral health in the long-term follow-up.

3.
medRxiv ; 2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39108504

RESUMO

Double-zero-event studies (DZS) pose a challenge for accurately estimating the overall treatment effect in meta-analysis. Current approaches, such as continuity correction or omission of DZS, are commonly employed, yet these ad hoc methods can yield biased conclusions. Although the standard bivariate generalized linear mixed model can accommodate DZS, it fails to address the potential systemic differences between DZS and other studies. In this paper, we propose a zero-inflated bivariate generalized linear mixed model (ZIBGLMM) to tackle this issue. This two-component finite mixture model includes zero-inflation for a subpopulation with negligible or extremely low risk. We develop both frequentist and Bayesian versions of ZIBGLMM and examine its performance in estimating risk ratios (RRs) against the bivariate generalized linear mixed model and conventional two-stage meta-analysis that excludes DZS. Through extensive simulation studies and real-world meta-analysis case studies, we demonstrate that ZIBGLMM outperforms the bivariate generalized linear mixed model and conventional two-stage meta-analysis that excludes DZS in estimating the true effect size with substantially less bias and comparable coverage probability.

4.
Stat Med ; 2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39193779

RESUMO

BACKGROUND: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data. METHODS: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( N = 100 $$ N=100 $$ to 500), zero rate (0.2 to 0.8), and intervention effect sizes. RESULTS: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory. CONCLUSIONS: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.

5.
Appl Psychol Meas ; 48(6): 235-256, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39166184

RESUMO

Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.

6.
Biom J ; 66(5): e202300182, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39001709

RESUMO

Spatial count data with an abundance of zeros arise commonly in disease mapping studies. Typically, these data are analyzed using zero-inflated models, which comprise a mixture of a point mass at zero and an ordinary count distribution, such as the Poisson or negative binomial. However, due to their mixture representation, conventional zero-inflated models are challenging to explain in practice because the parameter estimates have conditional latent-class interpretations. As an alternative, several authors have proposed marginalized zero-inflated models that simultaneously model the excess zeros and the marginal mean, leading to a parameterization that more closely aligns with ordinary count models. Motivated by a study examining predictors of COVID-19 death rates, we develop a spatiotemporal marginalized zero-inflated negative binomial model that directly models the marginal mean, thus extending marginalized zero-inflated models to the spatial setting. To capture the spatiotemporal heterogeneity in the data, we introduce region-level covariates, smooth temporal effects, and spatially correlated random effects to model both the excess zeros and the marginal mean. For estimation, we adopt a Bayesian approach that combines full-conditional Gibbs sampling and Metropolis-Hastings steps. We investigate features of the model and use the model to identify key predictors of COVID-19 deaths in the US state of Georgia during the 2021 calendar year.


Assuntos
Teorema de Bayes , Biometria , COVID-19 , Modelos Estatísticos , Humanos , COVID-19/mortalidade , COVID-19/epidemiologia , Georgia/epidemiologia , Biometria/métodos , Análise Espacial , Distribuição Binomial
7.
Annu Rev Stat Appl ; 11(1): 483-504, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38962089

RESUMO

The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.

8.
Behav Res Methods ; 56(7): 7963-7984, 2024 10.
Artigo em Inglês | MEDLINE | ID: mdl-38987450

RESUMO

Generalized linear mixed models (GLMMs) have great potential to deal with count data in single-case experimental designs (SCEDs). However, applied researchers have faced challenges in making various statistical decisions when using such advanced statistical techniques in their own research. This study focused on a critical issue by investigating the selection of an appropriate distribution to handle different types of count data in SCEDs due to overdispersion and/or zero-inflation. To achieve this, I proposed two model selection frameworks, one based on calculating information criteria (AIC and BIC) and another based on utilizing a multistage-model selection procedure. Four data scenarios were simulated including Poisson, negative binominal (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB). The same set of models (i.e., Poisson, NB, ZIP, and ZINB) were fitted for each scenario. In the simulation, I evaluated 10 model selection strategies within the two frameworks by assessing the model selection bias and its consequences on the accuracy of the treatment effect estimates and inferential statistics. Based on the simulation results and previous work, I provide recommendations regarding which model selection methods should be adopted in different scenarios. The implications, limitations, and future research directions are also discussed.


Assuntos
Método de Monte Carlo , Modelos Lineares , Humanos , Estudos de Caso Único como Assunto , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos , Distribuição de Poisson , Projetos de Pesquisa
9.
J R Stat Soc Ser C Appl Stat ; 73(3): 598-620, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-39072299

RESUMO

Recurrent events are common in clinical studies and are often subject to terminal events. In pragmatic trials, participants are often nested in clinics and can be susceptible or structurally unsusceptible to the recurrent events. We develop a Bayesian shared random effects model to accommodate this complex data structure. To achieve robustness, we consider the Dirichlet processes to model the residual of the accelerated failure time model for the survival process as well as the cluster-specific shared frailty distribution, along with an efficient sampling algorithm for posterior inference. Our method is applied to a recent cluster randomized trial on fall injury prevention.

10.
Front Microbiol ; 15: 1394204, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38873138

RESUMO

Motivation: High-throughput sequencing technology facilitates the quantitative analysis of microbial communities, improving the capacity to investigate the associations between the human microbiome and diseases. Our primary motivating application is to explore the association between gut microbes and obesity. The complex characteristics of microbiome data, including high dimensionality, zero inflation, and over-dispersion, pose new statistical challenges for downstream analysis. Results: We propose a GLM-based zero-inflated generalized Poisson factor analysis (GZIGPFA) model to analyze microbiome data with complex characteristics. The GZIGPFA model is based on a zero-inflated generalized Poisson (ZIGP) distribution for modeling microbiome count data. A link function between the generalized Poisson rate and the probability of excess zeros is established within the generalized linear model (GLM) framework. The latent parameters of the GZIGPFA model constitute a low-rank matrix comprising a low-dimensional score matrix and a loading matrix. An alternating maximum likelihood algorithm is employed to estimate the unknown parameters, and cross-validation is utilized to determine the rank of the model in this study. The proposed GZIGPFA model demonstrates superior performance and advantages through comprehensive simulation studies and real data applications.

11.
J Appl Stat ; 51(9): 1792-1817, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38933142

RESUMO

Proportional data arise frequently in a wide variety of fields of study. Such data often exhibit extra variation such as over/under dispersion, sparseness and zero inflation. For example, the hepatitis data present both sparseness and zero inflation with 19 contributing non-zero denominators of 5 or less and with 36 having zero seropositive out of 83 annual age groups. The whitefly data consists of 640 observations with 339 zeros (53%), which demonstrates extra zero inflation. The catheter management data involve excessive zeros with over 60% zeros averagely for outcomes of 193 urinary tract infections, 194 outcomes of catheter blockages and 193 outcomes of catheter displacements. However, the existing models cannot always address such features appropriately. In this paper, a new two-parameter probability distribution called Lindley-binomial (LB) distribution is proposed to analyze the proportional data with such features. The probabilistic properties of the distribution such as moment, moment generating function are derived. The Fisher scoring algorithm and EM algorithm are presented for the computation of estimates of parameters in the proposed LB regression model. The issues on goodness of fit for the LB model are discussed. A limited simulation study is also performed to evaluate the performance of derived EM algorithms for the estimation of parameters in the model with/without covariates. The proposed model is illustrated through three aforementioned proportional datasets.

12.
Curr Res Insect Sci ; 5: 100078, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38576775

RESUMO

Population density and structure are critical to nature conservation and pest management. Traditional sampling methods such as capture-mark-recapture and catch-effort can't be used in situations where catching, marking, or removing individuals are not feasible. N-mixture models use repeated count data to estimate population abundance based on detection probability. They are widely adopted in wildlife surveys in recent years to account for imperfect detection. However, its application in entomology is relatively new. In this paper, we describe the general procedures of N-mixture models in population studies from data collection to model fitting and evaluation. Using Lycorma delicatula egg mass survey data at 28 plots in seven sites from the field, we found that detection probability (p) was negatively correlated with tree diameter at breast height (DBH), ranged from 0.516 [95 % CI: 0.470-0.561] to 0.614 [95 % CI: 0.566-0.660] between the 1st and the 3rd sample period. Furthermore, egg mass abundance (λ) was positively associated with basal area (BA) for the sample unit (single tree), with more egg masses on tree of heaven (TOH) trees. More egg masses were also expected on trees of other species in TOH plots. Predicted egg mass density (masses/100 m2) ranged from 5.0 (95 % CI: 3.0-16.0) (Gordon) to 276.9 (95 % CI: 255.0-303.0) (Susquehannock) for TOH plots, and 11.0 (95 % CI: 9.00-15.33) (Gordon) to 228.3 (95 % CI: 209.7-248.3) (Burlington) for nonTOH plots. Site-specific abundance estimates from N-mixture models were generally higher compared to observed maximum counts. N-mixture models could have great potential in insect population surveys in agriculture and forestry in the future.

13.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38470256

RESUMO

Semicontinuous outcomes commonly arise in a wide variety of fields, such as insurance claims, healthcare expenditures, rainfall amounts, and alcohol consumption. Regression models, including Tobit, Tweedie, and two-part models, are widely employed to understand the relationship between semicontinuous outcomes and covariates. Given the potential detrimental consequences of model misspecification, after fitting a regression model, it is of prime importance to check the adequacy of the model. However, due to the point mass at zero, standard diagnostic tools for regression models (eg, deviance and Pearson residuals) are not informative for semicontinuous data. To bridge this gap, we propose a new type of residuals for semicontinuous outcomes that is applicable to general regression models. Under the correctly specified model, the proposed residuals converge to being uniformly distributed, and when the model is misspecified, they significantly depart from this pattern. In addition to in-sample validation, the proposed methodology can also be employed to evaluate predictive distributions. We demonstrate the effectiveness of the proposed tool using health expenditure data from the US Medical Expenditure Panel Survey.


Assuntos
Gastos em Saúde
14.
Behav Res Methods ; 56(4): 2765-2781, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38383801

RESUMO

Count outcomes are frequently encountered in single-case experimental designs (SCEDs). Generalized linear mixed models (GLMMs) have shown promise in handling overdispersed count data. However, the presence of excessive zeros in the baseline phase of SCEDs introduces a more complex issue known as zero-inflation, often overlooked by researchers. This study aimed to deal with zero-inflated and overdispersed count data within a multiple-baseline design (MBD) in single-case studies. It examined the performance of various GLMMs (Poisson, negative binomial [NB], zero-inflated Poisson [ZIP], and zero-inflated negative binomial [ZINB] models) in estimating treatment effects and generating inferential statistics. Additionally, a real example was used to demonstrate the analysis of zero-inflated and overdispersed count data. The simulation results indicated that the ZINB model provided accurate estimates for treatment effects, while the other three models yielded biased estimates. The inferential statistics obtained from the ZINB model were reliable when the baseline rate was low. However, when the data were overdispersed but not zero-inflated, both the ZINB and ZIP models exhibited poor performance in accurately estimating treatment effects. These findings contribute to our understanding of using GLMMs to handle zero-inflated and overdispersed count data in SCEDs. The implications, limitations, and future research directions are also discussed.


Assuntos
Estudos de Caso Único como Assunto , Humanos , Modelos Lineares , Análise Multinível/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Distribuição de Poisson , Simulação por Computador , Projetos de Pesquisa
15.
Stat Med ; 42(28): 5100-5112, 2023 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-37715594

RESUMO

Physical activity (PA) guidelines recommend that PA be accumulated in bouts of 10 minutes or more in duration. Recently, researchers have sought to better understand how participants in PA interventions increase their activity. Participants can increase their daily PA by increasing the number of PA bouts per day while keeping the duration of the bouts constant; they can keep the number of bouts constant but increase the duration of each bout; or participants can increase both the number of bouts and their duration. We propose a novel joint modeling framework for modeling PA bouts and their duration over time. Our joint model is comprised of two sub-models: a mixed-effects Poisson hurdle sub-model for the number of bouts per day and a mixed-effects location scale gamma regression sub-model to characterize the duration of the bouts and their variance. The model allows us to estimate how daily PA bouts and their duration vary together over the course of an intervention and by treatment condition and is specifically designed to capture the unique distributional features of bouted PA as measured by accelerometer: frequent measurements, zero-inflated bouts, and skewed bout durations. We apply our methods to the Make Better Choices study, a longitudinal lifestyle intervention trial to increase PA. We perform a simulation study to evaluate how well our model is able to estimate relationships between outcomes.


Assuntos
Exercício Físico , Estilo de Vida , Humanos , Acelerometria/métodos , Fatores de Tempo , Ensaios Clínicos como Assunto
16.
Stat Med ; 42(25): 4632-4643, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37607718

RESUMO

In this article, we present a flexible model for microbiome count data. We consider a quasi-likelihood framework, in which we do not make any assumptions on the distribution of the microbiome count except that its variance is an unknown but smooth function of the mean. By comparing our model to the negative binomial generalized linear model (GLM) and Poisson GLM in simulation studies, we show that our flexible quasi-likelihood method yields valid inferential results. Using a real microbiome study, we demonstrate the utility of our method by examining the relationship between adenomas and microbiota. We also provide an R package "fql" for the application of our method.


Assuntos
Microbiota , Modelos Estatísticos , Humanos , Funções Verossimilhança , Simulação por Computador , Distribuição de Poisson
17.
R Soc Open Sci ; 10(8): 221226, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37621657

RESUMO

In this paper, performance of hurdle models in rare events data is improved by modifying their binary component. The rare-event weighted logistic regression model is adopted in place of logistic regression to deal with class imbalance due to rare events. Poisson Hurdle Rare Event Weighted Logistic Regression (REWLR) and Negative Binomial Hurdle (NBH) REWLR are developed as two-part models which use the REWLR model to estimate the probability of a positive count and a Poisson or NB zero-truncated count model to estimate non-zero counts. This research aimed to develop and assess the performance of the Poisson and Negative Binomial (NB) Hurdle Rare Event Weighted Logistic Regression (REWLR) models, applied to simulated data with various degrees of zero inflation and to Nairobi county's maternal mortality data. The study data on maternal mortality were pulled from JPHES. The data contain the number of maternal deaths, which is the outcome variable, and other obstetric and demographic factors recorded in MNCH facilities in Nairobi between October 2021 and January 2022. The models were also fit and evaluated based on simulated data with varying degrees of zero inflation. The obtained results are numerically validated and then discussed from both the mathematical and the maternal mortality perspective. Numerical simulations are also presented to give a more complete representation of the model dynamics. Results obtained suggest that NB Hurdle REWLR is the best performing model for zero inflated count data due to rare events.

18.
Biom J ; 65(8): e2100408, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37439440

RESUMO

Count data with an excess of zeros are often encountered when modeling infectious disease occurrence. The degree of zero inflation can vary over time due to nonepidemic periods as well as by age group or region. A well-established approach to analyze multivariate incidence time series is the endemic-epidemic modeling framework, also known as the HHH approach. However, it assumes Poisson or negative binomial distributions and is thus not tailored to surveillance data with excess zeros. Here, we propose a multivariate zero-inflated endemic-epidemic model with random effects that extends HHH. Parameters of both the zero-inflation probability and the HHH part of this mixture model can be estimated jointly and efficiently via (penalized) maximum likelihood inference using analytical derivatives. We found proper convergence and good coverage of confidence intervals in simulation studies. An application to measles counts in the 16 German states, 2005-2018, showed that zero inflation is more pronounced in the Eastern states characterized by a higher vaccination coverage. Probabilistic forecasts of measles cases improved when accounting for zero inflation. We anticipate zero-inflated HHH models to be a useful extension also for other applications and provide an implementation in an R package.


Assuntos
Sarampo , Modelos Estatísticos , Humanos , Fatores de Tempo , Simulação por Computador , Sarampo/epidemiologia , Sarampo/prevenção & controle , Alemanha/epidemiologia , Distribuição de Poisson
19.
Stat Med ; 42(20): 3636-3648, 2023 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-37316997

RESUMO

Disease mapping is a research field to estimate spatial pattern of disease risks so that areas with elevated risk levels can be identified. The motivation of this article is from a study of dengue fever infection, which causes seasonal epidemics in almost every summer in Taiwan. For analysis of zero-inflated data with spatial correlation and covariates, current methods would either cause a computational burden or miss associations between zero and non-zero responses. In this article, we develop estimating equations for a mixture regression model that accommodates spatial dependence and zero inflation for study of disease propagation. Asymptotic properties for the proposed estimates are established. A simulation study is conducted to evaluate performance of the mixture estimating equations; and a dengue dataset from southern Taiwan is used to illustrate the proposed method.


Assuntos
Dengue , Epidemias , Humanos , Simulação por Computador , Análise Espacial , Taiwan/epidemiologia , Dengue/epidemiologia , Dengue/prevenção & controle , Modelos Estatísticos
20.
Biostatistics ; 2023 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-37257175

RESUMO

In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA