RESUMEN
Outbreaks of emerging and zoonotic infections represent a substantial threat to human health and well-being. These outbreaks tend to be characterised by highly stochastic transmission dynamics with intense variation in transmission potential between cases. The negative binomial distribution is commonly used as a model for transmission in the early stages of an epidemic as it has a natural interpretation as the convolution of a Poisson contact process and a gamma-distributed infectivity. In this study we expand upon the negative binomial model by introducing a beta-Poisson mixture model in which infectious individuals make contacts at the points of a Poisson process and then transmit infection along these contacts with a beta-distributed probability. We show that the negative binomial distribution is a limit case of this model, as is the zero-inflated Poisson distribution obtained by combining a Poisson-distributed contact process with an additional failure probability. We assess the beta-Poisson model's applicability by fitting it to secondary case distributions (the distribution of the number of subsequent cases generated by a single case) estimated from outbreaks covering a range of pathogens and geographical settings. We find that while the beta-Poisson mixture can achieve a closer to fit to data than the negative binomial distribution, it is consistently outperformed by the negative binomial in terms of Akaike Information Criterion, making it a suboptimal choice on parsimonious grounds. The beta-Poisson performs similarly to the negative binomial model in its ability to capture features of the secondary case distribution such as overdispersion, prevalence of superspreaders, and the probability of a case generating zero subsequent cases. Despite this possible shortcoming, the beta-Poisson distribution may still be of interest in the context of intervention modelling since its structure allows for the simulation of measures which change contact structures while leaving individual-level infectivity unchanged, and vice-versa.
Asunto(s)
Brotes de Enfermedades , Modelos Estadísticos , Humanos , Simulación por Computador , Distribución de Poisson , Distribución BinomialRESUMEN
OBJECTIVES: Screening blood units for compatibility constitutes a Bernoulli series. Estimating the number of units needed to be screened represents a classic waiting time problem that may be resolved using the Negative Binomial Distribution. The currently recommended method for estimating the number of units screened, n, to find a required number of compatible units, r, with a given probability, p, is n = r/p. This coincides with the mean of the Negative Binomial Distribution so that the actual number of units screened will often be underestimated by the current method. METHODS: The cumulative distribution function of the Negative Binomial Distribution provides the probability of success (compatibility), F(n;r,p), as a function of the number of trials performed (attempted crossmatches), n, the probability of success on each trial, p, and the number of successes (compatible units) required, r. Choosing a threshold cumulative probability sufficiently high, such as F ~ 0.9, for example, will provide confidence that the projected number of units screened will be underestimated less often (~10% of the time). RESULTS: With F ≥ 0.9, the estimated number of attempted crossmatches ranges from 1.3 to 2.3 times as many as the number calculated by the current method. As a rule of thumb approximately 1.6 times the current estimated number provides a similar estimate (n ~ 1.6âr/p). CONCLUSIONS: Waiting time underestimation will be reduced significantly by using the Negative Binomial Distribution solution and should be accompanied by improved customer satisfaction.
Asunto(s)
Tipificación y Pruebas Cruzadas Sanguíneas , Humanos , Factores de Tiempo , Tipificación y Pruebas Cruzadas Sanguíneas/métodos , Distribución Binomial , Transfusión SanguíneaRESUMEN
Timeline followback (TLFB) is often used in addiction research to monitor recent substance use, such as the number of abstinent days in the past week. TLFB data usually take the form of binomial counts that exhibit overdispersion and zero inflation. Motivated by a 12-week randomized trial evaluating the efficacy of varenicline tartrate for smoking cessation among adolescents, we propose a Bayesian zero-inflated beta-binomial model for the analysis of longitudinal, bounded TLFB data. The model comprises a mixture of a point mass that accounts for zero inflation and a beta-binomial distribution for the number of days abstinent in the past week. Because treatment effects appear to level off during the study, we introduce random changepoints for each study group to reflect group-specific changes in treatment efficacy over time. The model also includes fixed and random effects that capture group- and subject-level slopes before and after the changepoints. Using the model, we can accurately estimate the mean trend for each study group, test whether the groups experience changepoints simultaneously, and identify critical windows of treatment efficacy. For posterior computation, we propose an efficient Markov chain Monte Carlo algorithm that relies on easily sampled Gibbs and Metropolis-Hastings steps. Our application shows that the varenicline group has a short-term positive effect on abstinence that tapers off after week 9.
Asunto(s)
Modelos Estadísticos , Trastornos Relacionados con Sustancias , Adolescente , Humanos , Teorema de Bayes , Distribución Binomial , AlgoritmosRESUMEN
BACKGROUND: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data. METHODS: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( N = 100 $$ N=100 $$ to 500), zero rate (0.2 to 0.8), and intervention effect sizes. RESULTS: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory. CONCLUSIONS: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
Asunto(s)
Simulación por Computador , Modelos Estadísticos , Humanos , Distribución de Poisson , Modelos Lineales , Tamaño de la Muestra , Evaluación de Resultado en la Atención de Salud/estadística & datos numéricos , Interpretación Estadística de Datos , Alcoholismo , Consumo de Bebidas Alcohólicas/epidemiología , Distribución BinomialRESUMEN
PURPOSE: Outcome variables that are assumed to follow a negative binomial distribution are frequently used in both clinical and epidemiological studies. Epidemiological studies, particularly those performed by pharmaceutical companies often aim to describe a population rather than compare treatments. Such descriptive studies are often analysed using confidence intervals. While precision calculations and sample size calculations are not always performed in these settings, they have the important role of setting expectations of what results the study may generate. Current methods for precision calculations for the negative binomial rate are based on plugging in parameter values into the confidence interval formulae. This method has the downside of ignoring the randomness of the confidence interval limits. To enable better practice for precision calculations, methods are needed that address the randomness. METHODS: Using the well-known delta-method we develop a method for calculating the precision probability, that is, the probability of achieving a certain width. We assess the performance of the method in smaller samples through simulations. RESULTS: The method for the precision probability performs well in small to medium sample sizes, and the usefulness of the method is demonstrated through an example. CONCLUSIONS: We have developed a simple method for calculating the precision probability for negative binomial rates. This method can be used when planning epidemiological studies in for example, asthma, while correctly taking the randomness of confidence intervals into account.
Asunto(s)
Modelos Estadísticos , Humanos , Tamaño de la Muestra , Probabilidad , Distribución Binomial , Intervalos de ConfianzaRESUMEN
Count outcomes are collected in clinical trials for new drug development in several therapeutic areas and the event rate is commonly used as a single primary endpoint. Count outcomes that are greater than the mean value are termed overdispersion; thus, count outcomes are assumed to have a negative binomial distribution. However, in clinical trials for treating asthma and chronic obstructive pulmonary disease (COPD), a regulatory agency has suggested that a continuous endpoint related to lung function must be evaluated as a primary endpoint in addition to the event rate. The two co-primary endpoints that need to be evaluated include overdispersed count and continuous outcomes. Some researchers have proposed sample size calculation methods in the context of co-primary endpoints for various outcome types. However, methodologies for sample size calculation in trials with two co-primary endpoints, including overdispersed count and continuous outcomes, required when planning clinical trials for treating asthma and COPD, remain to be proposed. In this study, we aimed to develop a hypothesis-testing method and a corresponding sample size calculation method with two co-primary endpoints including overdispersed count and continuous outcomes. In a simulation, we demonstrated that the proposed sample size calculation method has adequate power accuracy. In addition, we illustrated an application of the proposed sample size calculation method to a placebo-controlled Phase 3 trial for patients with COPD.
Asunto(s)
Asma , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Tamaño de la Muestra , Asma/tratamiento farmacológico , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/tratamiento farmacológico , Distribución Binomial , Simulación por ComputadorRESUMEN
Spatial count data with an abundance of zeros arise commonly in disease mapping studies. Typically, these data are analyzed using zero-inflated models, which comprise a mixture of a point mass at zero and an ordinary count distribution, such as the Poisson or negative binomial. However, due to their mixture representation, conventional zero-inflated models are challenging to explain in practice because the parameter estimates have conditional latent-class interpretations. As an alternative, several authors have proposed marginalized zero-inflated models that simultaneously model the excess zeros and the marginal mean, leading to a parameterization that more closely aligns with ordinary count models. Motivated by a study examining predictors of COVID-19 death rates, we develop a spatiotemporal marginalized zero-inflated negative binomial model that directly models the marginal mean, thus extending marginalized zero-inflated models to the spatial setting. To capture the spatiotemporal heterogeneity in the data, we introduce region-level covariates, smooth temporal effects, and spatially correlated random effects to model both the excess zeros and the marginal mean. For estimation, we adopt a Bayesian approach that combines full-conditional Gibbs sampling and Metropolis-Hastings steps. We investigate features of the model and use the model to identify key predictors of COVID-19 deaths in the US state of Georgia during the 2021 calendar year.
Asunto(s)
Teorema de Bayes , Biometría , COVID-19 , Modelos Estadísticos , Humanos , COVID-19/mortalidad , COVID-19/epidemiología , Georgia/epidemiología , Biometría/métodos , Análisis Espacial , Distribución BinomialRESUMEN
Existing methods for generating synthetic genotype data are ill-suited for replicating the effects of assortative mating (AM). We propose rb_dplr, a novel and computationally efficient algorithm for generating high-dimensional binary random variates that effectively recapitulates AM-induced genetic architectures using the Bahadur order-2 approximation of the multivariate Bernoulli distribution. The rBahadur R library is available through the Comprehensive R Archive Network at https://CRAN.R-project.org/package=rBahadur .
Asunto(s)
Algoritmos , Comunicación Celular , Distribución Binomial , Simulación por Computador , GenotipoRESUMEN
BACKGROUND: The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate. RESULTS: We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice. CONCLUSIONS: With our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS .
Asunto(s)
Algoritmos , Mama , Masculino , Humanos , Mutación , Distribución Binomial , Simulación por ComputadorRESUMEN
Count data with excessive zeros are increasingly ubiquitous in genetic association studies, such as neuritic plaques in brain pathology for Alzheimer's disease. Here, we developed gene-based association tests to model such data by a mixture of two distributions, one for the structural zeros contributed by the Binomial distribution, and the other for the counts from the Poisson distribution. We derived the score statistics of the corresponding parameter of the rare variants in the zero-inflated Poisson regression model, and then constructed burden (ZIP-b) and kernel (ZIP-k) tests for the association tests. We evaluated omnibus tests that combined both ZIP-b and ZIP-k tests. Through simulated sequence data, we illustrated the potential power gain of our proposed method over a two-stage method that analyzes binary and non-zero continuous data separately for both burden and kernel tests. The ZIP burden test outperformed the kernel test as expected in all scenarios except for the scenario of variants with a mixture of directions in the genetic effects. We further demonstrated its applications to analyses of the neuritic plaque data in the ROSMAP cohort. We expect our proposed test to be useful in practice as more powerful than or complementary to the two-stage method.
Asunto(s)
Modelos Genéticos , Modelos Estadísticos , Distribución Binomial , Humanos , Fenotipo , Distribución de PoissonRESUMEN
A major challenge emerging in genomic medicine is how to assess best disease risk from rare or novel variants found in disease-related genes. The expanding volume of data generated by very large phenotyping efforts coupled to DNA sequence data presents an opportunity to reinterpret genetic liability of disease risk. Here we propose a framework to estimate the probability of disease given the presence of a genetic variant conditioned on features of that variant. We refer to this as the penetrance, the fraction of all variant heterozygotes that will present with disease. We demonstrate this methodology using a well-established disease-gene pair, the cardiac sodium channel gene SCN5A and the heart arrhythmia Brugada syndrome. From a review of 756 publications, we developed a pattern mixture algorithm, based on a Bayesian Beta-Binomial model, to generate SCN5A penetrance probabilities for the Brugada syndrome conditioned on variant-specific attributes. These probabilities are determined from variant-specific features (e.g. function, structural context, and sequence conservation) and from observations of affected and unaffected heterozygotes. Variant functional perturbation and structural context prove most predictive of Brugada syndrome penetrance.
Asunto(s)
Síndrome de Brugada/genética , Modelos Genéticos , Canal de Sodio Activado por Voltaje NAV1.5/genética , Penetrancia , Polimorfismo de Nucleótido Simple , Algoritmos , Teorema de Bayes , Distribución Binomial , Síndrome de Brugada/terapia , Bases de Datos Genéticas/estadística & datos numéricos , Conjuntos de Datos como Asunto , Humanos , Medicina de Precisión/métodosRESUMEN
Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein identities. These identities, in turn, might be utilized to define a corresponding goodness-of-fit (GoF) test, the test statistic of which involves the computation of weighted means for a user-selected weight function f. Here, the choice of f should be done with respect to the relevant alternative scenario, as it will have great impact on the GoF-test's performance. We derive the asymptotics of both the Poisson and binomial Stein-type GoF-statistic for general count distributions (we also briefly consider the negative-binomial case), such that the asymptotic power is easily computed for arbitrary alternatives. This allows for an efficient implementation of optimal Stein tests, that is, which are most powerful within a given class F $\mathcal {F}$ of weight functions. The performance and application of the optimal Stein-type GoF-tests is investigated by simulations and several medical data examples.
Asunto(s)
Modelos Estadísticos , Distribución BinomialRESUMEN
Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .
Asunto(s)
Neoplasias , Programas Informáticos , Distribución Binomial , Humanos , Neoplasias/genética , Proyectos de Investigación , Tamaño de la MuestraRESUMEN
Cluster randomized trials (CRT) have been widely employed in medical and public health research. Many clinical count outcomes, such as the number of falls in nursing homes, exhibit excessive zero values. In the presence of zero inflation, traditional power analysis methods for count data based on Poisson or negative binomial distribution may be inadequate. In this study, we present a sample size method for CRTs with zero-inflated count outcomes. It is developed based on GEE regression directly modeling the marginal mean of a zero-inflated Poisson outcome, which avoids the challenge of testing two intervention effects under traditional modeling approaches. A closed-form sample size formula is derived which properly accounts for zero inflation, ICCs due to clustering, unbalanced randomization, and variability in cluster size. Robust approaches, including t-distribution-based approximation and Jackknife re-sampling variance estimator, are employed to enhance trial properties under small sample sizes. Extensive simulations are conducted to evaluate the performance of the proposed method. An application example is presented in a real clinical trial setting.
Asunto(s)
Modelos Estadísticos , Distribución Binomial , Análisis por Conglomerados , Simulación por Computador , Humanos , Distribución de Poisson , Ensayos Clínicos Controlados Aleatorios como Asunto , Tamaño de la MuestraRESUMEN
BACKGROUND: Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data. METHODS: Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database. RESULTS: Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors. CONCLUSIONS: Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.
Asunto(s)
Hospitales , Modelos Estadísticos , Distribución Binomial , Humanos , Tiempo de Internación , Distribución de PoissonRESUMEN
BACKGROUND: We consider cluster size data of SARS-CoV-2 transmissions for a number of different settings from recently published data. The statistical characteristics of superspreading events are commonly described by fitting a negative binomial distribution to secondary infection and cluster size data as an alternative to the Poisson distribution as it is a longer tailed distribution, with emphasis given to the value of the extra parameter which allows the variance to be greater than the mean. Here we investigate whether other long tailed distributions from more general extended Poisson process modelling can better describe the distribution of cluster sizes for SARS-CoV-2 transmissions. METHODS: We use the extended Poisson process modelling (EPPM) approach with nested sets of models that include the Poisson and negative binomial distributions to assess the adequacy of models based on these standard distributions for the data considered. RESULTS: We confirm the inadequacy of the Poisson distribution in most cases, and demonstrate the inadequacy of the negative binomial distribution in some cases. CONCLUSIONS: The probability of a superspreading event may be underestimated by use of the negative binomial distribution as much larger tail probabilities are indicated by EPPM distributions than negative binomial alternatives. We show that the large shared accommodation, meal and work settings, of the settings considered, have the potential for more severe superspreading events than would be predicted by a negative binomial distribution. Therefore public health efforts to prevent transmission in such settings should be prioritised.
Asunto(s)
COVID-19 , Pandemias , Distribución Binomial , Humanos , Distribución de Poisson , SARS-CoV-2RESUMEN
BACKGROUND: Despite expected initial universal susceptibility to a novel pandemic pathogen like SARS-CoV-2, the pandemic has been characterized by higher observed incidence in older persons and lower incidence in children and adolescents. OBJECTIVE: To determine whether differential testing by age group explains observed variation in incidence. DESIGN: Population-based cohort study. SETTING: Ontario, Canada. PARTICIPANTS: Persons diagnosed with SARS-CoV-2 and those tested for SARS-CoV-2. MEASUREMENTS: Test volumes from the Ontario Laboratories Information System, number of laboratory-confirmed SARS-CoV-2 cases from the Integrated Public Health Information System, and population figures from Statistics Canada. Demographic and temporal patterns in incidence, testing rates, and test positivity were explored using negative binomial regression models and standardization. Sources of variation in standardized ratios were identified and test-adjusted standardized infection ratios (SIRs) were estimated by metaregression. RESULTS: Observed disease incidence and testing rates were highest in the oldest age group and markedly lower in those younger than 20 years; no differences in incidence were seen by sex. After adjustment for testing frequency, SIRs were lowest in children and in adults aged 70 years or older and markedly higher in adolescents and in males aged 20 to 49 years compared with the overall population. Test-adjusted SIRs were highly correlated with standardized positivity ratios (Pearson correlation coefficient, 0.87 [95% CI, 0.68 to 0.95]; P < 0.001) and provided a case identification fraction similar to that estimated with serologic testing (26.7% vs. 17.2%). LIMITATIONS: The novel methodology requires external validation. Case and testing data were not linkable at the individual level. CONCLUSION: Adjustment for testing frequency provides a different picture of SARS-CoV-2 infection risk by age, suggesting that younger males are an underrecognized group at high risk for SARS-CoV-2 infection. PRIMARY FUNDING SOURCE: Canadian Institutes of Health Research.
Asunto(s)
Prueba de COVID-19/estadística & datos numéricos , COVID-19/epidemiología , Adolescente , Adulto , Distribución por Edad , Anciano , Anciano de 80 o más Años , Distribución Binomial , Niño , Preescolar , Femenino , Humanos , Incidencia , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Ontario/epidemiología , Pandemias , SARS-CoV-2 , Distribución por Sexo , Adulto JovenRESUMEN
Speech-recognition tests are a routine component of the clinical hearing evaluation. The most common type of test uses recorded monosyllabic words presented in quiet. The interpretation of test scores relies on an understanding of the variance of repeated tests. Confidence intervals are useful for determining if two scores are significantly different or if the difference is due to the variability of test scores. Because the response to each test item is binary, either correct or incorrect, the binomial distribution has been used to estimate confidence intervals. This method requires that test scores be independent. If the scores are not independent, the binomial distribution will not accurately estimate the variance of repeated scores. A previously published dataset with repeated scores from normal-hearing and hearing-impaired listeners was used to derive confidence intervals from actual test scores in contrast to the predicted confidence intervals in earlier reports. This analysis indicates that confidence intervals predicted by the binomial distribution substantially overestimate the variance of repeated scores resulting in erroneously broad confidence intervals. High correlations were found for repeated scores, indicating that scores are not independent. The interdependence of repeated scores invalidates confidence intervals predicted by the binomial distribution. Confidence intervals and confidence levels for repeated measures were determined empirically from measured test scores to assist in interpreting differences between repeat scores.
Asunto(s)
Pérdida Auditiva Sensorineural , Percepción del Habla , Distribución Binomial , Intervalos de Confianza , Humanos , Habla , Pruebas de Discriminación del Habla/métodos , Percepción del Habla/fisiología , Prueba del Umbral de Recepción del HablaRESUMEN
The identification and treatment of "one-inflation" in estimating the size of an elusive population has received increasing attention in capture-recapture literature in recent years. The phenomenon occurs when the number of units captured exactly once clearly exceeds the expectation under a baseline count distribution. Ignoring one-inflation has serious consequences for estimation of the population size, which can be drastically overestimated. In this paper we propose a Bayesian approach for Poisson, geometric, and negative binomial one-inflated count distributions. Posterior inference for population size will be obtained applying a Gibbs sampler approach. We also provide a Bayesian approach to model selection. We illustrate the proposed methodology with simulated and real data and propose a new application in official statistics to estimate the number of people implicated in the exploitation of prostitution in Italy.
Asunto(s)
Modelos Estadísticos , Teorema de Bayes , Distribución Binomial , Humanos , Distribución de Poisson , Densidad de PoblaciónRESUMEN
This article proposes a Bayesian regression model for nonlinear zero-inflated longitudinal count data that models the median count as an alternative to the mean count. The nonlinear model generalizes a recently introduced linear mixed-effects model based on the zero-inflated discrete Weibull (ZIDW) distribution. The ZIDW distribution is more robust to severe skewness in the data than conventional zero-inflated count distributions such as the zero-inflated negative binomial (ZINB) distribution. Moreover, the ZIDW distribution is attractive because of its convenience to model the median counts given its closed-form quantile function. The median is a more robust measure of central tendency than the mean when the data, for instance, zero-inflated counts, are right-skewed. In an application of the model we consider a biphasic mixed-effects model consisting of an intercept term and two slope terms. Conventionally, the ZIDW model separately specifies the predictors for the zero-inflation probability and the counting process's median count. In our application, the two latent class interpretations are not clinically plausible. Therefore, we propose a marginal ZIDW model that directly models the biphasic median counts marginally. We also consider the marginal ZINB model to make inferences about the nonlinear mean counts over time. Our simulation study shows that the models have good properties in terms of accuracy and confidence interval coverage.