RESUMO
The concept of fitness is central to evolution, but it quantifies only the expected number of offspring an individual will produce. The actual number of offspring is also subject to demographic stochasticity-that is, randomness associated with birth and death processes. In nature, individuals who are more fecund tend to have greater variance in their offspring number. Here, we develop a model for the evolution of two types competing in a population of nonconstant size. The fitness of each type is determined by pairwise interactions in a prisoner's dilemma game, and the variance in offspring number depends upon its mean. Although defectors are preferred by natural selection in classical population models, since they always have greater fitness than cooperators, we show that sufficiently large offspring variance can reverse the direction of evolution and favor cooperation. Large offspring variance produces qualitatively new dynamics for other types of social interactions, as well, which cannot arise in populations with a fixed size or with a Poisson offspring distribution.
Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Humanos , Dinâmica Populacional , Densidade Demográfica , Seleção GenéticaRESUMO
Based on the well-known Poisson (P) distribution and the new generalized Lindley distribution (NGLD) developed by using gamma (α,θ) and gamma (α-1,θ) distributions, a new compound two-parameter Poisson generalized Lindley (TPPGL) distribution is proposed in this paper and thereon systematically explores the mathematical properties. Closed form expressions are assembled for such properties including the probability generating function, moments, skewness, kurtosis, etc. The likelihood-based method is used for estimating the parameters followed by a broad Monte Carlo simulation study. To further motivate the proposed model, a count regression model and a first order integer valued autoregressive process are constructed based on the novel TPPGL distribution. The empirical importance of the proposed models is confirmed through application to four real datasets.
Assuntos
Funções Verossimilhança , Humanos , Simulação por Computador , Distribuição de Poisson , Método de Monte CarloRESUMO
Binomial autoregressive models are frequently used for modeling bounded time series counts. However, they are not well developed for more complex bounded time series counts of the occurrence of n exchangeable and dependent units, which are becoming increasingly common in practice. To fill this gap, this paper first constructs an exchangeable Conway-Maxwell-Poisson-binomial (CMPB) thinning operator and then establishes the Conway-Maxwell-Poisson-binomial AR (CMPBAR) model. We establish its stationarity and ergodicity, discuss the conditional maximum likelihood (CML) estimate of the model's parameters, and establish the asymptotic normality of the CML estimator. In a simulation study, the boxplots illustrate that the CML estimator is consistent and the qqplots show the asymptotic normality of the CML estimator. In the real data example, our model takes a smaller AIC and BIC than its main competitors.
RESUMO
Recently developed actigraphy devices have made it possible for continuous and objective monitoring of sleep over multiple nights. Sleep variables captured by wrist actigraphy devices include sleep onset, sleep end, total sleep time, wake time after sleep onset, number of awakenings, etc. Currently available statistical methods to analyze such actigraphy data have limitations. First, averages over multiple nights are used to summarize sleep activities, ignoring variability over multiple nights from the same subject. Second, sleep variables are often analyzed independently. However, sleep variables tend to be correlated with each other. For example, how long a subject sleeps at night can be correlated with how long and how frequent he/she wakes up during that night. It is important to understand these inter-relationships. We therefore propose a joint mixed effect model on total sleep time, number of awakenings, and wake time. We develop an estimating procedure based upon a sequence of generalized linear mixed effects models, which can be implemented using existing software. The use of these models not only avoids computational intensity and instability that may occur by directly applying a numerical algorithm on a complicated joint likelihood function, but also provides additional insights on sleep activities. We demonstrated in simulation studies that the proposed estimating procedure performed well in estimating both fixed and random effects' parameters. We applied the proposed model to data from the Women's Interagency HIV Sleep Study to examine the association of employment status and age with overall sleep quality assessed by several actigraphy measured sleep variables.
Assuntos
Actigrafia , Punho , Actigrafia/métodos , Feminino , Humanos , Polissonografia/métodos , SonoRESUMO
Pathogen exposure to multiple hurdles could result in variation in the number of survivors, which needs to be carefully considered using appropriate regression models for dealing with survivor dispersion. The aim of this study was to evaluate the impact of the hurdles on the random component of the measured variation and on its unexplained part (over or under-dispersion) representing the departure from randomness, i.e. non-randomness, in survivors of a multi-strain mixture of L. monocytogenes. The pathogen inactivation curves were fitted to the Weibull model within the Conway-Maxwell-Poisson process. In all the 20 hurdle combinations, the surviving cells, whether they showed an upward curvature or linear kinetics, displayed the randomness revealed by the degree of dispersion of the inactivation parameters (-b and p). In 15 combinations, a significant dispersion coefficient (c0), which reflected the non-random component of variation was evident, denoting either over-dispersion (c0 > 0 in 13 combinations) or under-dispersion (c0 < 0 in 2 combinations). The observed dependence of the under- and over-dispersion conditions on the inactivation rate was confirmed by a Monte Carlo simulation based on the inactivation parameter -b. Including both randomness and non-randomness provides a more accurate estimation of survivors, which certainly impacts on intervention practices.
Assuntos
Listeria monocytogenes , Contagem de Colônia Microbiana , Simulação por Computador , Microbiologia de Alimentos , Humanos , Cinética , Listeria monocytogenes/fisiologia , Método de Monte Carlo , SobreviventesRESUMO
In employing spatial regression models for counts, we usually meet two issues. First, the possible inherent collinearity between covariates and the spatial effect could lead to misleading inferences. Second, real count data usually reveal over- or under-dispersion where the classical Poisson model is not appropriate to use. We propose a flexible Bayesian hierarchical modeling approach by joining nonconfounding spatial methodology and a newly reconsidered dispersed count modeling from the renewal theory to control the issues. Specifically, we extend the methodology for analyzing spatial count data based on the gamma distribution assumption for waiting times. The model can be formulated as a latent Gaussian model, and consequently, we can carry out the fast computation by using the integrated nested Laplace approximation method. We examine different popular approaches for handling spatial confounding and compare their performances in the presence of dispersion. Two real applications from a crime study against women in India as well as stomach cancer incidences in Slovenia motivate the suggested methods. We also perform a simulation study to understand the proposed approach's merits better. Supplementary Materials for this article are available.
Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Teorema de Bayes , Simulação por Computador , Feminino , Humanos , Distribuição Normal , Análise EspacialRESUMO
BACKGROUND: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. RESULTS: Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS ( https://github.com/aLiehrmann/CROCS ), detect the peaks more accurately than algorithms which rely on natural assumptions. CONCLUSION: The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Imunoprecipitação da Cromatina , Análise de Sequência de DNARESUMO
Species' evolutionary histories shape their present-day ecologies, but the integration of phylogenetic approaches in ecology has had a contentious history. The field of ecophylogenetics promised to reveal the process of community assembly from simple indices of phylogenetic pairwise distances - communities shaped by environmental filtering were composed of closely related species, whereas communities shaped by competition were composed of less closely related species. However, the mapping of ecology onto phylogeny proved to be not so straightforward, and the field remains mired in controversy. Nonetheless, ecophylogenetic methods provided important advances across ecology. For example the phylogenetic distances between species is a strong predictor of pest and pathogen sharing, and can thus inform models of species invasion, coexistence and the disease dilution/amplification effect of biodiversity. The phylogenetic structure of communities may also provide information on niche space occupancy, helping interpret patterns of facilitation, succession and ecosystem functioning - with relevance for conservation and restoration - and the dynamics among species within foodwebs and metacommunities. I suggest leveraging advances in our understanding of the process of evolution on phylogenetic trees would allow the field to progress further, while maintaining the essence of the original vision that proved so seductive.
Assuntos
Ecologia , Ecossistema , Biodiversidade , FilogeniaRESUMO
For count data, though a zero-inflated model can work perfectly well with an excess of zeroes and the generalized Poisson model can tackle over- or under-dispersion, most models cannot simultaneously deal with both zero-inflated or zero-deflated data and over- or under-dispersion. Ear diseases are important in healthcare, and falls into this kind of count data. This paper introduces a generalized Poisson Hurdle model that work with count data of both too many/few zeroes and a sample variance not equal to the mean. To estimate parameters, we use the generalized method of moments. In addition, the asymptotic normality and efficiency of these estimators are established. Moreover, this model is applied to ear disease using data gained from the New South Wales Health Research Council in 1990. This model performs better than both the generalized Poisson model and the Hurdle model.
RESUMO
BACKGROUND: In malaria-endemic settings, a small proportion of children suffer repeated malaria infections, contributing to most of the malaria cases, yet underlying factors are not fully understood. This study was aimed to determine whether undernutrition predicts this over-dispersion of malaria infections in children aged 6-18 months in settings of high malaria and undernutrition prevalence. METHODS: Prospective cohort study, conducted in Mangochi, Malawi. Six-months-old infants were enrolled and had length-for-age z-scores (LAZ), weight-for-age z-scores (WAZ), and weight-for-length z-scores (WLZ) assessed. Data were collected for 'presumed', clinical, and rapid diagnostic test (RDT)-confirmed malaria until 18 months. Malaria microscopy was done at 6 and 18 months. Negative binomial regression was used for malaria incidence and modified Poisson regression for malaria prevalence. RESULTS: Of the 2723 children enrolled, 2561 (94%) had anthropometry and malaria data. The mean (standard deviation [SD]) of LAZ, WAZ, and WLZ at 6 months were - 1.4 (1.1), - 0.7 (1.2), and 0.3 (1.1), respectively. The mean (SD) incidences of 'presumed', clinical, and RDT-confirmed malaria from 6 to 18 months were: 1.1 (1.6), 0.4 (0.8), and 1.3 (2.0) episodes/year, respectively. Prevalence of malaria parasitaemia was 4.8% at 6 months and 9.6% at 18 months. Higher WLZ at 6 months was associated with lower prevalence of malaria parasitaemia at 18 months (prevalence ratio [PR] = 0.80, 95% confidence interval [CI] 0.67 to 0.94, p = 0.007), but not with incidences of 'presumed' malaria (incidence rate ratio [IRR] = 0.97, 95% CI 0.92 to 1.02, p = 0.190), clinical malaria (IRR = 1.03, 95% CI 0.94 to 1.12, p = 0.571), RDT-confirmed malaria (IRR = 1.00, 95% CI 0.94 to 1.06, p = 0.950). LAZ and WAZ at 6 months were not associated with malaria outcomes. Household assets, maternal education, and food insecurity were significantly associated with malaria. There were significant variations in hospital-diagnosed malaria by study site. CONCLUSION: In children aged 6-18 months living in malaria-endemic settings, LAZ, WAZ, and WLZ do not predict malaria incidence. However, WLZ may be associated with prevalence of malaria. Socio-economic and micro-geographic factors may explain the variations in malaria, but these require further study. Trial registration NCT00945698. Registered July 24, 2009, https://clinicaltrials.gov/ct2/show/NCT00945698 , NCT01239693. Registered Nov 11, 2010, https://clinicaltrials.gov/ct2/show/NCT01239693.
Assuntos
Antropometria , Malária/epidemiologia , Parasitemia/epidemiologia , Humanos , Lactente , Malária/parasitologia , Malaui/epidemiologia , Parasitemia/parasitologia , Prevalência , Estudos ProspectivosRESUMO
RNA-seq has been an increasingly popular high-throughput platform to identify differentially expressed (DE) genes, which is much more reproducible and accurate than the previous microarray technology. Yet, a number of statistical issues remain to be resolved in data analysis, largely due to the high-throughput data volume and over-dispersion of read counts. These problems become more challenging for those biologists who use RNA-seq to measure genome-wide expression profiles in different combinations of sampling resources (species or genotypes) or treatments. In this paper, the author first reviews the statistical methods available for detecting DE genes, which have implemented negative binomial (NB) models and/or quasi-likelihood (QL) approaches to account for the over-dispersion problem in RNA-seq samples. The author then studies how to carry out the DE test in the context of phylogeny, i.e., RNA-seq samples are from a range of species as phylogenetic replicates. The author proposes a computational framework to solve this phylo-DE problem: While an NB model is used to account for data over-dispersion within biological replicates, over-dispersion among phylogenetic replicates is taken into account by QL, plus some special treatments for phylogenetic bias. This work helps to design cost-effective RNA-seq experiments in the field of biodiversity or phenotype plasticity that may involve hundreds of species under a phylogenetic framework.
Assuntos
Algoritmos , Interpretação Estatística de Dados , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Filogenia , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
The community composition of any group of organisms should theoretically be determined by a combination of assembly processes including resource partitioning, competition, environmental filtering, and phylogenetic legacy. Environmental DNA studies have revealed a huge diversity of protists in all environments, raising questions about the ecological significance of such diversity and the degree to which they obey to the same rules as macroscopic organisms. The fast-growing cultivable protist species on which hypotheses are usually experimentally tested represent only a minority of the protist diversity. Addressing these questions for the lesser known majority can only be inferred through observational studies. We conducted an environmental DNA survey of the genus Nebela, a group of closely related testate (shelled) amoeba species, in different habitats within Sphagnum-dominated peatlands. Identification based on the mitochondrial cytochrome c oxidase 1 gene, allowed species-level resolution as well as phylogenetic reconstruction. Community composition varied strongly across habitats and associated environmental gradients. Species showed little overlap in their realized niche, suggesting resource partitioning, and a strong influence of environmental filtering driving community composition. Furthermore, phylogenetic clustering was observed in the most nitrogen-poor samples, supporting phylogenetic inheritance of adaptations in the group of N. guttata. This study showed that the studied free-living unicellular eukaryotes follow to community assembly rules similar to those known to determine plant and animal communities; the same may be true for much of the huge functional and taxonomic diversity of protists.
Assuntos
Ecossistema , Sphagnopsida , Animais , Ecologia , Filogenia , PlantasRESUMO
Understanding the factors that alter the composition of the human microbiota may help personalized healthcare strategies and therapeutic drug targets. In many sequencing studies, microbial communities are characterized by a list of taxa, their counts, and their evolutionary relationships represented by a phylogenetic tree. In this article, we consider an extension of the Dirichlet multinomial distribution, called the Dirichlet-tree multinomial distribution, for multivariate, over-dispersed, and tree-structured count data. To address the relationships between these counts and a set of covariates, we propose the Dirichlet-tree multinomial regression model for which we develop a penalized likelihood method for estimating parameters and selecting covariates. For efficient optimization, we adopt the accelerated proximal gradient approach. Simulation studies are presented to demonstrate the good performance of the proposed procedure. An analysis of a data set relating dietary nutrients with bacterial counts is used to show that the incorporation of the tree structure into the model helps increase the prediction power.
Assuntos
Microbioma Gastrointestinal , Dieta , Humanos , Funções Verossimilhança , Filogenia , Análise de Regressão , Distribuições EstatísticasRESUMO
In this paper, we develop estimation procedure for the parameters of a zero-inflated over-dispersed/under-dispersed count model in the presence of missing responses. In particular, we deal with a zero-inflated extended negative binomial model in the presence of missing responses. A weighted expectation maximization algorithm is used for the maximum likelihood estimation of the parameters involved. Some simulations are conducted to study the properties of the estimators. Robustness of the procedure is shown when count data follow other over-dispersed models, such as the log-normal mixture of the Poisson distribution or even from a zero-inflated Poisson model. An illustrative example and a discussion leading to some conclusions are given. Copyright © 2016 John Wiley & Sons, Ltd.
Assuntos
Funções Verossimilhança , Distribuição de Poisson , Algoritmos , Humanos , Modelos EstatísticosRESUMO
Evaluating the impacts of clinical or policy interventions on health care utilization requires addressing methodological challenges for causal inference while also analyzing highly skewed data. We examine the impact of registering with a Family Medicine Group, an integrated primary care model in Quebec, on hospitalization and emergency department visits using propensity scores to adjust for baseline characteristics and marginal structural models to account for time-varying exposures. We also evaluate the performance of different marginal structural generalized linear models in the presence of highly skewed data and conduct a simulation study to determine the robustness of alternative generalized linear models to distributional model mis-specification. Although the simulations found that the zero-inflated Poisson likelihood performed the best overall, the negative binomial likelihood gave the best fit for both outcomes in the real dataset. Our results suggest that registration to a Family Medicine Group for all 3 years caused a small reduction in the number of emergency room visits and no significant change in the number of hospitalizations in the final year.
Assuntos
Funções Verossimilhança , Modelos Lineares , Atenção Primária à Saúde/estatística & dados numéricos , Pontuação de Propensão , Idoso , Idoso de 80 Anos ou mais , Simulação por Computador , Serviço Hospitalar de Emergência , Hospitalização , Humanos , Masculino , QuebequeRESUMO
Problems of finding confidence intervals (CIs) and prediction intervals (PIs) for two-parameter negative binomial distributions are considered. Simple CIs for the mean of a two-parameter negative binomial distribution based on some large sample methods are proposed and compared with the likelihood CIs. Proposed CIs are not only simple to compute, but also better than the likelihood CIs for moderate sample sizes. Prediction intervals for the mean of a future sample from a two-parameter negative binomial distribution are also proposed and evaluated for their accuracy. The methods are illustrated using two examples with real life data sets.
RESUMO
Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group â1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group â1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.
Assuntos
Bactérias/genética , Bactérias/isolamento & purificação , Interpretação Estatística de Dados , Intestinos/microbiologia , Modelos Logísticos , Microbiota/genética , Análise de Regressão , Bactérias/classificação , Simulação por Computador , HumanosRESUMO
Inflated data and over-dispersion are two common problems when modeling count data with traditional Poisson regression models. In this study, we propose a latent class inflated Poisson (LCIP) regression model to solve the unobserved heterogeneity that leads to inflations and over-dispersion. The performance of the model estimation is evaluated through simulation studies. We illustrate the usefulness of introducing a latent class variable by analyzing the Behavioral Risk Factor Surveillance System (BRFSS) data, which contain several excessive values and characterized by over-dispersion. As a result, the new model we proposed displays a better fit than the standard Poisson regression and zero-inflated Poisson regression models for the inflated counts.
RESUMO
Transmission of Ross River virus (RRV) is influenced by climatic, environmental, and socio-economic factors. Accurate and robust predictions based on these factors are necessary for disease prevention and control. However, the complicated transmission cycle and the characteristics of RRV notification data present challenges. Studies to compare model performance are lacking. In this study, we used RRV notification data and exposure data from 2001 to 2020 in Queensland, Australia, and compared ten models (including generalised linear models, zero-inflated models, and generalised additive models) to predict RRV incidence in different regions of Queensland. We aimed to compare model performance and to evaluate the effect of statistical over-dispersion and zero-inflation of RRV surveillance data, and non-linearity of predictors on model fit. A variable selection strategy for screening important predictors was developed and was found to be efficient and able to generate consistent and reasonable numbers of predictors across regions and in all training sets. Negative binomial models generally exhibited better model fit than Poisson models, suggesting that over-dispersion in the data is the primary factor driving model fit compared to non-linearity of predictors and excess zeros. All models predicted the peak periods well but were unable to fit and predict the magnitude of peaks, especially when there were high numbers of cases. Adding new variables including historical RRV cases and mosquito abundance may improve model performance. The standard negative binomial generalised linear model is stable, simple, and effective in prediction, and is thus considered the best choice among all models.
Assuntos
Infecções por Alphavirus , Ross River virus , Animais , Humanos , Queensland/epidemiologia , Incidência , Infecções por Alphavirus/epidemiologia , Mosquitos Vetores , Austrália/epidemiologiaRESUMO
Predicting pedestrian crashes on urban roads is one of the most important issues related to urban traffic safety. Due to the lack of spatial correlation and instability in the crash data, the statistical reliability of Empirical Bayesian method in the combination of the observed and predicted crash frequency is questionable. In this study, an EB model has been developed to estimate the expected frequency of pedestrian crashes in urban areas using the over-dispersion parameter taking into account the spatial correlation of crash data. The objective of this study is to estimate the expected geographical frequency of pedestrian crashes using the Empirical Bayesian (EB) approach using weighted geographical regression models for pedestrian crashes in Tehran. For doing so, four models of geographic weighted Poisson regression (GWPR), geographic weighted zero-inflated Poisson regression (GWZIPR), geographic weighted Negative Binomial regression (GWNBR) and the geographic weighted zero-inflated Negative Binomial regression (GWZINBR) have been used. In this study, the areas analyzed for the development of the EB model based on pedestrian exposure variables include traffic analysis zones (TAZs). Finally, the EB model was extended to the Geographic Empirical Bayesian (Ge-EB) model. The results showed that GWZIPR and GWZINBR models make more accurate predictions. These models had the lowest values of Akaike Information Criterion (AIC), the lowest values of Cross Validation and the lowest values of Root Mean Square Error (RMSE). The Moran and Variance Inflated Factor (VIF) indices were also within acceptable limits. The weighted negative binomial distribution could moderate the amount of heterogeneity of crash data to some extent. This study has shown the dispersion and density of pedestrian crashes without having the volume of pedestrians and thus can be done by taking safety measures in places prone to pedestrian crashes.