Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Chaos Solitons Fractals ; 166: 112914, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36440087


The prevalence of COVID-19 has been the most serious health challenge of the 21th century to date, concerning national health systems on a daily basis, since December 2019 when it appeared in Wuhan City. Nevertheless, most of the proposed mathematical methodologies aiming to describe the dynamics of an epidemic, rely on deterministic models that are not able to reflect the true nature of its spread. In this paper, we propose a SEIHCRDV model - an extension/improvement of the classic SIR compartmental model - which also takes into consideration the populations of exposed, hospitalized, admitted in intensive care units (ICU), deceased and vaccinated cases, in combination with an unscented Kalman filter (UKF), providing a dynamic estimation of the time dependent system's parameters. The stochastic approach is considered necessary, as both observations and system equations are characterized by uncertainties. Apparently, this new consideration is useful for examining various pandemics more effectively. The reliability of the model is examined on the daily recordings of COVID-19 in France, over a long period of 265 days. Two major waves of infection are observed, starting in January 2021, which signified the start of vaccinations in Europe providing quite encouraging predictive performance, based on the produced NRMSE values. Special emphasis is placed on proving the non-negativity of SEIHCRDV model, achieving a representative basic reproductive number R 0 and demonstrating the existence and stability of disease equilibria according to the formula produced to estimate R 0 . The model outperforms in predictive ability not only deterministic approaches but also state-of-the-art stochastic models that employ Kalman filters. Furthermore, the relevant analysis supports the importance of vaccination, as even a small increase in the dialy vaccination rate could lead to a notable reduction in mortality and hospitalizations.

Can J Stat ; 51(3): 824-851, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38974813


Multiple oscillating time series are typically analyzed in the frequency domain, where coherence is usually said to represent the magnitude of the correlation between two signals at a particular frequency. The correlation being referenced is complex-valued and is similar to the real-valued Pearson correlation in some ways but not others. We discuss the dependence among oscillating series in the context of the multivariate complex normal distribution, which plays a role for vectors of complex random variables analogous to the usual multivariate normal distribution for vectors of real-valued random variables. We emphasize special cases that are valuable for the neural data we are interested in and provide new variations on existing results. We then introduce a complex latent variable model for narrowly band-pass-filtered signals at some frequency, and show that the resulting maximum likelihood estimate produces a latent coherence that is equivalent to the magnitude of the complex canonical correlation at the given frequency. We also derive an equivalence between partial coherence and the magnitude of complex partial correlation, at a given frequency. Our theoretical framework leads to interpretable results for an interesting multivariate dataset from the Allen Institute for Brain Science.

Les séries temporelles à oscillations multiples sont généralement étudiées dans le domaine fréquentiel, où la cohérence est souvent considérée comme l'amplitude de la corrélation entre deux signaux à une fréquence spécifique. Cette corrélation est à valeurs complexes et présente des similitudes avec la corrélation de Pearson pour les valeurs réelles, tout en présentant des différences distinctes. Dans cette étude, les auteurs explorent la dépendance entre les séries oscillantes en utilisant la distribution normale complexe multivariée. Cette distribution est l'équivalent de la distribution normale multivariée classique, mais adaptée aux vecteurs de variables aléatoires complexes plutôt qu'aux vecteurs de variables aléatoires réelles. Les auteurs mettent l'accent sur des cas spécifiques qui revêtent une importance particulière pour les données neuronales qui les intéressent, tout en proposant de nouvelles approches et des variations des résultats existants. Ils introduisent un modèle de variables latentes complexes pour les signaux filtrés en bande passante étroite à une fréquence donnée. Ils démontrent ensuite que l'estimation du maximum de vraisemblance dans ce modèle produit une cohérence latente équivalente à l'amplitude de la corrélation canonique complexe à la fréquence spécifiée. Ils établissent également une équivalence entre la cohérence partielle et l'amplitude de la corrélation partielle complexe, toujours à une fréquence donnée. Leur approche théorique conduit à des résultats interprétables pour un ensemble de données multivariées intéressant provenant de l'Allen Institute for Brain Science.

Socioecon Plann Sci ; 87: 101549, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37255583


In order to address one of the most challenging problems in hospital management - patients' absenteeism without prior notice - this study analyses the risk factors associated with this event. To this end, through real data from a hospital located in the North of Portugal, a prediction model previously validated in the literature is used to infer absenteeism risk factors, and an explainable model is proposed, based on a modified CART algorithm. The latter intends to generate a human-interpretable explanation for patient absenteeism, and its implementation is described in detail. Furthermore, given the significant impact, the COVID-19 pandemic had on hospital management, a comparison between patients' profiles upon absenteeism before and during the COVID-19 pandemic situation is performed. Results obtained differ between hospital specialities and time periods meaning that patient profiles on absenteeism change during pandemic periods and within specialities.

J Math Biol ; 85(4): 43, 2022 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-36169721


We present a unifying, tractable approach for studying the spread of viruses causing complex diseases requiring to be modeled using a large number of types (e.g., infective stage, clinical state, risk factor class). We show that recording each infected individual's infection age, i.e., the time elapsed since infection, has three benefits. First, regardless of the number of types, the age distribution of the population can be described by means of a first-order, one-dimensional partial differential equation (PDE) known as the McKendrick-von Foerster equation. The frequency of type i is simply obtained by integrating the probability of being in state i at a given age against the age distribution. This representation induces a simple methodology based on the additional assumption of Poisson sampling to infer and forecast the epidemic. We illustrate this technique using French data from the COVID-19 epidemic. Second, our approach generalizes and simplifies standard compartmental models using high-dimensional systems of ordinary differential equations (ODEs) to account for disease complexity. We show that such models can always be rewritten in our framework, thus, providing a low-dimensional yet equivalent representation of these complex models. Third, beyond the simplicity of the approach, we show that our population model naturally appears as a universal scaling limit of a large class of fully stochastic individual-based epidemic models, where the initial condition of the PDE emerges as the limiting age structure of an exponentially growing population starting from a single individual.

COVID-19 , Epidemias , COVID-19/epidemiologia , Previsões , Humanos , Modelos Biológicos , Probabilidade
Artigo em Inglês | MEDLINE | ID: mdl-35125572


Pathway analysis, i.e., grouping analysis, has important applications in genomic studies. Existing pathway analysis approaches are mostly focused on a single response and are not suitable for analyzing complex diseases that are often related with multiple response variables. Although a handful of approaches have been developed for multiple responses, these methods are mainly designed for pathways with a moderate number of features. A multi-response pathway analysis approach that is able to conduct statistical inference when the dimension is potentially higher than sample size is introduced. Asymptotical properties of the test statistic are established and theoretical investigation of the statistical power is conducted. Simulation studies and real data analysis show that the proposed approach performs well in identifying important pathways that influence multiple expression quantitative trait loci (eQTL).

Can J Stat ; 46(3): 416-428, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32999527


Recurrent event data occur in many areas such as medical studies and social sciences and a great deal of literature has been established for their analysis. On the other hand, only limited research exists on the variable selection for recurrent event data, and the existing methods can be seen as direct generalizations of the available penalized procedures for linear models and may not perform as well as expected. This article discusses simultaneous parameter estimation and variable selection and presents a new method with a new penalty function, which will be referred to as the broken adaptive ridge regression approach. In addition to the establishment of the oracle property, we also show that the proposed method has the clustering or grouping effect when covariates are highly correlated. Furthermore, a numerical study is performed and indicates that the method works well for practical situations and can outperform existing methods. An application is provided.

Une riche littérature traite de l'analyse des événements récurrents, un type de données observé notamment dans les études médicales et dans les projets de recherche en sciences sociales. Par contre, peu de résultats de recherche portent sur la sélection de variables pour ces modèles. Les méthodes existantes peuvent être vues comme une généralisation directe de procédures pénalisées disponibles pour les modèles linéaires et peuvent offrir des performances inférieures aux attentes. Les auteurs proposent l'approche de régression ridge brisée adaptative où ils procèdent simultanément à l'estimation de paramètres et à la sélection de variables en exploitant une nouvelle fonction de pénalité. Ils prouvent la propriété d'oracle de leur méthode et montrent qu'elle possède une propriété de regroupement lorsque les covariables sont hautement corrélées. Ils présentent une étude numérique qui indique que leur méthode fonctionne bien dans des situations pratiques et peut même s'avérer plus performante que les approches existantes. Ils fournissent également un exemple d'application.

J Appl Probab ; 54(2): 569-587, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31156271


We consider a class of Sevastyanov branching processes with non-homogeneous Poisson immigration. These processes relax the assumption required by the Bellman-Harris process which imposes the lifespan and offspring of each individual to be independent. They find applications in studies of the dynamics of cell populations. In this paper, we focus on the subcritical case and examine asymptotic properties of the process. We establish limit theorems, which generalize classical results due to Sevastyanov and others. Our key findings include novel LLN and CLT which emerge from the non-homogeneity of the immigration process.

J Stat Comput Simul ; 87(8): 1541-1558, 2017 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-28515536


The linear mixed model with an added integrated Ornstein-Uhlenbeck (IOU) process (linear mixed IOU model) allows for serial correlation and estimation of the degree of derivative tracking. It is rarely used, partly due to the lack of available software. We implemented the linear mixed IOU model in Stata and using simulations we assessed the feasibility of fitting the model by restricted maximum likelihood when applied to balanced and unbalanced data. We compared different (1) optimization algorithms, (2) parameterizations of the IOU process, (3) data structures and (4) random-effects structures. Fitting the model was practical and feasible when applied to large and moderately sized balanced datasets (20,000 and 500 observations), and large unbalanced datasets with (non-informative) dropout and intermittent missingness. Analysis of a real dataset showed that the linear mixed IOU model was a better fit to the data than the standard linear mixed model (i.e. independent within-subject errors with constant variance).

J Appl Stat ; 51(6): 1023-1040, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38628451


Beta distributions are commonly used to model proportion valued response variables, often encountered in longitudinal studies. In this article, we develop semi-parametric Beta regression models for proportion valued responses, where the aggregate covariate effect is summarized and flexibly modeled, using a interpretable monotone time-varying single index transform of a linear combination of the potential covariates. We utilize the potential of single index models, which are effective dimension reduction tools and accommodate link function misspecification in generalized linear mixed models. Our Bayesian methodology incorporates the missing-at-random feature of the proportion response and utilize Hamiltonian Monte Carlo sampling to conduct inference. We explore finite-sample frequentist properties of our estimates and assess the robustness via detailed simulation studies. Finally, we illustrate our methodology via application to a motivating longitudinal dataset on obesity research recording proportion body fat.

J Appl Stat ; 51(5): 1007-1022, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524792


Several statistical models have been proposed in recent years, among them is the semiparametric regression. In medicine, there are several situations in which it is impracticable to consider a linear regression for statistical modeling, especially when the data contain explanatory variables that present a nonlinear relationship with the response variable. Another common situation is when the response variable does not have a unimodal shape, and it is not possible to adopt distributions belonging to the symmetric or asymmetric classes. In this context, a semiparametric heteroskedastic regression is proposed based on an extension of the normal distribution. Then, we show the usefulness of this model to analyze the cost of prostate cancer surgery. The predictor variables refer to two groups of patients such that one group receives a multimodal local anesthetic solution (Preemptive Target Anesthetic Solution) and the second group is treated with neuraxial blockade (spinal anesthesia/traditional standard). The other relevant predictor variables are also evaluated, thus allowing for the in-depth interpretation of the predictor variables with a nonlinear effect on the dependent variable cost. The penalized maximum likelihood method is adopted to estimate the model parameters. The new regression is a useful statistical tool for analyzing medical data.

J Appl Stat ; 51(5): 845-865, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524794


Statistical learning of the structures of cellular networks, such as protein signaling pathways, is a topical research field in computational systems biology. To get the most information out of experimental data, it is often required to develop a tailored statistical approach rather than applying one of the off-the-shelf network reconstruction methods. The focus of this paper is on learning the structure of the mTOR protein signaling pathway from immunoblotting protein phosphorylation data. Under two experimental conditions eleven phosphorylation sites of eight key proteins of the mTOR pathway were measured at ten non-equidistant time points. For the statistical analysis we propose a new advanced hierarchically coupled non-homogeneous dynamic Bayesian network (NH-DBN) model, and we consider various data imputation methods for dealing with non-equidistant temporal observations. Because of the absence of a true gold standard network, we propose to use predictive probabilities in combination with a leave-one-out cross validation strategy to objectively cross-compare the accuracies of different NH-DBN models and data imputation methods. Finally, we employ the best combination of model and data imputation method for predicting the structure of the mTOR protein signaling pathway.

J Appl Stat ; 51(7): 1227-1250, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38835822


The main concern of this paper is providing a flexible discrete model that captures every kind of dispersion (equi-, over- and under-dispersion). Based on the balanced discretization method, a new discrete version of Burr-Hatke distribution is introduced with the partial moment-preserving property. Some statistical properties of the new distribution are introduced, and the applicability of proposed model is evaluated by considering counting series. A new integer-valued autoregressive (INAR) process based on the mixing Pegram and binomial thinning operators with discrete Burr-Hatke innovations is introduced, which can model contagious data properly. The different estimation approaches of parameters of the new process are provided and compared through the Monte Carlo simulation scheme. The performance of the proposed process is evaluated by four data sets of the daily death counts of the COVID-19 in Austria, Switzerland, Nigeria and Slovenia in comparison with some competitor INAR(1) models, along with the Pearson residual analysis of the assessing model. The goodness of fit measures affirm the adequacy of the proposed process in modeling all COVID-19 data sets. The fundamental prediction procedures are considered for new process by classic, modified Sieve bootstrap and Bayesian forecasting methods for all COVID-19 data sets, which is concluded that the Bayesian forecasting approach provides more reliable results.

J Appl Stat ; 51(4): 793-807, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38482195


Current methods for clustering adult obesity prevalence by state focus on creating a single map of obesity prevalence for a given year in the United States. Comparing these maps for different years may limit our understanding of the progression of state and regional obesity prevalence over time for the purpose of developing targeted regional health policies. In this application note, we adopt the non-parametric Dynamic Time Warping method for clustering longitudinal time series of obesity prevalence by state. This method captures the lead and lag relationship between the time series as part of the temporal alignment, allowing us to produce a single map that captures the regional and temporal clusters of obesity prevalence from 1990 to 2019 in the United States. We identify six regions of obesity prevalence in the United States and forecast future estimates of obesity prevalence based on ARIMA models.

J Appl Stat ; 51(9): 1756-1771, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38933137


In many biomedical applications, we are more interested in the predicted probability that a numerical outcome is above a threshold than in the predicted value of the outcome. For example, it might be known that antibody levels above a certain threshold provide immunity against a disease, or a threshold for a disease severity score might reflect conversion from the presymptomatic to the symptomatic disease stage. Accordingly, biomedical researchers often convert numerical to binary outcomes (loss of information) to conduct logistic regression (probabilistic interpretation). We address this bad statistical practice by modelling the binary outcome with logistic regression, modelling the numerical outcome with linear regression, transforming the predicted values from linear regression to predicted probabilities, and combining the predicted probabilities from logistic and linear regression. Analysing high-dimensional simulated and experimental data, namely clinical data for predicting cognitive impairment, we obtain significantly improved predictions of dichotomised outcomes. Thus, the proposed approach effectively combines binary with numerical outcomes to improve binary classification in high-dimensional settings. An implementation is available in the R package cornet on GitHub ( and CRAN (

ArXiv ; 2024 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-38562445


With a single circulating vector-borne virus, the basic reproduction number incorporates contributions from tick-to-tick (co-feeding), tick-to-host and host-to-tick transmission routes. With two different circulating vector-borne viral strains, resident and invasive, and under the assumption that co-feeding is the only transmission route in a tick population, the invasion reproduction number depends on whether the model system of ordinary differential equations possesses the property of neutrality. We show that a simple model, with two populations of ticks infected with one strain, resident or invasive, and one population of co-infected ticks, does not have Alizon's neutrality property. We present model alternatives that are capable of representing the invasion potential of a novel strain by including populations of ticks dually infected with the same strain. The invasion reproduction number is analysed with the next-generation method and via numerical simulations.

J Appl Stat ; 50(11-12): 2373-2387, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37529565


In this paper, we propose a Susceptible-Infected-Removal (SIR) model with time fused coefficients. In particular, our proposed model discovers the underlying time homogeneity pattern for the SIR model's transmission rate and removal rate via Bayesian shrinkage priors. MCMC sampling for the proposed method is facilitated by the nimble package in R. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods. We further apply the proposed methodology to analyze different levels of COVID-19 data in the United States.

J Appl Stat ; 50(1): 155-169, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36530783


Many medical applications are interested to know the disease status. The disease status can be related to multiple serial measurements. Nevertheless, owing to various reasons, the binary outcome can be measured incorrectly. The estimators derived from the misspecified outcome can be biased. This paper derives the complete data likelihood function to incorporate both the multiple serial measurements and the misspecified outcome. Owing to the latent variables, EM algorithm is used to derive the maximum-likelihood estimators. Monte Carlo simulations are conducted to compare the impact of misspecification on the estimates. A retrospective data for the recurrence of atrial fibrillation is used to illustrate the usage of the proposed model.

J Appl Stat ; 50(14): 2889-2913, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37808611


In this paper, we present an efficient statistical method (denoted as 'Adaptive Resources Allocation CUSUM') to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods to balance the exploration and exploitation of resource allocation for hotspot detection. Further, a Bayesian weighted update is used to update the posterior distribution of the infection rate. Then, the upper confidence bound (UCB) is used for resource allocation and planning. Finally, CUSUM monitoring statistics to detect the change point as well as the change location. For performance evaluation, we compare the performance of the proposed method with several benchmark methods in the literature and showed the proposed algorithm is able to achieve a lower detection delay and higher detection precision. Finally, this method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA) and demonstrates the effectiveness with very limited distributed samples.

J Appl Stat ; 50(3): 805-826, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36819087


Multi-parametric MRI (mpMRI) is a critical tool in prostate cancer (PCa) diagnosis and management. To further advance the use of mpMRI in patient care, computer aided diagnostic methods are under continuous development for supporting/supplanting standard radiological interpretation. While voxel-wise PCa classification models are the gold standard, few if any approaches have incorporated the inherent structure of the mpMRI data, such as spatial heterogeneity and between-voxel correlation, into PCa classification. We propose a machine learning-based method to fill in this gap. Our method uses an ensemble learning approach to capture regional heterogeneity in the data, where classifiers are developed at multiple resolutions and combined using the super learner algorithm, and further account for between-voxel correlation through a Gaussian kernel smoother. It allows any type of classifier to be the base learner and can be extended to further classify PCa sub-categories. We introduce the algorithms for binary PCa classification, as well as for classifying the ordinal clinical significance of PCa for which a weighted likelihood approach is implemented to improve the detection of less prevalent cancer categories. The proposed method has shown important advantages over conventional modeling and machine learning approaches in simulations and application to our motivating patient data.

Axioms ; 12(2)2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37284612


The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.