Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Biom J ; 65(8): e2100302, 2023 12.
Article in English | MEDLINE | ID: mdl-37853834

ABSTRACT

Human immunodeficiency virus (HIV) dynamics have been the focus of epidemiological and biostatistical research during the past decades to understand the progression of acquired immunodeficiency syndrome (AIDS) in the population. Although there are several approaches for modeling HIV dynamics, one of the most popular is based on Gaussian mixed-effects models because of its simplicity from the implementation and interpretation viewpoints. However, in some situations, Gaussian mixed-effects models cannot (a) capture serial correlation existing in longitudinal data, (b) deal with missing observations properly, and (c) accommodate skewness and heavy tails frequently presented in patients' profiles. For those cases, mixed-effects state-space models (MESSM) become a powerful tool for modeling correlated observations, including HIV dynamics, because of their flexibility in modeling the unobserved states and the observations in a simple way. Consequently, our proposal considers an MESSM where the observations' error distribution is a skew-t. This new approach is more flexible and can accommodate data sets exhibiting skewness and heavy tails. Under the Bayesian paradigm, an efficient Markov chain Monte Carlo algorithm is implemented. To evaluate the properties of the proposed models, we carried out some exciting simulation studies, including missing data in the generated data sets. Finally, we illustrate our approach with an application in the AIDS Clinical Trial Group Study 315 (ACTG-315) clinical trial data set.


Subject(s)
Acquired Immunodeficiency Syndrome , HIV Infections , Humans , Acquired Immunodeficiency Syndrome/epidemiology , HIV Infections/epidemiology , Bayes Theorem , Models, Statistical , Viral Load , HIV , Longitudinal Studies
2.
Biom J ; 64(3): 539-556, 2022 03.
Article in English | MEDLINE | ID: mdl-34821410

ABSTRACT

In many biomedical studies or clinical trials, we have data with more than one response variable on the same subject repeatedly measured over time. In analyzing such data, we adopt a multivariate linear mixed-effects longitudinal model. On the other hand, in longitudinal data, we often find features that do not impact modeling the response variable and are eliminated from the study. In this paper, we consider the problem of simultaneous variable selection and estimation in a multivariate t linear mixed-effects model (MtLMM) for analyzing longitudinally measured multioutcome data. This work's motivation comes from a cohort study of patients with primary biliary cirrhosis. The interest is eliminating insignificant variables using the smoothly clipped and absolute deviation penalty function in the MtLMM. The proposed penalized model offers robustness and flexibility to accommodate fat tails. An expectation conditional maximization algorithm is employed for the computation of maximum likelihood estimates of parameters. The calculation of standard errors is affected by an information-based method. The methodology is illustrated by analyzing Mayo Clinic Primary Biliary Cirrhosis sequential (PBCseq) data and a simulation study. We found drugs and sex can be eliminated from the PBCseq analysis, and over time the disease progresses.


Subject(s)
Data Analysis , Liver Cirrhosis, Biliary , Algorithms , Cohort Studies , Computer Simulation , Humans , Likelihood Functions , Linear Models , Liver Cirrhosis, Biliary/genetics , Longitudinal Studies
3.
Stat Sin ; 32(4): 1767-1787, 2022 Oct.
Article in English | MEDLINE | ID: mdl-39077116

ABSTRACT

Quantile regression as an alternative to modeling the conditional mean function provides a comprehensive picture of the relationship between a response and covariates. It is particularly attractive in applications focused on the upper or lower conditional quantiles of the response. However, conventional quantile regression estimators are often unstable at the extreme tails, owing to data sparsity, especially for heavy-tailed distributions. Assuming that the functional predictor has a linear effect on the upper quantiles of the response, we develop a novel estimator for extreme conditional quantiles using a functional composite quantile regression based on a functional principal component analysis and an extrapolation technique from extreme value theory. We establish the asymptotic normality of the proposed estimator under some regularity conditions, and compare it with other estimation methods using Monte Carlo simulations. Finally, we demonstrate the proposed method by empirically analyzing two real data sets.

4.
Can J Stat ; 50(1): 267-286, 2022 Mar.
Article in English | MEDLINE | ID: mdl-38239624

ABSTRACT

In this article, we propose a novel estimator of extreme conditional quantiles in partial functional linear regression models with heavy-tailed distributions. The conventional quantile regression estimators are often unstable at the extreme tails due to data sparsity, especially for heavy-tailed distributions. We first estimate the slope function and the partially linear coefficient using a functional quantile regression based on functional principal component analysis, which is a robust alternative to the ordinary least squares regression. The extreme conditional quantiles are then estimated by using a new extrapolation technique from extreme value theory. We establish the asymptotic normality of the proposed estimator and illustrate its finite sample performance by simulation studies and an empirical analysis of diffusion tensor imaging data from a cognitive disorder study.


Dans cet article, un nouvel estimateur de quantiles conditionnels extrêmes est élaboré dans le cadre de modèles de régression linéaire fonctionnelle partielle avec des distributions à queues lourdes. Il est bien connu que la rareté des observations dans les ailes extrêmes de distributions à queues lourdes rend souvent les estimateurs de régression quantile usuels instables. Pour parer à la non robustesse des moindres carrés classiques, les auteurs ont commencé par estimer la fonction de pente et le coefficient partiellement linéaire d'une régression quantile en ayant recours à une approche basée sur l'analyse en composantes principales fonctionnelles. Ensuite, ils ont estimé les quantiles conditionnels extrêmes à l'aide d'une nouvelle technique d'extrapolation issue de la théorie des valeurs extrêmes. En plus d'établir la normalité asymptotique de l'estimateur proposé, les auteurs illustrent ses bonnes performances à distance finie par le biais d'une étude de simulation et une mise en oeuvre pratique sur les données d'imagerie de diffusion par tenseurs provenant d'une étude portant sur des troubles cognitifs.

5.
Sensors (Basel) ; 21(24)2021 Dec 20.
Article in English | MEDLINE | ID: mdl-34960579

ABSTRACT

Many real-world systems change their parameters during the operation. Thus, before the analysis of the data, there is a need to divide the raw signal into parts that can be considered as homogeneous segments. In this paper, we propose a segmentation procedure that can be applied for the signal with time-varying characteristics. Moreover, we assume that the examined signal exhibits impulsive behavior, thus it corresponds to the so-called heavy-tailed class of distributions. Due to the specific behavior of the data, classical algorithms known from the literature cannot be used directly in the segmentation procedure. In the considered case, the transition between parts corresponding to homogeneous segments is smooth and non-linear. This causes that the segmentation algorithm is more complex than in the classical case. We propose to apply the divergence measures that are based on the distance between the probability density functions for the two examined distributions. The novel segmentation algorithm is applied to real acoustic signals acquired during coffee grinding. Justification of the methodology has been performed experimentally and using Monte-Carlo simulations for data from the model with heavy-tailed distribution (here the stable distribution) with time-varying parameters. Although the methodology is demonstrated for a specific case, it can be extended to any process with time-changing characteristics.


Subject(s)
Acoustics , Algorithms , Likelihood Functions , Monte Carlo Method
6.
Diagnostics (Basel) ; 11(11)2021 Nov 19.
Article in English | MEDLINE | ID: mdl-34829494

ABSTRACT

Deep learning has gained immense attention from researchers in medicine, especially in medical imaging. The main bottleneck is the unavailability of sufficiently large medical datasets required for the good performance of deep learning models. This paper proposes a new framework consisting of one variational autoencoder (VAE), two generative adversarial networks, and one auxiliary classifier to artificially generate realistic-looking skin lesion images and improve classification performance. We first train the encoder-decoder network to obtain the latent noise vector with the image manifold's information and let the generative adversarial network sample the input from this informative noise vector in order to generate the skin lesion images. The use of informative noise allows the GAN to avoid mode collapse and creates faster convergence. To improve the diversity in the generated images, we use another GAN with an auxiliary classifier, which samples the noise vector from a heavy-tailed student t-distribution instead of a random noise Gaussian distribution. The proposed framework was named TED-GAN, with T from the t-distribution and ED from the encoder-decoder network which is part of the solution. The proposed framework could be used in a broad range of areas in medical imaging. We used it here to generate skin lesion images and have obtained an improved classification performance on the skin lesion classification task, rising from 66% average accuracy to 92.5%. The results show that TED-GAN has a better impact on the classification task because of its diverse range of generated images due to the use of a heavy-tailed t-distribution.

7.
Accid Anal Prev ; 158: 106192, 2021 Aug.
Article in English | MEDLINE | ID: mdl-34029919

ABSTRACT

Crash severity model is a classical topic in road safety research. The multinomial logit (MNL) model, as a basic discrete outcome method, is widely applied to measure the association between crash severity and possible risk factors. However, the MNL model has several assumptions and properties that are possibly not consistent with the actual crash mechanism, and therefore with the association measure for crash severity. One significant attribute is the variation in drivers' safety perception. Risk-taking drivers tends to drive at a higher speed, which increases the likelihood of severe crashes. However, the variations in speed and other driving performance lead to the error in the utility function more profound. This violates the assumption of identical error distributions between different crash severity outcomes. In this paper, we propose a multinomial multiplicative (MNM) model, as an alternative for crash severity model. There are two possible formulations for the proposed MNM model: (1) Weibull and (2) Fréchet, according to the distributions of random propensities and subject to the signs of the systematic parts of the regression equation. The two heavy-tailed distributions can capture the effect of unobserved contributory factors on crash injury severity. Additionally, the MNM model can incorporate the effects of the non-identical, heavy-tailed, and asymmetric properties of the distribution, whereas the conventional MNL model cannot. Several operational considerations are also attempted in this study, including the specifications of the systematic parts and the interpretations of the parameters. The MNM model is further extended to the mixed MNM (MMNM) model by considering unobserved heterogeneities using random coefficients, while the mixed MNL (MMNL) model is used as the benchmark model. The proposed MMNM model is calibrated using the crash dataset obtained from the Guangdong Province, China. Results indicated that the proposed MMNM model outperformed the MMNL model in this case. Also, the results of parameter estimates are indicative to impact factors on crash severity as well as the design and implementation of policies. This justified the use of MMNM model as an alternative for crash severity model in practice. This is the first application of MMNM model in the traffic safety literature, it is worth exploring the application of other advanced multiplicative models for safety analysis in the future.


Subject(s)
Accidents, Traffic , Wounds and Injuries , China , Humans , Logistic Models , Risk Factors
8.
J Appl Stat ; 48(4): 646-668, 2021.
Article in English | MEDLINE | ID: mdl-35706985

ABSTRACT

While there has been considerable research on the analysis of extreme values and outliers by using heavy-tailed distributions, little is known about the semi-heavy-tailed behaviors of data when there are a few suspicious outliers. To address the situation where data are skewed possessing semi-heavy tails, we introduce two new skewed distribution families of the hyperbolic secant with exciting properties. We extend the semi-heavy-tailedness property of data to a linear regression model. In particular, we investigate the asymptotic properties of the ML estimators of the regression parameters when the error term has a semi-heavy-tailed distribution. We conduct simulation studies comparing the ML estimators of the regression parameters under various assumptions for the distribution of the error term. We also provide three real examples to show the priority of the semi-heavy-tailedness of the error term comparing to heavy-tailedness. Online supplementary materials for this article are available. All the new proposed models in this work are implemented by the shs R package, which can be found on the GitHub webpage.

9.
Proc Natl Acad Sci U S A ; 117(50): 31754-31759, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33257554

ABSTRACT

The duration of interaction events in a society is a fundamental measure of its collective nature and potentially reflects variability in individual behavior. Here we performed a high-throughput measurement of trophallaxis and face-to-face event durations experienced by a colony of honeybees over their entire lifetimes. The interaction time distribution is heavy-tailed, as previously reported for human face-to-face interactions. We developed a theory of pair interactions that takes into account individual variability and predicts the scaling behavior for both bee and extant human datasets. The individual variability of worker honeybees was nonzero but less than that of humans, possibly reflecting their greater genetic relatedness. Our work shows how individual differences can lead to universal patterns of behavior that transcend species and specific mechanisms for social interactions.


Subject(s)
Behavior, Animal/physiology , Biological Variation, Individual , Models, Biological , Social Behavior , Social Interaction , Animals , Bees/physiology , Datasets as Topic , High-Throughput Screening Assays , Humans , Individuality , Time Factors
10.
Mem Cognit ; 48(5): 772-787, 2020 07.
Article in English | MEDLINE | ID: mdl-32078735

ABSTRACT

Free-recall tasks suggest human memory foraging may follow a heavy-tailed distribution, such as a Lévy flight, patch foraging, or area-restricted search - walk procedures that are common in other activities of cognitive agents, such as food foraging in both animals and humans. To date, research merely equates memory foraging with hunting in the physical world based on similarities in statistical structure. The current work supports that memory foraging follows a heavy-tailed distribution by using categories with quantitative distances between each item: countries, which have physical distances, and animals, from which cognitive distances can be derived using a multidimensional scaling (MDS) procedure. Likewise, inter-item lag times follow a heavy-tailed distribution. The current work also demonstrates that inter-item distances and times are positively correlated, suggesting the organization of items in memory may be akin to the organization of a physical landscape. Finally, both studies show that participants' original, heavy-tailed lists of countries and animal names produce shorter overall distances traveled than random selection. Human memory foraging follows the same pattern as foraging in the natural world - perhaps because exposure to ecological settings informs our inner cognitive experience - leading to a processing and retrieval time benefit.


Subject(s)
Memory , Animals , Humans
11.
Entropy (Basel) ; 23(1)2020 Dec 31.
Article in English | MEDLINE | ID: mdl-33396383

ABSTRACT

Stochastic Configuration Network (SCN) has a powerful capability for regression and classification analysis. Traditionally, it is quite challenging to correctly determine an appropriate architecture for a neural network so that the trained model can achieve excellent performance for both learning and generalization. Compared with the known randomized learning algorithms for single hidden layer feed-forward neural networks, such as Randomized Radial Basis Function (RBF) Networks and Random Vector Functional-link (RVFL), the SCN randomly assigns the input weights and biases of the hidden nodes in a supervisory mechanism. Since the parameters in the hidden layers are randomly generated in uniform distribution, hypothetically, there is optimal randomness. Heavy-tailed distribution has shown optimal randomness in an unknown environment for finding some targets. Therefore, in this research, the authors used heavy-tailed distributions to randomly initialize weights and biases to see if the new SCN models can achieve better performance than the original SCN. Heavy-tailed distributions, such as Lévy distribution, Cauchy distribution, and Weibull distribution, have been used. Since some mixed distributions show heavy-tailed properties, the mixed Gaussian and Laplace distributions were also studied in this research work. Experimental results showed improved performance for SCN with heavy-tailed distributions. For the regression model, SCN-Lévy, SCN-Mixture, SCN-Cauchy, and SCN-Weibull used less hidden nodes to achieve similar performance with SCN. For the classification model, SCN-Mixture, SCN-Lévy, and SCN-Cauchy have higher test accuracy of 91.5%, 91.7% and 92.4%, respectively. Both are higher than the test accuracy of the original SCN.

12.
Math Biosci Eng ; 18(1): 214-230, 2020 11 26.
Article in English | MEDLINE | ID: mdl-33525088

ABSTRACT

Quantile estimation with big data is still a challenging problem in statistics. In this paper we introduce a distributed algorithm for estimating high quantiles of heavy-tailed distributions with massive datasets. The key idea of the algorithm is to apply the alternating direction method of multipliers in parameter estimation of the generalized pareto distribution in a distributed structure and compute high quantiles based on parameter estimation by the Peak Over Threshold method. This paper proves that the proposed algorithm converges to a stationary solution when the step size is properly chosen. The numerical study and real data analysis also shows that the algorithm is feasible and efficient for estimating high quantiles of heavy-tailed distribution with massive datasets and there is a clear-cut winner for the extreme quantiles.

13.
SAR QSAR Environ Res ; 30(6): 417-428, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31122071

ABSTRACT

Linear regression model is frequently encountered in quantitative structure-activity relationship (QSAR) modelling. The traditional estimation of regression model parameters is based on the normal assumption of the response variable (biological activity) and therefore, it is sensitive to outliers or heavy-tailed distributions. Robust penalized regression methods have been given considerable attention because they combine the robust estimation method with penalty terms to perform QSAR parameter estimation and variable selection (descriptor selection) simultaneously. In this paper, based on bridge penalty, a robust QSAR model of the influenza neuraminidase a/PR/8/34 (H1N1) inhibitors is proposed as a resistant method to the existence of outliers or heavy-tailed errors. The basic idea is to combine the rank regression and the bridge penalty together to produce the rank-bridge method. The rank-bridge model is internally and externally validated based on Qint2 , QLGO2 , QBoot2 , MSEtrain , Y-randomization test, Qext2 , MSEtest and the applicability domain (AD). The validation results indicate that the rank-bridge model is robust and not due to chance correlation. In addition, the results indicate that the descriptor selection and prediction performance of the rank-bridge model for training dataset outperforms the other two used modelling methods. Rank-bridge model shows the highest Qint2 , QLGO2 and QBoot2 , and the lowest MSEtrain . For the test dataset, rank-bridge model shows higher external validation value ( Qext2 = 0.824), and lower value of MSEtest compared with the other methods, indicating its higher predictive ability.


Subject(s)
Antiviral Agents/chemistry , Enzyme Inhibitors/chemistry , Influenza A Virus, H1N1 Subtype/enzymology , Neuraminidase/antagonists & inhibitors , Quantitative Structure-Activity Relationship , Humans , Linear Models , Models, Molecular , Neuraminidase/chemistry
14.
Sci Total Environ ; 633: 1480-1495, 2018 Aug 15.
Article in English | MEDLINE | ID: mdl-29758900

ABSTRACT

The heavy-tailed distribution of the data of organic pollution in soils can raise specific problems in estimating and mapping the concentrations. Some high values often highly impact the sample variogram and extend the pollution hot-spots on the estimation maps. Non-linear geostatistical models, such as the anamorphosed Gaussian model, have been proposed in the 70's. They allow a consistent estimate of the concentrations and the probability that the concentrations exceed a cut-off. These well-founded methods are rarely used by environmental consultants, mainly because of time constraints and because the hypotheses of the models are not always satisfied. To estimate the concentrations, an empirical method widely used by environmental consultants consists of truncating the high values to gain robustness in the variogram analysis. The truncation value is arbitrary, even if it has a strong influence on the estimates of the concentrations. Proposed to handle heavy-tailed distributions of ore grades, the top-cut model (Rivoirard et al., 2013) justifies the use of truncated values but corrects the underestimation of the mean caused by truncation. In this model, the decomposition of the variable into three components (the truncated value, a weighted indicator at the top-cut threshold and a residual) makes the variographic study more robust and guides choosing the top-cut threshold. In the case of a chlorinated solvent contamination, a detailed comparison between several estimation methods is performed: ordinary kriging, kriging after truncation of the highest concentrations and estimation within the top-cut model, with structured or pure nugget residual. A sensitivity study to the top-cut threshold is realized. The results of two implementations of the cross-validation are compared. The top-cut model with nugget residual appears to be robust, even if the hypotheses of the model are not perfectly satisfied.

15.
Springerplus ; 5(1): 2089, 2016.
Article in English | MEDLINE | ID: mdl-28018797

ABSTRACT

South Africa is a cornucopia of the platinum group metals particularly platinum and palladium. These metals have many unique physical and chemical characteristics which render them indispensable to technology and industry, the markets and the medical field. In this paper we carry out a holistic investigation on long memory (LM), structural breaks and stylized facts in platinum and palladium return and volatility series. To investigate LM we employed a wide range of methods based on time domain, Fourier and wavelet based techniques while we attend to the dual LM phenomenon using ARFIMA-FIGARCH type models, namely FIGARCH, ARFIMA-FIEGARCH, ARFIMA-FIAPARCH and ARFIMA-HYGARCH models. Our results suggests that platinum and palladium returns are mean reverting while volatility exhibited strong LM. Using the Akaike information criterion (AIC) the ARFIMA-FIAPARCH model under the Student distribution was adjudged to be the best model in the case of platinum returns although the ARCH-effect was slightly significant while using the Schwarz information criterion (SIC) the ARFIMA-FIAPARCH under the Normal Distribution outperforms all the other models. Further, the ARFIMA-FIEGARCH under the Skewed Student distribution model and ARFIMA-HYGARCH under the Normal distribution models were able to capture the ARCH-effect. In the case of palladium based on both the AIC and SIC, the ARFIMA-FIAPARCH under the GED distribution model is selected although the ARCH-effect was slightly significant. Also, ARFIMA-FIEGARCH under the GED and ARFIMA-HYGARCH under the normal distribution models were able to capture the ARCH-effect. The best models with respect to prediction excluded the ARFIMA-FIGARCH model and were dominated by the ARFIMA-FIAPARCH model under Non-normal error distributions indicating the importance of asymmetry and heavy tailed error distributions.

SELECTION OF CITATIONS
SEARCH DETAIL