Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Sensors (Basel) ; 22(10)2022 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-35632152

RESUMEN

In this paper, we propose a new privatization mechanism based on a naive theory of a perturbation on a probability using wavelets, such as a noise perturbs the signal of a digital image sensor. Wavelets are employed to extract information from a wide range of types of data, including audio signals and images often related to sensors, as unstructured data. Specifically, the cumulative wavelet integral function is defined to build the perturbation on a probability with the help of this function. We show that an arbitrary distribution function additively perturbed is still a distribution function, which can be seen as a privatized distribution, with the privatization mechanism being a wavelet function. Thus, we offer a mathematical method for choosing a suitable probability distribution for data by starting from some guessed initial distribution. Examples of the proposed method are discussed. Computational experiments were carried out using a database-sensor and two related algorithms. Several knowledge areas can benefit from the new approach proposed in this investigation. The areas of artificial intelligence, machine learning, and deep learning constantly need techniques for data fitting, whose areas are closely related to sensors. Therefore, we believe that the proposed privatization mechanism is an important contribution to increasing the spectrum of existing techniques.


Asunto(s)
Inteligencia Artificial , Privatización , Algoritmos , Aprendizaje Automático , Probabilidad
2.
An Acad Bras Cienc ; 93(2): e20181019, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34190839

RESUMEN

In this paper, we introduce a new family of distributions whose probability density function is defined as a weighted sum of two probability density functions; one is defined as a warped version of the other. We focus our attention on a special case based on the exponential distribution with three parameters, a dilation transformation and a weight with polynomial decay, leading to a new life-time distribution. The explicit expressions of the moments generating function, moments and quantile function of the proposed distribution are provided. For estimating the parameters, the method of maximum likelihood estimation is used. Two applications with practical data sets are given.


Asunto(s)
Algoritmos , Modelos Estadísticos , Funciones de Verosimilitud , Distribuciones Estadísticas
3.
Entropy (Basel) ; 23(8)2021 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-34441228

RESUMEN

In this article, the "truncated-composed" scheme was applied to the Burr X distribution to motivate a new family of univariate continuous-type distributions, called the truncated Burr X generated family. It is mathematically simple and provides more modeling freedom for any parental distribution. Additional functionality is conferred on the probability density and hazard rate functions, improving their peak, asymmetry, tail, and flatness levels. These characteristics are represented analytically and graphically with three special distributions of the family derived from the exponential, Rayleigh, and Lindley distributions. Subsequently, we conducted asymptotic, first-order stochastic dominance, series expansion, Tsallis entropy, and moment studies. Useful risk measures were also investigated. The remainder of the study was devoted to the statistical use of the associated models. In particular, we developed an adapted maximum likelihood methodology aiming to efficiently estimate the model parameters. The special distribution extending the exponential distribution was applied as a statistical model to fit two sets of actuarial and financial data. It performed better than a wide variety of selected competing non-nested models. Numerical applications for risk measures are also given.

4.
Entropy (Basel) ; 23(11)2021 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-34828091

RESUMEN

In this article, we propose the exponentiated sine-generated family of distributions. Some important properties are demonstrated, such as the series representation of the probability density function, quantile function, moments, stress-strength reliability, and Rényi entropy. A particular member, called the exponentiated sine Weibull distribution, is highlighted; we analyze its skewness and kurtosis, moments, quantile function, residual mean and reversed mean residual life functions, order statistics, and extreme value distributions. Maximum likelihood estimation and Bayes estimation under the square error loss function are considered. Simulation studies are used to assess the techniques, and their performance gives satisfactory results as discussed by the mean square error, confidence intervals, and coverage probabilities of the estimates. The stress-strength reliability parameter of the exponentiated sine Weibull model is derived and estimated by the maximum likelihood estimation method. Also, nonparametric bootstrap techniques are used to approximate the confidence interval of the reliability parameter. A simulation is conducted to examine the mean square error, standard deviations, confidence intervals, and coverage probabilities of the reliability parameter. Finally, three real applications of the exponentiated sine Weibull model are provided. One of them considers stress-strength data.

5.
Chaos ; 30(11): 113142, 2020 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33261340

RESUMEN

The purpose of this study is to discriminate sunflower seeds with the help of a dataset having spectral and textural features. The production of crop based on seed purity and quality other hand sunflower seed used for oil content worldwide. In this regard, the foundation of a dataset categorizes sunflower seed varieties (Syngenta CG, HS360, S278, HS30, Armani, and High Sun 33), which were acquired from the agricultural farms of The Islamia University of Bahawalpur, Pakistan, into six classes. For preprocessing, a new region-oriented seed-based segmentation was deployed for the automatic selection of regions and extraction of 53 multi-features from each region, while 11 optimized fused multi-features were selected using the chi-square feature selection technique. For discrimination, four supervised classifiers, namely, deep learning J4, support vector machine, random committee, and Bayes net, were employed to optimize the multi-feature dataset. We observe very promising accuracies of 98.2%, 97.5%, 96.6%, and 94.8%, respectively, when the size of a region is (180 × 180).


Asunto(s)
Helianthus , Teorema de Bayes , Humanos , Máquina de Vectores de Soporte
6.
Entropy (Basel) ; 22(3)2020 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-33286120

RESUMEN

As a matter of fact, the statistical literature lacks of general family of distributions based on the truncated Cauchy distribution. In this paper, such a family is proposed, called the truncated Cauchy power-G family. It stands out for the originality of the involved functions, its overall simplicity and its desirable properties for modelling purposes. In particular, (i) only one parameter is added to the baseline distribution avoiding the over-parametrization phenomenon, (ii) the related probability functions (cumulative distribution, probability density, hazard rate, and quantile functions) have tractable expressions, and (iii) thanks to the combined action of the arctangent and power functions, the flexible properties of the baseline distribution (symmetry, skewness, kurtosis, etc.) can be really enhanced. These aspects are discussed in detail, with the support of comprehensive numerical and graphical results. Furthermore, important mathematical features of the new family are derived, such as the moments, skewness and kurtosis, two kinds of entropy and order statistics. For the applied side, new models can be created in view of fitting data sets with simple or complex structure. This last point is illustrated by the consideration of the Weibull distribution as baseline, the maximum likelihood method of estimation and two practical data sets wit different skewness properties. The obtained results show that the truncated Cauchy power-G family is very competitive in comparison to other well implanted general families.

7.
Entropy (Basel) ; 22(4)2020 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-33286223

RESUMEN

The inverse Rayleigh distribution finds applications in many lifetime studies, but has not enough overall flexibility to model lifetime phenomena where moderately right-skewed or near symmetrical data are observed. This paper proposes a solution by introducing a new two-parameter extension of this distribution through the use of the half-logistic transformation. The first contribution is theoretical: we provide a comprehensive account of its mathematical properties, specifically stochastic ordering results, a general linear representation for the exponentiated probability density function, raw/inverted moments, incomplete moments, skewness, kurtosis, and entropy measures. Evidences show that the related model can accommodate the treatment of lifetime data with different right-skewed features, so far beyond the possibility of the former inverse Rayleigh model. We illustrate this aspect by exploring the statistical inference of the new model. Five classical different methods for the estimation of the model parameters are employed, with a simulation study comparing the numerical behavior of the different estimates. The estimation of entropy measures is also discussed numerically. Finally, two practical data sets are used as application to attest of the usefulness of the new model, with favorable goodness-of-fit results in comparison to three recent extended inverse Rayleigh models.

8.
Entropy (Basel) ; 22(6)2020 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-33286373

RESUMEN

The inverse Lomax distribution has been widely used in many applied fields such as reliability, geophysics, economics and engineering sciences. In this paper, an unexplored practical problem involving the inverse Lomax distribution is investigated: the estimation of its entropy when multiple censored data are observed. To reach this goal, the entropy is defined through the Rényi and q-entropies, and we estimate them by combining the maximum likelihood and plugin methods. Then, numerical results are provided to show the behavior of the estimates at various sample sizes, with the determination of the mean squared errors, two-sided approximate confidence intervals and the corresponding average lengths. Our numerical investigations show that, when the sample size increases, the values of the mean squared errors and average lengths decrease. Also, when the censoring level decreases, the considered of Rényi and q-entropies estimates approach the true value. The obtained results validate the usefulness and efficiency of the method. An application to two real life data sets is given.

9.
Entropy (Basel) ; 22(5)2020 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-33286339

RESUMEN

The object of this study was to demonstrate the ability of machine learning (ML) methods for the segmentation and classification of diabetic retinopathy (DR). Two-dimensional (2D) retinal fundus (RF) images were used. The datasets of DR-that is, the mild, moderate, non-proliferative, proliferative, and normal human eye ones-were acquired from 500 patients at Bahawal Victoria Hospital (BVH), Bahawalpur, Pakistan. Five hundred RF datasets (sized 256 × 256) for each DR stage and a total of 2500 (500 × 5) datasets of the five DR stages were acquired. This research introduces the novel clustering-based automated region growing framework. For texture analysis, four types of features-histogram (H), wavelet (W), co-occurrence matrix (COM) and run-length matrix (RLM)-were extracted, and various ML classifiers were employed, achieving 77.67%, 80%, 89.87%, and 96.33% classification accuracies, respectively. To improve classification accuracy, a fused hybrid-feature dataset was generated by applying the data fusion approach. From each image, 245 pieces of hybrid feature data (H, W, COM, and RLM) were observed, while 13 optimized features were selected after applying four different feature selection techniques, namely Fisher, correlation-based feature selection, mutual information, and probability of error plus average correlation. Five ML classifiers named sequential minimal optimization (SMO), logistic (Lg), multi-layer perceptron (MLP), logistic model tree (LMT), and simple logistic (SLg) were deployed on selected optimized features (using 10-fold cross-validation), and they showed considerably high classification accuracies of 98.53%, 99%, 99.66%, 99.73%, and 99.73%, respectively.

10.
BMC Bioinformatics ; 15: 205, 2014 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-24946781

RESUMEN

BACKGROUND: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the difficulty of dealing with subfamilies composed of very few sequences. One of the most crucial bioinformatical tasks affected by the problem of small-size subfamilies is the subtyping of human immunodeficiency virus type 1 (HIV-1) sequences, i.e., HIV-1 subtypes for which only a small number of sequences is known. RESULTS: To deal with small samples for particular subfamilies of HIV-1, we introduce a novel model-based information sharing protocol. It estimates the emission probabilities of the pHMM modeling a particular subfamily not only based on the nucleotide frequencies of the respective subfamily but also incorporating the nucleotide frequencies of all available subfamilies. To this end, the underlying probabilistic model mimics the pattern of commonality and variation between the subtypes with regards to the biological characteristics of HI viruses. In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine. CONCLUSIONS: We apply the modified algorithm to classify HIV-1 sequence data in the form of partial HIV-1 sequences and semi-artificial recombinants. Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique. Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.


Asunto(s)
Biología Computacional/métodos , VIH-1/genética , Cadenas de Markov , Modelos Estadísticos , Recombinación Genética , Algoritmos , Secuencia de Bases , Variación Genética , VIH-1/fisiología , Interacciones Huésped-Patógeno , Humanos , Inmunidad , Modelos Biológicos
11.
Sci Rep ; 14(1): 5956, 2024 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-38472298

RESUMEN

Extensive research has been conducted on poverty in developing countries using conventional regression analysis, which has limited prediction capability. This study aims to address this gap by applying advanced machine learning (ML) methods to predict poverty in Somalia. Utilizing data from the first-ever 2020 Somalia Demographic and Health Survey (SDHS), a cross-sectional study design is considered. ML methods, including random forest (RF), decision tree (DT), support vector machine (SVM), and logistic regression, are tested and applied using R software version 4.1.2, while conventional methods are analyzed using STATA version 17. Evaluation metrics, such as confusion matrix, accuracy, precision, sensitivity, specificity, recall, F1 score, and area under the receiver operating characteristic (AUROC), are employed to assess the performance of predictive models. The prevalence of poverty in Somalia is notable, with approximately seven out of ten Somalis living in poverty, making it one of the highest rates in the region. Among nomadic pastoralists, agro-pastoralists, and internally displaced persons (IDPs), the poverty average stands at 69%, while urban areas have a lower poverty rate of 60%. The accuracy of prediction ranged between 67.21% and 98.36% for the advanced ML methods, with the RF model demonstrating the best performance. The results reveal geographical region, household size, respondent age group, husband employment status, age of household head, and place of residence as the top six predictors of poverty in Somalia. The findings highlight the potential of ML methods to predict poverty and uncover hidden information that traditional statistical methods cannot detect, with the RF model identified as the best classifier for predicting poverty in Somalia.


Asunto(s)
Benchmarking , Aprendizaje Automático , Estudios Transversales , Somalia , Pobreza
12.
J Appl Stat ; 50(1): 131-154, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36530782

RESUMEN

This article introduces a new distribution with two tuning parameters specified on the unit interval. It follows from a 'hyperbolic secant transformation' of a random variable following the Weibull distribution. The lack of research on the prospect of hyperbolic transformations providing flexible distributions over the unit interval is a motivation for the study. The main distributional structural properties of the new distribution are established. The different estimation methods and two simulation works have been derived for model parameters. Subsequently, we develop a related quantile regression model for further statistical perspectives. We consider two real data applications based on the educational measurements of both OECD and some non-members of OECD countries. Our regression model aims to relate the desire to get top grades on certain young students in the OECD countries with some of their Education and School Life Index such as reading performance, work environment at home, and paid work experience. It is shown that the elaborated quantile regression model has a better fitting power than famous regression models when the unit response variable possesses skewed distribution as well as two independent variables are significant in the statistical sense at any standard significance level for the median response.

13.
Math Biosci Eng ; 20(11): 19871-19911, 2023 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-38052628

RESUMEN

Recent innovations have focused on the creation of new families that extend well-known distributions while providing a huge amount of practical flexibility for data modeling. Weighted distributions offer an effective approach for addressing model building and data interpretation problems. The main objective of this work is to provide a novel family based on a weighted generator called the length-biased truncated Lomax-generated (LBTLo-G) family. Discussions are held about the characteristics of the LBTLo-G family, including expressions for the probability density function, moments, and incomplete moments. In addition, different measures of uncertainty are determined. We provide four new sub-distributions and investigated their functionalities. Subsequently, a statistical analysis is given. The LBTLo-G family's parameter estimation is carried out using the maximum likelihood technique on the basis of full and censored samples. Simulation research is conducted to determine the parameters of the LBTLo Weibull (LBTLoW) distribution. Four genuine data sets are considered to illustrate the fitting behavior of the LBTLoW distribution. In each case, the application outcomes demonstrate that the LBTLoW distribution can, in fact, fit the data more accurately than other rival distributions.

14.
Biology (Basel) ; 12(7)2023 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-37508389

RESUMEN

Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.

15.
Immun Inflamm Dis ; 11(8): e981, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37647450

RESUMEN

BACKGROUND: Accessibility to the immense collection of studies on noncommunicable diseases related to coronavirus disease of 2019 (COVID-19) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an immediate focus of researchers. However, there is a scarcity of information about chronic obstructed pulmonary disease (COPD), which is associated with a high rate of infection in COVID-19 patients. Moreover, by combining the effects of the SARS-CoV-2 on COPD patients, we may be able to overcome formidable obstacles factors, and diagnosis influencers. MATERIALS AND METHODS: A retrospective study of 280 patients was conducted at DHQ Hospital Muzaffargarh in Punjab, Pakistan. Negative binomial regression describes the risk of fixed successive variables. The association is described by the Cox proportional hazard model and the model coefficient is determined through log-likelihood observation. Patients with COPD had their survival and mortality plotted on Kaplan-Meier curves. RESULTS: The increased risk of death in COPD patients was due to the effects of variables such as cough, lower respiratory tract infection (LRTI), tuberculosis (TB), and body-aches being 1.369, 0.693, 0.170, and 0.217 times higher at (95% confidence interval [CI]: 0.747-1.992), (95% CI: 0.231-1.156), (95% CI: 0.008-0.332), and (95% CI: -0.07 to 0.440) while it decreased 0.396 in normal condition. CONCLUSION: We found that the symptoms of COPD (cough, LRTI, TB, and bodyaches) are statistically significant in patients who were most infected by SARS-CoV-2.


Asunto(s)
COVID-19 , Enfermedad Pulmonar Obstructiva Crónica , Infecciones del Sistema Respiratorio , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Estudios Retrospectivos , Tos , Pakistán/epidemiología , Factores de Riesgo , Enfermedad Pulmonar Obstructiva Crónica/epidemiología
16.
Artículo en Inglés | MEDLINE | ID: mdl-34360135

RESUMEN

Diet management or caloric restriction for diabetes mellitus patients is essential in order to reduce the disease's burden. Mathematical programming problems can help in this regard; they have a central role in optimal diet management and in the nutritional balance of food recipes. The present study employed linear optimization models such as linear, pre-emptive, and non-pre-emptive goal programming problems (LPP, PGP and NPGP) to minimize the deviations of over and under achievements of specific nutrients for optimal selection of food menus with various energy (calories) levels. Sixty-two food recipes are considered, all selected because of being commonly available for the Indian population and developed dietary intake for meal planning through optimization models. The results suggest that a variety of Indian food recipes with low glycemic values can be chosen to assist the varying glucose levels (>200 mg/dL) of Indian diabetes patients.


Asunto(s)
Diabetes Mellitus , Planificación de Menú , Diabetes Mellitus/prevención & control , Dieta , Ingestión de Energía , Objetivos , Humanos
17.
J Appl Stat ; 48(16): 3002-3024, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35707257

RESUMEN

In this paper, we develop a new general class of skew distributions with flexibility properties on the tails. Moreover, such class can provide heavy and light tails. Some of its mathematical properties are studied, including the quantile function, the moments, the moment generating function and the mean of deviations. New skew distributions are derived and used to construct new models capturing asymmetry inherent to data. The estimation of the class parameters is investigated by the method of maximum likelihood and the performance of the estimators is assessed by a simulation study. Applications of the proposed distribution are explored for two climate data sets. The first data set concerns the annual heat wave index and the second data set involves temperature and precipitation measures from the meteorological station located at Schiphol, Netherlands. Data fitting results show that our models perform better than the competitors.

18.
PLoS One ; 16(5): e0250790, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33974643

RESUMEN

In recent years, the trigonometric families of continuous distributions have found a place of choice in the theory and practice of statistics, with the Sin-G family as leader. In this paper, we provide some contributions to the subject by introducing a flexible extension of the Sin-G family, called the transformed Sin-G family. It is constructed from a new polynomial-trigonometric function presenting a desirable "versatile concave/convex" property, among others. The modelling possibilities of the former Sin-G family are thus multiplied. This potential is also highlighted by a complete theoretical work, showing stochastic ordering results, studying the analytical properties of the main functions, deriving several kinds of moments, and discussing the reliability parameter as well. Then, the applied side of the proposed family is investigated, with numerical results and applications on the related models. In particular, the estimation of the unknown model parameters is performed through the use of the maximum likelihood method. Then, two real life data sets are analyzed by a new extended Weibull model derived to the considered trigonometric mechanism. We show that it performs the best among seven comparable models, illustrating the importance of the findings.


Asunto(s)
Estadística como Asunto , Modelos Estadísticos
19.
PLoS One ; 16(3): e0249027, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33784310

RESUMEN

The estimation of the entropy of a random system or process is of interest in many scientific applications. The aim of this article is the analysis of the entropy of the famous Kumaraswamy distribution, an aspect which has not been the subject of particular attention previously as surprising as it may seem. With this in mind, six different entropy measures are considered and expressed analytically via the beta function. A numerical study is performed to discuss the behavior of these measures. Subsequently, we investigate their estimation through a semi-parametric approach combining the obtained expressions and the maximum likelihood estimation approach. Maximum likelihood estimates for the considered entropy measures are thus derived. The convergence properties of these estimates are proved through a simulated data, showing their numerical efficiency. Concrete applications to two real data sets are provided.


Asunto(s)
Entropía , Estadística como Asunto , Simulación por Computador , Inundaciones , Sedimentos Geológicos/química , Funciones de Verosimilitud , Análisis Numérico Asistido por Computador
20.
J Appl Stat ; 48(1): 124-137, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35707233

RESUMEN

In this paper, a new two-parameter discrete distribution is introduced. It belongs to the family of the weighted geometric distribution (GD), with the feature of using a particular trigonometric weight. This configuration adds an oscillating property to the former GD which can be helpful in analyzing the data with over-dispersion, as developed in this study. First, we present the basic statistical properties of the new distribution, including the cumulative distribution function, hazard rate function and moment generating function. Estimation of the related model parameters is investigated using the maximum likelihood method. A simulation study is performed to illustrate the convergence of the estimators. Applications to two practical datasets are given to show that the new model performs at least as well as some competitors.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA