Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Heliyon ; 10(14): e33839, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39092266

RESUMO

This article considers the issue of domain mean estimation utilizing bivariate auxiliary information based enhanced direct and synthetic logarithmic type estimators under simple random sampling (SRS). The expressions of mean square error (MSE) of the proposed estimators are provided to the 1 s t order approximation. The efficiency criteria are derived to exhibit the dominance of the proposed estimators. To exemplify the theoretical results, a simulation study is conducted on a hypothetically drawn trivariate normal population from R programming language. Some applications of the suggested methods are also provided by analyzing the actual data from the municipalities of Sweden and acreage of paddy crop in the Mohanlal Ganj tehsil of the Indian state of Uttar Pradesh. The findings of the simulation and real data application exhibit that the proposed direct and synthetic logarithmic estimators dominate the conventional direct and synthetic mean, ratio, and logarithmic estimators in terms of least MSE and highest percent relative efficiency.

2.
Front Big Data ; 7: 1382144, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39015435

RESUMO

Low-rank tensor completion (LRTC), which aims to complete missing entries from tensors with partially observed terms by utilizing the low-rank structure of tensors, has been widely used in various real-world issues. The core tensor nuclear norm minimization (CTNM) method based on Tucker decomposition is one of common LRTC methods. However, the CTNM methods based on Tucker decomposition often have a large computing cost due to the fact that the general factor matrix solving technique involves multiple singular value decompositions (SVDs) in each loop. To address this problem, this article enhances the method and proposes an effective CTNM method based on thin QR decomposition (CTNM-QR) with lower computing complexity. The proposed method extends the CTNM by introducing tensor versions of the auxiliary variables instead of matrices, while using the thin QR decomposition to solve the factor matrix rather than the SVD, which can save the computational complexity and improve the tensor completion accuracy. In addition, the CTNM-QR method's convergence and complexity are analyzed further. Numerous experiments in synthetic data, real color images, and brain MRI data at different missing rates demonstrate that the proposed method not only outperforms in terms of completion accuracy and visualization, but also conducts more efficiently than most state-of-the-art LRTC methods.

3.
Heliyon ; 10(10): e31291, 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38826740

RESUMO

Improvement in the estimation of population mean has been an area of interest in sampling theory. So many estimators have been suggested for elevated estimation of the population mean in stratified random sampling, but there is still a gap for more closely estimating the population mean. In this paper, the authors propose a ratio-product-cum-exponential-cum-logarithmic type estimator for the enhanced estimation of population mean by implying one auxiliary variable in stratified random sampling using conventional ratio, exponential ratio, and logarithmic ratio type estimators. The suggested estimator is a generalization of ratio, exponential ratio, and logarithmic ratio type estimators, and therefore these are special cases of the proposed estimator. The proposed estimator's bias and MSE are determined and compared with those of influential estimators, with the linear cost function being used to investigate and compare alternatives. Use Cramer's rule to determine the optimal value of the proposed estimator. The proposed estimator is more effective than other existing estimators, according to theoretical observations. For various applications, we suggest using a proposed estimator with the minimal MSE, which is verified by a numerical example, to have practical applicability of theoretical conclusions in real life.

4.
Sci Rep ; 14(1): 10255, 2024 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-38704410

RESUMO

Our study explores neutrosophic statistics, an extension of classical and fuzzy statistics, to address the challenges of data uncertainty. By leveraging accurate measurements of an auxiliary variable, we can derive precise estimates for the unknown population median. The estimators introduced in this research are particularly useful for analysing unclear, vague data or within the neutrosophic realm. Unlike traditional methods that yield single-valued outcomes, our estimators produce ranges, suggesting where the population parameter is likely to be. We present the suggested generalised estimator's bias and mean square error within a first-order approximation framework. The practicality and efficiency of these proposed neutrosophic estimators are demonstrated through real-world data applications and the simulated data set.

5.
Sci Rep ; 14(1): 11086, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38750190

RESUMO

The existence of measurement errors cannot be avoided in practice. It is a prominent fact that the existence of measurement errors diminishes conventional properties of the estimators. A modified correlated measurement errors model has been proposed. Shalabh and Tsai (Commun Stat Simul Comput 46(7):5566-5593. 10.1080/03610918.2016.1165845, 2017) correlated measurement errors model is a particular member of the suggested modified model. In this article, we have tackled the estimation of population mean utilizing auxiliary information under modified correlated measurement errors model. We have developed ratio and product estimators and studied their properties in case of simple random sampling without replacement (SRSWOR) up to first order of approximation. It has been illustrated that suggested ratio and product estimators are more efficient than the conventional unbiased estimator as well as Shalabh and Tsai (Commun Stat Simul Comput 46(7):5566-5593. 10.1080/03610918.2016.1165845, 2017) ratio and product estimators under very realistic situations. An empirical study has also been performed to demonstrate the merits of the recommended estimators over other estimators.

6.
ISA Trans ; 149: 266-280, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38627161

RESUMO

This paper develops two-filter particle smoothing (TFPS) algorithms for the nonlinear fixed-interval smoothing problem of one generalized hidden Markov model (GHMM), where the current observation depends not only on the current state, but also on one-step previous state. Firstly, by Bayesian approach, the two-filter smoothing (TFS) formula for GHMM is established to calculate smoothing densities. In this TFS formula, the backward information prediction density is generally not a density of the state. This results in a difficulty that the normal sequential Monte Carlo (SMC) sampling technique cannot be directly applied to design corresponding TFPS algorithms based on the TFS formula. To solve this difficulty, a generalized TFS formula for GHMM is then proposed by introducing a sequence of artificial densities. By combining this generalized TFS formula, SMC, and the auxiliary variable sampling technique, a basic auxiliary TFPS (ATFPS) algorithm with quadratic computational complexity is proposed, and a simplified ATFPS algorithm with linear computational complexity is further developed. Finally, the effectiveness and superiority of the two proposed ATFPS algorithms for GHMM are verified via simulation examples and real experimental data.

7.
Sci Rep ; 14(1): 8117, 2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38582765

RESUMO

This paper offers a novel approach to formulate efficient ratio estimator of the population variance using a transformed auxiliary variable. The impact of transformation on auxiliary information has also been discussed. It is observed that incorporating a transformed auxiliary variable result in a high gain in efficiency. Theoretical properties of the newly developed estimators have been derived. The empirical and simulation studies show that the suggested estimators outperformed the existing estimators.

8.
Heliyon ; 10(7): e28891, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38601683

RESUMO

To estimate the unknown population median, several researchers have developed efficient estimators but these estimators are unable to provide efficient results in the existence of outliers. Keeping this point in view, the present work suggests enhanced class of robust estimators to estimate population median under simple random sampling in case of outliers/extreme observations. The suggested estimators are a mixture of bivariate auxiliary information and robust measures with the linear combination of deciles mean, tri-mean and Hodges Lehmann estimator. Mathematical properties associated with the improved class of robust estimators are evaluated in terms of bias and mean squared error. Moreover, the potentiality of our suggested estimators as compared to already available estimators is checked by considering two real-life data sets with outlier(s). In addition, a simulation study is also added in this regard. From theoretical and numerical findings, it is observed that our newly suggested estimators outperforms as compared to its competitors.

9.
Heliyon ; 10(6): e27546, 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38524533

RESUMO

Asking direct questions in face to face surveys about sensitive traits is an intricate issue. One of the solutions to this issue is the randomized response technique (RRT). Being the most widely used indirect questioning technique to obtain truthful data on sensitive traits in survey sampling RRT has been applied in a variety of fields including behavioral science, socio-economic, psychological, epidemiology, biomedical, criminology, data masking, public health engineering, conservation studies, ecological studies and many others. This paper aims at exploring the methods to subsidize the randomized response technique through additional information relevant to the parameter of interest. Specifically, we plan to contribute by proposing more efficient hybrid estimators compared to existing estimator based on (Kuk, 1990) [31] family of randomized response models. The proposed estimators are based on the methodology of incorporating the pertinent information, available on the basis of either historical records or expert opinion. Specifically, in case of availability of auxiliary information, the regression-cum-ratio estimator is found to be the best to further enhance the estimation through (Kuk, 1990) [31] model while the (Thompson, 1968) [49] shrinkage estimation is observed to be yielding more precise and accurate estimator of sensitive proportion. The findings in this study signify the importance of the proposed methodology. Additionally, to support the mathematical findings, a detailed numerical investigation to evaluate the comparative performances is also conducted. Based on performance analysis, overwhelming evidences are witnessed in the favor of proposed strategies.

10.
Heliyon ; 10(5): e27488, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38495208

RESUMO

In sampling theory, a majority of the available estimators of population variance are designed for use with non-sensitive variables only. Such estimators cannot perform efficiently when the variable of interest is of sensitive nature, such as use of drugs, illegal income, abortion, cheating in examination, the amount of income tax payable, and the violation of rules by employees, etc. In the current literature, the shortage of research studies on variance estimators of a sensitive variable has created a big research gap and a room for improvement in the efficiency of such estimators. In this paper, a new randomized scrambling technique is proposed, along with a new estimator of population variance. The new estimator achieves improvement in efficiency over the available variance estimators. The proposed estimator is designed for use with simple random sampling and uses the information on an auxiliary variable. The improvement in efficiency is shown for different choices of constants. Besides efficiency, improvement in the unified measure of estimator quality is also achieved with the proposed estimator under the new randomized response model.

11.
Heliyon ; 10(6): e26897, 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38533019

RESUMO

In the real-world, there are various situations when all units are not accessible of the respondent called unit non-response. The effect of unit non-response is a tricky matter for estimating the total number of unit. The present work highlights the interest about subpopulations (domains) in two affairs: i. if domains total of the supportive information is accessible ii. if domains total of the supportive variable does not access. The government needs to be introducing the actual facilities in these small domains. The supportive information is used to find out the estimate of the non respondent information and to apply this information for desired domains. Sometimes, it has been found that the accessible auxiliary variable for the domains might be positive shape. Therefore, it develops an appropriate model that has positive skewness. The present context highlighted the indirect method using a power-based estimation with calibration approach. By combining power based estimation and calibration technique, it is possible to obtain more accurate estimates for intended small domains. Even the supportive information is positively biased. This approach helps us in mitigating the effect of non-respondent and improving the overall reliability of the estimators. The simulation was conducted for different sizes 70 and 90 when nonresponse variable in the study variable. The results show that investigated power-based estimate provides better option over relevant exponential, ratio, and generalized regression estimators for intended domains.

12.
Front Epidemiol ; 3: 1237447, 2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37974561

RESUMO

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a "collider"), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

13.
Heliyon ; 9(11): e21418, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37885711

RESUMO

Values that are too large or small enough can be found in many data sets. Therefore, the estimator can yield ambiguous findings if several of the incredible deals are picked for the sample. When such extreme values occur, we propose improved estimators to determine the finite population means using double sampling based on probability proportional to size sampling (PPS). The properties of estimators are obtained up to the first order of approximations. When the size of the units varies widely, the PPS sampling technique may be employed. To determine the values of Pi when using PPS, we must be acquainted with the aggregate of the auxiliary variable Xi. However the designs and estimation techniques we have looked at so far are unsuccessful and are less effective when this information is difficult to locate or when other information is missing. The two-phase approach is preferable and more feasible in these kinds of circumstances. To demonstrate how effectively the recommended estimators performed, we used three actual data sets. We show mathematically and theoretically that the suggested estimators outperform alternative estimators.

14.
J Clin Epidemiol ; 154: 33-41, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36464232

RESUMO

OBJECTIVES: To investigate whether a complete case logistic regression gives a biased estimate of the exposure odds ratio (OR) if missingness depends on a continuous outcome, but a binary version is used for analysis; to examine whether any bias could be reduced by including a misclassified form of the incomplete outcome as an auxiliary variable in multiple imputation (MI). STUDY DESIGN AND SETTING: Analytical investigation, simulation study, and data from a UK cohort. RESULTS: There was bias in the exposure OR when the probability of being a complete case was independently associated with the exposure and (continuous) outcome but this was generally small unless the association with the outcome was strong. Where exposure and (continuous) outcome interacted in their effect on this probability, the bias was large, particularly at high levels of missing data. Inclusion of the auxiliary variable resulted in important bias reductions when this had high sensitivity and specificity. CONCLUSION: The robustness of logistic regression to missing data is not maintained when the outcome is a binary version of an underlying continuous measure, but the bias will be small unless the association between the continuous outcome and missingness is strong.


Assuntos
Modelos Logísticos , Humanos , Interpretação Estatística de Dados , Probabilidade , Viés , Simulação por Computador
15.
Ann Epidemiol ; 74: 75-83, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35940394

RESUMO

PURPOSE: To demonstrate improvements in the precision of inverse probability-weighted estimators by use of auxiliary variables, i.e., determinants of the outcome that are independent of treatment, missingness or selection. METHODS: First with simulated data, and then with public data from the National Health and Nutrition Examination Survey (NHANES), we estimated the mean of a continuous outcome using inverse probability weights to account for informative missingness. We assessed gains in precision resulting from the inclusion of auxiliary variables in the model for the weights. We compared the performance of robust and nonparametric bootstrap variance estimators in this setting. RESULTS: We found that the inclusion of auxiliary variables reduced the empirical variance of inverse probability-weighted estimators. However, that reduction was not captured in standard errors computed using the robust variance estimator, which is widely used in weighted analyses due to the non-independence of weighted observations. In contrast, a nonparametric bootstrap estimator properly captured the precision gain. CONCLUSIONS: Epidemiologists can leverage auxiliary data to improve the precision of weighted estimators by using bootstrap variance estimation, or a closed-form variance estimator that properly accounts for the estimation of the weights, in place of the standard robust variance estimator.


Assuntos
Modelos Estatísticos , Causalidade , Simulação por Computador , Humanos , Inquéritos Nutricionais , Probabilidade
16.
Stat Med ; 40(30): 6777-6791, 2021 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-34585424

RESUMO

Multiple imputation (MI) provides us with efficient estimators in model-based methods for handling missing data under the true model. It is also well-understood that design-based estimators are robust methods that do not require accurately modeling the missing data; however, they can be inefficient. In any applied setting, it is difficult to know whether a missing data model may be good enough to win the bias-efficiency trade-off. Raking of weights is one approach that relies on constructing an auxiliary variable from data observed on the full cohort, which is then used to adjust the weights for the usual Horvitz-Thompson estimator. Computing the optimally efficient raking estimator requires evaluating the expectation of the efficient score given the full cohort data, which is generally infeasible. We demonstrate MI as a practical method to compute a raking estimator that will be optimal. We compare this estimator to common parametric and semi-parametric estimators, including standard MI. We show that while estimators, such as the semi-parametric maximum likelihood and MI estimator, obtain optimal performance under the true model, the proposed raking estimator utilizing MI maintains a better robustness-efficiency trade-off even under mild model misspecification. We also show that the standard raking estimator, without MI, is often competitive with the optimal raking estimator. We demonstrate these properties through several numerical examples and provide a theoretical discussion of conditions for asymptotically superior relative efficiency of the proposed raking estimator.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Viés , Estudos de Coortes , Interpretação Estatística de Dados , Humanos
17.
BMC Med Res Methodol ; 21(1): 173, 2021 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-34404347

RESUMO

BACKGROUND: The use of auxiliary variables with maximum likelihood parameter estimation for surveys that miss data by design is not a widespread approach, despite its documented improved efficiency over traditional approaches that deploy sampling weights. Although efficiency gains from the use of Normally distributed auxiliary variables in a model have been recorded in the literature, little is known about the effects of non-Normal auxiliary variables in the parameter estimation. METHODS: We simulate growth data to mimic SCALES, a two-stage survey of language development with a screening phase (stage one) for which data are observed for the whole sample and an intensive assessments phase (stage two), for which data are observed for a sub-sample, selected using stratified random sampling. In the simulation, we allow a fully observed Poisson distributed stratification criterion to be correlated with the partially observed model responses and develop five generalised structural equation growth models that host the auxiliary information from this criterion. We compare these models with each other and with a weighted growth model in terms of bias, efficiency, and coverage. We finally apply our best performing model to SCALES data and show how to obtain growth parameters and population norms. RESULTS: Parameter estimation from a model that incorporates a non-Normal auxiliary variable is unbiased and more efficient than its weighted counterpart. The auxiliary variable method is capable of producing efficient population percentile norms and velocities. CONCLUSIONS: The deployment of a fully observed variable that dominates the selection of the sample and correlates strongly with the incomplete variable of interest appears beneficial for the estimation process.


Assuntos
Projetos de Pesquisa , Viés , Simulação por Computador , Humanos , Inquéritos e Questionários
18.
Educ Psychol Meas ; 80(2): 389-398, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32158027

RESUMO

A procedure for evaluation of validity related coefficients and their differences is discussed, which is applicable when one or more frequently used assumptions in empirical educational, behavioral and social research are violated. The method is developed within the framework of the latent variable modeling methodology and accomplishes point and interval estimation of convergent and discriminant correlations as well as differences between them in cases of incomplete data sets with data not missing at random, nonnormality, and clustering effects. The procedure uses the full information maximum likelihood approach to model fitting and parameter estimation, does not assume availability of multiple indicators for underlying latent constructs, includes auxiliary variables, and accounts for within-group correlations on main response variables resulting from nesting effects involving studied respondents. The outlined procedure is illustrated on empirical data from a study using tertiary education entrance examination measures.

19.
Springerplus ; 5(1): 723, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27375992

RESUMO

In this manuscript, we have proposed a difference-type estimator for population mean under two-phase sampling scheme using two auxiliary variables. The properties and the mean square error of the proposed estimator are derived up to first order of approximation; we have also found some efficiency comparison conditions for the proposed estimator in comparison with the other existing estimators under which the proposed estimator performed better than the other relevant existing estimators. We show that the proposed estimator is more efficient than other available estimators under the two phase sampling scheme for this one example; however, further study is needed to establish the superiority of the proposed estimator for other populations.

20.
Springerplus ; 5: 86, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26848426

RESUMO

In this article, we have proposed a ratio chain-type exponential estimator for finite population mean of the study variable under double sampling scheme using auxiliary variables. The large sample properties of the suggested strategy are derived up to first order, of approximation, and its competence conditions are carried out under which the suggested estimator is performed better than the other existing estimators discussed in the literature. An empirical study shows that the suggested strategy is more efficient than the other relevant competing estimators under two phase sampling scheme.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA