RESUMEN
This study proposes a procedure for substantive dimensionality estimation in the presence of wording effects, the inconsistent response to regular and reversed self-report items. The procedure developed consists of subtracting an approximate estimate of the wording effects variance from the sample correlation matrix and then estimating the substantive dimensionality on the residual correlation matrix. This is achieved by estimating a random intercept factor with unit loadings for all the regular and unrecoded reversed items. The accuracy of the procedure was evaluated through an extensive simulation study that manipulated nine relevant variables and employed the exploratory graph analysis (EGA) and parallel analysis (PA) retention methods. The results indicated that combining the proposed procedure with EGA or PA achieved high accuracy in estimating the substantive latent dimensionality, but that EGA was superior. Additionally, the present findings shed light on the complex ways that wording effects impact the dimensionality estimates when the response bias in the data is ignored. A tutorial on substantive dimensionality estimation with the R package EGAnet is offered, as well as practical guidelines for applied researchers.
Asunto(s)
Psicometría , Psicometría/métodos , Humanos , Análisis Factorial , Autoinforme , Modelos Estadísticos , Simulación por Computador , Interpretación Estadística de DatosRESUMEN
The accuracy of factor retention methods for structures with one or more general factors, like the ones typically encountered in fields like intelligence, personality, and psychopathology, has often been overlooked in dimensionality research. To address this issue, we compared the performance of several factor retention methods in this context, including a network psychometrics approach developed in this study. For estimating the number of group factors, these methods were the Kaiser criterion, empirical Kaiser criterion, parallel analysis with principal components (PAPCA) or principal axis, and exploratory graph analysis with Louvain clustering (EGALV). We then estimated the number of general factors using the factor scores of the first-order solution suggested by the best two methods, yielding a "second-order" version of PAPCA (PAPCA-FS) and EGALV (EGALV-FS). Additionally, we examined the direct multilevel solution provided by EGALV. All the methods were evaluated in an extensive simulation manipulating nine variables of interest, including population error. The results indicated that EGALV and PAPCA displayed the best overall performance in retrieving the true number of group factors, the former being more sensitive to high cross-loadings, and the latter to weak group factors and small samples. Regarding the estimation of the number of general factors, both PAPCA-FS and EGALV-FS showed a close to perfect accuracy across all the conditions, while EGALV was inaccurate. The methods based on EGA were robust to the conditions most likely to be encountered in practice. Therefore, we highlight the particular usefulness of EGALV (group factors) and EGALV-FS (general factors) for assessing bifactor structures with multiple general factors. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
RESUMEN
The number of available factor analytic techniques has been increasing in the last decades. However, the lack of clear guidelines and exhaustive comparison studies between the techniques might hinder that these valuable methodological advances make their way to applied research. The present paper evaluates the performance of confirmatory factor analysis (CFA), CFA with sequential model modification using modification indices and the Saris procedure, exploratory factor analysis (EFA) with different rotation procedures (Geomin, target, and objectively refined target matrix), Bayesian structural equation modeling (BSEM), and a new set of procedures that, after fitting an unrestrictive model (i.e., EFA, BSEM), identify and retain only the relevant loadings to provide a parsimonious CFA solution (ECFA, BCFA). By means of an exhaustive Monte Carlo simulation study and a real data illustration, it is shown that CFA and BSEM are overly stiff and, consequently, do not appropriately recover the structure of slightly misspecified models. EFA usually provides the most accurate parameter estimates, although the rotation procedure choice is of major importance, especially depending on whether the latent factors are correlated or not. Finally, ECFA might be a sound option whenever an a priori structure cannot be hypothesized and the latent factors are correlated. Moreover, it is shown that the pattern of the results of a factor analytic technique can be somehow predicted based on its positioning in the confirmatory-exploratory continuum. Applied recommendations are given for the selection of the most appropriate technique under different representative scenarios by means of a detailed flowchart. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
RESUMEN
Exploratory bi-factor analysis (EBFA) is a very popular approach to estimate models where specific factors are concomitant to a single, general dimension. However, the models typically encountered in fields like personality, intelligence, and psychopathology involve more than one general factor. To address this circumstance, we developed an algorithm (GSLiD) based on partially specified targets to perform exploratory bi-factor analysis with multiple general factors (EBFA-MGF). In EBFA-MGF, researchers do not need to conduct independent bi-factor analyses anymore because several bi-factor models are estimated simultaneously in an exploratory manner, guarding against biased estimates and model misspecification errors due to unexpected cross-loadings and factor correlations. The results from an exhaustive Monte Carlo simulation manipulating nine variables of interest suggested that GSLiD outperforms the Schmid-Leiman approximation and is robust to challenging conditions involving cross-loadings and pure items of the general factors. Thereby, we supply an R package (bifactor) to make EBFA-MGF readily available for substantive research. Finally, we use GSLiD to assess the hierarchical structure of a reduced version of the Personality Inventory for DSM-5 Short Form (PID-5-SF).
Asunto(s)
Algoritmos , Canales de Calcio , Simulación por Computador , Análisis Factorial , Método de Montecarlo , PsicometríaRESUMEN
Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.
RESUMEN
BACKGROUND: The emergence of digital technology in the field of psychological and educational measurement and assessment broadens the traditional concept of pencil and paper tests. New assessment models built on the proliferation of smartphones, social networks and software developments are opening up new horizons in the field. METHOD: This study is divided into four sections, each discussing the benefits and limitations of a specific type of technology-based assessment: ambulatory assessment, social networks, gamification and forced-choice testing. RESULTS: The latest developments are clearly relevant in the field of psychological and educational measurement and assessment. Among other benefits, they bring greater ecological validity to the assessment process and eliminate the bias associated with retrospective assessment. CONCLUSIONS: Some of these new approaches point to a multidisciplinary scenario with a tradition which has yet to be created. Psychometrics must secure a place in this new world by contributing sound expertise in the measurement of psychological variables. The challenges and debates facing the field of psychology as it incorporates these new approaches are also discussed.
Asunto(s)
Tecnología Digital , Programas Informáticos , Humanos , Estudios Retrospectivos , Psicometría , Evaluación EducacionalRESUMEN
Cognitive diagnosis models (CDMs) are used in educational, clinical, or personnel selection settings to classify respondents with respect to discrete attributes, identifying strengths and needs, and thus allowing to provide tailored training/treatment. As in any assessment, an accurate reliability estimation is crucial for valid score interpretations. In this sense, most CDM reliability indices are based on the posterior probabilities of the estimated attribute profiles. These posteriors are traditionally computed using point estimates for the model parameters as approximations to their populational values. If the uncertainty around these parameters is unaccounted for, the posteriors may be overly peaked, deriving into overestimated reliabilities. This article presents a multiple imputation (MI) procedure to integrate out the model parameters in the estimation of the posterior distributions, thus correcting the reliability estimation. A simulation study was conducted to compare the MI procedure with the traditional reliability estimation. Five factors were manipulated: the attribute structure, the CDM model (DINA and G-DINA), test length, sample size, and item quality. Additionally, an illustration using the Examination for the Certificate of Proficiency in English data was analyzed. The effect of sample size was studied by sampling subsets of subjects from the complete data. In both studies, the traditional reliability estimation systematically provided overestimated reliabilities, whereas the MI procedure offered more accurate results. Accordingly, practitioners in small educational or clinical settings should be aware that the reliability estimation using model parameter point estimates may be positively biased. R codes for the MI procedure are made available.
Asunto(s)
Concienciación , Humanos , Reproducibilidad de los Resultados , Simulación por ComputadorRESUMEN
The use of multidimensional forced-choice questionnaires has been proposed as a means of improving validity in the assessment of non-cognitive attributes in high-stakes scenarios. However, the reduced precision of trait estimates in this questionnaire format is an important drawback. Accordingly, this article presents an optimization procedure for assembling pairwise forced-choice questionnaires while maximizing posterior marginal reliabilities. This procedure is performed through the adaptation of a known genetic algorithm (GA) for combinatorial problems. In a simulation study, the efficiency of the proposed procedure was compared with a quasi-brute-force (BF) search. For this purpose, five-dimensional item pools were simulated to emulate the real problem of generating a forced-choice personality questionnaire under the five-factor model. Three factors were manipulated: (1) the length of the questionnaire, (2) the relative item pool size with respect to the questionnaire's length, and (3) the true correlations between traits. The recovery of the person parameters for each assembled questionnaire was evaluated through the squared correlation between estimated and true parameters, the root mean square error between the estimated and true parameters, the average difference between the estimated and true inter-trait correlations, and the average standard error for each trait level. The proposed GA offered more accurate trait estimates than the BF search within a reasonable computation time in every simulation condition. Such improvements were especially important when measuring correlated traits and when the relative item pool sizes were higher. A user-friendly online implementation of the algorithm was made available to the users.
Asunto(s)
Algoritmos , Personalidad , Simulación por Computador , Humanos , Encuestas y CuestionariosRESUMEN
As general factor modeling continues to grow in popularity, researchers have become interested in assessing how reliable general factor scores are. Even though omega hierarchical estimation has been suggested as a useful tool in this context, little is known about how to approximate it using modern bi-factor exploratory factor analysis methods. This study is the first to compare how omega hierarchical estimates were recovered by six alternative algorithms: Bi-quartimin, bi-geomin, Schmid-Leiman (SL), empirical iterative empirical target rotation based on an initial SL solution (SLiD), direct SL (DSL), and direct bi-factor (DBF). The algorithms were tested in three Monte-Carlo simulations including bi-factor and second-order structures and presenting complexities such as cross-loadings or pure indicators of the general factor and structures without a general factor. Results showed that SLiD provided the best approximation to omega hierarchical under most conditions. Overall, neither SL, bi-quartimin, nor bi-geomin produced an overall satisfactory recovery of omega hierarchical. Lastly, the performance of DSL and DBF depended upon the average discrepancy between the loadings of the general and the group factors. The re-analysis of eight classical datasets further illustrated how algorithm selection could influence judgments regarding omega hierarchical.
Asunto(s)
Algoritmos , Juicio , Análisis Factorial , Método de Montecarlo , RotaciónRESUMEN
BACKGROUND: Unproctored Internet Tests (UIT) are vulnerable to cheating attempts by candidates to obtain higher scores. To prevent this, subsequent procedures such as a verification test (VT) is carried out. This study compares five statistics used to detect cheating in Computerized Adaptive Tests (CATs): Guo and Drasgow's Z-test, the Adaptive Measure of Change (AMC), Likelihood Ratio Test (LRT), Score Test, and Modified Signed Likelihood Ratio Test (MSLRT). METHOD: We simulated data from honest and cheating candidates to the UIT and the VT. Honest candidates responded to the UIT and the VT with their real ability level, while cheating candidates responded only to the VT, and different levels of cheating were simulated. We applied hypothesis tests, and obtained type I error and power rates. RESULTS: Although we found differences in type I error rates between some of the procedures, all procedures reported quite accurate results with the exception of the Score Test. The power rates obtained point to MSLRT's superiority in detecting cheating. CONCLUSIONS: We consider the MSLRT to be the best test, as it has the highest power rate and a suitable type I error rate.
Asunto(s)
Decepción , Internet , Funciones de VerosimilitudRESUMEN
BACKGROUND: The inclusion of direct and reversed items in scales is a commonly-used strategy to control acquiescence bias. However, this is not enough to avoid the distortions produced by this response style in the structure of covariances and means of the scale in question. This simulation study provides evidence on the performance of two different procedures for modelling the influence of acquiescence bias on partially balanced multidimensional scales: a method based on exploratory factor analysis (EFA) with target rotation, and a method based on random intercept factor analysis (RIFA). METHOD: The independent variables analyzed in a simulation study were sample size, number of items per factor, balance of substantive loadings of direct and reversed items, size and heterogeneity of acquiescence loadings, and inter-factor correlation. RESULTS: The RIFA method had better performance over most of the conditions, especially for the balanced conditions, although the variance of acquiescence factor loadings had a certain impact. In relation to the EFA method, it was severely affected by a low degree of balance. CONCLUSIONS: RIFA seems the most robust approach, but EFA also remains a good alternative for medium and fully balanced scales.
Asunto(s)
Proyectos de Investigación , Sesgo , Simulación por Computador , Análisis Factorial , HumanosRESUMEN
BACKGROUND: Due to its flexibility and statistical properties, bi-factor Exploratory Structural Equation Modeling (bi-factor ESEM) has become an often-recommended tool in psychometrics. Unfortunately, most recent methods for approximating these structures, such as the SLiD algorithm, are not available in the leading software for performing ESEM (i.e., Mplus). To resolve this issue, we present a novel, user-friendly Shiny application for integrating the SLiD algorithm in bi-factor ESEM estimation in Mplus. Thus, a two-stage framework for conducting SLiD-based bi-factor ESEM in Mplus was developed. METHOD: This approach was presented in a step-by-step guide for applied researchers, showing the utility of the developed SLiDApp application. Using data from the Open-Source Psychometrics Project (N = 2495), we conducted a bi-factor ESEM exploration of the Generic Conspiracist Beliefs Scale. We studied whether bi-factor modelling was appropriate and if both general and group factors were related to each personality trait. RESULTS: The application of the SLiD algorithm provided unique information regarding this factor structure and its ESEM structural parameters. CONCLUSIONS: The results illustrated the usefulness and validity of SLiD-based bi-factor ESEM, and how the proposed Shiny app could make it eaiser for applied researchers to use these methods.
Asunto(s)
Análisis Factorial , Análisis de Clases Latentes , Humanos , PsicometríaRESUMEN
One important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.
RESUMEN
There has been increased interest in assessing the quality and usefulness of short versions of the Raven's Progressive Matrices. A recent proposal, composed of the last twelve matrices of the Standard Progressive Matrices (SPM-LS), has been depicted as a valid measure of g. Nonetheless, the results provided in the initial validation questioned the assumption of essential unidimensionality for SPM-LS scores. We tested this hypothesis through two different statistical techniques. Firstly, we applied exploratory graph analysis to assess SPM-LS dimensionality. Secondly, exploratory bi-factor modelling was employed to understand the extent that potential specific factors represent significant sources of variance after a general factor has been considered. Results evidenced that if modelled appropriately, SPM-LS scores are essentially unidimensional, and that constitute a reliable measure of g. However, an additional specific factor was systematically identified for the last six items of the test. The implications of such findings for future work on the SPM-LS are discussed.
RESUMEN
Multidimensional computerized adaptive testing based on the bifactor model (MCAT-B) can provide efficient assessments of multifaceted constructs. In this study, MCAT-B was compared with a short fixed-length scale and computerized adaptive testing based on unidimensional (UCAT) and multidimensional (correlated-factors) models (MCAT) to measure the Big Five model of personality. The sample comprised 826 respondents who completed a pool with 360 personality items measuring the Big Five domains and facets. The dimensionality of the Big Five domains was also tested. With only 12 items per domain, the MCAT and MCAT-B procedures were more efficient to assess highly multidimensional constructs (e.g., Agreeableness), whereas no differences were found with UCAT and the short scale with traits that were essentially unidimensional (e.g., Extraversion). Furthermore, the study showed that MCAT and MCAT-B provide better content-balance of the pool because, for each Big Five domain, items from all the facets are selected in similar proportions. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Asunto(s)
Diagnóstico por Computador/métodos , Determinación de la Personalidad , Trastornos de la Personalidad/diagnóstico , Trastornos de la Personalidad/psicología , Adolescente , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Estudiantes/psicología , Adulto JovenRESUMEN
PURPOSE: Developing valid and reliable instruments that can be used across countries is necessary. The present study aimed to test the comparability of quality of life scores across three European countries (Finland, Poland, and Spain). METHOD: Data from 9987 participants interviewed between 2011 and 2012 were employed, using nationally representative samples from the Collaborative Research on Ageing in Europe project. The WHOQOL-AGE questionnaire is a 13-item test and was employed to assess the quality of life in the three considered countries. First of all, two models (a bifactor model and a two-correlated factor model) were proposed and tested in each country by means of confirmatory factor models. Second, measurement invariance across the three countries was tested using multi-group confirmatory factor analysis for that model which showed the best fit. Finally, differences in latent mean scores across countries were analyzed. RESULTS: The results indicated that the bifactor model showed more satisfactory goodness-of-fit indices than the two-correlated factor model and that the WHOQOL-AGE questionnaire is a partially scalar invariant instrument (only two items do not meet scalar invariance). Quality of life scores were higher in Finland (considered as the reference category: mean = 0, SD = 1) than in Spain (mean = - 0.547, SD = 1.22) and Poland (mean = - 0.927, SD = 1.26). CONCLUSIONS: Respondents from Finland, Poland, and Spain attribute the same meaning to the latent construct studied, and differences across countries can be due to actual differences in quality of life. According to the results, the comparability across the different considered samples is supported and the WHOQOL-AGE showed an adequate validity in terms of cross-country validation. Caution should be exercised with the two items which did not meet scalar invariance, as potential indicator of differential item functioning.
Asunto(s)
Análisis Factorial , Psicometría/métodos , Calidad de Vida/psicología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Proyectos de Investigación , Encuestas y CuestionariosRESUMEN
Contemporary models of personality assume a hierarchical structure in which broader traits contain narrower traits. Individual differences in response styles also constitute a source of score variance. In this study, the bifactor model is applied to separate these sources of variance for personality subscores. The procedure is illustrated using data for two personality inventories-NEO Personality Inventory-Revised and Zuckerman-Kuhlman-Aluja Personality Questionnaire. The inclusion of the acquiescence method factor generally improved the fit to acceptable levels for the Zuckerman-Kuhlman-Aluja Personality Questionnaire, but not for the NEO Personality Inventory-Revised. This effect was higher in subscales where the number of direct and reverse items is not balanced. Loadings on the specific factors were usually smaller than the loadings on the general factor. In some cases, part of the variance was due to domains being different from the main one. This information is of particular interest to researchers as they can identify which subscale scores have more potential to increase predictive validity.
Asunto(s)
Modelos Psicológicos , Inventario de Personalidad , Adolescente , Adulto , Análisis Factorial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Trastornos de la Personalidad/diagnóstico , Psicometría , Adulto JovenRESUMEN
BACKGROUND: The development of an effective instrument to assess the risk of partner violence is a topic of great social relevance. This study evaluates the scale of “Predicción del Riesgo de Violencia Grave Contra la Pareja” –Revisada– (EPV-R - Severe Intimate Partner Violence Risk Prediction Scale-Revised), a tool developed in Spain, which is facing the problem of how to treat the high rate of missing values, as is usual in this type of scale. METHOD: First, responses to the EPV-R in a sample of 1215 male abusers who were reported to the police were used to analyze the patterns of occurrence of missing values, as well as the factor structure. Second, we analyzed the performance of various imputation methods using simulated data that emulates the missing data mechanism found in the empirical database. RESULTS: The imputation procedure originally proposed by the authors of the scale provides acceptable results, although the application of a method based on the Item Response Theory could provide greater accuracy and offers some additional advantages. CONCLUSIONS: Item Response Theory appears to be a useful tool for imputing missing data in this type of questionnaire.
Asunto(s)
Violencia de Pareja/estadística & datos numéricos , Humanos , Masculino , Modelos Estadísticos , Medición de Riesgo , AutoinformeRESUMEN
BACKGROUND: Even though the Five Factor Model (FFM) has been the dominant paradigm in personality research for the past two decades, very few studies have measured the FFM adaptively. Thus, the purpose of this research was the building of a new item pool to develop a computerized adaptive test (CAT) for personality assessment. METHOD: A pool of 480 items that measured the FFM facets was developed and applied to 826 participants. Facets were calibrated separately and item selection was performed being mindful of the preservation of unidimensionality of each facet. Then, a post-hoc simulation study was carried out to test the performance of separate CATs to measure the facets. RESULTS: The final item pool was composed of 360 items with good psychometric properties. Findings reveal that a CAT administration of four items per facet (total length of 120 items) provides accurate facets scores, while maintaining the factor structure of the FFM. CONCLUSIONS: An item pool with good psychometric properties was obtained and a CAT simulation study demonstrated that the FFM facets could be measured with precision using a third of the items in the pool.
Asunto(s)
Pruebas de Personalidad , Adolescente , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Adulto JovenRESUMEN
The current study proposes a new bi-factor rotation method, Schmid-Leiman with iterative target rotation (SLi), based on the iteration of partially specified target matrices and an initial target constructed from a Schmid-Leiman (SL) orthogonalization. SLi was expected to ameliorate some of the limitations of the previously presented SL bi-factor rotations, SL and SL with target rotation (SLt), when the factor structure either includes cross-loadings, near-zero loadings, or both. A Monte Carlo simulation was carried out to test the performance of SLi, SL, SLt, and the two analytic bi-factor rotations, bi-quartimin and bi-geomin. The results revealed that SLi accurately recovered the bi-factor structures across the majority of the conditions, and generally outperformed the other rotation methods. SLi provided the biggest improvements over SL and SLt when the bi-factor structures contained cross-loadings and pure indicators of the general factor. Additionally, SLi was superior to bi-quartimin and bi-geomin, which performed inconsistently across the types of factor structures evaluated. No method produced a good recovery of the bi-factor structures when small samples (N = 200) were combined with low factor loadings (0.30-0.50) in the specific factors. Thus, it is recommended that larger samples of at least 500 observations be obtained.