|

1.

Controlling for Response Biases in Self-Report Scales: Forced-Choice vs. Psychometric Modeling of Likert Items.

Kreitchmann, Rodrigo Schames; Abad, Francisco J; Ponsoda, Vicente; Nieto, Maria Dolores; Morillo, Daniel.

Front Psychol ; 10: 2309, 2019.

Article En | MEDLINE | ID: mdl-31681103

One important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.

2.

The journey from Likert to forced-choice questionnaires: evidence of the invariance of item parameters / El viaje desde los cuestionarios Likert a los cuestionarios de elección forzosa: evidencia de la invarianza de los parámetros de los ítems

Morillo, Daniel; Abad, Francisco J; Kreitchmann, Rodrigo S; Leenen, Iwin; Hontangas, Pedro; Ponsoda, Vicente.

Rev. psicol. trab. organ. (1999) ; 35(2): 75-83, ago. 2019. tab, graf

Article En | IBECS | ID: ibc-184732

Multidimensional forced-choice questionnaires are widely regarded in the personnel selection literature for their ability to control response biases. Recently developed IRT models usually rely on the assumption that item parameters remain invariant when they are paired in forced-choice blocks, without giving it much consideration. This study aims to test this assumption empirically on the MUPP-2PL model, comparing the parameter estimates of the forced-choice format to their graded-scale equivalent on a Big Five personality instrument. The assumption was found to hold reasonably well, especially for the discrimination parameters. In the cases in which it was violated, we briefly discuss the likely factors that may lead to non-invariance. We conclude discussing the practical implications of the results and providing a few guidelines for the design of forced-choice questionnaires based on the invariance assumption

Los cuestionarios de elección forzosa multidimensionales son bastante apreciados en la literatura de selección de personal por su capacidad para controlar los sesgos de respuesta. Los modelos de TRI desarrollados recientemente normalmente asumen que los parámetros de los ítems permanecen invariantes cuando se emparejan en bloques de elección forzosa, sin dedicarle mucha atención. Este estudio tiene como objetivo poner a prueba empíricamente este supuesto en el modelo MUPP-2PL, comparando las estimaciones de los parámetros del formato de elección forzosa con su equivalente en escala graduada, en un instrumento de personalidad Big Five. Se encontró que el supuesto se cumplía razonablemente bien, especialmente para los parámetros de discriminación. En los casos en los que no se cumplió se discuten brevemente los posibles factores que pueden dar lugar a no invarianza. Concluimos discutiendo las implicaciones prácticas de los resultados y proponiendo algunas pautas para el diseño de cuestionarios de elección forzosa basados en el supuesto de invarianza

Humans , Personnel Selection/methods , Professional Competence , Psychometrics/methods , Test Taking Skills/psychology , Job Description , Self Efficacy , Data Interpretation, Statistical , Choice Behavior , Psychological Tests/statistics & numerical data

3.

Cheating on Unproctored Internet Test Applications: An Analysis of a Verification Test in a Real Personnel Selection Context.

Aguado, David; Vidal, Alejandro; Olea, Julio; Ponsoda, Vicente; Barrada, Juan Ramón; Abad, Francisco José.

Span J Psychol ; 21: E62, 2018 Dec 03.

Article En | MEDLINE | ID: mdl-30501646

This study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test). An initial simulation study showed that the Z-test has adequate Type I error and power rates in the specific selection settings explored. A second study applied the Z-test statistic verification procedure to a sample of 954 employment candidates. Additional external evidence based on item time response to the verification items was gathered. The results revealed a good performance of the Z-test statistic and a relatively low, but non-negligible, number of suspected cheaters that showed higher distorted ability estimates. The study with real data provided additional information on the presence of suspected cheating in unproctored applications and the viability of using item response times as an additional evidence of cheating. In the verification test, suspected cheaters spent 5.78 seconds per item more than expected considering the item difficulty and their assumed ability in the unproctored stage. We found that the percentage of suspected cheaters in the empirical study could be estimated at 13.84%. In summary, the study provides evidence of the usefulness of the Z-test in the detection of cheating in a specific setting, in which a computerized adaptive test for assessing English grammar knowledge was used for personnel selection.

Deception , Educational Measurement/standards , Internet , Personnel Selection/standards , Adult , Female , Humans , Male

4.

A Two-Dimensional Multiple-Choice Model Accounting for Omissions.

Kreitchmann, Rodrigo Schames; Abad, Francisco José; Ponsoda, Vicente.

Front Psychol ; 9: 2540, 2018.

Article En | MEDLINE | ID: mdl-30618961

This paper presents a new two-dimensional Multiple-Choice Model accounting for Omissions (MCMO). Based on Thissen and Steinberg multiple-choice models, the MCMO defines omitted responses as the result of the respondent not knowing the correct answer and deciding to omit rather than to guess given a latent propensity to omit. Firstly, using a Monte Carlo simulation, the accuracy of the parameters estimated from data with different sample sizes (500, 1,000, and 2,000 subjects), test lengths (20, 40, and 80 items) and percentages of omissions (5, 10, and 15%) were investigated. Later, the appropriateness of the MCMO to the Trends in International Mathematics and Science Study (TIMSS) Advanced 2015 mathematics and physics multiple-choice items was analyzed and compared with the Holman and Glas' Between-item Multi-dimensional IRT model (B-MIRT) and with the three-parameter logistic (3PL) model with omissions treated as incorrect responses. The results of the simulation study showed a good recovery of scale and position parameters. Pseudo-guessing parameters (d) were less accurate, but this inaccuracy did not seem to have an important effect on the estimation of abilities. The precision of the propensity to omit strongly depended on the ability values (the higher the ability, the worse the estimate of the propensity to omit). In the empirical study, the empirical reliability for ability estimates was high in both physics and mathematics. As in the simulation study, the estimates of the propensity to omit were less reliable and their precision varied with ability. Regarding the absolute item fit, the MCMO fitted the data better than the other models. Also, the MCMO offered significant increments in convergent validity between scores from multiple-choice and constructed-response items, with an increase of around 0.02 to 0.04 in R 2 in comparison with the two other methods. Finally, the high correlation between the country means of the propensity to omit in mathematics and physics suggests that (1) the propensity to omit is somehow affected by the country of residence of the examinees, and (2) the propensity to omit is independent of the test contents.

5.

Cheating on unproctored internet test applications: an analysis of a verification test in a Real personnel selection context

Aguado, David; Vidal, Alejandro; Olea, Julio; Ponsoda, Vicente; Barrada, Juan Ramón; Abad, Francisco José.

Span. j. psychol ; 21: e62.1-e62.10, 2018. tab, graf

Article En | IBECS | ID: ibc-189177

This study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test). An initial simulation study showed that the Z-test has adequate Type I error and power rates in the specific selection settings explored. A second study applied the Z-test statistic verification procedure to a sample of 954 employment candidates. Additional external evidence based on item time response to the verification items was gathered. The results revealed a good performance of the Z-test statistic and a relatively low, but non-negligible, number of suspected cheaters that showed higher distorted ability estimates. The study with real data provided additional information on the presence of suspected cheating in unproctored applications and the viability of using item response times as an additional evidence of cheating. In the verification test, suspected cheaters spent 5.78 seconds per item more than expected considering the item difficulty and their assumed ability in the unproctored stage. We found that the percentage of suspected cheaters in the empirical study could be estimated at 13.84%. In summary, the study provides evidence of the usefulness of the Z-test in the detection of cheating in a specific setting, in which a computerized adaptive test for assessing English grammar knowledge was used for personnel selection

No disponible

Humans , Male , Female , Adult , Deception , Educational Measurement/standards , Internet , Personnel Selection/standards

6.

Structural brain connectivity and cognitive ability differences: A multivariate distance matrix regression analysis.

Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto.

Hum Brain Mapp ; 38(2): 803-816, 2017 02.

Article En | MEDLINE | ID: mdl-27726264

Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc.

Brain/diagnostic imaging , Brain/physiology , Cognition/physiology , Multivariate Analysis , Regression Analysis , Adolescent , Brain Mapping , Female , Humans , Imaging, Three-Dimensional , Magnetic Resonance Imaging , Male , Neural Pathways/diagnostic imaging , Neural Pathways/physiology , Neuropsychological Tests , Reproducibility of Results , Young Adult

7.

Revisión del modelo para evaluar la calidad de los tests utilizados en España / Assessing the quality of tests in Spain: revision of the Spanish test review model

Hernández, Ana; Ponsoda, Vicente; Muñiz, José; Prieto, Gerardo; Elosua, Paula.

Pap. psicol ; 37(3): 192-197, sept.-dic. 2016. tab

Article Es | IBECS | ID: ibc-157861

Para usar adecuadamente los tests, es necesario que los profesionales cuenten con información rigurosa de su calidad. Es por ello que, desde hace unos años, se viene aplicando el modelo español de evaluación de la calidad de los tests (Prieto y Muñiz, 2000). El objetivo de este trabajo es actualizar y revisar dicho modelo, con el fin de incorporar las recomendaciones hechas en sus aplicaciones, y para incorporar los avances psicométricos y tecnológicos que se han producido durante los últimos años. El modelo original fue revisado en varias fases, y la revisión originalmente propuesta fue revisada por un conjunto de expertos, lo que dio lugar a la versión final que se describe en este trabajo. Se espera que la aplicación del modelo revisado y la publicación de los resultados correspondientes, contribuya a seguir mejorando el uso de los tests y, con ello, la práctica profesional de la Psicología

For professionals to use tests adequately, they must have rigorous information about the quality of the tests. This is why during the last years, the Spanish test review model (Prieto & Muñiz, 2000) has been applied. The goal of this paper is to update and revise this model in order to incorporate the recommendations given after the original model was applied and, to incorporate the latest psychometric and technological innovations. The original model was revised following different steps, and the revised proposal was reviewed by a number of experts. After incorporating their suggestions, we arrived to the final version, which is described in this paper. With the application of the revised model, and the publication of the corresponding results, we expect to continue improving the use of tests, and consequently, the professional practice of Psychology

Humans , Psychological Tests/statistics & numerical data , Psychometrics/instrumentation , Psychological Techniques/instrumentation , Reproducibility of Results , /methods

8.

A multistage addaptive test of fluid intelligence / Un test adaptativo multietapa de inteligencia fluida

Martín Fernández, Manuel; Ponsoda, Vicente; Olea, Julio; Shih, Pei Chun; Revuelta, Javier.

Psicothema (Oviedo) ; 28(3): 346-352, ago. 2016. tab, ilus, graf

Article En | IBECS | ID: ibc-154633

BACKGROUND: Multistage adaptive testing has recently emerged as an alternative to the computerized adaptive test. The current study details a new multistage test to assess fluid intelligence. METHOD: An item pool of progressive matrices with constructed response format was developed, and divided into six subtests. The subtests were applied to a sample of 724 college students and their psychometric properties were studied (i.e., reliability, dimensionality and validity evidence). The item pool was calibrated under the graded response model, and two multistage structures were developed, based on the automatic test assembly principles. Finally, the test information provided by each structure was compared in order to select the most appropriate one. RESULTS: The item pool showed adequate psychometric properties. From the two compared multistage structures, the simplest structure (i.e., routing test and two modules in the next stages) were more informative across the latent trait continuum and were therefore kept. DISCUSSION: Taken together, the results of the two studies support the application of the FIMT (Fluid Intelligence Multistage Test), a multistage test to assess fluid intelligence accurately and innovatively

ANTECEDENTES: los test adaptativos multietapa han emergido recientemente como una alternativa a los test adaptativos informatizados. Se presenta en este estudio un test multietapa para evaluar la inteligencia fluida. MÉTODO: se desarrolló un banco de ítems de matrices progresivas con formato de respuesta construida que posteriormente fue dividido en seis subtests. Los ítems se administraron a un total de 724 estudiantes universitarios. Se estudiaron las propiedades psicométricas de los subtests (fiabilidad, dimensionalidad, evidencias de validez) y se calibró el banco con el modelo de respuesta graduada. Se construyeron después dos estructuras multietapa a través del ensamblaje automático de tests y se comparó la información proporcionada por cada una de ellas. RESULTADOS: los ítems mostraron unas propiedades psicométricas adecuadas. De las dos estructuras puestas a prueba, se conservó finalmente la estructura sencilla, pues resultó más informativa. DISCUSIÓN: los resultados de estos dos estudios avalan el empleo del FIMT, una herramienta que emplea este formato para evaluar de forma innovadora y precisa la inteligencia fluida

Humans , Intelligence , Intelligence Tests , Psychometrics/instrumentation , Psychological Tests , Mental Processes , Reproducibility of Results , Reproducibility of Results

9.

A multistage adaptive test of fluid intelligence.

Martín-Fernández, Manuel; Ponsoda, Vicente; Olea, Julio; Shih, Pei-Chun; Revuelta, Javier.

Psicothema ; 28(3): 346-52, 2016 Aug.

Article En | MEDLINE | ID: mdl-27448271

BACKGROUND: Multistage adaptive testing has recently emerged as an alternative to the computerized adaptive test. The current study details a new multistage test to assess fluid intelligence. METHOD: An item pool of progressive matrices with constructed response format was developed, and divided into six subtests. The subtests were applied to a sample of 724 college students and their psychometric properties were studied (i.e., reliability, dimensionality and validity evidence). The item pool was calibrated under the graded response model, and two multistage structures were developed, based on the automatic test assembly principles. Finally, the test information provided by each structure was compared in order to select the most appropriate one. RESULTS: The item pool showed adequate psychometric properties. From the two compared multistage structures, the simplest structure (i.e., routing test and two modules in the next stages) were more informative across the latent trait continuum and were therefore kept. DISCUSSION: Taken together, the results of the two studies support the application of the FIMT (Fluid Intelligence Multistage Test), a multistage test to assess fluid intelligence accurately and innovatively.

Intelligence Tests , Adolescent , Adult , Female , Humans , Male , Psychometrics , Young Adult

10.

Traditional scores versus irt estimates on forced-choice tests based on a dominance model / Puntuaciones tradicionales y estimaciones TRI en tests de elección forzosa con un modelo de dominancia

Hontangas, Pedro M; Leenen, Iwing; Torre, Jimmy de la; Ponsoda, Vicente; Morillo, Daniel; Abad, Francisco J.

Psicothema (Oviedo) ; 28(1): 76-82, feb. 2016. tab, graf

Article En | IBECS | ID: ibc-148821

BACKGROUND: Forced-choice tests (FCTs) were proposed to minimize response biases associated with Likert format items. It remains unclear whether scores based on traditional methods for scoring FCTs are appropriate for between-subjects comparisons. Recently, Hontangas et al. (2015) explored the extent to which traditional scoring of FCTs relates to the true scores and IRT estimates. The authors found certain conditions under which traditional scores (TS) can be used with FCTs when the underlying IRT model was an unfolding model. In this study, we examine to what extent the results are preserved when the underlying process becomes a dominance model. METHOD: The independent variables analyzed in a simulation study are: forced-choice format, number of blocks, discrimination of items, polarity of items, variability of intra-block difficulty, range of difficulty, and correlation between dimensions. RESULTS: A similar pattern of results was observed for both models; however, correlations between TS and true thetas are higher and the differences between TS and IRT estimates are less discrepant when a dominance model involved. CONCLUSIONS: A dominance model produces a linear relationship between TS and true scores, and the subjects with extreme thetas are better measured

ANTECEDENTES: los tests de elección forzosa (TEFs) fueron propuestos para reducir los sesgos de respuesta de ítems tipo Likert. Se cuestiona que los métodos de puntuación tradicional (PT) empleados permitan hacer comparaciones entre-sujetos. Recientemente, Hontangas et al. (2015) exploraron cómo las PTs obtenidas con diferentes TEFs se relacionan con sus puntuaciones verdaderas y estimaciones TRI, mostrando las condiciones para ser utilizadas cuando el modelo subyacente es un modelo de unfolding. El objetivo del trabajo actual es comprobar si el patrón de resultados se mantiene con un modelo de dominancia. MÉTODO: las variables independientes del estudio de simulación fueron: formato de elección forzosa, número de bloques, discriminación de los ítems, polaridad de los ítems, variabilidad de la dificultad intrabloque, rango de dificultad del test y correlación entre dimensiones. RESULTADOS: un patrón similar de resultados fue obtenido en ambos modelos, pero en el modelo de dominancia las correlaciones entre PTs y puntuaciones verdaderas son más altas y las diferencias entre PTs y estimaciones TRI se reducen. CONCLUSIONES: un modelo de dominancia produce una relación lineal entre PTs y puntuaciones verdaderas, y los sujetos con puntuaciones extremas son medidos mejor

Humans , Male , Female , Models, Psychological , 28574/classification , 28574/methods , Psychometrics/methods , Psychometrics/trends , Multivariate Analysis , Psychology, Industrial/organization & administration , Psychology, Industrial/standards

11.

Traditional scores versus IRT estimates on forced-choice tests based on a dominance model.

Hontangas, Pedro M; Leenen, Iwin; de la Torre, Jimmy; Ponsoda, Vicente; Morillo, Daniel; Abad, Francisco J.

Psicothema ; 28(1): 76-82, 2016.

Article En | MEDLINE | ID: mdl-26820428

BACKGROUND: Forced-choice tests (FCTs) were proposed to minimize response biases associated with Likert format items. It remains unclear whether scores based on traditional methods for scoring FCTs are appropriate for between-subjects comparisons. Recently, Hontangas et al. (2015) explored the extent to which traditional scoring of FCTs relates to the true scores and IRT estimates. The authors found certain conditions under which traditional scores (TS) can be used with FCTs when the underlying IRT model was an unfolding model. In this study, we examine to what extent the results are preserved when the underlying process becomes a dominance model. METHOD: The independent variables analyzed in a simulation study are: forced-choice format, number of blocks, discrimination of items, polarity of items, variability of intra-block difficulty, range of difficulty, and correlation between dimensions. RESULTS: A similar pattern of results was observed for both models; however, correlations between TS and true thetas are higher and the differences between TS and IRT estimates are less discrepant when a dominance model involved. CONCLUSIONS: A dominance model produces a linear relationship between TS and true scores, and the subjects with extreme thetas are better measured.

Choice Behavior , Models, Psychological , Humans , Psychometrics , Surveys and Questionnaires

12.

Are fit indices really fit to estimate the number of factors with categorical variables? Some cautionary findings via Monte Carlo simulation.

Garrido, Luis Eduardo; Abad, Francisco José; Ponsoda, Vicente.

Psychol Methods ; 21(1): 93-111, 2016 Mar.

Article En | MEDLINE | ID: mdl-26651983

An early step in the process of construct validation consists of establishing the fit of an unrestricted "exploratory" factorial model for a prespecified number of common factors. For this initial unrestricted model, researchers have often recommended and used fit indices to estimate the number of factors to retain. Despite the logical appeal of this approach, little is known about the actual accuracy of fit indices in the estimation of data dimensionality. The present study aimed to reduce this gap by systematically evaluating the performance of 4 commonly used fit indices-the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR)-in the estimation of the number of factors with categorical variables, and comparing it with what is arguably the current golden rule, Horn's (1965) parallel analysis. The results indicate that the CFI and TLI provide nearly identical estimations and are the most accurate fit indices, followed at a step below by the RMSEA, and then by the SRMR, which gives notably poor dimensionality estimates. Difficulties in establishing optimal cutoff values for the fit indices and the general superiority of parallel analysis, however, suggest that applied researchers are better served by complementing their theoretical considerations regarding dimensionality with the estimates provided by the latter method.

Data Interpretation, Statistical , Models, Statistical , Monte Carlo Method , Humans

13.

Expanding the occupational health methodology: A concatenated artificial neural network approach to model the burnout process in Chinese nurses.

Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming.

Ergonomics ; 59(2): 207-21, 2016.

Article En | MEDLINE | ID: mdl-26230967

Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. PRACTITIONER SUMMARY: Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.

Burnout, Professional/psychology , Models, Theoretical , Neural Networks, Computer , Nurses/psychology , Occupational Medicine/methods , Adult , China , Female , Humans , Male , Middle Aged , Monte Carlo Method

14.

A Dominance Variant Under the Multi-Unidimensional Pairwise-Preference Framework: Model Formulation and Markov Chain Monte Carlo Estimation.

Morillo, Daniel; Leenen, Iwin; Abad, Francisco J; Hontangas, Pedro; de la Torre, Jimmy; Ponsoda, Vicente.

Appl Psychol Meas ; 40(7): 500-516, 2016 Oct.

Article En | MEDLINE | ID: mdl-29881066

Forced-choice questionnaires have been proposed as a way to control some response biases associated with traditional questionnaire formats (e.g., Likert-type scales). Whereas classical scoring methods have issues of ipsativity, item response theory (IRT) methods have been claimed to accurately account for the latent trait structure of these instruments. In this article, the authors propose the multi-unidimensional pairwise preference two-parameter logistic (MUPP-2PL) model, a variant within Stark, Chernyshenko, and Drasgow's MUPP framework for items that are assumed to fit a dominance model. They also introduce a Markov Chain Monte Carlo (MCMC) procedure for estimating the model's parameters. The authors present the results of a simulation study, which shows appropriate goodness of recovery in all studied conditions. A comparison of the newly proposed model with a Brown and Maydeu's Thurstonian IRT model led us to the conclusion that both models are theoretically very similar and that the Bayesian estimation procedure of the MUPP-2PL may provide a slightly better recovery of the latent space correlations and a more reliable assessment of the latent trait estimation errors. An application of the model to a real data set shows convergence between the two estimation procedures. However, there is also evidence that the MCMC may be advantageous regarding the item parameters and the latent trait correlations.

15.

Nuevas directrices sobre el uso de los tests: investigación, control de calidad y seguridad / New guidelines for testuse: research, quality control and securtiry of test

Muñiz, José; Hernández, Ana; Ponsoda, Vicente.

Pap. psicol ; 36(3): 161-173, sept.-dic. 2015.

Article Es | IBECS | ID: ibc-144814

Antecedentes: Para llevar a cabo una evaluación psicológica rigurosa es necesario que los profesionales que la realizan tengan una preparación adecuada, que los tests utilizados muestren unas buenas propiedades psicométricas, y que se utilicen de forma correcta. El objetivo de este trabajo es presentar las directrices recientes de la Comisión Internacional de Tests sobre el uso de los tests en tres ámbitos: investigación, control de calidad y seguridad en el manejo de las pruebas. Método: Se revisarán y comentarán los directrices recientes desarrolladas por la Comisión Internacional de Tests. Resultados: Las nuevas directrices sobre el uso de los tests ofrecen todo un conjunto de recomendaciones teórico-prácticas para guiar la utilización adecuada de los tests en contextos de investigación, para desarrollar e implementar procesos de control de calidad efectivos, y para salvaguardar la seguridad de todos los datos implicados en un proceso evaluativo. Conclusiones: Las nuevas directrices desarrolladas por la Comisión Internacional de Tests contribuirán a una adecuada utilización de los tests en contextos de investigación, a una mejora en los procesos de control de calidad de las pruebas, y a garantizar la seguridad en los procesos evaluativos

Background: In order to carry out a rigorous psychological evaluation three conditions must be met: the practitioners must have the appropriate qualifications, the tests must show good psychometric properties, and they must be used correctly. The aim of this paper is to present the recent guidelines developed by the International Test Commission on the use of tests in three areas: research, quality control, and security. Method: The guidelines developed by the International Test Commission will be analysed and discussed. Results: The new guidelines on the use of tests offer a whole range of theoretical and practical recommendations to guide the appropriate use of tests in research settings, in order to develop and implement effective quality control strategies, and to preserve the security of all of the data involved in the assessment process. Conclusions: The new guidelines developed by the International Test Commission will contribute to the correct use of tests in research settings, to an improvement in the quality control of testing, and to ensuring security in the assessment processes

Humans , Psychological Tests/standards , Psychometrics/instrumentation , Patient Safety , Quality Control , Reproducibility of Results , Reproducibility of Results

16.

Comparing Traditional and IRT Scoring of Forced-Choice Tests.

Hontangas, Pedro M; de la Torre, Jimmy; Ponsoda, Vicente; Leenen, Iwin; Morillo, Daniel; Abad, Francisco J.

Appl Psychol Meas ; 39(8): 598-612, 2015 Nov.

Article En | MEDLINE | ID: mdl-29881030

This article explores how traditional scores obtained from different forced-choice (FC) formats relate to their true scores and item response theory (IRT) estimates. Three FC formats are considered from a block of items, and respondents are asked to (a) pick the item that describes them most (PICK), (b) choose the two items that describe them the most and the least (MOLE), or (c) rank all the items in the order of their descriptiveness of the respondents (RANK). The multi-unidimensional pairwise-preference (MUPP) model, which is extended to more than two items per block and different FC formats, is applied to obtain the responses to each item block. Traditional and IRT (i.e., expected a posteriori) scores are computed from each data set and compared. The aim is to clarify the conditions under which simpler traditional scoring procedures for FC formats may be used in place of the more appropriate IRT estimates for the purpose of inter-individual comparisons. Six independent variables are considered: response format, number of items per block, correlation between the dimensions, item discrimination level, and sign-heterogeneity and variability of item difficulty parameters. Results show that the RANK response format outperforms the other formats for both the IRT estimates and traditional scores, although it is only slightly better than the MOLE format. The highest correlations between true and traditional scores are found when the test has a large number of blocks, dimensions assessed are independent, items have high discrimination and highly dispersed location parameters, and the test contains blocks formed by positive and negative items.

17.

The impact of ambiguous response categories on the factor structure of the GHQ-12.

Rey, Juan J; Abad, Francisco J; Barrada, Juan R; Garrido, Luis E; Ponsoda, Vicente.

Psychol Assess ; 26(3): 1021-30, 2014 Sep.

Article En | MEDLINE | ID: mdl-24708083

Previous research has suggested multiple factor structures for the 12-item General Health Questionnaire (GHQ-12), with contradictory evidence arising across different studies on the validity of these models. In the present research, it was hypothesized that these inconsistent findings were due to the interaction of 3 main methodological factors: ambiguous response categories in the negative items, multiple scoring schemes, and inappropriate estimation methods. Using confirmatory factor analysis with appropriate estimation methods and scores obtained from a large (n = 27,674) representative Spanish sample, we tested this hypothesis by evaluating the fit and predictive validities of 4 GHQ-12 factor models-unidimensional, Hankins' (2008a) response bias model, Andrich and Van Schoubroeck's (1989) 2-factor model, and Graetz's (1991) 3-factor model-across 3 scoring methods: standard, corrected, and Likert. In addition, the impact of method effects on the reliability of the global GHQ-12 scores was also evaluated. The combined results of this study support the view that the GHQ-12 is a unidimensional measure that contains spurious multidimensionality under certain scoring schemes (corrected and Likert) as a result of ambiguous response categories in the negative items. Therefore, it is suggested that the items be scored using the standard method and that only a global score be derived from the instrument.

Mental Disorders/diagnosis , Research Design , Adolescent , Adult , Aged , Aged, 80 and over , Factor Analysis, Statistical , Female , Humans , Male , Mental Disorders/psychology , Middle Aged , Reproducibility of Results , Surveys and Questionnaires , Young Adult

18.

Detection of Q-matrix misspecification using two criteria for validation of cognitive structures under the Least Squares Distance Model / Detección de errores de especificación en la matriz Q utilizando dos criterios de validación de estructuras cognitivas con el Modelo de las Distancias Mínimo Cuadráticas (LSDM)

Romero, Sonia J; Ordoñez, Xavier G; Ponsoda, Vicente; Revuelta, Javier.

Psicológica (Valencia, Ed. impr.) ; 35(1): 149-169, 2014. tab, ilus

Article En | IBECS | ID: ibc-118513

Cognitive Diagnostic Models (CDMs) aim to provide information about the degree to which individuals have mastered specific attributes that underlie the success of these individuals on test items. The Q-matrix is a key element in the application of CDMs, because contains links item-attributes representing the cognitive structure proposed for solve the test. Using a simulation study we investigated the performance of two model-fit statistics (MAD and LSD) to detect misspecifications in the Q-matrix within the least squares distance modeling framework. The manipulated test design factors included the number of respondents (300, 500, 1000), attributes (1, 2, 3, 4), and type of model (conjunctive vs disjunctive). We investigated MAD and LSD behavior under correct Q-matrix specification, with Qmisspecifications and in a real data application. The results shows that the two model-fit indexes were sensitive to Q-misspecifications, consequently, cut points were proposed to use in applied context (AU)

Los Modelos de Diagnóstico Cognitivo (MDC) tienen por objeto proporcionar información sobre el grado en que los individuos dominan atributos específicos para resolver correctamente los items de un test. La matriz Q es un elemento clave en la aplicación de los MDC porque contiene vínculos entre items y atributos que representan la estructura cognitiva propuesta para resolver la prueba. Por medio de un estudio de simulación, se determinó el rendimiento de dos estadísticos de ajuste (MAD y LSD) para detectar errores de especificación en la matriz Q dentro del marco del modelo de la distancia mínimo cuadrática. Los factores manipulados en el diseño del test incluyen: número de encuestados (300, 500, 1000), número de atributos (1, 2, 3, 4), y el tipo de modelo (conjuntivo vs disyuntivo). Se investigó el comportamiento de los valores MAD y LSD bajo una correcta especificación de Q, con errores de especificación en Q y en una aplicación de datos reales. Los resultados muestran que los dos índices son sensibles a los errores de especificación de Q, por este motivo se proponen puntos de corte para usar en aplicaciones del Modelo (AU)

Humans , Male , Female , Validation Studies as Topic , Psychological Tests/standards , Cognitive Behavioral Therapy/methods , Cognitive Science/instrumentation , Cognitive Science/methods , Cognitive Science/organization & administration , Cognitive Reserve

19.

Segunda evaluación de tests editados en España / No disponible

Ponsoda, Vicente; Hontangas, Pedro.

Pap. psicol ; 34(2): 82-90, mayo 2013. tab

Article Es | IBECS | ID: ibc-113743

El artículo describe los resultados de la segunda evaluación de tests psicológicos editados en España. La Comisión de Tests del Colegio Oficial de Psicólogos decidió que se evaluasen 12 tests, seleccionados principalmente por su novedad y amplio uso. Cada test ha sido evaluado por dos expertos. Al igual que en la primera evaluación (Muñiz, Fernández-Hermida, Fonseca-Pedrero, Campillo-Álvarez y Peña-Suárez, 2011), los evaluadores hacían su trabajo respondiendo a las preguntas del Cuestionario para la Evaluación de los Tests (Prieto y Muñiz, 2000), que adapta al contexto español el modelo elaborado por la Federación Europea de Asociaciones de Psicólogos Profesionales. De cada test se ha evaluado la calidad de los materiales y documentación, la fiabilidad de sus puntuaciones, la cobertura de los estudios de validación, la calidad de los baremos, etc. Los revisores informaron también acerca de la idoneidad del instrumento y proceso seguido en la evaluación. Se aportan sugerencias que pudieran ser útiles para mejorar las evaluaciones futuras (AU)

This article describes the results of the second evaluation of psychological tests published in Spain. The Committee on Testing of the Spanish Psychological Association agreed on assessing 12 tests, selected mainly for their novelty and wide use. Each test has been evaluated by two experts. As in the first evaluation (Muñiz, Fernández-Hermida, Fonseca-Pedrero, Campillo-Álvarez and Peña-Suárez,2011), assessments were made by responding to the Questionnaire for the Assessment of Tests (Prieto and Muñiz, 2000), which adapts to the Spanish context the assessment model developed by the European Federation of Psychologists Associations. Results are provided in both absolute and relative terms, as they are compared to those of the first evaluation. They refer to the quality of documentation and materials, the coverage of the validation studies, reliability, norms, etc. Reviewers were also asked about the suitability of the instrument and procedure used to conduct the assessment. Suggestions are provided that may be useful to improve next test evaluations (AU)

Humans , Psychological Tests , Sensitivity and Specificity , Psychometrics/instrumentation , Mental Disorders/diagnosis

20.

A new look at Horn's parallel analysis with ordinal variables.

Garrido, Luis Eduardo; Abad, Francisco José; Ponsoda, Vicente.

Psychol Methods ; 18(4): 454-74, 2013 Dec.

Article En | MEDLINE | ID: mdl-23046000

Previous research evaluating the performance of Horn's parallel analysis (PA) factor retention method with ordinal variables has produced unexpected findings. Specifically, PA with Pearson correlations has performed as well as or better than PA with the more theoretically appropriate polychoric correlations. Seeking to clarify these findings, the current study employed a more comprehensive simulation study that included the systematic manipulation of 7 factors related to the data (sample size, factor loading, number of variables per factor, number of factors, factor correlation, number of response categories, and skewness) as well as 3 factors related to the PA method (type of correlation matrix, extraction method, and eigenvalue percentile). The results from the simulation study show that PA with either Pearson or polychoric correlations is particularly sensitive to the sample size, factor loadings, number of variables per factor, and factor correlations. However, whereas PA with polychorics is relatively robust to the skewness of the ordinal variables, PA with Pearson correlations frequently retains difficulty factors and is generally inaccurate with large levels of skewness. In light of these findings, we recommend the use of PA with polychoric correlations for the dimensionality assessment of ordinal-level data.

Data Interpretation, Statistical , Models, Statistical , Animals