Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Educ Psychol Meas ; 84(1): 171-189, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38250503

RESUMO

Multiple imputation (MI) is one of the recommended techniques for handling missing data in ordinal factor analysis models. However, methods for computing MI-based fit indices under ordinal factor analysis models have yet to be developed. In this short note, we introduced the methods of using the standardized root mean squared residual (SRMR) and the root mean square error of approximation (RMSEA) to assess the fit of ordinal factor analysis models with multiply imputed data. Specifically, we described the procedure for computing the MI-based sample estimates and constructing the confidence intervals. Simulation results showed that the proposed methods could yield sufficiently accurate point and interval estimates for both SRMR and RMSEA, especially in conditions with larger sample sizes, less missing data, more response categories, and higher degrees of misfit. Based on the findings, implications and recommendations were discussed.

2.
Perspect Med Educ ; 12(1): 462-471, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37929203

RESUMO

Introduction: The accurate assessment of health professionals' competence is critical for ensuring public health safety and quality of care. Computerized Adaptive Testing (CAT) based on the Item Response Theory (IRT) has the potential to improve measurement accuracy and reduce respondent burden. In this study, we conducted psychometric simulations to develop a CAT for evaluating the candidates' competence of health professionals. Methods: The initial CAT item bank was sourced from the Standardized Competence Test for Clinical Medicine Undergraduates (SCTCMU), a nationwide summative test in China, consisting of 300 multiple-choice items. We randomly selected response data from 2000 Chinese clinical medicine undergraduates for analysis. Two types of analyses were performed: first, evaluating the psychometric properties of all items to meet the requirements of CAT; and second, conducting multiple CAT simulations using both simulated and real response data. Results: The final CAT item bank consisted of 121 items, for which item parameters were calculated using a two-parameter logistic model (2PLM). The CAT simulations, based on both simulated and real data, revealed sufficient marginal reliability (coefficient of marginal reliability above 0.750) and criterion-related validity (Pearson's correlations between CAT scores and aggregate scores of the SCTCMU exceeding 0.850). Discussion: In national-level medical education assessment, there is an increasing need for concise yet valid evaluations of candidates' competence of health professionals. The CAT developed in this study demonstrated satisfactory reliability and validity, offering a more efficient assessment of candidates' competence of health professionals. The psychometric properties of the CAT could lead to shorter test durations, reduced information loss, and a decreased testing burden for participants.


Assuntos
Teste Adaptativo Computadorizado , Pessoal de Saúde , Humanos , Psicometria , Reprodutibilidade dos Testes , Estudantes
3.
Educ Psychol Meas ; 83(5): 984-1006, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37663533

RESUMO

The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven CRF-based imputation equating methods are proposed based on different data augmentation methods. The equating performance of the proposed methods is examined through a simulation study. Five factors are considered: (a) test length (20, 30, 40, 50), (b) sample size per test form (50 versus 100), (c) ratio of common/anchor items (0.2 versus 0.3), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 0.5), and (e) three different types of anchors (random, easy, and hard), resulting in 96 conditions. In addition, five traditional equating methods, (1) Tucker method; (2) Levine observed score method; (3) equipercentile equating method; (4) circle-arc method; and (5) concurrent calibration based on Rasch model, were also considered, plus seven CRF-based imputation equating methods for a total of 12 methods in this study. The findings suggest that benefiting from the advantages of ML techniques, CRF-based methods that incorporate the equating result of the Tucker method, such as IMP_total_Tucker, IMP_pair_Tucker, and IMP_Tucker_cirlce methods, can yield more robust and trustable estimates for the "missingness" in an equating task and therefore result in more accurate equated scores than other counterparts in short-length tests with small samples.

4.
Aging Ment Health ; 27(11): 2238-2247, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37561077

RESUMO

OBJECTIVES: This study draws on conservation of resources theory and transactional stress theory to guide our understanding of how social isolation, financial insecurity, and social support serve as a balance of both risk and protection for late-life depression. METHODS: Data were from the Leave-Behind Questionnaire in the 2016 (N = 4293) and 2018 (N = 4714) waves of the Health and Retirement Study. We conducted a cross-sectional path analysis via structural equation modeling, including objective and subjective perspectives. The same model was tested in both samples. RESULTS: Both social isolation and financial insecurity were associated with depression. We found several mediating risks and protective factors of these relationships. Objective financial status affected depression through both perceived financial insecurity and perceived social isolation, whereas objective isolation affected depression through perceived social support. This mediation model was -significant after adjusting for confounders. CONCLUSION: This study underscores the importance of investigating the balance between risk and protection for depression, in the rising number of older adults aging alone in society. Findings suggest that objective and perceived measures offer unique windows into psychological constructs. Considering both objective and subjective perspectives may provide alternative targets for subsequent interventions to improve mental health in later life.

5.
Educ Psychol Meas ; 83(3): 586-608, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37187692

RESUMO

In the literature of modern psychometric modeling, mostly related to item response theory (IRT), the fit of model is evaluated through known indices, such as χ2, M2, and root mean square error of approximation (RMSEA) for absolute assessments as well as Akaike information criterion (AIC), consistent AIC (CAIC), and Bayesian information criterion (BIC) for relative comparisons. Recent developments show a merging trend of psychometric and machine learnings, yet there remains a gap in the model fit evaluation, specifically the use of the area under curve (AUC). This study focuses on the behaviors of AUC in fitting IRT models. Rounds of simulations were conducted to investigate AUC's appropriateness (e.g., power and Type I error rate) under various conditions. The results show that AUC possessed certain advantages under certain conditions such as high-dimensional structure with two-parameter logistic (2PL) and some three-parameter logistic (3PL) models, while disadvantages were also obvious when the true model is unidimensional. It cautions researchers about the dangers of using AUC solely in evaluating psychometric models.

6.
Adv Health Sci Educ Theory Pract ; 28(4): 1265-1288, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37052739

RESUMO

As one of the indicators reflecting student well-being in medical education practice, student satisfaction is no doubt an important topic. Instead of exploring student satisfaction from the perspectives of education quality and organizational factors, this study focused on student engagement to explore the impact of it on student satisfaction with medical education in China. Student engagement refers to students' actions, efforts and persistence, indicating both time and energy students invested in educationally purposeful activities, especially academic activities. The data used in this study came from the first national survey of clinical undergraduates-the China Medical Student Survey-in which 10,062 clinical medical undergraduates in 33 medical schools participated. We developed a model of medical student engagement and satisfaction and utilized descriptive statistics, ordered logit regression, and path analysis to describe the relationship between medical student engagement and satisfaction. In this study, student engagement was categorized into behavioral, emotional and cognitive dimensions. The findings showed that medical student satisfaction was relatively low and was significantly affected by student satisfaction, especially the behavioral engagement in clinical rotations and professional identity of emotional engagement. These findings could put a supplementary perspective on improving student satisfaction through student engagement, and offer notable implications for future research and practice.


Assuntos
Educação Médica , Estudantes de Medicina , Humanos , Estudantes de Medicina/psicologia , Inquéritos e Questionários , Satisfação Pessoal , China
7.
J Biophotonics ; 16(6): e202200377, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36906736

RESUMO

Analysis of blood species is an extremely important part in customs inspection, forensic investigation, wildlife protection and other fields. In this study, a classification method based on Siamese-like neural network (SNN) for interspecies blood (22 species) was proposed to measure Raman Spectra similarity. The average accuracy was above 99.20% in the test set of spectra (known species) that did not appear in the training set. This model could detect species not represented in the dataset underlying the model. After adding new species to the training set, we can update the training based on the original model without retraining the model from scratch. For species with lower accuracy, SNN model can be trained intensively in the form of enriched training data for that species. A single model can achieve both multiple-classification and binary classification functions. Moreover, SNN showed higher accuracy rates when trained with smaller datasets compared to other methods.


Assuntos
Análise Química do Sangue , Redes Neurais de Computação , Análise Química do Sangue/métodos
8.
Appl Psychol Meas ; 47(1): 64-75, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36425286

RESUMO

Diagnostic classification models (DCMs) have been used to classify examinees into groups based on their possession status of a set of latent traits. In addition to traditional item-based scoring approaches, examinees may be scored based on their completion of a series of small and similar tasks. Those scores are usually considered as count variables. To model count scores, this study proposes a new class of DCMs that uses the negative binomial distribution at its core. We explained the proposed model framework and demonstrated its use through an operational example. Simulation studies were conducted to evaluate the performance of the proposed model and compare it with the Poisson-based DCM.

9.
Front Med (Lausanne) ; 10: 1301356, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38259855

RESUMO

Introduction: This study seeks to explore validity and reliability evidence for core residency entrustable professional activities (CR-EPAs) that were developed by Peking University First Hospital (PKUFH) in 2020. Methods: A prospective cohort study was conducted in PKUFH. Trainers (raters) assessed pediatric residents on CR-EPAs over 1 academic year, bi-annually. Critical components within a validity evidence framework were examined: response process (rater perceptions), the internal structure (reliability and contributions of different variance sources), and consequences (potential use of a cutoff score). Results: In total, 37 residents were enrolled, and 111 and 99 trainers' ratings were collected in Fall 2020 and Spring 2021, respectively. For rater perceptions, all the raters considered CR-EPAs highly operational and convenient. In all ratings, individual EPAs correlate with total EPA moderately, with Spearman correlation coefficients spanning from 0.805 to 0.919. EPA 2 (select and interpret the auxiliary examinations), EPA 5 (prepare and complete medical documents), EPA 6 (provide an oral presentation of a case or a clinical encounter), and EPA 7 (identify and manage the general clinical conditions) were EPAs correlated with other EPAs significantly. The results of the generalizability theory indicated that the variability due to residents is the highest (nearly 78.5%), leading to a large size of the reliability estimates. The matching results indicate that the lowest error locates at 5.933. Conclusion: The rating showed good validity and reliability. The ratings were reliable based on G-theory. CR-EPAs have a magnificent internal structure and have promising consequences. Our results indicate that CR-EPAs are a robust assessment tool in workplace-based training in a carefully designed setting.

10.
Med Educ Online ; 27(1): 2136559, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36250891

RESUMO

Medical education assessments are becoming more complex, resulting in the inappropriateness of traditional methods primarily consisting of direct observations, oral examinations, and multiple-choice tests. Advancements in research methods have led to the formation of new modalities, namely performance assessments, which are, on the other hand, always costly in development and implementation. Proposing using the Program Effectiveness and Cost Generalization flow within an assessment context (PRECOG-A), this brief report explores the real financial cost drivers associated with an assessment case in the context of medical education, presents the steps in bridging the effectiveness with its psychometric properties via cost-effectiveness analysis, and evaluates the two-side outcomes for further evaluation decision-making. Referentially providing a framework to investigators and researchers, the illustration of PRECOG-A in this study outlines instructional guidelines for conducting cost-effectiveness analysis in a performance assessment.


Assuntos
Educação Médica , Análise Custo-Benefício , Humanos , Psicometria
11.
Appl Psychol Meas ; 46(7): 622-639, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36131839

RESUMO

When developing ordinal rating scales, we may include potentially unordered response options such as "Neither Agree nor Disagree," "Neutral," "Don't Know," "No Opinion," or "Hard to Say." To handle responses to a mixture of ordered and unordered options, Huggins-Manley et al. (2018) proposed a class of semi-ordered models under the unidimensional item response theory framework. This study extends the concept of semi-ordered models into the area of diagnostic classification models. Specifically, we propose a flexible framework of semi-ordered DCMs that accommodates most earlier DCMs and allows for analyzing the relationship between those potentially unordered responses and the measured traits. Results from an operational study and two simulation studies show that the proposed framework can incorporate both ordered and non-ordered responses into the estimation of the latent traits and thus provide useful information about both the items and the respondents.

12.
Front Med (Lausanne) ; 9: 921719, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35991657

RESUMO

Background: Analyzing distractor qualities of a pediatrics subject test in a national-level examination is vital in developing high-quality items for the discipline. Yet traditional approaches focus on key answers only and therefore are less informative. The number of distractors can also be parsimonized to improve the item development. Materials and methods: From a pediatrics subject test at the national level, raw responses of 44,332 examines to nineteen multiple-choice questions were analyzed, such that the distractor qualities were evaluated via traditional and advanced methods such as canonical correlation index. Additionally, a simulation study was conducted to investigate the impact of eliminating distractor numbers on reliability. Results: The traditional item analysis showed that most items had acceptable psychometric properties, and two items were flagged for low item difficulty and discrimination. Distractor analysis showed that about one-third of items had poorly functioning distractors based on relatively a low choice frequency (<5%) and a small effect size of distractor discrimination. The simulation study also confirmed that shrinking distractor numbers to 4 was viable. Conclusions: Integrating multiple methods, especially the advanced ones, provides comprehensive evaluations of the item quality. Simulations can help re-consider the decision to set distractor numbers for cost-effectiveness. These proposed methods can improve further development of the pediatrics subject test.

13.
Educ Psychol Meas ; 82(4): 705-718, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35754612

RESUMO

Computing confidence intervals around generalizability coefficients has long been a challenging task in generalizability theory. This is a serious practical problem because generalizability coefficients are often computed from designs where some facets have small sample sizes, and researchers have little guide regarding the trustworthiness of the coefficients. As generalizability theory can be framed to a linear mixed-effect model (LMM), bootstrap and simulation techniques from LMM paradigm can be used to construct the confidence intervals. The purpose of this research is to examine four different LMM-based methods for computing the confidence intervals that have been proposed and to determine their accuracy under six simulated conditions based on the type of test scores (normal, dichotomous, and polytomous data) and data measurement design (p×i×r and p× [i:r]). A bootstrap technique called "parametric methods with spherical random effects" consistently produced more accurate confidence intervals than the three other LMM-based methods. Furthermore, the selected technique was compared with model-based approach to investigate the performance at the levels of variance components via the second simulation study, where the numbers of examines, raters, and items were varied. We conclude with the recommendation generalizability coefficients, the confidence interval should accompany the point estimate.

14.
Educ Psychol Meas ; 82(3): 506-516, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35444338

RESUMO

Assessments with a large amount of small, similar, or often repetitive tasks are being used in educational, neurocognitive, and psychological contexts. For example, respondents are asked to recognize numbers or letters from a large pool of those and the number of correct answers is a count variable. In 1960, George Rasch developed the Rasch Poisson counts model (RPCM) to handle that type of assessment. This article extends the RPCM into the world of diagnostic classification models (DCMs) where a Poisson distribution is applied to traditional DCMs. A framework of Poisson DCMs is proposed and demonstrated through an operational dataset. This study aims to be exploratory with recommendations for future research given in the end.

15.
Front Med (Lausanne) ; 9: 1037897, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36590939

RESUMO

Background: As a high-stake national-level examination administrated before students' clerkship in China, the Standardized Competence Test for Clinical Medicine Undergraduates (SCTCMU) has received much attention from the relevant educational departments and society at large. Investigating SCTCMU's validity and reliability is critical to the national healthcare profession education. Materials and methods: Raw responses from SCTCMU, answered by 44,332 examines of 4th-year undergraduate medical students on 300 multiple-choice items, were used to evaluate the quality of the exam via psychometric methods based on item response theory (IRT). The core assumptions and model-data fit of IRT models were evaluated, as well as the item properties and information functions. Results: The IRT models were fitted to the observed assessment data, where all the required assumptions were met. The IRT analysis showed that most items had acceptable psychometric properties, and the passing score was located close to the lowest measurement error computed from the model outcomes. Conclusion: The proposed modern psychometric method provides a practical and informative approach to calibrating and analyzing medical education assessments. This work showcases a realistic depiction of the IRT analysis process and therefore facilitates the work of applied researchers wanting to conduct, interpret, and report IRT analyses on medical assessments.

16.
Med Educ Online ; 26(1): 1981198, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34569433

RESUMO

The purpose of this scoping review is to update the recent progress of EPAs research in GME, focusing on the topical concern of EPAs effectiveness, and to provide a reference for medical researchers in countries/regions interested in introducing EPAs. Guided by Arksey and O'Malley's framework regarding scoping reviews, the researchers, in January 2021, conducted a search in five databases to ensure the comprehensiveness of the literature. After the predetermined process, 29 articles in total were included in this study. The most common areas for the implementation and evaluation of EPAs were Surgery (n = 7,24.1%), Pediatric (n = 5,17.2%) and Internal medicine (n = 4,13.8%), a result that shows a relatively large change in the research trend of EPAs in the last two years. Prior to 2018, EPAs research focused on internal medicine, psychiatry, family medicine, and primary care. The articles in the category of EPAs implementation and evaluation had four main themes: (1) validation of EPAs (n = 16,55.2%); (2) describing the experience of implementing EPAs (n = 11,37.9%); (3) examining the factors and barriers that influence the implementation and evaluation of EPAs (n = 6,20.6%); and (4) researching the experiences of faculty, interns, and other relevant personnel in using EPAs. Training programs were the most common EPAs implementation setting (n = 26,89.6%); direct observation and evaluation (n = 12,41.4%), and evaluation by scoring reports (n = 5,17.2%) were the two most common means of assessing physicians' EPA levels; 19 papers (65.5%) used faculty evaluation, and nine of these papers also used self-assessment (31.0%); the most frequently used tools in the evaluation of EPAs were mainly researcher-made instruments (n = 37.9%), assessment form (n = 7,24.1%), and mobile application (n = 6,20.7%). Although EPAs occupy an increasingly important place in international medical education, this study concludes that the implementation and diffusion of EPAs on a larger scale is still difficult.


Assuntos
Educação Baseada em Competências , Internato e Residência , Criança , Competência Clínica , Educação de Pós-Graduação em Medicina , Humanos , Medicina Interna/educação
17.
Educ Psychol Meas ; 81(6): 1221-1233, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34565822

RESUMO

The costs of an objective structured clinical examination (OSCE) are of concern to health profession educators globally. As OSCEs are usually designed under generalizability theory (G-theory) framework, this article proposes a machine-learning-based approach to optimize the costs, while maintaining the minimum required generalizability coefficient, a reliability-like index in G-theory. The authors adopted G-theory parameters yielded from an OSCE hosted by a medical school, reproduced the generalizability coefficients to prepare for optimizing manipulations, applied simulated annealing algorithm to calculate the number of facet levels minimizing the associated costs, and conducted the analysis in various conditions via computer simulation. With a given generalizability coefficient, the proposed approach, virtually an instrument of decision-making supports, found the optimal solution for the OSCE such that the associated costs were minimized. The computer simulation results showed how the cost reductions varied with different levels of required generalizability coefficients. Machine learning-based approaches can be used in conjunction with psychometric modeling to help planning assessment tasks more scientifically. The proposed approach is easy to adopt into practice and customize in alignment with specific testing designs. While these results are encouraging, the possible pitfalls such as algorithmic convergences' failure and inadequate cost assumptions should also be avoided.

18.
Front Med (Lausanne) ; 8: 677818, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34124108

RESUMO

Background: Assessing the preparedness of junior doctors to use vancomycin is important in medical education. Preparedness is typically evaluated by self-reported confidence surveys. Materials and Methods: An eight-item vancomycin prescribing confidence questionnaire was developed, piloted, and evaluated. The questionnaire responses were collected from 195 junior doctors and a series of statistical techniques, such as principal component analysis and confirmatory factor analysis, and were implemented to examine the validity and reliability. Results: The principal component analysis supported a one-factor structure, which was fed into a confirmatory factor analysis model resulting in a good fit [comparative fit index (CFI) = 0.99, Tucker-Lewis index (TLI) = 0.99, root mean square error of approximation (RMSEA) = 0.08, standardized root mean square residual (SRMR) = 0.04]. Ordinal-based α was 0.95, and various ωs were all above 0.93, indicating a high reliability level. The questionnaire responses were further proved to be robust to extreme response patterns via item response tree modeling. Jonckheere-Terpstra test results (z = 6.5237, p = 3.429e-11) showed that vancomycin prescribing confidence differed based on the experience in order (i.e., four ordinal independent groups: "≤10 times," "11-20 times," "21-30 times," and "≥31 times") and therefore provided external validity evidences for the questionnaire. Conclusions: The questionnaire is valid and reliable such that teaching hospitals can consider using it to assess junior doctors' vancomycin prescribing confidence. Further investigation of the questionnaire can point to the relationship between the prescribing confidence and the actual performance.

19.
Int J Behav Dev ; 45(2): 179-192, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33664535

RESUMO

This study investigates the performance of robust ML estimators when fitting and evaluating small sample latent growth models (LGM) with non-normal missing data. Results showed that the robust ML methods could be used to account for non-normality even when the sample size is very small (e.g., N < 100). Among the robust ML estimators, "MLR" was the optimal choice, as it was found to be robust to both non-normality and missing data while also yielding more accurate standard error estimates and growth parameter coverage. However, the choice "MLMV" produced the most accurate p values for the Chi-square test statistic under conditions studied. Regarding the goodness of fit indices, as sample size decreased, all three fit indices studied (i.e., CFI, RMSEA, and SRMR) exhibited worse fit. When the sample size was very small (e.g., N < 60), the fit indices would imply that a proposed model fit poorly, when this might not be actually the case in the population.

20.
Appl Psychol Meas ; 45(2): 95-111, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33627916

RESUMO

Despite the increasing popularity, cognitive diagnosis models have been criticized for limited utility for small samples. In this study, the authors proposed to use Bayes modal (BM) estimation and monotonic constraints to stabilize item parameter estimation and facilitate person classification in small samples based on the generalized deterministic input noisy "and" gate (G-DINA) model. Both simulation study and real data analysis were used to assess the utility of the BM estimation and monotonic constraints. Results showed that in small samples, (a) the G-DINA model with BM estimation is more likely to converge successfully, (b) when prior distributions are specified reasonably, and monotonicity is not violated, the BM estimation with monotonicity tends to produce more stable item parameter estimates and more accurate person classification, and (c) the G-DINA model using the BM estimation with monotonicity is less likely to overfit the data and shows higher predictive power.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA