Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Am J Pharm Educ ; 87(10): 100081, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37852684

RESUMEN

OBJECTIVE: Automatic item generation (AIG) is a new area of assessment research where a set of multiple-choice questions (MCQs) are created using models and computer technology. Although successfully demonstrated in medicine and dentistry, AIG has not been implemented in pharmacy. The objective was to implement AIG to create a set of MCQs appropriate for inclusion in a summative, high-stakes, pharmacy examination. METHODS: A 3-step process, well evidenced in AIG research, was employed to create the pharmacy MCQs. The first step was developing a cognitive model based on content within the examination blueprint. Second, an item model was developed based on the cognitive model. A process of systematic distractor generation was also incorporated to optimize distractor plausibility. Third, we used computer technology to assemble a set of test items based on the cognitive and item models. A sample of generated items was assessed for quality against Gierl and Lai's 8 guidelines of item quality. RESULTS: More than 15,000 MCQs were generated to measure knowledge and skill of patient assessment and treatment of nausea and/or vomiting within the scope of clinical pharmacy. A sample of generated items satisfies the requirements of content-related validity and quality after substantive review. CONCLUSION: This research demonstrates the AIG process is a viable strategy for creating a test item bank to provide MCQs appropriate for inclusion in a pharmacy licensing examination.


Asunto(s)
Educación de Pregrado en Medicina , Educación en Farmacia , Farmacia , Humanos , Evaluación Educacional , Computadores
2.
Teach Learn Med ; : 1-11, 2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36106359

RESUMEN

Issue: Automatic item generation is a method for creating medical items using an automated, technological solution. Automatic item generation is a contemporary method that can scale the item development process for production of large numbers of new items, support building of multiple forms, and allow rapid responses to changing medical content guidelines and threats to test security. The purpose of this analysis is to describe three sources of validation evidence that are required when producing high-quality medical licensure test items to ensure evidence for valid test score inferences, using the automatic item generation methodology for test development. Evidence: Generated items are used to make inferences about examinees' medical knowledge, skills, and competencies. We present three sources of evidence required to evaluate the quality of the generated items that is necessary to ensure the generated items measure the intended knowledge, skills, and competencies. The sources of evidence we present here relate to the item definition, the item development process, and the item quality review. An item is defined as an explicit set of properties that include the parameters, constraints, and instructions used to elicit a response from the examinee. This definition allows for a critique of the input used for automatic item generation. The item development process is evaluated using a validation table, whose purpose is to support verification of the assumptions related to model specification made by the subject-matter expert. This table provides a succinct summary of the content and constraints that were used to create new items. The item quality review is used to evaluate the statistical quality of the generated items, which often focuses on the difficulty and the discrimination of the correct and incorrect options. Implications: Automatic item generation is an increasingly popular item development method. The generated items from this process must be bolstered by evidence to ensure the items measure the intended knowledge, skills, and competencies. The purpose of this analysis is to describe these sources of evidence that can be used to evaluate the quality of the generated items. The important role of medical expertise in the development and evaluation of the generated items is highlighted as a crucial requirement for producing validation evidence.

3.
Nurse Educ Pract ; 54: 103085, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34087578

RESUMEN

Nursing students' higher-level thinking skills are ideally assessed through constructed-response items. At the baccalaureate level in North America, however, this exam format has largely fallen into disuse owing to the labor-intensive process of scoring written exam papers. The authors sought to determine if automated essay scoring (AES) would be an efficient and reliable alternative to human scoring. Four constructed-response exam items were administered to an initial cohort of 359 undergraduate nursing students in 2016 and to a second cohort of 40 students in 2018. The items were graded by two human raters (HR1 & HR2) and an AES software platform. AES approximated or surpassed agreement and reliability measures achieved by the HR1 and HR2 with each other, and AES surpassed both human raters in efficiency. A list of answer keywords was created to increase the efficiency and reliability of AES. Low agreement between human raters may be explained by rater drift and fatigue, and shortcomings in the development of Item 1 may have reduced its overall agreement and reliability measures. It can be concluded that AES is a reliable and cost-effective means of scoring constructed-response nursing examinations, but further studies employing greater sample sizes are needed to establish this definitively.


Asunto(s)
Bachillerato en Enfermería , Estudiantes de Enfermería , Evaluación Educacional , Humanos , Reproducibilidad de los Resultados , Pensamiento
4.
Front Psychol ; 10: 825, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31133911

RESUMEN

Writing a high-quality, multiple-choice test item is a complex process. Creating plausible but incorrect options for each item poses significant challenges for the content specialist because this task is often undertaken without implementing a systematic method. In the current study, we describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students' misconceptions. These misconceptions are extracted from the labeled written responses. One thousand five hundred and fifteen written responses from an existing constructed-response item in Biology from Grade 10 students were used to demonstrate the method. Using a topic modeling procedure commonly used with machine learning and natural language processing called latent dirichlet allocation, 22 plausible misconceptions from students' written responses were identified and used to produce a list of plausible distractors based on students' responses. These distractors, in turn, were used as part of new multiple-choice items. Implications for item development are discussed.

5.
Appl Psychol Meas ; 42(1): 42-57, 2018 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-29881111

RESUMEN

Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain.

6.
J Dent Educ ; 80(3): 339-47, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26933110

RESUMEN

Test items created for dentistry examinations are often individually written by content experts. This approach to item development is expensive because it requires the time and effort of many content experts but yields relatively few items. The aim of this study was to describe and illustrate how items can be generated using a systematic approach. Automatic item generation (AIG) is an alternative method that allows a small number of content experts to produce large numbers of items by integrating their domain expertise with computer technology. This article describes and illustrates how three modeling approaches to item content-item cloning, cognitive modeling, and image-anchored modeling-can be used to generate large numbers of multiple-choice test items for examinations in dentistry. Test items can be generated by combining the expertise of two content specialists with technology supported by AIG. A total of 5,467 new items were created during this study. From substitution of item content, to modeling appropriate responses based upon a cognitive model of correct responses, to generating items linked to specific graphical findings, AIG has the potential for meeting increasing demands for test items. Further, the methods described in this study can be generalized and applied to many other item types. Future research applications for AIG in dental education are discussed.


Asunto(s)
Educación en Odontología , Evaluación Educacional/métodos , Algoritmos , Antibacterianos/uso terapéutico , Competencia Clínica , Cognición , Educación Basada en Competencias , Metodologías Computacionales , Evaluación Educacional/normas , Humanos , Modelos Educacionales , Solución de Problemas , Radiografía Panorámica , Radiología/educación
7.
Med Teach ; 38(8): 838-43, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26998566

RESUMEN

With the recent interest in competency-based education, educators are being challenged to develop more assessment opportunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice questions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties comparable to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cognitive models.


Asunto(s)
Educación de Pregrado en Medicina , Evaluación Educacional/métodos , Evaluación Educacional/normas , Modelos Psicológicos , Educación Basada en Competencias , Humanos
8.
Teach Learn Med ; 28(2): 166-73, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26849247

RESUMEN

UNLABELLED: CONSTRUCT: Automatic item generation (AIG) is an alternative method for producing large numbers of test items that integrate cognitive modeling with computer technology to systematically generate multiple-choice questions (MCQs). The purpose of our study is to describe and validate a method of generating plausible but incorrect distractors. Initial applications of AIG demonstrated its effectiveness in producing test items. However, expert review of the initial items identified a key limitation where the generation of implausible incorrect options, or distractors, might limit the applicability of items in real testing situations. BACKGROUND: Medical educators require development of test items in large quantities to facilitate the continual assessment of student knowledge. Traditional item development processes are time-consuming and resource intensive. Studies have validated the quality of generated items through content expert review. However, no study has yet documented how generated items perform in a test administration. Moreover, no study has yet to validate AIG through student responses to generated test items. APPROACH: To validate our refined AIG method in generating plausible distractors, we collected psychometric evidence from a field test of the generated test items. A three-step process was used to generate test items in the area of jaundice. At least 455 Canadian and international medical graduates responded to each of the 13 generated items embedded in a high-stake exam administration. Item difficulty, discrimination, and index of discrimination estimates were calculated for the correct option as well as each distractor. RESULTS: Item analysis results for the correct options suggest that the generated items measured candidate performances across a range of ability levels while providing a consistent level of discrimination for each item. Results for the distractors reveal that the generated items differentiated the low- from the high-performing candidates. CONCLUSIONS: Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.


Asunto(s)
Instrucción por Computador/métodos , Educación de Pregrado en Medicina/métodos , Evaluación Educacional/métodos , Mejoramiento de la Calidad , Automatización , Humanos , Ictericia/diagnóstico , Ictericia/terapia , Modelos Educacionales , Psicometría
9.
Eval Health Prof ; 39(1): 100-13, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26377072

RESUMEN

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised machine learning techniques were employed for developing the scoring models. In Stage 3, responses to six English and French CDM questions were scored using the scoring models from Stage 2. Of the 8,007 English and French CDM responses, 7,643 were accurately scored with an agreement rate of 95.4% between human and computer scoring. This result serves as an improvement of 5.4% when compared with the human inter-rater reliability. Our framework yielded scores similar to those of expert physician markers and could be used for clinical competency assessment.


Asunto(s)
Competencia Clínica , Evaluación Educacional/métodos , Evaluación Educacional/normas , Procesamiento Automatizado de Datos/normas , Traducción , Canadá , Toma de Decisiones Clínicas , Humanos , Licencia Médica , Reproducibilidad de los Resultados
10.
Med Educ ; 48(10): 950-62, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25200016

RESUMEN

CONTEXT: Constructed-response tasks, which range from short-answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students' ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed-response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as 'automated essay scoring' (AES). METHODS: An AES system uses a computer program that builds a scoring model by extracting linguistic features from a constructed-response prompt that has been pre-scored by human raters and then, using machine learning algorithms, maps the linguistic features to the human scores so that the computer can be used to classify (i.e. score or grade) the responses of a new group of students. The accuracy of the score classification can be evaluated using different measures of agreement. RESULTS: Automated essay scoring provides a method for scoring constructed-response tests that complements the current use of selected-response testing in medical education. The method can serve medical educators by providing the summative scores required for high-stakes testing. It can also serve medical students by providing them with detailed feedback as part of a formative assessment process. CONCLUSIONS: Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks.


Asunto(s)
Instrucción por Computador/tendencias , Educación Médica/métodos , Educación Médica/tendencias , Evaluación Educacional/métodos , Programas Informáticos , Competencia Clínica , Humanos , Escritura
11.
J Nurs Meas ; 22(1): 145-63, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24851670

RESUMEN

BACKGROUND AND PURPOSE: Conceptual research utilization (CRU) is one indicator of an optimum practice environment that leads to improved patient and organizational outcomes. Yet, its measurement has not been adequately addressed. In this study, we investigated precision of scores obtained with a new CRU scale using item response theory (IRT) methods. METHODS: We analyzed the responses from 1,349 health care aides from 30 Canadian nursing homes using Samejima's (1969, 1996) graded response model (GRM). RESULTS: Findings suggest that the CRU scale is most precise at low to average trait levels with significantly less precision at higher trait levels. CONCLUSIONS: The scale showed acceptable precision at low to average trait levels. New items and/or different response options that capture higher trait levels are needed. Future development of the scale is discussed.


Asunto(s)
Técnicos Medios en Salud/psicología , Investigación Biomédica , Casas de Salud , Adulto , Anciano , Canadá , Análisis Factorial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Teóricos , Psicometría , Mejoramiento de la Calidad , Reproducibilidad de los Resultados , Encuestas y Cuestionarios
12.
Adv Health Sci Educ Theory Pract ; 19(4): 497-506, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24449122

RESUMEN

Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving learners are more likely to use. The purpose of this study was to determine if limiting checklist items to clinically discriminating items and/or adding missing evidence-based items improved score reliability in an Internal Medicine residency OSCE. Six internists reviewed the traditional checklists of four OSCE stations classifying items as clinically discriminating or non-discriminating. Two independent reviewers augmented checklists with missing evidence-based items. We used generalizability theory to calculate overall reliability of faculty observer checklist scores from 45 first and second-year residents and predict how many 10-item stations would be required to reach a Phi coefficient of 0.8. Removing clinically non-discriminating items from the traditional checklist did not affect the number of stations (15) required to reach a Phi of 0.8 with 10 items. Focusing the checklist on only evidence-based clinically discriminating items increased test score reliability, needing 11 stations instead of 15 to reach 0.8; adding missing evidence-based clinically discriminating items to the traditional checklist modestly improved reliability (needing 14 instead of 15 stations). Checklists composed of evidence-based clinically discriminating items improved the reliability of checklist scores and reduced the number of stations needed for acceptable reliability. Educators should give preference to evidence-based items over non-evidence-based items when developing OSCE checklists.


Asunto(s)
Lista de Verificación , Competencia Clínica/normas , Educación de Postgrado en Medicina , Práctica Clínica Basada en la Evidencia/normas , Medicina Interna/normas , Internado y Residencia/normas , Examen Físico/normas , Canadá , Evaluación Educacional/métodos , Humanos , Reproducibilidad de los Resultados , Estudiantes de Medicina
13.
Med Educ ; 47(7): 726-33, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23746162

RESUMEN

OBJECTIVES: Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review. METHODS: Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items. RESULTS: Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%. CONCLUSIONS: Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible distractors than the traditional method, medical experts cannot consistently distinguish AIG items from traditionally developed items in a blind review.


Asunto(s)
Instrucción por Computador/normas , Educación de Pregrado en Medicina/métodos , Evaluación Educacional/normas , Instrucción por Computador/métodos , Educación de Pregrado en Medicina/normas , Evaluación Educacional/métodos , Humanos , Modelos Educacionales , Proyectos de Investigación , Encuestas y Cuestionarios
14.
Nurs Res Pract ; 2013: 156782, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23401759

RESUMEN

Background and Purpose. In this paper, we present a protocol for advanced psychometric assessments of surveys based on the Standards for Educational and Psychological Testing. We use the Alberta Context Tool (ACT) as an exemplar survey to which this protocol can be applied. Methods. Data mapping, acceptability, reliability, and validity are addressed. Acceptability is assessed with missing data frequencies and the time required to complete the survey. Reliability is assessed with internal consistency coefficients and information functions. A unitary approach to validity consisting of accumulating evidence based on instrument content, response processes, internal structure, and relations to other variables is taken. We also address assessing performance of survey data when aggregated to higher levels (e.g., nursing unit). Discussion. In this paper we present a protocol for advanced psychometric assessment of survey data using the Alberta Context Tool (ACT) as an exemplar survey; application of the protocol to the ACT survey is underway. Psychometric assessment of any survey is essential to obtaining reliable and valid research findings. This protocol can be adapted for use with any nursing survey.

15.
Med Educ ; 46(8): 757-65, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22803753

RESUMEN

CONTEXT: Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. METHODS: We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. RESULTS: To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. CONCLUSIONS: Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically.


Asunto(s)
Competencia Clínica , Educación de Pregrado en Medicina/métodos , Evaluación Educacional/métodos , Cirugía General/educación , Instrucción por Computador/métodos , Instrucción por Computador/normas , Computadores , Educación de Pregrado en Medicina/normas , Evaluación Educacional/normas , Humanos , Modelos Educacionales
16.
BMC Health Serv Res ; 11: 107, 2011 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-21595888

RESUMEN

BACKGROUND: There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale). METHODS: We used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression. RESULTS: Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression. CONCLUSIONS: The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.


Asunto(s)
Evaluación Educacional/normas , Investigación sobre Servicios de Salud/estadística & datos numéricos , Psicometría/normas , Adulto , Anciano , Australia , Canadá , Interpretación Estadística de Datos , Evaluación Educacional/estadística & datos numéricos , Escolaridad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Análisis Multivariante , Análisis de Componente Principal , Pruebas Psicológicas , Psicometría/estadística & datos numéricos , Análisis de Regresión , Reproducibilidad de los Resultados , Estadística como Asunto , Suecia , Reino Unido , Estados Unidos , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA