Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-37665413

RESUMO

Recent advances in automated scoring technology have made it practical to replace multiple-choice questions (MCQs) with short-answer questions (SAQs) in large-scale, high-stakes assessments. However, most previous research comparing these formats has used small examinee samples testing under low-stakes conditions. Additionally, previous studies have not reported on the time required to respond to the two item types. This study compares the difficulty, discrimination, and time requirements for the two formats when examinees responded as part of a large-scale, high-stakes assessment. Seventy-one MCQs were converted to SAQs. These matched items were randomly assigned to examinees completing a high-stakes assessment of internal medicine. No examinee saw the same item in both formats. Items administered in the SAQ format were generally more difficult than items in the MCQ format. The discrimination index for SAQs was modestly higher than that for MCQs and response times were substantially higher for SAQs. These results support the interchangeability of MCQs and SAQs. When it is important that the examinee generate the response rather than selecting it, SAQs may be preferred. The results relating to difficulty and discrimination reported in this paper are consistent with those of previous studies. The results on the relative time requirements for the two formats suggest that with a fixed testing time fewer SAQs can be administered, this limitation more than makes up for the higher discrimination that has been reported for SAQs. We additionally examine the extent to which increased difficulty may directly impact the discrimination of SAQs.

2.
J Biomed Inform ; 98: 103268, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31421211

RESUMO

OBJECTIVE: The assessment of written medical examinations is a tedious and expensive process, requiring significant amounts of time from medical experts. Our objective was to develop a natural language processing (NLP) system that can expedite the assessment of unstructured answers in medical examinations by automatically identifying relevant concepts in the examinee responses. MATERIALS AND METHODS: Our NLP system, Intelligent Clinical Text Evaluator (INCITE), is semi-supervised in nature. Learning from a limited set of fully annotated examples, it sequentially applies a series of customized text comparison and similarity functions to determine if a text span represents an entry in a given reference standard. Combinations of fuzzy matching and set intersection-based methods capture inexact matches and also fragmented concepts. Customizable, dynamic similarity-based matching thresholds allow the system to be tailored for examinee responses of different lengths. RESULTS: INCITE achieved an average F1-score of 0.89 (precision = 0.87, recall = 0.91) against human annotations over held-out evaluation data. Fuzzy text matching, dynamic thresholding and the incorporation of supervision using annotated data resulted in the biggest jumps in performances. DISCUSSION: Long and non-standard expressions are difficult for INCITE to detect, but the problem is mitigated by the use of dynamic thresholding (i.e., varying the similarity threshold for a text span to be considered a match). Annotation variations within exams and disagreements between annotators were the primary causes for false positives. Small amounts of annotated data can significantly improve system performance. CONCLUSIONS: The high performance and interpretability of INCITE will likely significantly aid the assessment process and also help mitigate the impact of manual assessment inconsistencies.


Assuntos
Educação Médica/métodos , Educação Médica/normas , Avaliação Educacional/métodos , Licenciamento em Medicina/normas , Processamento de Linguagem Natural , Faculdades de Medicina , Algoritmos , Competência Clínica/normas , Coleta de Dados , Curadoria de Dados/métodos , Lógica Fuzzy , Humanos , Prontuários Médicos , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Software , Unified Medical Language System
3.
Eval Health Prof ; 45(4): 327-340, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34753326

RESUMO

One of the most challenging aspects of writing multiple-choice test questions is identifying plausible incorrect response options-i.e., distractors. To help with this task, a procedure is introduced that can mine existing item banks for potential distractors by considering the similarities between a new item's stem and answer and the stems and response options for items in the bank. This approach uses natural language processing to measure similarity and requires a substantial pool of items for constructing the generating model. The procedure is demonstrated with data from the United States Medical Licensing Examination (USMLE®). For about half the items in the study, at least one of the top three system-produced candidates matched a human-produced distractor exactly; and for about one quarter of the items, two of the top three candidates matched human-produced distractors. A study was conducted in which a sample of system-produced candidates were shown to 10 experienced item writers. Overall, participants thought about 81% of the candidates were on topic and 56% would help human item writers with the task of writing distractors.


Assuntos
Avaliação Educacional , Processamento de Linguagem Natural , Humanos , Estados Unidos , Avaliação Educacional/métodos
4.
Acad Med ; 81(10 Suppl): S56-60, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17001137

RESUMO

BACKGROUND: Multivariate generalizability analysis was used to investigate the performance of a commonly used clinical evaluation tool. METHOD: Practicing physicians were trained to use the mini-Clinical Skills Examination (CEX) rating form to rate performances from the United States Medical Licensing Examination Step 2 Clinical Skills examination. RESULTS: Differences in rater stringency made the greatest contribution to measurement error; more raters rating each examinee, even on fewer occasions, could enhance score stability. Substantial correlated error across the competencies suggests that decisions about one scale unduly influence those on others. CONCLUSIONS: Given the appearance of a halo effect across competencies, score interpretations that assume assessment of distinct dimensions of clinical performance should be made with caution. If the intention is to produce a single composite score by combining results across competencies, the presence of these effects may be less critical.


Assuntos
Competência Clínica/normas , Avaliação Educacional/métodos , Exame Físico/métodos , Software , Análise de Variância , Humanos , Entrevistas como Assunto
5.
Eval Health Prof ; 27(4): 369-82, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15492048

RESUMO

A primary question that must be resolved in the development of tasks to assess the quality of physicians' clinical judgment is, "What is the outcome variable?" One natural choice would seem to be the correctness of the clinical decision. In this article, we use data on the diagnosis of urinary tract infections among young girls to illustrate why, in many clinical situations, this is not a useful variable. We propose instead a judgment weighted by the relative costs of an error. This variable has the disadvantage of requiring expert judgment for scoring, but the advantage of measuring the construct of interest.


Assuntos
Competência Clínica , Diagnóstico , Julgamento , Médicos , Custos e Análise de Custo , Feminino , Humanos , Lactente , Análise de Regressão , Infecções Urinárias/diagnóstico
6.
Acad Med ; 86(10 Suppl): S59-62, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21955771

RESUMO

BACKGROUND: Studies completed over the past decade suggest the presence of a gap between what students learn during medical school and their clinical responsibilities as first-year residents. The purpose of this survey was to verify on a large scale the responsibilities of residents during their initial months of training. METHOD: Practice analysis surveys were mailed in September 2009 to 1,104 residency programs for distribution to an estimated 8,793 first-year residents. Surveys were returned by 3,003 residents from 672 programs; 2,523 surveys met inclusion criteria and were analyzed. RESULTS: New residents performed a wide range of activities, from routine but important communications (obtain informed consent) to complex procedures (thoracentesis), often without the attending physician present or otherwise involved. CONCLUSIONS: Medical school curricula and the content of competence assessments prior to residency should consider more thorough coverage of the complex knowledge and skills required early in residency.


Assuntos
Internato e Residência , Prática Profissional , Comunicação , Coleta de Dados , Estados Unidos
7.
Acad Med ; 86(10 Suppl): S63-7; quiz S68, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21955772

RESUMO

BACKGROUND: Multisource feedback can provide a comprehensive picture of a medical trainee's performance. The utility of a multisource feedback system could be undermined by lack of direct observation and accurate knowledge. METHOD: The National Board of Medical Examiners conducted a national survey of medical students, interns, residents, chief residents, and fellows to learn the extent to which certain behaviors were observed, to examine beliefs about knowledge of each other's performance, and to assess feedback. RESULTS: Increased direct observation is associated with the perception of more accurate knowledge, which is associated with increased feedback. Some evaluators provide feedback in the absence of accurate knowledge of a trainee's performance, and others who have accurate knowledge miss opportunities for feedback. CONCLUSIONS: Direct observation is a key component of an effective multisource feedback system. Medical educators and residency directors may be well advised to establish explicit criteria specifying a minimum number of observations for evaluations.


Assuntos
Avaliação Educacional/métodos , Retroalimentação , Coleta de Dados , Internato e Residência , Estudantes de Medicina , Estados Unidos
8.
J Contin Educ Health Prof ; 29(4): 220-34, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19998445

RESUMO

INTRODUCTION: Deficiencies in physician competence play an important role in medical errors and poor-quality health care. National trends toward implementation of continuous assessment of physicians hold potential for significant impact on patient care because minor deficiencies can be identified before patient safety is threatened. However, the availability of assessment methods and the quality of existing tools vary, and a better understanding of the types of deficiencies seen in physicians is required to prioritize the development and enhancement of assessment and remediation methods. METHODS: Surveys of physicians and licensing authorities and analysis of the Federation of State Medical Boards (FSMB) Board Action Data Bank were used to collect information describing the nature and types of problems seen in practicing physicians. Focus groups, depth interviews with key professional stakeholders, and state medical board site visits provided additional information about deficiencies in physician competence. RESULTS: Quantitative and qualitative analyses identified (1) communication skills as a priority target for assessment approaches that also should focus on professional behaviors, knowledge, clinical judgment, and health-care quality; and (2) differences between regulatory approaches of licensing and certifying bodies contribute to a culture that limits effective self-assessment and continuous quality improvement. System problems impacting physician performance emerged as an important theme in the qualitative analysis. DISCUSSION: Considering alternative perspectives from the regulatory, education, and practice communities helps to define assessment priorities for physicians, facilitating development of a coherent and defensible approach to assessment and continuing professional development that promises to provide a more comprehensive solution to problems of health-care quality in the United States.


Assuntos
Competência Clínica , Educação Médica Continuada , Avaliação Educacional , Licenciamento em Medicina , Certificação , Coleta de Dados , Grupos Focais , Entrevistas como Assunto , Médicos , Controle Social Formal , Governo Estadual , Estados Unidos
9.
Acad Med ; 84(10 Suppl): S86-9, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19907395

RESUMO

BACKGROUND: In clinical skills, closely related skills are often combined to form a composite score. For example, history-taking and physical examination scores are typically combined. Interestingly, there is relatively little research to support this practice. METHOD: Multivariate generalizability theory was employed to examine the relationship between history-taking and physical examination scores from the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills examination. These two proficiencies are currently combined into a data-gathering score. RESULTS: The physical examination score is less generalizable than the score for history taking, and there is only a modest to moderate relationship between these two proficiencies. CONCLUSIONS: A decision about combining physical examination and history-taking proficiencies into one composite score, as well as the weighting of these components, should be driven by the intended use of the score. The choice of weights in combining physical examination and history taking makes a substantial difference in the precision of the resulting score.


Assuntos
Competência Clínica , Avaliação Educacional , Licenciamento em Medicina , Anamnese , Exame Físico , Análise Multivariada , Estados Unidos
10.
Acad Med ; 83(10 Suppl): S41-4, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18820498

RESUMO

BACKGROUND: This research examined various sources of measurement error in the documentation score component of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills examination. METHOD: A generalizability theory framework was employed to examine the documentation ratings for 847 examinees who completed the USMLE Step 2 Clinical Skills examination during an eight-day period in 2006. Each patient note was scored by two different raters allowing for a persons-crossed-with-raters-nested-in-cases design. RESULTS: The results suggest that inconsistent performance on the part of raters makes a substantially greater contribution to measurement error than case specificity. Double scoring the notes significantly increases precision. CONCLUSIONS: The results provide guidance for improving operational scoring of the patient notes. Double scoring of the notes may produce an increase in the precision of measurement equivalent to that achieved by lengthening the test by more than 50%. The study also cautions researchers that when examining sources of measurement error, inappropriate data-collection designs may result in inaccurate inferences.


Assuntos
Competência Clínica , Licenciamento em Medicina , Estudos de Coortes , Comunicação , Generalização Psicológica , Humanos , Variações Dependentes do Observador , Simulação de Paciente , Exame Físico , Relações Médico-Paciente , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA