Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Adv Health Sci Educ Theory Pract ; 24(1): 141-150, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30362027

RESUMO

Research suggests that the three-option format is optimal for multiple choice questions (MCQs). This conclusion is supported by numerous studies showing that most distractors (i.e., incorrect answers) are selected by so few examinees that they are essentially nonfunctional. However, nearly all studies have defined a distractor as nonfunctional if it is selected by fewer than 5% of examinees. A limitation of this definition is that the proportion of examinees available to choose a distractor depends on overall item difficulty. This is especially problematic for mastery tests, which consist of items that most examinees are expected to answer correctly. Based on the traditional definition of nonfunctional, a five-option MCQ answered correctly by greater than 90% of examinees will be constrained to have only one functional distractor. The primary purpose of the present study was to evaluate an index of nonfunctional that is sensitive to item difficulty. A secondary purpose was to extend previous research by studying distractor functionality within the context of professionally-developed credentialing tests. Data were analyzed for 840 MCQs consisting of five options per item. Results based on the traditional definition of nonfunctional were consistent with previous research indicating that most MCQs had one or two functional distractors. In contrast, the newly proposed index indicated that nearly half (47.3%) of all items had three or four functional distractors. Implications for item and test development are discussed.


Assuntos
Educação Médica/métodos , Educação Médica/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Comportamento de Escolha , Humanos , Modelos Estatísticos , Psicometria
2.
Med Teach ; 41(8): 854-861, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31017518

RESUMO

A test blueprint describes the key elements of a test, including the content to be covered, the amount of emphasis allocated to each content area, and other important features. This article offers practical guidelines for developing test blueprints. We first discuss the role of learning outcomes and behavioral objectives in test blueprinting, and then describe a four-stage process for creating test blueprints. The steps include identifying the major knowledge and skill domains (i.e. competencies); delineating the specific assessment objectives; determining the method of assessment to address those objectives; and establishing the amount of emphasis to allocate to each knowledge or skill domain. The article refers to and provides examples of numerous test blueprints for a wide variety of knowledge and skill domains. We conclude by discussing the role of test blueprinting in test score validation, and by summarizing some of the other ways that test blueprints support instruction and assessment.


Assuntos
Lista de Checagem/métodos , Avaliação Educacional/métodos , Conhecimento , Competência Clínica , Currículo , Educação de Graduação em Medicina , Humanos
3.
Adv Health Sci Educ Theory Pract ; 17(3): 325-37, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21964951

RESUMO

Examinees who initially fail and later repeat an SP-based clinical skills exam typically exhibit large score gains on their second attempt, suggesting the possibility that examinees were not well measured on one of those attempts. This study evaluates score precision for examinees who repeated an SP-based clinical skills test administered as part of the US Medical Licensing Examination sequence. Generalizability theory was used as the basis for computing conditional standard errors of measurement (SEM) for individual examinees. Conditional SEMs were computed for approximately 60,000 single-take examinees and 5,000 repeat examinees who completed the Step 2 Clinical Skills Examination(®) between 2007 and 2009. The study focused exclusively on ratings of communication and interpersonal skills. Conditional SEMs for single-take and repeat examinees were nearly indistinguishable across most of the score scale. US graduates and IMGs were measured with equal levels of precision at all score levels, as were examinees with differing levels of skill speaking English. There was no evidence that examinees with the largest score changes were measured poorly on either their first or second attempt. The large score increases for repeat examinees on this SP-based exam probably cannot be attributed to unexpectedly large errors of measurement.


Assuntos
Competência Clínica/normas , Avaliação Educacional/normas , Exame Físico , Comunicação , Humanos , Licenciamento , Simulação de Paciente , Estudantes de Medicina , Estados Unidos
4.
Adv Health Sci Educ Theory Pract ; 15(4): 587-600, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20127509

RESUMO

The use of standardized patients to assess communication skills is now an essential part of assessing a physician's readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.


Assuntos
Comunicação , Interpretação Estatística de Dados , Relações Médico-Paciente , Médicos/psicologia , Análise de Variância , Educação Médica , Avaliação Educacional , Escolaridade , Humanos , Análise dos Mínimos Quadrados , Modelos Lineares , Análise de Regressão
5.
Educ Psychol Meas ; 80(1): 67-90, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31933493

RESUMO

Conventional methods for evaluating the utility of subscores rely on traditional indices of reliability and on correlations among subscores. One limitation of correlational methods is that they do not explicitly consider variation in subtest means. An exception is an index of score profile reliability designated as G , which quantifies the ratio of true score profile variance to observed score profile variance. G has been shown to be more sensitive than correlational methods to group differences in score profile utility. However, it is a group average, representing the expected value over a population of examinees. Just as score reliability varies across individuals and subgroups, one can expect that the reliability of score profiles will vary across examinees. This article proposes two conditional indices of score profile utility grounded in multivariate generalizability theory. The first is based on the ratio of observed profile variance to the profile variance that can be attributed to random error. The second quantifies the proportion of observed variability in a score profile that can be attributed to true score profile variance. The article describes the indices, illustrates their use with two empirical examples, and evaluates their properties with simulated data. The results suggest that the proposed estimators of profile error variance are consistent with the known error in simulated score profiles and that they provide information beyond that provided by traditional measures of subscore utility. The simulation study suggests that artificially large values of the indices could occur for about 5% to 8% of examinees. The article concludes by suggesting possible applications of the indices and discusses avenues for further research.

6.
Acad Med ; 93(5): 781-785, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-28930764

RESUMO

PURPOSE: In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. METHOD: A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. RESULTS: Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. CONCLUSIONS: Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.


Assuntos
Cardiologia/educação , Educação Médica/métodos , Avaliação Educacional/métodos , Auscultação Cardíaca/métodos , Treinamento por Simulação/estatística & dados numéricos , Adulto , Competência Clínica , Feminino , Humanos , Licenciamento em Medicina , Masculino , Multimídia , Leitura , Treinamento por Simulação/métodos , Estados Unidos
7.
Eval Health Prof ; 40(2): 151-158, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-27760879

RESUMO

This study evaluated the extent to which medical students with limited English-language experience are differentially impacted by the additional reading load of test items consisting of long clinical vignettes. Participants included 25,012 examinees who completed Step 2 of the U.S. Medical Licensing Examination®. Test items were categorized into five levels based on the number of words per item, and examinee scores at each level were evaluated as a function of English-language experience (English as a second language [ESL] status and scores on a test of English-speaking proficiency). The longest items were more difficult than the shortest items across all examinee groups, and examinees with more English-language experience scored higher than those with less experience across all five levels of word count. The effect of primary interest-the interaction of word count with English-language experience-was statistically significant, indicating that score declines for longer items were larger for examinees with less English-language experience; however, the magnitude of this interaction effect was barely detectable (η2 = .0004, p < .001). Additional analyses supported the conclusion that the differential effect for examinees with less English-language experience was small but worthy of continued monitoring.


Assuntos
Avaliação Educacional/métodos , Avaliação Educacional/estatística & dados numéricos , Idioma , Estudantes de Medicina/estatística & dados numéricos , Competência Clínica , Humanos
8.
Acad Med ; 92(4): 448-454, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28351062

RESUMO

One challenge when implementing case-based learning, and other approaches to contextualized learning, is determining which clinical problems to include. This article illustrates how health care utilization data, readily available from the National Center for Health Statistics (NCHS), can be incorporated into an educational needs assessment to identify medical problems physicians are likely to encounter in clinical practice. The NCHS survey data summarize patient demographics, diagnoses, and interventions for tens of thousands of patients seen in various settings, including emergency departments (EDs), clinics, and hospitals.Selected data from the National Hospital Ambulatory Medical Care Survey: Emergency Department illustrate how instructional materials can be derived from the results of such public-use health care data. Using fever as the reason for visit to the ED, the patient management path is depicted in the form of a case drill-down by exploring the most common diagnoses, blood tests, diagnostic studies, procedures, and medications associated with fever.Although these types of data are quite useful, they should not serve as the sole basis for determining which instructional cases to include. Additional sources of information should be considered to ensure the inclusion of cases that represent infrequent but high-impact problems and those that illustrate fundamental principles that generalize to other cases.


Assuntos
Bases de Dados Factuais , Educação Médica/métodos , Pesquisas sobre Atenção à Saúde , Serviços de Saúde/estatística & dados numéricos , Aprendizagem Baseada em Problemas/métodos , Instituições de Assistência Ambulatorial , Currículo , Serviço Hospitalar de Emergência , Hospitalização , Humanos , National Center for Health Statistics, U.S. , Avaliação das Necessidades , Aceitação pelo Paciente de Cuidados de Saúde , Estados Unidos
9.
Int J Radiat Oncol Biol Phys ; 53(3): 729-34, 2002 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12062619

RESUMO

PURPOSE: To determine if graduates of different types of educational programs obtain similar scores on the Examination in Radiation Therapy administered by the American Registry of Radiologic Technologists. The results will help inform discussions regarding educational requirements for radiation therapists. METHODS AND MATERIALS: Test scores were obtained for 531 candidates who had taken the examination for the first time in 1997, 1998, or 1999. Candidates were divided into the following three categories, based on the type of educational program attended: hospital-based certificate, associate's degree, or bachelor's degree. To determine if test scores were related to the type of educational preparation, analyses of variance were conducted separately to test for differences in total scores and section scores, and scores on test questions intended to measure critical thinking skills. RESULTS: Candidates with an associate's degree scored slightly lower than candidates with a bachelor's degree on the total test (p < 0.10) and lower than candidates with either a certificate or bachelor's degree on Section B of the examination (Treatment Planning and Delivery, p < 0.10). Baccalaureate candidates did not obtain higher scores than those prepared in certificate programs. On critical thinking questions, candidates with certificates scored higher than those with associate's degrees (p < 0.10). Some evidence suggested that candidates with a certificate scored higher on critical thinking than those with a bachelor's degree (p < 0.10), and that candidates with a bachelor's degree scored higher than candidates with an associate's degree (p < 0.10). CONCLUSIONS: Although some of the differences in the mean test scores among the three educational groups were statistically significant, all differences were small and do not support one type of educational preparation over another.


Assuntos
Certificação/estatística & dados numéricos , Avaliação Educacional/estatística & dados numéricos , Escolaridade , Tecnologia Radiológica/educação , Análise de Variância , Certificação/normas , Avaliação Educacional/normas , Humanos , Tecnologia Radiológica/normas , Pensamento
10.
Int J Radiat Oncol Biol Phys ; 56(5): 1405-13, 2003 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-12873687

RESUMO

PURPOSE: To determine whether radiation therapy department administrators prefer to hire graduates with certain types of educational preparation. The study was undertaken by the American Registry of Radiologic Technologists as part of a larger project to determine educational requirements for radiation therapists. METHODS AND MATERIALS: Forty-one department administrators evaluated applications from a pool of 984 hypothetical applicants for the position of radiation therapist. Applications were created by systematically varying eight characteristics such as years of experience, quality of educational program, and ratings from prior references. Type of educational program (baccalaureate degree, associate's degree, or hospital certificate) was of particular interest in this study. Each administrator evaluated 24 applications and assigned a rating ranging from 1 to 5 to indicate the extent to which he or she desired to interview each applicant. All ratings and applicant characteristics were coded and subjected to regression-type analyses to determine the relative importance of each applicant characteristic to administrators' decision-making policies. RESULTS: Information obtained from applicant references had the greatest impact on administrators' evaluations of applicant quality. Specifically, reference ratings of cooperation and technical skills were the two most important characteristics, followed closely by reference ratings of interpersonal skills and dependability. Quality of educational program had some influence, as did years of experience. Type of educational program had virtually no impact on interview decisions for a vast majority of the administrators. CONCLUSIONS: When making hiring decisions about hypothetical applicants, department administrators place most emphasis on evidence relating to past performance and give almost no weight to type of educational preparation. The extent to which these results generalize to actual applicants is addressed in the article.


Assuntos
Certificação , Escolaridade , Radioterapia , Tecnologia Radiológica/educação , Seguimentos , Humanos , Seleção de Pessoal , Tecnologia Radiológica/normas
11.
J Allied Health ; 33(2): 95-103, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15239407

RESUMO

As the practice of cardiovascular interventional technology (CVIT) has evolved over the last 50 years, so has the role of radiographers employed in this specialty. In 1991, the American Registry of Radiologic Technologists (ARRT) initiated a certification program to recognize radiologic technologists practicing in CVIT. The certification program consisted of a single examination that covered all aspects of CVIT (e.g., neurologic, cardiac, genitourinary). In 2000, the ARRT conducted a study to investigate further the nature of subspecialization occurring within CVIT. A comprehensive job analysis questionnaire was developed that consisted of 137 clinical activities organized into 19 general domains of practice. The questionnaire was completed by a national sample of 848 radiologic technologists working in CVIT, who indicated the frequency with which they performed each of the 137 activities. Responses were subjected to cluster analysis to classify technologists into homogeneous groups corresponding to different CVIT subspecialties. Results indicated that CVIT consists of two major subspecialties: one corresponding to cardiac procedures and one corresponding to procedures involving organ systems other than the heart. Other smaller subspecialties also emerged from the cluster analysis. A multidimensional scaling of the profiles suggested that CVIT subspecialization can be explained by two dimensions: (1) whether the procedures are diagnostic or interventional and (2) the type of organ system involved. The findings are discussed in terms of their implications for education, certification, and performance evaluation.


Assuntos
Doenças Cardiovasculares/diagnóstico por imagem , Medicina/estatística & dados numéricos , Especialização , Análise por Conglomerados , Humanos , Radiografia Intervencionista , Inquéritos e Questionários , Estados Unidos
12.
Acad Med ; 88(5): 688-92, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23524920

RESUMO

PURPOSE: Previous studies on standardized patient (SP) exams reported score gains both across attempts when examinees failed and retook the exam and over multiple SP encounters within a single exam session. The authors analyzed the within-session score gains of examinees who repeated the United States Medical Licensing Examination Step 2 Clinical Skills to answer two questions: How much do scores increase within a session? Can the pattern of increasing first-attempt scores account for across-session score gains? METHOD: Data included encounter-level scores for 2,165 U.S. and Canadian medical students and graduates who took Step 2 Clinical Skills twice between April 1, 2005 and December 31, 2010. The authors modeled examinees' score patterns using smoothing and regression techniques and applied statistical tests to determine whether the patterns were the same or different across attempts. In addition, they tested whether any across-session score gains could be explained by the first-attempt within-session score trajectory. RESULTS: For the first and second attempts, the authors attributed examinees' within-session score gains to a pattern of score increases over the first three to six SP encounters followed by a leveling off. Model predictions revealed that the authors could not attribute the across-session score gains to the first-attempt within-session score gains. CONCLUSIONS: The within-session score gains over the first three to six SP encounters of both attempts indicate that there is a temporary "warm-up" effect on performance that "resets" between attempts. Across-session gains are not due to this warm-up effect and likely reflect true improvement in performance.


Assuntos
Avaliação Educacional/métodos , Licenciamento em Medicina , Exame Físico/normas , Canadá , Competência Clínica/normas , Competência Clínica/estatística & dados numéricos , Avaliação Educacional/normas , Avaliação Educacional/estatística & dados numéricos , Humanos , Modelos Estatísticos , Análise de Regressão , Estados Unidos
13.
Acad Med ; 86(10 Suppl): S59-62, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21955771

RESUMO

BACKGROUND: Studies completed over the past decade suggest the presence of a gap between what students learn during medical school and their clinical responsibilities as first-year residents. The purpose of this survey was to verify on a large scale the responsibilities of residents during their initial months of training. METHOD: Practice analysis surveys were mailed in September 2009 to 1,104 residency programs for distribution to an estimated 8,793 first-year residents. Surveys were returned by 3,003 residents from 672 programs; 2,523 surveys met inclusion criteria and were analyzed. RESULTS: New residents performed a wide range of activities, from routine but important communications (obtain informed consent) to complex procedures (thoracentesis), often without the attending physician present or otherwise involved. CONCLUSIONS: Medical school curricula and the content of competence assessments prior to residency should consider more thorough coverage of the complex knowledge and skills required early in residency.


Assuntos
Internato e Residência , Prática Profissional , Comunicação , Coleta de Dados , Estados Unidos
14.
Acad Med ; 86(10): 1253-9, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21869669

RESUMO

PURPOSE: Prior studies report large score gains for examinees who fail and later repeat standardized patient (SP) assessments. Although research indicates that score gains on SP exams cannot be attributed to memorizing previous cases, no studies have investigated the empirical validity of scores for repeat examinees. This report compares single-take and repeat examinees in terms of both internal (construct) validity and external (criterion-related) validity. METHOD: Data consisted of test scores for examinees who took the United States Medical Licensing Examination Step 2 Clinical Skills (CS) exam between July 16, 2007, and September 12, 2009. The sample included 12,090 examinees who completed Step 2 CS on one occasion and another 4,030 examinees who completed the exam on two occasions. The internal measures included four separately scored performance domains of the Step 2 CS examination, whereas the external measures consisted of scores on three written assessments of medical knowledge (Step 1, Step 2 clinical knowledge, and Step 3). The authors subjected the four Step 2 CS domains to confirmatory factor analysis and evaluated correlations between Step 2 CS scores and the three written assessments for single-take and repeat examinees. RESULTS: The factor structure for repeat examinees on their first attempt was markedly different from the factor structure for single-take examinees, but it became more similar to that for single-take examinees by their second attempt. Scores on the second attempt correlated more highly with all three external measures. CONCLUSIONS: The findings support the validity of scores for repeat examinees on their second attempt.


Assuntos
Competência Clínica , Educação de Graduação em Medicina/normas , Avaliação Educacional/métodos , Licenciamento em Medicina/normas , Exame Físico/normas , Feminino , Humanos , Masculino , Simulação de Paciente , Reprodutibilidade dos Testes , Estudos Retrospectivos , Estados Unidos
15.
Eval Health Prof ; 33(3): 386-403, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20801978

RESUMO

Years of research with high-stakes written tests indicates that although repeat examinees typically experience score gains between their first and subsequent attempts, their pass rates remain considerably lower than pass rates for first-time examinees. This outcome is consistent with expectations. Comparable studies of the performance of repeat examinees on oral examinations are lacking. The current research evaluated pass rates for more than 50,000 examinees on written and oral exams administered by six medical specialty boards for several recent years. Pass rates for first-time examinees were similar for both written and oral exams, averaging about 84% across all boards. Pass rates for repeat examinees on written exams were expectedly lower, ranging from 22% to 51%, with an average of 36%. However, pass rates for repeat examinees on oral exams were markedly higher than for written exams, ranging from 53% to 77%, with an average of 65%. Four explanations for the elevated repeat pass rates on oral exams are proposed, including an increase in examinee proficiency, construct-irrelevant variance, measurement error (score unreliability), and memorization of test content. Simulated data are used to demonstrate that roughly one third of the score increase can be explained by measurement error alone. The authors suggest that a substantial portion of the score increase can also likely be attributed to construct-irrelevant variance. Results are discussed in terms of their implications for making pass-fail decisions when retesting is allowed. The article concludes by identifying areas for future research.


Assuntos
Competência Clínica/estatística & dados numéricos , Avaliação Educacional/estatística & dados numéricos , Licenciamento em Medicina/estatística & dados numéricos , Conselhos de Especialidade Profissional/estatística & dados numéricos , Estudantes de Medicina/estatística & dados numéricos , Redação , Competência Clínica/normas , Escolaridade , Humanos , Psicometria , Análise de Regressão , Conselhos de Especialidade Profissional/normas , Análise e Desempenho de Tarefas , Estados Unidos
16.
Acad Med ; 84(10 Suppl): S83-5, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19907394

RESUMO

BACKGROUND: Previous research has shown that ratings of English proficiency on the United States Medical Licensing Examination Clinical Skills Examination are highly reliable. However, the score distributions for native and nonnative speakers of English are sufficiently different to suggest that reliability should be investigated separately for each group. METHOD: Generalizability theory was used to obtain reliability indices separately for native and nonnative speakers of English (N = 29,084). Conditional standard errors of measurement were also obtained for both groups to evaluate measurement precision for each group at specific score levels. RESULTS: Overall indices of reliability (phi) exceeded 0.90 for both native and nonnative speakers, and both groups were measured with nearly equal precision throughout the score distribution. However, measurement precision decreased at lower levels of proficiency for all examinees. CONCLUSIONS: The results of this and future studies may be helpful in understanding and minimizing sources of measurement error at particular regions of the score distribution.


Assuntos
Competência Clínica , Avaliação Educacional , Idioma , Licenciamento em Medicina , Competência Clínica/estatística & dados numéricos , Avaliação Educacional/estatística & dados numéricos , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA