ABSTRACT
Computer-based testing (CBT) has been used in Japan since 2002 to assess medical students' basic and clinical medical knowledge, based on the model core-curriculum, before they start clinical clerkships. For effective CBT, multiplechoice questions must accurately assess the knowledge of students. Questions for CBT are submitted by all medical schools in Japan. However, only 40% of questions are chosen for CBT and used at random; the other 60% of questions are rejected because of poor quality. Toimprove the ability of medical staff to devise questions, workshops were held at 30 medical schools. The acceptance rate of questions from schools where workshops were held was significantly increased. The workshops were extremely effbctive for improving the quality of questions.
ABSTRACT
Data from the first trial of the computer-based nationwide common achievement test in medicine, carried out from February through July in 2002, were analyzed to evaluate the applicability of the item-response theory. The trial test was designed to cover 6 areas of the core curriculum and included a total of 2791 items. For each area, 3 to 40 items were chosen randomly and administered to 5693 students in the fourth to sixth years; the responses of 5676 of these students were analyzed with specifically designed computer systems. Each student was presented with 100 items. The itemresponse patterns were analyzed with a 3-parameter logistic model (item discrimination, item difficulty, and guessing parameter). The main findings were: 1) Item difficulty and the percentage of correct answers were strongly correlated (r=-0.969to-0.982). 2) Item discrimination and the point-biserial correlation were moderately strongly correlated (r=0.304 to 0.511). 3) The estimated abilities and the percentage of correct answers were strongly correlated (r=0.810 to 0.945). 4) The mean ability increased with school year. 5) The correlation coefficients among the 6 curriculum area ability scores were less than 0.6. Because the nationwide common achievement test was designed to randomly present items to each student, the item-response theory can be used to adjust the differences among test sets. The first trial test was designed without considering the item-response theory, but the second trial test was administered with a design better suited for comparison. Results of an analysis of the second trial will be reported soon.
ABSTRACT
The first trial of common achievement test-computer-based testing was held from January through August 2002. The number of examinees was 5, 693, of whom 5, 676 were analyzed. Single-best-answer, five-choice questions were used. The highest score was 92 points, the lowest score was 19 points, and the average score was 55.9±10.2 points (standard deviation). Scores were distributed normally. The test sets did not differ significantly in difficulty, although test-set items differed for each student. The percentage of correct answers, the ∅-coefficient, and the point-biserial correlation coefficient were calculated for each category of the model core curriculum. The percentage of correct answers was highest in category A of the model core curriculum, and percentages of correct answers were similar in categories B, C, D, E, and F. The ∅-coefficient and the correlation coefficient were low in categories A and F and were highest in category C. Although the percentage of correct answers in this trial was lower than expected, many test items had discriminatory power. The Test Items Evaluation Subcommittee is now evaluating test items, determining pool items, and revising new test items for the second trial and expect to compile a useful item bank.
ABSTRACT
In 2002, Japanese medical students began computer-based testing (CBT) to assess their basic and clinical medical knowledge, based on the model core-curriculum, before starting clinical clerkships. Of 9, 919 multiple choice questions submitted by 80 medical schools, 2, 791 were used for CBT and 7, 128 were rejected. To improve the quality of future CBT, we analyzed why questions were rejected. The most commons reasons were difficulty, length, and inappropriate choice of answers. A training course may be needed to improve the ability of medical school staff to devise questions.