Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 141
Filter
1.
Med Teach ; 31(5): 442-6, 2009 May.
Article in English | MEDLINE | ID: mdl-18608946

ABSTRACT

BACKGROUND: High stakes postgraduate specialist certification examinations have considerable implications for candidates' future careers. The cut score i.e. pass/fail mark of such examinations needs to be determined in a defensible and credible manner. A number of methods, suitable for use with numeric scoring methods, have been described. Determining the cut score of letter-graded examinations is, however, not described in the literature. AIM: The aim of this study was to determine a defensible and credible method for deriving the cut score of a letter-graded examination. METHOD: The cut score of the Fellowship examination of the College of Physicians of South Africa was estimated using a novel method. This method was validated by comparing the results obtained to those obtained using the contrasting groups method. RESULTS: By using the examiners' decision as the 'gold standard' we found that a cut score of 50% best approximated the cutpoint of this letter-graded examination, achieving a sensitivity and specificity of 83.7% and 82.8% respectively. CONCLUSION: This paper describes a useful strategy for estimating the cut score of letter-graded examinations.


Subject(s)
Education, Medical, Graduate , Educational Measurement/methods , Clinical Competence/standards , Humans , South Africa , Specialization
2.
Adv Health Sci Educ Theory Pract ; 13(4): 521-33, 2008 Nov.
Article in English | MEDLINE | ID: mdl-17476579

ABSTRACT

High stakes postgraduate specialist certification examinations have considerable implications for the future careers of examinees. Medical colleges and professional boards have a social and professional responsibility to ensure their fitness for purpose. To date there is a paucity of published data about the reliability of specialist certification examinations and objective methods for improvement. Such data are needed to improve current assessment practices and sustain the international credibility of specialist certification processes. To determine the component and composite reliability of the Fellowship examination of the College of Physicians of South Africa, and identify strategies for further improvement, generalizability and multivariate generalizability theory were used to estimate the reliability of examination subcomponents and the overall reliability of the composite examination. Decision studies were used to identify strategies for improving the composition of the examination. Reliability coefficients of the component subtests ranged from 0.58 to 0.64. The composite reliability of the examination was 0.72. This could be increased to 0.8 by weighting all test components equally or increasing the number of patient encounters in the clinical component of the examination. Correlations between examination components were high, suggesting that similar parameters of competence were being assessed. This composite certification examination, if equally weighted, achieved an overall reliability sufficient for high stakes examination purposes. Increasing the weighting of the clinical component decreased the reliability. This could be rectified by increasing the number of patient encounters in the examination. Practical ways of achieving this are suggested.


Subject(s)
Certification , Clinical Competence , Education, Medical , Educational Measurement/methods , Licensure , Specialization , Humans , Reproducibility of Results , South Africa
4.
Qual Life Res ; 16(5): 815-22, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17351823

ABSTRACT

BACKGROUND: Little consensus exists regarding the most appropriate measure of responsiveness. While most indices are variants on Cohen's effect size, the mathematical relationships among these indices have not been elucidated. Consequently, the health-related quality of life (HRQL) literature contains many publications in which a variety of different indices are computed and differences among them noted. These differences are completely predictable when the underlying analytical form of each coefficient is explicated. METHODS: In this paper, we begin with a mathematical analysis of the variance components underlying an observed change score. From this, we determine analytically the relationships among the more commonly used indices of responsiveness. CONCLUSIONS: Based on this analysis, we conclude that Cohen's effect size and the Standardized Response Mean are the two most appropriate measures, as each provides unique information and each best captures an important relation between treatment effect and variability in response. However, the latter should be interpreted with caution, as under some circumstances, any measure based on variability in change scores can give misleading information. On this basis, we recommend that future analysis of responsiveness be restricted to the Cohen effect size to ensure interpretability and comparability with treatment effects in other domains.


Subject(s)
Psychometrics/statistics & numerical data , Quality of Life , Sickness Impact Profile , Treatment Outcome , Analysis of Variance , Biomedical Research , Humans , Models, Statistical , Reproducibility of Results
7.
Med Care ; 39(10): 1039-47, 2001 Oct.
Article in English | MEDLINE | ID: mdl-11567167

ABSTRACT

BACKGROUND: Approaches to interpretation of quality of life changes in clinical trials have fallen into two camps: those that rely on the distribution of changes and the Effect Size (ES), and those that use some external anchor, such as patient judgments of change, which is then used to compute a Minimally Important Difference (MID), the proportion benefiting from treatment, p(B), and the Number Needed to Treat (NNT). OBJECTIVE: To examine the relationship between the ES and p(B), and the impact of the MID on this relationship. METHODS: Simulation was used based on a normal distribution to compute the proportion of patients benefiting in both parallel group and crossover designs, for various values of the ES and the MID. The agreement of the simulation with empirical data from four studies of asthma and respiratory disease was assessed. The effect of skewness in the distributions of change scores on the relationship between ES and p(B) was also examined. RESULTS: The simulation showed a near-linear relationship between ES and p(B), which was nearly independent of the value of the MID. Agreement of the simulation with the empirical data were excellent. Although the curves differed for crossover and parallel group designs, the general form was similar. Introducing moderate skew into the distributions had minimal impact on the relationship. CONCLUSIONS: The proportion of patients who will benefit from treatment can be directly estimated from the ES, and is nearly independent of the choice of MID. Effect size and anchor based approaches provide equivalent information in this situation.


Subject(s)
Chronic Disease/rehabilitation , Models, Statistical , Outcome Assessment, Health Care/methods , Quality of Life , Research Design/statistics & numerical data , Data Interpretation, Statistical , Humans , Lung Diseases, Obstructive/rehabilitation , Quality-Adjusted Life Years , Self Efficacy , Treatment Outcome
8.
Eur Respir J ; 18(1): 38-44, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11510803

ABSTRACT

With interest in health economics growing, there is a demand for valid methods for measuring health-related quality of life (HRQL) in asthma using utilities. The aims of this study were to develop disease-specific versions of the standard gamble and rating scale, to compare their measurement properties with those of the Asthma Quality of Life Questionnaire (AQLQ) and the Medical Outcomes Survey Short-Form 36 (SF-36), as well as to determine their validity for assessing asthma-specific quality of life. Forty adults with symptomatic asthma participated in a 9-week observational study. Participants completed the standard gamble, rating scale, AQLQ, SF-36 and other measures of clinical asthma status at baseline and after 1, 5 and 9 weeks. In patients whose asthma was stable between assessments, reliability was good for the rating scale (intraclass correlation coefficient (ICC)=0.89) and the AQLQ (ICC=0.95) but more modest for the SF-36 mental score (ICC=0.68), SF-36 physical score (ICC=0.65) and standard gamble (ICC=0.59). The responsiveness index was highest in the AQLQ (1.35), followed by the rating scale (0.74), the physical score of the SF-36 (0.61) and the standard gamble (0.31). Construct validity (correlation with other indices of health status) was strongest for the AQLQ and the rating scale. In conclusion, both the disease-specific rating scale and the Asthma Quality of Life Questionnaire have strong measurement properties for measuring asthma-specific quality of life; the Short-Form 36 health survey physical summary score has more modest properties. Although the disease-specific standard gamble has acceptable discriminative properties, its evaluative properties are too inadequate for it to be used in cost/utility analyses. Poor correlation between the standard gamble and the rating scale indicates that utilities cannot be derived from rating scale data.


Subject(s)
Asthma/psychology , Quality of Life , Sickness Impact Profile , Activities of Daily Living/psychology , Adaptation, Psychological , Adolescent , Adult , Aged , Female , Health Status Indicators , Humans , Male , Middle Aged , Psychometrics , Reproducibility of Results , Sick Role
9.
Med Educ ; 35(6): 530-6, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11380854

ABSTRACT

CONTEXT: Tuberculosis is one of the most common infectious diseases worldwide and is responsible for the largest number of deaths from a single infectious cause. OBJECTIVE: The objective of this study was to compare the knowledge of and practices regarding tuberculosis in final-year medical students at schools from endemic and non-endemic areas. SUBJECTS: Final-year medical students at McMaster University in Canada, the Christian Medical College in India, and Makerere University in Uganda. METHODS: A questionnaire consisting of 20 multiple-choice questions assessing knowledge, practices, and exposure. A total knowledge score (maximum=13) and a total practice score (maximum=5) were created for each study site. RESULTS: 160 questionnaires were returned; the response rate was 68.4% (65/95) for McMaster University, 39.7% (23/58) for the Christian Medical College and 78.3% (72/92) for Makerere University. Students from Makerere University had the highest knowledge scores but differences were non-significant after adjustment for patient exposure and curriculum time (F(2,153)= 1.80, P=0.16). Differences in practice scores, however, remained significant after adjusting for curriculum time and patient exposure (F(2,153)=5.14, P=0.006). Knowledge score (F(1,156)=5.05, P=0.02), patient exposure (F(1,153)=9.11, P=0.003), and curriculum time and patient exposure (F(2,153)=5.14, P=0.006) were statistically significant positive predictors of the total practice score. CONCLUSIONS: This study demonstrated significant differences in undergraduate exposure to tuberculosis, total knowledge, and practice competency at three medical schools in Canada, India, and Uganda. In general, the knowledge base and practice competency of all three graduating classes was adequate.


Subject(s)
Clinical Competence/standards , Students, Medical , Tuberculosis , Analysis of Variance , Canada , Curriculum , Health Knowledge, Attitudes, Practice , Humans , India , Surveys and Questionnaires , Uganda
10.
J Clin Epidemiol ; 54(6): 571-9, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11377117

ABSTRACT

Reducing the number of items in a health-related quality of life instrument will enhance efficiency. However, it is important to maintain measurement properties. We determined the effect of reducing items from each domain (dyspnea, fatigue, emotion and mastery) of the 20-item Chronic Respiratory Questionnaire (CRQ). Three randomized trials of respiratory rehabilitation provided data. We removed items one at a time from each domain in three orders: by item impact, item responsiveness, and at random. Responsiveness, test-retest reliability and construct validity were evaluated at each step. Responsiveness and reliability, evaluated by intraclass correlation coefficients (ICC), were reduced marginally as the number of items was reduced to two items per domain. The deterioration was greatest when reducing from two items to one. To detect a particular effect, sample size would increase by about 10% when reducing the number of items in a domain to 2. Construct validity showed a more marked deterioration. Reducing to two items per domain would maintain responsiveness and reliability of the CRQ at an acceptable level, with a trade-off of reduced construct validity and increase in sample size requirements.


Subject(s)
Health Care Surveys/methods , Lung Diseases, Obstructive/therapy , Quality of Life , Respiratory Therapy , Factor Analysis, Statistical , Humans , Quality-Adjusted Life Years , Reproducibility of Results , Surveys and Questionnaires
11.
Teach Learn Med ; 13(2): 110-6, 2001.
Article in English | MEDLINE | ID: mdl-11302031

ABSTRACT

BACKGROUND: Medical diagnosis may be thought of as a categorization task. Research and theory in psychology as well as medical decision making indicate at least 2 processes by which this categorization task may be accomplished: (a) analytic processing, in which one makes explicit use of clinical features to reach a diagnosis, and (b) similarity-based processing, in which one makes use of past exemplars to reach a clinical diagnosis. Recent research indicates that these 2 processes are complementary. PURPOSE: We investigate the coordination of analytic and similarity-based processes in clinical decision making to examine if the relative reliance on these 2 processes is (a) amenable to instruction and (b) dependent on level of clinical experience. METHODS: The reliance of these 2 processes was indexed by the performance of 12 preclinical medical students on cases dichotomized as typical and atypical (analytic processing) and on cases dichotomized as similar or dissimilar to cases seen previously in a training phase (similarity-based processing). RESULTS: The results indicated that both processes are operative. Of particular interest was that preclinical medical students enhanced their performance by adopting a similarity-based strategy. This was especially so for atypical cases. These results are in contrast to residents, who enhanced their performance by adopting an analytic strategy. CONCLUSIONS: The relative reliance on analytic and similarity-based processes is amenable to instruction and dependent on expertise.


Subject(s)
Diagnosis, Differential , Judgment , Skin Diseases/diagnosis , Students, Medical/psychology , Clinical Competence , Decision Making , Education, Medical , Humans , Ontario , Schools, Medical
14.
Med Educ ; 34(9): 721-8, 2000 Sep.
Article in English | MEDLINE | ID: mdl-10972750

ABSTRACT

In a recent review article, Colliver concluded that there was no convincing evidence that problem-based learning was more effective than conventional methods. He then went on to lay part of the blame on cognitive psychology, claiming that 'the theory is weak, its theoretical concepts are imprecise. the basic research is contrived and ad hoc'. This paper challenges these claims and presents evidence that (a) cognitive research is not contrived and irrelevant, (b) curriculum level interventions are doomed to fail and (c) education needs more theory-based research.


Subject(s)
Clinical Competence , Education, Medical, Undergraduate/methods , Problem-Based Learning/standards , Cognition , Curriculum , Evaluation Studies as Topic , Humans , Models, Educational , Research
15.
Fertil Steril ; 74(2): 319-24, 2000 Aug.
Article in English | MEDLINE | ID: mdl-10927051

ABSTRACT

OBJECTIVE: To determine whether a constant-sequence or an alternating-sequence design is better for the evaluation of infertility treatment efficacy when multiple cycles of treatment are undertaken. DESIGN: A simulation exercise using analytical methods. SETTING: University medical center. PATIENT(S): A hypothetical, heterogeneous population of infertile patients participating in a randomized trial comparing an experimental treatment, with effectiveness of 2.0, to no treatment. INTERVENTION(S): Comparison of a constant-sequence design in which the subject receives the same intervention or the alternating-sequence design in which experimental and control treatments are crossed over after each successive cycle. MAIN OUTCOME MEASURE(S): Relative risks of pregnancy per cycle and overall after a maximum of five cycles of treatment. RESULT(S): With both designs, the pregnancy rates in experimental and control groups showed a consistent decrease with each successive cycle. The overall effectiveness in the constant-sequence design was underestimated at 1.83, whereas in the alternating-sequence design it was overestimated at 2.06. However, by restricting the analysis in the latter design only to the odd-numbered cycles, the relative risk was precisely correct at 2.00. CONCLUSION(S): When multiple cycles of treatment are undertaken to evaluate the efficacy of infertility therapy, the alternating-sequence design with restriction of the analysis to only the odd-numbered treatment cycles provides an unbiased estimation of the treatment effect.


Subject(s)
Clinical Trials as Topic , Infertility/therapy , Computer Simulation , Female , Humans , Infertility, Female/therapy , Pregnancy , Pregnancy Rate , Randomized Controlled Trials as Topic
17.
Teach Learn Med ; 12(4): 196-200, 2000.
Article in English | MEDLINE | ID: mdl-11273369

ABSTRACT

BACKGROUND: A challenge for Problem-Based Learning (PBL) schools is to introduce reliable, valid, and cost-effective testing methods into the curriculum in such a way as to maximize the potential benefits of PBL while avoiding problems associated with assessment techniques like multiple-choice question, or MCQ, tests. PURPOSE: We document the continued development of an exam that was designed to satisfy the demands of both PBL and the scientific principles of measurement. METHODS: A total of 102 medical students wrote a clinical reasoning exercise (CRE) as a requirement for two consecutive units of instruction. Each CRE consisted of a series of 18 short clinical problems designed to assess a student's knowledge of the mechanism of diseases that were covered in three subunits located within each unit. Responses were scored by a student's tutor and a 2nd crossover tutor. RESULTS: Generalizability coefficients for raters, subunits, and individual problems were low, but the reliability of the overall test scores and the reliability of the scores across 2 units of instruction were high. Subsequent analyses found that the crossover tutor's ratings were lower than the ratings provided by one's own tutor, and the CRE correlated with the biology component of a progress test. CONCLUSION: The magnitude of the generalizability coefficients demonstrates that the CRE is capable of detecting differences in reasoning across knowledge domains and is therefore a useful evaluation tool.


Subject(s)
Education, Medical, Undergraduate , Educational Measurement/methods , Humans , Medicine , Reproducibility of Results , Specialization
18.
Psychol Sci ; 11(2): 112-7, 2000 Mar.
Article in English | MEDLINE | ID: mdl-11273417

ABSTRACT

Medical students and experts were given head-and-shoulder photographs of patients, each showing a key feature of the patient's problem. Three quarters of these pictures were taken from textbooks. Noticing these supposedly obvious features was difficult and strongly influenced by contextual factors. Both experts and students gained about 20% in diagnostic accuracy by having the key features verbally described for them, although these were clearly visible on the photographs. Conversely, both experts and students reported seeing more of these features when the correct diagnosis was suggested to them. This facilitation resulted from an increase in sensitivity to depicted features, rather than a response bias. The properties of these features that allow such failures of noticing are discussed.


Subject(s)
Attention , Diagnosis , Students, Medical/psychology , Adult , Clinical Competence , Cues , Female , Humans , Male , Mental Recall , Visual Perception
SELECTION OF CITATIONS
SEARCH DETAIL
...