RESUMO
BACKGROUND: Unicompartmental knee replacements (UKRs) have become an increasingly attractive option for end-stage single-compartment knee osteoarthritis (OA). However, there remains controversy in patient selection. Natural language processing (NLP) is a form of artificial intelligence (AI). We aimed to determine whether general-purpose open-source natural language programs can make decisions regarding a patient's suitability for a total knee replacement (TKR) or a UKR and how confident AI NLP programs are in surgical decision making. METHODS: We conducted a case-based cohort study using data from a separate study, where participants (73 surgeons and AI NLP programs) were presented with 32 fictitious clinical case scenarios that simulated patients with predominantly medial knee OA who would require surgery. Using the overall UKR/TKR judgments of the 73 experienced knee surgeons as the gold standard reference, we calculated the sensitivity, specificity, and positive predictive value of AI NLP programs to identify whether a patient should undergo UKR. RESULTS: There was disagreement between the surgeons and ChatGPT in only five scenarios (15.6%). With the 73 surgeons' decision as the gold standard, the sensitivity of ChatGPT in determining whether a patient should undergo UKR was 0.91 (95% confidence interval (CI): 0.71 to 0.98). The positive predictive value for ChatGPT was 0.87 (95% CI: 0.72 to 0.94). ChatGPT was more confident in its UKR decision making (surgeon mean confidence = 1.7, ChatGPT mean confidence = 2.4). CONCLUSIONS: It has been demonstrated that ChatGPT can make surgical decisions, and exceeded the confidence of experienced knee surgeons with substantial inter-rater agreement when deciding whether a patient was most appropriate for a UKR.
RESUMO
OBJECTIVES: Point-of-care tests (POCTs) for infection offer accurate rapid diagnostics but do not consistently improve antibiotic stewardship (ASP) of suspected ventilator-associated pneumonia. We aimed to measure the effect of a negative PCR-POCT result on intensive care unit (ICU) clinicians' antibiotic decisions and the additional effects of patient trajectory and cognitive-behavioural factors (clinician intuition, dis/interest in POCT, risk averseness). DESIGN: Observational cohort simulation study. SETTING: ICU. PARTICIPANTS: 70 ICU consultants/trainees working in UK-based teaching hospitals. METHODS: Clinicians saw four case vignettes describing patients who had completed a course of antibiotics for respiratory infection. Vignettes comprised clinical and biological data (ie, white cell count, C reactive protein), varied to create four trajectories: clinico-biological improvement (the 'improvement' case), clinico-biological worsening ('worsening'), clinical improvement/biological worsening ('discordant clin better'), clinical worsening/biological improvement ('discordant clin worse'). Based on this, clinicians made an initial antibiotics decision (stop/continue) and rated confidence (6-point Likert scale). A PCR-based POCT was then offered, which clinicians could accept or decline. All clinicians (including those who declined) were shown the result, which was negative. Clinicians updated their antibiotics decision and confidence. MEASURES: Antibiotics decisions and confidence were compared pre-POCT versus post-POCT, per vignette. RESULTS: A negative POCT result increased the proportion of stop decisions (54% pre-POCT vs 70% post-POCT, χ2(1)=25.82, p<0.001, w=0.32) in all vignettes except improvement (already high), most notably in discordant clin worse (49% pre-POCT vs 74% post-POCT). In a linear regression, factors that significantly reduced clinicians' inclination to stop antibiotics were a worsening trajectory (b=-0.73 (-1.33, -0.14), p=0.015), initial confidence in continuing (b=0.66 (0.56, 0.76), p<0.001) and involuntary receipt of POCT results (clinicians who accepted the POCT were more inclined to stop than clinicians who declined it, b=1.30 (0.58, 2.02), p<0.001). Clinician risk averseness was not found to influence antibiotic decisions (b=-0.01 (-0.12, 0.10), p=0.872). CONCLUSIONS: A negative PCR-POCT result can encourage antibiotic cessation in ICU, notably in cases of clinical worsening (where the inclination might otherwise be to continue). This effect may be reduced by high clinician confidence to continue and/or disinterest in POCT, perhaps due to low trust/perceived utility. Such cognitive-behavioural and trajectorial factors warrant greater consideration in future ASP study design.
Assuntos
Antibacterianos , Testes de Diagnóstico Rápido , Humanos , Antibacterianos/uso terapêutico , Testes Imediatos , Reação em Cadeia da Polimerase , Unidades de Terapia Intensiva , CogniçãoRESUMO
BACKGROUND: The 'STARWAVe' clinical prediction rule (CPR) uses seven factors to guide risk assessment and antibiotic prescribing in children with cough (Short illness duration, Temperature, Age, Recession, Wheeze, Asthma, Vomiting). AIM: To assess the influence of STARWAVe factors on GPs' unaided risk assessments and prescribing decisions. DESIGN AND SETTING: Clinical vignettes administered to 188 UK GPs online. METHOD: GPs were randomly assigned to view 32 (out of a possible 64) vignettes online depicting children with cough. The vignettes comprised the seven STARWAVe factors, which were varied systematically. For each vignette, GPs assessed risk of deterioration in one of two ways (sliding-scale versus risk-category selection) and indicated whether they would prescribe antibiotics. Finally, GPs saw an additional vignette, suggesting that the parent was concerned. Mixed-effects regressions were used to measure the influence of STARWAVe factors, risk-elicitation method, and parental concern on GPs' assessments and decisions. RESULTS: Six STARWAVe risk factors correctly increased GPs' risk assessments (bssliding-scale≥0.66, odds ratios [ORs]category-selection≥1.75, Ps≤0.001), whereas one incorrectly reduced them (short illness duration: b sliding-scale -0.30, ORcategory-selection 0.80, P≤0.039). Conversely, one STARWAVe factor increased prescribing odds (temperature: OR 5.22, P<0.001), whereas the rest either reduced them (short illness duration, age, and recession: ORs≤0.70, Ps<0.001) or had no significant impact (wheeze, asthma, and vomiting: Ps≥0.065). Parental concern increased risk assessments (b sliding-scale 1.29, ORcategory-selection 2.82, P≤0.003) but not prescribing odds (P = 0.378). CONCLUSION: GPs use some, but not all, STARWAVe factors when making unaided risk assessments and prescribing decisions. Such discrepancies must be considered when introducing CPRs to clinical practice.
Assuntos
Antibacterianos , Tosse , Criança , Humanos , Antibacterianos/uso terapêutico , Tosse/tratamento farmacológico , Regras de Decisão Clínica , Padrões de Prática Médica , Atitude do Pessoal de SaúdeRESUMO
OBJECTIVE: Physicians' low adoption of diagnostic decision aids (DDAs) may be partially due to concerns about patient/public perceptions. We investigated how the UK public views DDA use and factors affecting perceptions. MATERIALS AND METHODS: In this online experiment, 730 UK adults were asked to imagine attending a medical appointment where the doctor used a computerized DDA. The DDA recommended a test to rule out serious disease. We varied the test's invasiveness, the doctor's adherence to DDA advice, and the severity of the patient's disease. Before disease severity was revealed, respondents indicated how worried they felt. Both before [t1] and after [t2] severity was revealed, we measured satisfaction with the consultation, likelihood of recommending the doctor, and suggested frequency of DDA use. RESULTS: At both timepoints, satisfaction and likelihood of recommending the doctor increased when the doctor adhered to DDA advice (P ≤ .01), and when the DDA suggested an invasive versus noninvasive test (P ≤ .05). The effect of adherence to DDA advice was stronger when participants were worried (P ≤ .05), and the disease turned out to be serious (P ≤ .01). Most respondents felt that DDAs should be used by doctors "sparingly" (34%[t1]/29%[t2]), "frequently," (43%[t1]/43%[t2]) or "always" (17%[t1]/21%[t2]). DISCUSSION: People are more satisfied when doctors adhere to DDA advice, especially when worried, and when it helps to spot serious disease. Having to undergo an invasive test does not appear to dampen satisfaction. CONCLUSION: Positive attitudes regarding DDA use and satisfaction with doctors adhering to DDA advice could encourage greater use of DDAs in consultations.
Assuntos
Relações Médico-Paciente , Médicos , Adulto , Humanos , Satisfação do Paciente , Reino Unido , Técnicas de Apoio para a Decisão , Inquéritos e QuestionáriosRESUMO
Previous research has highlighted the importance of physicians' early hypotheses for their subsequent diagnostic decisions. It has also been shown that diagnostic accuracy improves when physicians are presented with a list of diagnostic suggestions to consider at the start of the clinical encounter. The psychological mechanisms underlying this improvement in accuracy are hypothesised. It is possible that the provision of diagnostic suggestions disrupts physicians' intuitive thinking and reduces their certainty in their initial diagnostic hypotheses. This may encourage them to seek more information before reaching a diagnostic conclusion, evaluate this information more objectively, and be more open to changing their initial hypotheses. Three online experiments explored the effects of early diagnostic suggestions, provided by a hypothetical decision aid, on different aspects of the diagnostic reasoning process. Family physicians assessed up to two patient scenarios with and without suggestions. We measured effects on certainty about the initial diagnosis, information search and evaluation, and frequency of diagnostic changes. We did not find a clear and consistent effect of suggestions and detected mainly non-significant trends, some in the expected direction. We also detected a potential biasing effect: when the most likely diagnosis was included in the list of suggestions (vs. not included), physicians who gave that diagnosis initially, tended to request less information, evaluate it as more supportive of their diagnosis, become more certain about it, and change it less frequently when encountering new but ambiguous information; in other words, they seemed to validate rather than question their initial hypothesis. We conclude that further research using different methodologies and more realistic experimental situations is required to uncover both the beneficial and biasing effects of early diagnostic suggestions.
Assuntos
Raciocínio Clínico , Médicos de Família , Humanos , Médicos de Família/psicologiaRESUMO
BACKGROUND: In the absence of research into therapies and care pathways for long COVID, guidance based on 'emerging experience' is needed. AIM: To provide a rapid expert guide for GPs and long COVID clinical services. DESIGN AND SETTING: A Delphi study was conducted with a panel of primary and secondary care doctors. METHOD: Recommendations were generated relating to the investigation and management of long COVID. These were distributed online to a panel of UK doctors (any specialty) with an interest in, lived experience of, and/or experience treating long COVID. Over two rounds of Delphi testing, panellists indicated their agreement with each recommendation (using a five-point Likert scale) and provided comments. Recommendations eliciting a response of 'strongly agree', 'agree', or 'neither agree nor disagree' from 90% or more of responders were taken as showing consensus. RESULTS: Thirty-three clinicians representing 14 specialties reached consensus on 35 recommendations. Chiefly, GPs should consider long COVID in the presence of a wide range of presenting features (not limited to fatigue and breathlessness) and exclude differential diagnoses where appropriate. Detailed history and examination with baseline investigations should be conducted in primary care. Indications for further investigation and specific therapies (for myocarditis, postural tachycardia syndrome, mast cell disorder) include hypoxia/desaturation, chest pain, palpitations, and histamine-related symptoms. Rehabilitation should be individualised, with careful activity pacing (to avoid relapse) and multidisciplinary support. CONCLUSION: Long COVID clinics should operate as part of an integrated care system, with GPs playing a key role in the multidisciplinary team. Holistic care pathways, investigation of specific complications, management of potential symptom clusters, and tailored rehabilitation are needed.
Assuntos
COVID-19 , COVID-19/complicações , COVID-19/diagnóstico , COVID-19/terapia , Consenso , Técnica Delphi , Humanos , Síndrome de COVID-19 Pós-AgudaRESUMO
Background. In previous research, we employed a signal detection approach to measure the performance of general practitioners (GPs) when deciding about urgent referral for suspected lung cancer. We also explored associations between provider and organizational performance. We found that GPs from practices with higher referral positive predictive value (PPV; chance of referrals identifying cancer) were more reluctant to refer than those from practices with lower PPV. Here, we test the generalizability of our findings to a different cancer. Methods. A total of 252 GPs responded to 48 vignettes describing patients with possible colorectal cancer. For each vignette, respondents decided whether urgent referral to a specialist was needed. They then completed the 8-item Stress from Uncertainty scale. We measured GPs' discrimination (d') and response bias (criterion; c) and their associations with organizational performance and GP demographics. We also measured correlations of d' and c between the 2 studies for the 165 GPs who participated in both. Results. As in the lung study, organizational PPV was associated with response bias: in practices with higher PPV, GPs had higher criterion (b = 0.05 [0.03 to 0.07]; P < 0.001), that is, they were less inclined to refer. As in the lung study, female GPs were more inclined to refer than males (b = -0.17 [-0.30 to -0.105]; P = 0.005). In a mediation model, stress from uncertainty did not explain the gender difference. Only response bias correlated between the 2 studies (r = 0.39, P < 0.001). Conclusions. This study confirms our previous findings regarding the relationship between provider and organizational performance and strengthens the finding of gender differences in referral decision making. It also provides evidence that response bias is a relatively stable feature of GP referral decision making.
Assuntos
Eficiência Organizacional , Médicos/normas , Desempenho Profissional/normas , Correlação de Dados , Humanos , Pulmão/anormalidades , Pulmão/diagnóstico por imagem , Médicos/estatística & dados numéricos , Encaminhamento e Consulta/normas , Detecção de Sinal Psicológico , Desempenho Profissional/estatística & dados numéricosRESUMO
OBJECTIVES: The validated 'STARWAVe' (Short illness duration, Temperature, Age, Recession, Wheeze, Asthma, Vomiting) clinical prediction rule (CPR) uses seven variables to guide risk assessment and antimicrobial stewardship in children presenting with cough. We aimed to compare general practitioners' (GPs) risk assessments and prescribing decisions to those of STARWAVe and assess the influence of the CPR's clinical variables. SETTING: Primary care. PARTICIPANTS: 252 GPs, currently practising in the UK. DESIGN: GPs were randomly assigned to view four (of a possible eight) clinical vignettes online. Each vignette depicted a child presenting with cough, who was described in terms of the seven STARWAVe variables. Systematically, we manipulated patient age (20 months vs 5 years), illness duration (3 vs 6 days), vomiting (present vs absent) and wheeze (present vs absent), holding the remaining STARWAVe variables constant. OUTCOME MEASURES: Per vignette, GPs assessed risk of hospitalisation and indicated whether they would prescribe antibiotics or not. RESULTS: GPs overestimated risk of hospitalisation in 9% of vignette presentations (88/1008) and underestimated it in 46% (459/1008). Despite underestimating risk, they overprescribed: 78% of prescriptions were unnecessary relative to GPs' own risk assessments (121/156), while 83% were unnecessary relative to STARWAVe risk assessments (130/156). All four of the manipulated variables influenced risk assessments, but only three influenced prescribing decisions: a shorter illness duration reduced prescribing odds (OR 0.14, 95% CI 0.08 to 0.27, p<0.001), while vomiting and wheeze increased them (ORvomit 2.17, 95% CI 1.32 to 3.57, p=0.002; ORwheeze 8.98, 95% CI 4.99 to 16.15, p<0.001). CONCLUSIONS: Relative to STARWAVe, GPs underestimated risk of hospitalisation, overprescribed and appeared to misinterpret illness duration (prescribing for longer rather than shorter illnesses). It is important to ascertain discrepancies between CPRs and current clinical practice. This has implications for the integration of CPRs into the electronic health record and the provision of intelligible explanations to decision-makers.
Assuntos
Antibacterianos/uso terapêutico , Tomada de Decisão Clínica , Tosse/tratamento farmacológico , Clínicos Gerais , Padrões de Prática Médica/estatística & dados numéricos , Gestão de Antimicrobianos , Criança , Hospitalização , Humanos , Medição de Risco , Reino UnidoRESUMO
BACKGROUND: Signal detection theory (SDT) describes how respondents categorize ambiguous stimuli over repeated trials. It measures separately "discrimination" (ability to recognize a signal amid noise) and "criterion" (inclination to respond "signal" v. "noise"). This is important because respondents may produce the same accuracy rate for different reasons. We employed SDT to measure the referral decision making of general practitioners (GPs) in cases of possible lung cancer. METHODS: We constructed 44 vignettes of patients for whom lung cancer could be considered and estimated their 1-year risk. Under UK risk-based guidelines, half of the vignettes required urgent referral. We recruited 216 GPs from practices across England. Practices differed in the positive predictive value (PPV) of their urgent referrals (chance of referrals identifying cancer) and the sensitivity (chance of cancer patients being picked up via urgent referral from their practice). Participants saw the vignettes online and indicated whether they would refer each patient urgently or not. We calculated each GP's discrimination ( d ') and criterion ( c) and regressed these on practice PPV and sensitivity, as well as on GP experience and gender. RESULTS: Criterion was associated with practice PPV: as PPV increased, GPs' c also increased, indicating lower inclination to refer ( b = 0.06 [0.02-0.09]; P = 0.001). Female GPs were more inclined to refer than male GPs ( b = -0.20 [-0.40 to -0.001]; P = 0.049). Average discrimination was modest ( d' = 0.77), highly variable (range, -0.28 to 1.91), and not associated with practice referral performance. CONCLUSIONS: High referral PPV at the organizational level indicates GPs' inclination to avoid false positives, not better discrimination. Rather than bluntly mandating increases in practice PPV via more referrals, it is necessary to increase discrimination by improving the evidence base for cancer referral decisions.
Assuntos
Tomada de Decisões , Clínicos Gerais/psicologia , Encaminhamento e Consulta/organização & administração , Adulto , Erros de Diagnóstico/estatística & dados numéricos , Feminino , Clínicos Gerais/estatística & dados numéricos , Humanos , Neoplasias Pulmonares/terapia , Masculino , Pessoa de Meia-Idade , Padrões de Prática Médica , Teoria Psicológica , Encaminhamento e Consulta/estatística & dados numéricos , Fatores Sexuais , Reino UnidoRESUMO
"Predecisional information distortion" occurs when decision makers evaluate new information in a way that is biased towards their leading option. The phenomenon is well established, as is the method typically used to measure it, termed "stepwise evolution of preference" (SEP). An inadequacy of this method has recently come to the fore: it measures distortion as the total advantage afforded a leading option over its competitor, and therefore it cannot differentiate between distortion to strengthen a leading option ("proleader" distortion) and distortion to weaken a trailing option ("antitrailer" distortion). To address this, recent research introduced new response scales to SEP. We explore whether and how these new response scales might influence the very proleader and antitrailer processes that they were designed to capture ("reactivity"). We used the SEP method with concurrent verbal reporting: fifty family physicians verbalized their thoughts as they evaluated patient symptoms and signs ("cues") in relation to two competing diagnostic hypotheses. Twenty-five physicians evaluated each cue using the response scale traditional to SEP (a single response scale, returning a single measure of distortion); the other twenty-five did so using the response scales introduced in recent studies (two separate response scales, returning two separate measures of distortion: proleader and antitrailer). We measured proleader and antitrailer processes in verbalizations, and compared verbalizations in the single-scale and separate-scales groups. Response scales did not appear to affect proleader processes: the two groups of physicians were equally likely to bolster their leading diagnosis verbally. Response scales did, however, appear to affect antitrailer processes: the two groups denigrated their trailing diagnosis verbally to differing degrees. Our findings suggest that the response scales used to measure information distortion might influence its constituent processes, limiting their generalizability across and beyond experimental studies.
Assuntos
Diagnóstico , Cognição , Sinais (Psicologia) , Técnicas de Apoio para a Decisão , Humanos , Entrevistas como Assunto , Inquéritos e QuestionáriosRESUMO
BACKGROUND: Computerized diagnostic decision support systems (CDDSS) have the potential to support the cognitive task of diagnosis, which is one of the areas where general practitioners have greatest difficulty and which accounts for a significant proportion of adverse events recorded in the primary care setting. OBJECTIVE: To determine the extent to which CDDSS may meet the requirements of supporting the cognitive task of diagnosis, and the currently perceived barriers that prevent the integration of CDDSS with electronic health record (EHR) systems. METHODS: We conducted a meta-review of existing systematic reviews published in English, searching MEDLINE, Embase, PsycINFO and Web of Knowledge for articles on the features and effectiveness of CDDSS for medical diagnosis published since 2004. Eligibility criteria included systematic reviews where individual clinicians were primary end users. Outcomes we were interested in were the effectiveness and identification of specific features of CDDSS on diagnostic performance. RESULTS: We identified 1970 studies and excluded 1938 because they did not fit our inclusion criteria. A total of 45 articles were identified and 12 were found suitable for meta-review. Extraction of high-level requirements identified that a more standardized computable approach is needed to knowledge representation, one that can be readily updated as new knowledge is gained. In addition, a deep integration with the EHR is needed in order to trigger at appropriate points in cognitive workflow. CONCLUSION: Developing a CDDSS that is able to utilize dynamic vocabulary tools to quickly capture and code relevant diagnostic findings, and coupling these with individualized diagnostic suggestions based on the best-available evidence has the potential to improve diagnostic accuracy, but requires evaluation.