Búsqueda | Portal Regional de la BVS

The impact of inconsistent human annotations on AI driven clinical decision making.

Sylolypavan, Aneeta; Sleeman, Derek; Wu, Honghan; Sim, Malcolm.

NPJ Digit Med ; 6(1): 26, 2023 Feb 21.

Artículo en Inglés | MEDLINE | ID: mdl-36810915

RESUMEN

In supervised learning model development, domain experts are often used to provide the class labels (annotations). Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgments, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings, when supervised learning is applied on such 'noisy' labelled data. To shed light on these issues, we conducted extensive experiments and analyses on three real-world Intensive Care Unit (ICU) datasets. Specifically, individual models were built from a common dataset, annotated independently by 11 Glasgow Queen Elizabeth University Hospital ICU consultants, and model performance estimates were compared through internal validation (Fleiss' κ = 0.383 i.e., fair agreement). Further, broad external validation (on both static and time series datasets) of these 11 classifiers was carried out on a HiRID external dataset, where the models' classifications were found to have low pairwise agreements (average Cohen's κ = 0.255 i.e., minimal agreement). Moreover, they tend to disagree more on making discharge decisions (Fleiss' κ = 0.174) than predicting mortality (Fleiss' κ = 0.267). Given these inconsistencies, further analyses were conducted to evaluate the current best practices in obtaining gold-standard models and determining consensus. The results suggest that: (a) there may not always be a "super expert" in acute clinical settings (using internal and external validation model performances as a proxy); and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only 'learnable' annotated datasets for determining consensus achieves optimal models in most cases.

Pointers to earlier diagnosis of endometriosis: a nested case-control study using primary care electronic health records.

Burton, Christopher; Iversen, Lisa; Bhattacharya, Sohinee; Ayansina, Dolapo; Saraswat, Lucky; Sleeman, Derek.

Br J Gen Pract ; 67(665): e816-e823, 2017 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-29109114

RESUMEN

BACKGROUND: Endometriosis is a condition with relatively non-specific symptoms, and in some cases a long time elapses from first-symptom presentation to diagnosis. AIM: To develop and test new composite pointers to a diagnosis of endometriosis in primary care electronic records. DESIGN AND SETTING: This is a nested case-control study of 366 cases using the Practice Team Information database of anonymised primary care electronic health records from Scotland. Data were analysed from 366 cases of endometriosis between 1994 and 2010, and two sets of age and GP practice matched controls: (a) 1453 randomly selected females and (b) 610 females whose records contained codes indicating consultation for gynaecological symptoms. METHOD: Composite pointers comprised patterns of symptoms, prescribing, or investigations, in combination or over time. Conditional logistic regression was used to examine the presence of both new and established pointers during the 3 years before diagnosis of endometriosis and to identify time of appearance. RESULTS: A number of composite pointers that were strongly predictive of endometriosis were observed. These included pain and menstrual symptoms occurring within the same year (odds ratio [OR] 6.5, 95% confidence interval [CI] = 3.9 to 10.6), and lower gastrointestinal symptoms occurring within 90 days of gynaecological pain (OR 6.1, 95% CI = 3.6 to 10.6). Although the association of infertility with endometriosis was only detectable in the year before diagnosis, several pain-related features were associated with endometriosis several years earlier. CONCLUSION: Useful composite pointers to a diagnosis of endometriosis in GP records were identified. Some of these were present several years before the diagnosis and may be valuable targets for diagnostic support systems.

Asunto(s)

Dismenorrea/diagnóstico , Registros Electrónicos de Salud , Endometriosis/diagnóstico , Gastroenteritis/diagnóstico , Dolor Pélvico/diagnóstico , Atención Primaria de Salud , Adolescente , Adulto , Distribución por Edad , Analgésicos/uso terapéutico , Antiinflamatorios no Esteroideos/uso terapéutico , Estudios de Casos y Controles , Dismenorrea/etiología , Endometriosis/fisiopatología , Femenino , Gastroenteritis/etiología , Humanos , Oportunidad Relativa , Dolor Pélvico/etiología , Guías de Práctica Clínica como Asunto , Derivación y Consulta , Medición de Riesgo , Escocia/epidemiología , Adulto Joven

Argumentation-logic for creating and explaining medical hypotheses.

Grando, Maria Adela; Moss, Laura; Sleeman, Derek; Kinsella, John.

Artif Intell Med ; 58(1): 1-13, 2013 May.

Artículo en Inglés | MEDLINE | ID: mdl-23522940

RESUMEN

OBJECTIVE: While EIRA has proved to be successful in the detection of anomalous patient responses to treatments in the Intensive Care Unit, it could not describe to clinicians the rationales behind the anomalous detections. The aim of this paper is to address this problem. METHODS: Few attempts have been made in the past to build knowledge-based medical systems that possess both argumentation and explanation capabilities. Here we propose an approach based on Dung's seminal calculus of opposition. RESULTS: We have developed a new tool, arguEIRA, which is an extension of the existing EIRA system. In this paper we extend EIRA by providing it with an argumentation-based justification system that formalizes and communicates to the clinicians the reasons why a patient response is anomalous. CONCLUSION: Our comparative evaluation of the EIRA system against the newly developed tool highlights the multiple benefits that the use of argumentation-logic can bring to the field of medical decision support and explanation.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas/organización & administración , Unidades de Cuidados Intensivos/organización & administración , Bases del Conocimiento , Resultado del Tratamiento , Algoritmos

Investigating the disagreement between clinicians' ratings of patients in ICUs.

Rogers, Simon; Sleeman, Derek; Kinsella, John.

IEEE J Biomed Health Inform ; 17(4): 843-52, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-25055313

RESUMEN

We present a Bayesian analysis of ordinal annotations made by clinicians of patients in intensive care. In particular, we investigate the different ways in which clinicians can disagree and how their disagreement is reduced once they take part in a recently proposed procedure (INSIGHT) that aims at improving consistency. The model combines a nonparametric function (loosely interpretable as the health of the patient) with clinician-specific generative procedures for producing the observed ordinal values. Our analysis provides valuable details of the rating behavior of the individual clinicians and shows that the INSIGHT procedure is particularly effective at removing (some) clinician-specific inconsistencies and biases.

Asunto(s)

Unidades de Cuidados Intensivos/estadística & datos numéricos , Registros Médicos/normas , Médicos/estadística & datos numéricos , Inteligencia Artificial , Simulación por Computador , Humanos , Modelos Estadísticos

Detecting and resolving inconsistencies between domain experts' different perspectives on (classification) tasks.

Sleeman, Derek; Moss, Laura; Aiken, Andy; Hughes, Martin; Kinsella, John; Sim, Malcolm.

Artif Intell Med ; 55(2): 71-86, 2012 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-22483422

RESUMEN

OBJECTIVES: The work reported here focuses on developing novel techniques which enable an expert to detect inconsistencies in 2 (or more) perspectives that the expert might have on the same (classification) task. The high level task which the experts (physicians) had set themselves was to classify, on a 5-point severity scale (A-E), the hourly reports produced by an intensive care unit's patient management system. METHOD: The INSIGHT system has been developed to support domain experts exploring, and removing inconsistencies in their conceptualization of a task. We report here a study of intensive care physicians reconciling 2 perspectives on their patients. The 2 perspectives provided to INSIGHT were an annotated set of patient records where the expert had selected the appropriate category to describe that snapshot of the patient, and a set of rules which are able to classify the various time points on the same 5-point scale. Inconsistencies between these 2 perspectives are displayed as a confusion matrix; moreover INSIGHT then allows the expert to revise both the annotated datasets (correcting data errors, or changing the assigned categories) and the actual rule-set. RESULTS: Each of the 3 experts achieved a very high degree of consensus (~97%) between his refined knowledge sources (i.e., annotated hourly patient records and the rule-set). We then had the experts produce a common rule-set and then refine their several sets of annotations against it; this again resulted in inter-expert agreements of ~97%. The resulting rule-set can then be used in applications with considerable confidence. CONCLUSION: This study has shown that under some circumstances, it is possible for domain experts to achieve a high degree of correlation between 2 perspectives of the same task. The experts agreed that the immediate feedback provided by INSIGHT was a significant contribution to this successful outcome.

Asunto(s)

Inteligencia Artificial , Sistemas de Administración de Bases de Datos/instrumentación , Registros Electrónicos de Salud/instrumentación , Testimonio de Experto , Clasificación/métodos , Diagnóstico por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Unidades de Cuidados Intensivos

A comparison between clinical decisions made about lung cancer patients and those inherent in the corresponding Scottish Intercollegiate Guidelines Network (SIGN) guideline.

Sleeman, Derek; Moss, Laura; Gyftodimos, Elias; Nicolson, Marianne; Devereux, Graham.

Health Informatics J ; 16(4): 260-73, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-21216806

RESUMEN

Treatment and survival for patients with lung cancer vary between and within countries. We have undertaken a multifaceted study of a clinical dataset of 635 patients, to see if clinician treatment decisions were being made consistently and in accordance with the appropriate Scottish Intercollegiate Guidelines Network (SIGN) document. Subsequently, we created a dataset of 117 patients who should have undergone surgery according to the SIGN guideline. As analyses of this dataset did not provide clear distinctions between the main treatment groups, a clinician reviewed the case notes and dataset, checking for inconsistencies. The revised dataset was processed by a decision tree algorithm which suggests clinically plausible decisions. Further, statistical analyses compared the 54 patients offered surgery with the 52 who were not. These analyses suggest that there are significant differences: the most discriminating feature is significant co-morbidity (p < 0.001). The article concludes with suggestions for how future guidelines might be enhanced.

Asunto(s)

Toma de Decisiones , Adhesión a Directriz , Neoplasias Pulmonares/terapia , Guías de Práctica Clínica como Asunto , Algoritmos , Árboles de Decisión , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto , Escocia

Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression.

Andrews, Peter J D; Sleeman, Derek H; Statham, Patrick F X; McQuatt, Andrew; Corruble, Vincent; Jones, Patricia A; Howells, Timothy P; Macmillan, Carol S A.

J Neurosurg ; 97(2): 326-36, 2002 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-12186460

RESUMEN

OBJECT: Decision tree analysis highlights patient subgroups and critical values in variables assessed. Importantly, the results are visually informative and often present clear clinical interpretation about risk factors faced by patients in these subgroups. The aim of this prospective study was to compare results of logistic regression with those of decision tree analysis of an observational, head-injury data set, including a wide range of secondary insults and 12-month outcomes. METHODS: One hundred twenty-four adult head-injured patients were studied during their stay in an intensive care unit by using a computerized data collection system. Verified values falling outside threshold limits were analyzed according to insult grade and duration with the aid of logistic regression. A decision tree was automatically produced from root node to target classes (Glasgow Outcome Scale [GOS] score). Among 69 patients, in whom eight insult categories could be assessed, outcome at 12 months was analyzed using logistic regression to determine the relative influence of patient age, admission Glasgow Coma Scale score, Injury Severity Score (ISS), pupillary response on admission, and insult duration. The most significant predictors of mortality in this patient set were duration of hypotensive, pyrexic, and hypoxemic insults. When good and poor outcomes were compared, hypotensive insults and pupillary response on admission were significant. Using decision tree analysis, the authors found that hypotension and low cerebral perfusion pressure (CPP) are the best predictors of death, with a 9.2% improvement in predictive accuracy (PA) over that obtained by simply predicting the largest outcome category as the outcome for each patient. Hypotension was a significant predictor of poor outcome (GOS Score 1-3). Low CPP, patient age, hypocarbia, and pupillary response were also good predictors of outcome (good/poor), with a 5.1% improvement in PA. In certain subgroups of patients pyrexia was a predictor of good outcome. CONCLUSIONS: Decision tree analysis confirmed some of the results of logistic regression and challenged others. This investigation shows that there is knowledge to be gained from analyzing observational data with the aid of decision tree analysis.

Asunto(s)

Lesiones Encefálicas/mortalidad , Lesiones Encefálicas/fisiopatología , Árboles de Decisión , Modelos Logísticos , Evaluación de Resultado en la Atención de Salud , Admisión del Paciente/estadística & datos numéricos , Recuperación de la Función/fisiología , Adulto , Lesiones Encefálicas/terapia , Femenino , Escala de Coma de Glasgow , Humanos , Puntaje de Gravedad del Traumatismo , Masculino , Valor Predictivo de las Pruebas , Estudios Prospectivos , Tasa de Supervivencia , Factores de Tiempo

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA