Búsqueda | Portal Regional de la BVS

Fairness metrics for health AI: we have a long way to go.

Mbakwe, Amarachi B; Lourentzou, Ismini; Celi, Leo Anthony; Wu, Joy T.

EBioMedicine ; 90: 104525, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-36924621

Asunto(s)

Benchmarking , Justicia Social , Humanos , Inteligencia Artificial

Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development.

Karargyris, Alexandros; Kashyap, Satyananda; Lourentzou, Ismini; Wu, Joy T; Sharma, Arjun; Tong, Matthew; Abedin, Shafiq; Beymer, David; Mukherjee, Vandana; Krupinski, Elizabeth A; Moradi, Mehdi.

Sci Data ; 8(1): 92, 2021 03 25.

Artículo en Inglés | MEDLINE | ID: mdl-33767191

RESUMEN

We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence. The data were collected using an eye-tracking system while a radiologist reviewed and reported on 1,083 CXR images. The dataset contains the following aligned data: CXR image, transcribed radiology report text, radiologist's dictation audio and eye gaze coordinates data. We hope this dataset can contribute to various areas of research particularly towards explainable and multimodal deep learning/machine learning methods. Furthermore, investigators in disease classification and localization, automated radiology report generation, and human-machine interaction can benefit from these data. We report deep learning experiments that utilize the attention maps produced by the eye gaze dataset to show the potential utility of this dataset.

Asunto(s)

Aprendizaje Profundo , Tórax/diagnóstico por imagen , Humanos , Radiografía

Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.

Wu, Joy T; Wong, Ken C L; Gur, Yaniv; Ansari, Nadeem; Karargyris, Alexandros; Sharma, Arjun; Morris, Michael; Saboury, Babak; Ahmad, Hassan; Boyko, Orest; Syed, Ali; Jadhav, Ashutosh; Wang, Hongzhi; Pillai, Anup; Kashyap, Satyananda; Moradi, Mehdi; Syeda-Mahmood, Tanveer.

JAMA Netw Open ; 3(10): e2022779, 2020 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-33034642

RESUMEN

Importance: Chest radiography is the most common diagnostic imaging examination performed in emergency departments (EDs). Augmenting clinicians with automated preliminary read assistants could help expedite their workflows, improve accuracy, and reduce the cost of care. Objective: To assess the performance of artificial intelligence (AI) algorithms in realistic radiology workflows by performing an objective comparative evaluation of the preliminary reads of anteroposterior (AP) frontal chest radiographs performed by an AI algorithm and radiology residents. Design, Setting, and Participants: This diagnostic study included a set of 72 findings assembled by clinical experts to constitute a full-fledged preliminary read of AP frontal chest radiographs. A novel deep learning architecture was designed for an AI algorithm to estimate the findings per image. The AI algorithm was trained using a multihospital training data set of 342â¯126 frontal chest radiographs captured in ED and urgent care settings. The training data were labeled from their associated reports. Image-based F1 score was chosen to optimize the operating point on the receiver operating characteristics (ROC) curve so as to minimize the number of missed findings and overcalls per image read. The performance of the model was compared with that of 5 radiology residents recruited from multiple institutions in the US in an objective study in which a separate data set of 1998 AP frontal chest radiographs was drawn from a hospital source representative of realistic preliminary reads in inpatient and ED settings. A triple consensus with adjudication process was used to derive the ground truth labels for the study data set. The performance of AI algorithm and radiology residents was assessed by comparing their reads with ground truth findings. All studies were conducted through a web-based clinical study application system. The triple consensus data set was collected between February and October 2018. The comparison study was preformed between January and October 2019. Data were analyzed from October to February 2020. After the first round of reviews, further analysis of the data was performed from March to July 2020. Main Outcomes and Measures: The learning performance of the AI algorithm was judged using the conventional ROC curve and the area under the curve (AUC) during training and field testing on the study data set. For the AI algorithm and radiology residents, the individual finding label performance was measured using the conventional measures of label-based sensitivity, specificity, and positive predictive value (PPV). In addition, the agreement with the ground truth on the assignment of findings to images was measured using the pooled κ statistic. The preliminary read performance was recorded for AI algorithm and radiology residents using new measures of mean image-based sensitivity, specificity, and PPV designed for recording the fraction of misses and overcalls on a per image basis. The 1-sided analysis of variance test was used to compare the means of each group (AI algorithm vs radiology residents) using the F distribution, and the null hypothesis was that the groups would have similar means. Results: The trained AI algorithm achieved a mean AUC across labels of 0.807 (weighted mean AUC, 0.841) after training. On the study data set, which had a different prevalence distribution, the mean AUC achieved was 0.772 (weighted mean AUC, 0.865). The interrater agreement with ground truth finding labels for AI algorithm predictions had pooled κ value of 0.544, and the pooled κ for radiology residents was 0.585. For the preliminary read performance, the analysis of variance test was used to compare the distributions of AI algorithm and radiology residents' mean image-based sensitivity, PPV, and specificity. The mean image-based sensitivity for AI algorithm was 0.716 (95% CI, 0.704-0.729) and for radiology residents was 0.720 (95% CI, 0.709-0.732) (P = .66), while the PPV was 0.730 (95% CI, 0.718-0.742) for the AI algorithm and 0.682 (95% CI, 0.670-0.694) for the radiology residents (P < .001), and specificity was 0.980 (95% CI, 0.980-0.981) for the AI algorithm and 0.973 (95% CI, 0.971-0.974) for the radiology residents (P < .001). Conclusions and Relevance: These findings suggest that it is possible to build AI algorithms that reach and exceed the mean level of performance of third-year radiology residents for full-fledged preliminary read of AP frontal chest radiographs. This diagnostic study also found that while the more complex findings would still benefit from expert overreads, the performance of AI algorithms was associated with the amount of data available for training rather than the level of difficulty of interpretation of the finding. Integrating such AI systems in radiology workflows for preliminary interpretations has the potential to expedite existing radiology workflows and address resource scarcity while improving overall accuracy and reducing the cost of care.

Asunto(s)

Inteligencia Artificial/normas , Internado y Residencia/normas , Interpretación de Imagen Radiográfica Asistida por Computador/normas , Tórax/diagnóstico por imagen , Algoritmos , Área Bajo la Curva , Inteligencia Artificial/estadística & datos numéricos , Humanos , Internado y Residencia/métodos , Internado y Residencia/estadística & datos numéricos , Calidad de la Atención de Salud/normas , Calidad de la Atención de Salud/estadística & datos numéricos , Curva ROC , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/estadística & datos numéricos , Radiografía/instrumentación , Radiografía/métodos

Combining Deep Learning and Knowledge-driven Reasoning for Chest X-Ray Findings Detection.

Jadhav, Ashutosh; Wong, Ken C L; Wu, Joy T; Moradi, Mehdi; Syeda-Mahmood, Tanveer.

AMIA Annu Symp Proc ; 2020: 593-601, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33936433

RESUMEN

The application of deep learning algorithms in medical imaging analysis is a steadily growing research area. While deep learning methods are thriving in the medical domain, they seldom utilize the rich knowledge associated with connected radiology reports. The knowledge derived from these reports can be utilized to enhance the performance of deep learning models. In this work, we used a comprehensive chest X-ray findings vocabulary to automatically annotate an extensive collection of chest X-rays using associated radiology reports and a vocabulary-driven concept annotation algorithm. The annotated X-rays are used to train a deep neural network classifier for finding detection. Finally, we developed a knowledge-driven reasoning algorithm that leverages knowledge learned from X-ray reports to improve upon the deep learning module's performance on finding detection. Our results suggest that combining deep learning and knowledge from radiology reports in a hybrid framework can significantly enhance overall performance in the CXR finding detection.

Asunto(s)

Radiografía Torácica/métodos , Tórax/diagnóstico por imagen , Rayos X , Algoritmos , Aprendizaje Profundo , Humanos , Redes Neurales de la Computación , Radiografía

Extracting and Learning Fine-Grained Labels from Chest Radiographs.

Syeda-Mahmood, Tanveer; Wong, K C L; Wu, Joy T; Jadhav, Ashutosh; Boyko, Orest.

AMIA Annu Symp Proc ; 2020: 1190-1199, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33936495

RESUMEN

Chest radiographs are the most common diagnostic exam in emergency rooms and intensive care units today. Recently, a number of researchers have begun working on large chest X-ray datasets to develop deep learning models for recognition of a handful of coarse finding classes such as opacities, masses and nodules. In this paper, we focus on extracting and learning fine-grained labels for chest X-ray images. Specifically we develop a new method of extracting fine-grained labels from radiology reports by combining vocabulary-driven concept extraction with phrasal grouping in dependency parse trees for association of modifiers with findings. A total of457finegrained labels depicting the largest spectrum of findings to date were selected and sufficiently large datasets acquired to train a new deep learning model designed for fine-grained classification. We show results that indicate a highly accurate label extraction process and a reliable learning of fine-grained labels. The resulting network, to our knowledge, is the first to recognize fine-grained descriptions offindings in images covering over nine modifiers including laterality, location, severity, size and appearance.

Asunto(s)

Diagnóstico por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Aprendizaje Profundo , Humanos , Reconocimiento de Normas Patrones Automatizadas , Tórax/diagnóstico por imagen

AI Accelerated Human-in-the-loop Structuring of Radiology Reports.

Wu, Joy T; Syed, Ali; Ahmad, Hassan; Pillai, Anup; Gur, Yaniv; Jadhav, Ashutosh; Gruhl, Daniel; Kato, Linda; Moradi, Mehdi; Syeda-Mahmood, Tanveer.

AMIA Annu Symp Proc ; 2020: 1305-1314, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33936507

RESUMEN

Rule-based Natural Language Processing (NLP) pipelines depend on robust domain knowledge. Given the long tail of important terminology in radiology reports, it is not uncommon for standard approaches to miss items critical for understanding the image. AI techniques can accelerate the concept expansion and phrasal grouping tasks to efficiently create a domain specific lexicon ontology for structuring reports. Using Chest X-ray (CXR) reports as an example, we demonstrate that with robust vocabulary, even a simple NLP pipeline can extract 83 directly mentioned abnormalities (Ave. recall=93.83%, precision=94.87%) and 47 abnormality/normality descriptions of key anatomies. The richer vocabulary enables identification of additional label mentions in 10 out of 13 labels (compared to baseline methods). Furthermore, it captures expert insight into critical differences between observed and inferred descriptions, and image quality issues in reports. Finally, we show how the CXR ontology can be used to anatomically structure labeled output.

Asunto(s)

Radiología , Bases de Datos Factuales , Humanos , Procesamiento de Lenguaje Natural , Informe de Investigación

Behind the scenes: A medical natural language processing project.

Wu, Joy T; Dernoncourt, Franck; Gehrmann, Sebastian; Tyler, Patrick D; Moseley, Edward T; Carlson, Eric T; Grant, David W; Li, Yeran; Welt, Jonathan; Celi, Leo Anthony.

Int J Med Inform ; 112: 68-73, 2018 04.

Artículo en Inglés | MEDLINE | ID: mdl-29500024

RESUMEN

Advancement of Artificial Intelligence (AI) capabilities in medicine can help address many pressing problems in healthcare. However, AI research endeavors in healthcare may not be clinically relevant, may have unrealistic expectations, or may not be explicit enough about their limitations. A diverse and well-functioning multidisciplinary team (MDT) can help identify appropriate and achievable AI research agendas in healthcare, and advance medical AI technologies by developing AI algorithms as well as addressing the shortage of appropriately labeled datasets for machine learning. In this paper, our team of engineers, clinicians and machine learning experts share their experience and lessons learned from their two-year-long collaboration on a natural language processing (NLP) research project. We highlight specific challenges encountered in cross-disciplinary teamwork, dataset creation for NLP research, and expectation setting for current medical AI technologies.

Asunto(s)

Algoritmos , Inteligencia Artificial , Toma de Decisiones Clínicas , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Humanos

Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.

Gehrmann, Sebastian; Dernoncourt, Franck; Li, Yeran; Carlson, Eric T; Wu, Joy T; Welt, Jonathan; Foote, John; Moseley, Edward T; Grant, David W; Tyler, Patrick D; Celi, Leo A.

PLoS One ; 13(2): e0192360, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29447188

RESUMEN

In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.

Asunto(s)

Lenguaje , Aprendizaje , Fenotipo , Humanos

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA