RESUMO
It is currently not well known how necroptosis and necroptosis responses manifest in vivo. Here, we uncovered a molecular switch facilitating reprogramming between two alternative modes of necroptosis signaling in hepatocytes, fundamentally affecting immune responses and hepatocarcinogenesis. Concomitant necrosome and NF-κB activation in hepatocytes, which physiologically express low concentrations of receptor-interacting kinase 3 (RIPK3), did not lead to immediate cell death but forced them into a prolonged "sublethal" state with leaky membranes, functioning as secretory cells that released specific chemokines including CCL20 and MCP-1. This triggered hepatic cell proliferation as well as activation of procarcinogenic monocyte-derived macrophage cell clusters, contributing to hepatocarcinogenesis. In contrast, necrosome activation in hepatocytes with inactive NF-κB-signaling caused an accelerated execution of necroptosis, limiting alarmin release, and thereby preventing inflammation and hepatocarcinogenesis. Consistently, intratumoral NF-κB-necroptosis signatures were associated with poor prognosis in human hepatocarcinogenesis. Therefore, pharmacological reprogramming between these distinct forms of necroptosis may represent a promising strategy against hepatocellular carcinoma.
Assuntos
Neoplasias Hepáticas , NF-kappa B , Humanos , NF-kappa B/metabolismo , Proteínas Quinases/metabolismo , Necroptose , Inflamação/patologia , Proteína Serina-Treonina Quinases de Interação com Receptores/genética , Proteína Serina-Treonina Quinases de Interação com Receptores/metabolismo , ApoptoseRESUMO
Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Assuntos
Glioblastoma , Medicina de Precisão , Humanos , Aprendizado de Máquina , Reino UnidoRESUMO
BACKGROUND: Homologous recombination deficiency (HRD) is recognized as a pan-cancer predictive biomarker that potentially indicates who could benefit from treatment with PARP inhibitors (PARPi). Despite its clinical significance, HRD testing is highly complex. Here, we investigated in a proof-of-concept study whether Deep Learning (DL) can predict HRD status solely based on routine hematoxylin & eosin (H&E) histology images across nine different cancer types. METHODS: We developed a deep learning pipeline with attention-weighted multiple instance learning (attMIL) to predict HRD status from histology images. As part of our approach, we calculated a genomic scar HRD score by combining loss of heterozygosity (LOH), telomeric allelic imbalance (TAI), and large-scale state transitions (LST) from whole genome sequencing (WGS) data of n = 5209 patients across two independent cohorts. The model's effectiveness was evaluated using the area under the receiver operating characteristic curve (AUROC), focusing on its accuracy in predicting genomic HRD against a clinically recognized cutoff value. RESULTS: Our study demonstrated the predictability of genomic HRD status in endometrial, pancreatic, and lung cancers reaching cross-validated AUROCs of 0.79, 0.58, and 0.66, respectively. These predictions generalized well to an external cohort, with AUROCs of 0.93, 0.81, and 0.73. Moreover, a breast cancer-trained image-based HRD classifier yielded an AUROC of 0.78 in the internal validation cohort and was able to predict HRD in endometrial, prostate, and pancreatic cancer with AUROCs of 0.87, 0.84, and 0.67, indicating that a shared HRD-like phenotype occurs across these tumor entities. CONCLUSIONS: This study establishes that HRD can be directly predicted from H&E slides using attMIL, demonstrating its applicability across nine different tumor types.
Assuntos
Aprendizado Profundo , Recombinação Homóloga , Neoplasias , Humanos , Neoplasias/genética , Perda de HeterozigosidadeRESUMO
Although pathological tissue analysis is typically performed on single 2-dimensional (2D) histologic reference slides, 3-dimensional (3D) reconstruction from a sequence of histologic sections could provide novel opportunities for spatial analysis of the extracted tissue. In this review, we analyze recent works published after 2018 and report information on the extracted tissue types, the section thickness, and the number of sections used for reconstruction. By analyzing the technological requirements for 3D reconstruction, we observe that software tools exist, both free and commercial, which include the functionality to perform 3D reconstruction from a sequence of histologic images. Through the analysis of the most recent works, we provide an overview of the workflows and tools that are currently used for 3D reconstruction from histologic sections and address points for future work, such as a missing common file format or computer-aided analysis of the reconstructed model.
Assuntos
Imageamento Tridimensional , Imageamento Tridimensional/métodos , Humanos , Software , AnimaisRESUMO
BACKGROUND & AIMS: Diagnosis of adenocarcinoma in the liver is a frequent scenario in routine pathology and has a critical impact on clinical decision making. However, rendering a correct diagnosis can be challenging, and often requires the integration of clinical, radiologic, and immunohistochemical information. We present a deep learning model (HEPNET) to distinguish intrahepatic cholangiocarcinoma from colorectal liver metastasis, as the most frequent primary and secondary forms of liver adenocarcinoma, with clinical grade accuracy using H&E-stained whole-slide images. METHODS: HEPNET was trained on 714,589 image tiles from 456 patients who were randomly selected in a stratified manner from a pool of 571 patients who underwent surgical resection or biopsy at Heidelberg University Hospital. Model performance was evaluated on a hold-out internal test set comprising 115 patients and externally validated on 159 patients recruited at Mainz University Hospital. RESULTS: On the hold-out internal test set, HEPNET achieved an area under the receiver operating characteristic curve of 0.994 (95% CI, 0.989-1.000) and an accuracy of 96.522% (95% CI, 94.521%-98.694%) at the patient level. Validation on the external test set yielded an area under the receiver operating characteristic curve of 0.997 (95% CI, 0.995-1.000), corresponding to an accuracy of 98.113% (95% CI, 96.907%-100.000%). HEPNET surpassed the performance of 6 pathology experts with different levels of experience in a reader study of 50 patients (P = .0005), boosted the performance of resident pathologists to the level of senior pathologists, and reduced potential downstream analyses. CONCLUSIONS: We provided a ready-to-use tool with clinical grade performance that may facilitate routine pathology by rendering a definitive diagnosis and guiding ancillary testing. The incorporation of HEPNET into pathology laboratories may optimize the diagnostic workflow, complemented by test-related labor and cost savings.
RESUMO
Deep learning (DL) is currently the standard artificial intelligence tool for computer-based image analysis in radiology. Traditionally, DL models have been trained with strongly supervised learning methods. These methods depend on reference standard labels, typically applied manually by experts. In contrast, weakly supervised learning is more scalable. Weak supervision comprises situations in which only a portion of the data are labeled (incomplete supervision), labels refer to a whole region or case as opposed to a precisely delineated image region (inexact supervision), or labels contain errors (inaccurate supervision). In many applications, weak labels are sufficient to train useful models. Thus, weakly supervised learning can unlock a large amount of otherwise unusable data for training DL models. One example of this is using large language models to automatically extract weak labels from free-text radiology reports. Here, we outline the key concepts in weakly supervised learning and provide an overview of applications in radiologic image analysis. With more fundamental and clinical translational work, weakly supervised learning could facilitate the uptake of DL in radiology and research workflows by enabling large-scale image analysis and advancing the development of new DL-based biomarkers.
Assuntos
Aprendizado Profundo , Radiologia , Humanos , Radiologia/educação , Aprendizado de Máquina Supervisionado , Interpretação de Imagem Assistida por Computador/métodosRESUMO
Background Procedural details of mechanical thrombectomy in patients with ischemic stroke are important predictors of clinical outcome and are collected for prospective studies or national stroke registries. To date, these data are collected manually by human readers, a labor-intensive task that is prone to errors. Purpose To evaluate the use of the large language models (LLMs) GPT-4 and GPT-3.5 to extract data from neuroradiology reports on mechanical thrombectomy in patients with ischemic stroke. Materials and Methods This retrospective study included consecutive reports from patients with ischemic stroke who underwent mechanical thrombectomy between November 2022 and September 2023 at institution 1 and between September 2016 and December 2019 at institution 2. A set of 20 reports was used to optimize the prompt, and the ability of the LLMs to extract procedural data from the reports was compared using the McNemar test. Data manually extracted by an interventional neuroradiologist served as the reference standard. Results A total of 100 internal reports from 100 patients (mean age, 74.7 years ± 13.2 [SD]; 53 female) and 30 external reports from 30 patients (mean age, 72.7 years ± 13.5; 18 male) were included. All reports were successfully processed by GPT-4 and GPT-3.5. Of 2800 data entries, 2631 (94.0% [95% CI: 93.0, 94.8]; range per category, 61%-100%) data points were correctly extracted by GPT-4 without the need for further postprocessing. With 1788 of 2800 correct data entries, GPT-3.5 produced fewer correct data entries than did GPT-4 (63.9% [95% CI: 62.0, 65.6]; range per category, 14%-99%; P < .001). For the external reports, GPT-4 extracted 760 of 840 (90.5% [95% CI: 88.3, 92.4]) correct data entries, while GPT-3.5 extracted 539 of 840 (64.2% [95% CI: 60.8, 67.4]; P < .001). Conclusion Compared with GPT-3.5, GPT-4 more frequently extracted correct procedural data from free-text reports on mechanical thrombectomy performed in patients with ischemic stroke. © RSNA, 2024 Supplemental material is available for this article.
Assuntos
AVC Isquêmico , Acidente Vascular Cerebral , Humanos , Feminino , Masculino , Idoso , AVC Isquêmico/diagnóstico por imagem , AVC Isquêmico/cirurgia , Estudos Retrospectivos , Estudos Prospectivos , Acidente Vascular Cerebral/diagnóstico por imagem , Acidente Vascular Cerebral/cirurgia , TrombectomiaRESUMO
Background The level of background parenchymal enhancement (BPE) at breast MRI provides predictive and prognostic information and can have diagnostic implications. However, there is a lack of standardization regarding BPE assessment. Purpose To investigate how well results of quantitative BPE assessment methods correlate among themselves and with assessments made by radiologists experienced in breast MRI. Materials and Methods In this pseudoprospective analysis of 5773 breast MRI examinations from 3207 patients (mean age, 60 years ± 10 [SD]), the level of BPE was prospectively categorized according to the Breast Imaging Reporting and Data System by radiologists experienced in breast MRI. For automated extraction of BPE, fibroglandular tissue (FGT) was segmented in an automated pipeline. Four different published methods for automated quantitative BPE extractions were used: two methods (A and B) based on enhancement intensity and two methods (C and D) based on the volume of enhanced FGT. The results from all methods were correlated, and agreement was investigated in comparison with the respective radiologist-based categorization. For surrogate validation of BPE assessment, how accurately the methods distinguished premenopausal women with (n = 50) versus without (n = 896) antihormonal treatment was determined. Results Intensity-based methods (A and B) exhibited a correlation with radiologist-based categorization of 0.56 ± 0.01 and 0.55 ± 0.01, respectively, and volume-based methods (C and D) had a correlation of 0.52 ± 0.01 and 0.50 ± 0.01 (P < .001). There were notable correlation differences (P < .001) between the BPE determined with the four methods. Among the four quantitation methods, method D offered the highest accuracy for distinguishing women with versus without antihormonal therapy (P = .01). Conclusion Results of different methods for quantitative BPE assessment agree only moderately among themselves or with visual categories reported by experienced radiologists; intensity-based methods correlate more closely with radiologists' ratings than volume-based methods. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Mann in this issue.
Assuntos
Neoplasias da Mama , Mama , Imageamento por Ressonância Magnética , Humanos , Feminino , Pessoa de Meia-Idade , Imageamento por Ressonância Magnética/métodos , Neoplasias da Mama/diagnóstico por imagem , Mama/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Adulto , Estudos Prospectivos , Aumento da Imagem/métodos , Idoso , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
Infarct size (IS) is the most robust end point for evaluating the success of preclinical studies on cardioprotection. The gold standard for IS quantification in ischemia/reperfusion (I/R) experiments is triphenyl tetrazolium chloride (TTC) staining, typically done manually. This study aimed to determine if automation through deep learning segmentation is a time-saving and valid alternative to standard IS quantification. High-resolution images from TTC-stained, macroscopic heart slices were retrospectively collected from pig experiments (n = 390) with I/R without/with cardioprotection to cover a wide IS range. Existing IS data from pig experiments, quantified using a standard method of manual and subsequent digital labeling of film-scan annotations, were used as reference. To automate the evaluation process with the aim to be more objective and save time, a deep learning pipeline was implemented; the collected images (n = 3869) were pre-processed by cropping and labeled (image annotations). To ensure their usability as training data for a deep learning segmentation model, IS was quantified from image annotations and compared to IS quantified using the existing film-scan annotations. A supervised deep learning segmentation model based on dynamic U-Net architecture was developed and trained. The evaluation of the trained model was performed by fivefold cross-validation (n = 220 experiments) and testing on an independent test set (n = 170 experiments). Performance metrics (Dice similarity coefficient [DSC], pixel accuracy [ACC], average precision [mAP]) were calculated. IS was then quantified from predictions and compared to IS quantified from image annotations (linear regression, Pearson's r; analysis of covariance; Bland-Altman plots). Performance metrics near 1 indicated a strong model performance on cross-validated data (DSC: 0.90, ACC: 0.98, mAP: 0.90) and on the test set data (DSC: 0.89, ACC: 0.98, mAP: 0.93). IS quantified from predictions correlated well with IS quantified from image annotations in all data sets (cross-validation: r = 0.98; test data set: r = 0.95) and analysis of covariance identified no significant differences. The model reduced the IS quantification time per experiment from approximately 90 min to 20 s. The model was further tested on a preliminary test set from experiments in isolated, saline-perfused rat hearts with regional I/R without/with cardioprotection (n = 27). There was also no significant difference in IS between image annotations and predictions, but the performance on the test set data from rat hearts was lower (DSC: 0.66, ACC: 0.91, mAP: 0.65). IS quantification using a deep learning segmentation model is a valid and time-efficient alternative to manual and subsequent digital labeling.
RESUMO
The advent of digital pathology and the deployment of high-throughput molecular techniques are generating an unprecedented mass of data. Thanks to advances in computational sciences, artificial intelligence (AI) approaches represent a promising avenue for extracting relevant information from complex data structures. From diagnostic assistance to powerful research tools, the potential fields of application of machine learning techniques in pathology are vast and constitute the subject of considerable research work. The aim of this article is to provide an overview of the potential applications of AI in the field of haematopathology and to define the role that these emerging technologies could play in our laboratories in the short to medium term.
RESUMO
BACKGROUND: Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient. METHODS: We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide. RESULTS: In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994-0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979-0.985] and 0.875 [95% CI: 0.864-0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort. CONCLUSION: Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations.
Assuntos
Neoplasias Colorretais , Aprendizado Profundo , Humanos , Neoplasias Colorretais/patologia , Neoplasias Colorretais/diagnóstico , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Interpretação de Imagem Assistida por Computador/métodosRESUMO
BACKGROUND: Attempts to use artificial intelligence (AI) in psychiatric disorders show moderate success, highlighting the potential of incorporating information from clinical assessments to improve the models. This study focuses on using large language models (LLMs) to detect suicide risk from medical text in psychiatric care. AIMS: To extract information about suicidality status from the admission notes in electronic health records (EHRs) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models. METHOD: We compared the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from 100 psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity and F1 score across different prompting strategies. RESULTS: A German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83.0%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs. CONCLUSIONS: The study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting information on suicidality from psychiatric records while preserving data privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research.
RESUMO
Use of techniques derived from generative artificial intelligence (AI), specifically large language models (LLMs), offer a transformative potential on the management of multiple sclerosis (MS). Recent LLMs have exhibited remarkable skills in producing and understanding human-like texts. The integration of AI in imaging applications and the deployment of foundation models for the classification and prognosis of disease course, including disability progression and even therapy response, have received considerable attention. However, the use of LLMs within the context of MS remains relatively underexplored. LLMs have the potential to support several activities related to MS management. Clinical decision support systems could help selecting proper disease-modifying therapies; AI-based tools could leverage unstructured real-world data for research or virtual tutors may provide adaptive education materials for neurologists and people with MS in the foreseeable future. In this focused review, we explore practical applications of LLMs across the continuum of MS management as an initial scope for future analyses, reflecting on regulatory hurdles and the indispensable role of human supervision.
Assuntos
Inteligência Artificial , Esclerose Múltipla , Humanos , Esclerose Múltipla/terapia , Sistemas de Apoio a Decisões Clínicas , Gerenciamento ClínicoRESUMO
OBJECTIVES: Structured reporting enhances comparability, readability, and content detail. Large language models (LLMs) could convert free text into structured data without disrupting radiologists' reporting workflow. This study evaluated an on-premise, privacy-preserving LLM for automatically structuring free-text radiology reports. MATERIALS AND METHODS: We developed an approach to controlling the LLM output, ensuring the validity and completeness of structured reports produced by a locally hosted Llama-2-70B-chat model. A dataset with de-identified narrative chest radiograph (CXR) reports was compiled retrospectively. It included 202 English reports from a publicly available MIMIC-CXR dataset and 197 German reports from our university hospital. Senior radiologist prepared a detailed, fully structured reporting template with 48 question-answer pairs. All reports were independently structured by the LLM and two human readers. Bayesian inference (Markov chain Monte Carlo sampling) was used to estimate the distributions of Matthews correlation coefficient (MCC), with [-0.05, 0.05] as the region of practical equivalence (ROPE). RESULTS: The LLM generated valid structured reports in all cases, achieving an average MCC of 0.75 (94% HDI: 0.70-0.80) and F1 score of 0.70 (0.70-0.80) for English, and 0.66 (0.62-0.70) and 0.68 (0.64-0.72) for German reports, respectively. The MCC differences between LLM and humans were within ROPE for both languages: 0.01 (-0.05 to 0.07), 0.01 (-0.05 to 0.07) for English, and -0.01 (-0.07 to 0.05), 0.00 (-0.06 to 0.06) for German, indicating approximately comparable performance. CONCLUSION: Locally hosted, open-source LLMs can automatically structure free-text radiology reports with approximately human accuracy. However, the understanding of semantics varied across languages and imaging findings. KEY POINTS: Question Why has structured reporting not been widely adopted in radiology despite clear benefits and how can we improve this? Findings A locally hosted large language model successfully structured narrative reports, showing variation between languages and findings. Critical relevance Structured reporting provides many benefits, but its integration into the clinical routine is limited. Automating the extraction of structured information from radiology reports enables the capture of structured data while allowing the radiologist to maintain their reporting workflow.
RESUMO
Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 (n = 5) and/or GPT-4 (n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. KEY POINTS: Question How can LLMs help make SR in radiology more ubiquitous? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications. Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data.
RESUMO
BACKGROUND: Artificial intelligence (AI) is increasingly entering and transforming not only medical research but also clinical practice. In the last 10 years, new AI methods have enabled computers to perform visual tasks, reaching high performance and thereby potentially supporting and even outperforming human experts. This is in particular relevant for colorectal cancer (CRC), which is the 3rd most common cancer type in general, as along the CRC patient journey many complex visual tasks need to be performed: from endoscopy over imaging to histopathology; the screening, diagnosis, and treatment of CRC involve visual image analysis tasks. SUMMARY: In all these clinical areas, AI models have shown promising results by supporting physicians, improving accuracy, and providing new biological insights and biomarkers. By predicting prognostic and predictive biomarkers from routine images/slides, AI models could lead to an improved patient stratification for precision oncology approaches in the near future. Moreover, it is conceivable that AI models, in particular together with innovative techniques such as single-cell or spatial profiling, could help identify novel clinically as well as biologically meaningful biomarkers that could pave the way to new therapeutic approaches. KEY MESSAGES: Here, we give a comprehensive overview of AI in colorectal cancer, describing and discussing these developments as well as the next steps which need to be taken to incorporate AI methods more broadly into the clinical care of CRC.
Assuntos
Inteligência Artificial , Biomarcadores Tumorais , Neoplasias Colorretais , Detecção Precoce de Câncer , Humanos , Neoplasias Colorretais/diagnóstico , Biomarcadores Tumorais/análise , Detecção Precoce de Câncer/métodos , Medicina de Precisão/métodos , Prognóstico , Tomada de Decisão Clínica/métodosRESUMO
INTRODUCTION: The research field of artificial intelligence (AI) in medicine and especially in gastroenterology is rapidly progressing with the first AI tools entering routine clinical practice, for example, in colorectal cancer screening. Contrast-enhanced ultrasound (CEUS) is a highly reliable, low-risk, and low-cost diagnostic modality for the examination of the liver. However, doctors need many years of training and experience to master this technique and, despite all efforts to standardize CEUS, it is often believed to contain significant interrater variability. As has been shown for endoscopy, AI holds promise to support examiners at all training levels in their decision-making and efficiency. METHODS: In this systematic review, we analyzed and compared original research studies applying AI methods to CEUS examinations of the liver published between January 2010 and February 2024. We performed a structured literature search on PubMed, Web of Science, and IEEE. Two independent reviewers screened the articles and subsequently extracted relevant methodological features, e.g., cohort size, validation process, machine learning algorithm used, and indicative performance measures from the included articles. RESULTS: We included 41 studies with most applying AI methods for classification tasks related to focal liver lesions. These included distinguishing benign versus malignant or classifying the entity itself, while a few studies tried to classify tumor grading, microvascular invasion status, or response to transcatheter arterial chemoembolization directly from CEUS. Some articles tried to segment or detect focal liver lesions, while others aimed to predict survival and recurrence after ablation. The majority (25/41) of studies used hand-picked and/or annotated images as data input to their models. We observed mostly good to high reported model performances with accuracies ranging between 58.6% and 98.9%, while noticing a general lack of external validation. CONCLUSION: Even though multiple proof-of-concept studies for the application of AI methods to CEUS examinations of the liver exist and report high performance, more prospective, externally validated, and multicenter research is needed to bring such algorithms from desk to bedside.
RESUMO
Background Clinicians consider both imaging and nonimaging data when diagnosing diseases; however, current machine learning approaches primarily consider data from a single modality. Purpose To develop a neural network architecture capable of integrating multimodal patient data and compare its performance to models incorporating a single modality for diagnosing up to 25 pathologic conditions. Materials and Methods In this retrospective study, imaging and nonimaging patient data were extracted from the Medical Information Mart for Intensive Care (MIMIC) database and an internal database comprised of chest radiographs and clinical parameters inpatients in the intensive care unit (ICU) (January 2008 to December 2020). The MIMIC and internal data sets were each split into training (n = 33 893, n = 28 809), validation (n = 740, n = 7203), and test (n = 1909, n = 9004) sets. A novel transformer-based neural network architecture was trained to diagnose up to 25 conditions using nonimaging data alone, imaging data alone, or multimodal data. Diagnostic performance was assessed using area under the receiver operating characteristic curve (AUC) analysis. Results The MIMIC and internal data sets included 36 542 patients (mean age, 63 years ± 17 [SD]; 20 567 male patients) and 45 016 patients (mean age, 66 years ± 16; 27 577 male patients), respectively. The multimodal model showed improved diagnostic performance for all pathologic conditions. For the MIMIC data set, the mean AUC was 0.77 (95% CI: 0.77, 0.78) when both chest radiographs and clinical parameters were used, compared with 0.70 (95% CI: 0.69, 0.71; P < .001) for only chest radiographs and 0.72 (95% CI: 0.72, 0.73; P < .001) for only clinical parameters. These findings were confirmed on the internal data set. Conclusion A model trained on imaging and nonimaging data outperformed models trained on only one type of data for diagnosing multiple diseases in patients in an ICU setting. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Kitamura and Topol in this issue.
Assuntos
Aprendizado Profundo , Humanos , Masculino , Pessoa de Meia-Idade , Idoso , Estudos Retrospectivos , Radiografia , Bases de Dados Factuais , Pacientes InternadosRESUMO
Background Reducing the amount of contrast agent needed for contrast-enhanced breast MRI is desirable. Purpose To investigate if generative adversarial networks (GANs) can recover contrast-enhanced breast MRI scans from unenhanced images and virtual low-contrast-enhanced images. Materials and Methods In this retrospective study of breast MRI performed from January 2010 to December 2019, simulated low-contrast images were produced by adding virtual noise to the existing contrast-enhanced images. GANs were then trained to recover the contrast-enhanced images from the simulated low-contrast images (approach A) or from the unenhanced T1- and T2-weighted images (approach B). Two experienced radiologists were tasked with distinguishing between real and synthesized contrast-enhanced images using both approaches. Image appearance and conspicuity of enhancing lesions on the real versus synthesized contrast-enhanced images were independently compared and rated on a five-point Likert scale. P values were calculated by using bootstrapping. Results A total of 9751 breast MRI examinations from 5086 patients (mean age, 56 years ± 10 [SD]) were included. Readers who were blinded to the nature of the images could not distinguish real from synthetic contrast-enhanced images (average accuracy of differentiation: approach A, 52 of 100; approach B, 61 of 100). The test set included images with and without enhancing lesions (29 enhancing masses and 21 nonmass enhancement; 50 total). When readers who were not blinded compared the appearance of the real versus synthetic contrast-enhanced images side by side, approach A image ratings were significantly higher than those of approach B (mean rating, 4.6 ± 0.1 vs 3.0 ± 0.2; P < .001), with the noninferiority margin met by synthetic images from approach A (P < .001) but not B (P > .99). Conclusion Generative adversarial networks may be useful to enable breast MRI with reduced contrast agent dose. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Bahl in this issue.
Assuntos
Meios de Contraste , Imageamento por Ressonância Magnética , Humanos , Pessoa de Meia-Idade , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Mama , Aprendizado de MáquinaRESUMO
Background Deep learning (DL) models can potentially improve prognostication of rectal cancer but have not been systematically assessed. Purpose To develop and validate an MRI DL model for predicting survival in patients with rectal cancer based on segmented tumor volumes from pretreatment T2-weighted MRI scans. Materials and Methods DL models were trained and validated on retrospectively collected MRI scans of patients with rectal cancer diagnosed between August 2003 and April 2021 at two centers. Patients were excluded from the study if there were concurrent malignant neoplasms, prior anticancer treatment, incomplete course of neoadjuvant therapy, or no radical surgery performed. The Harrell C-index was used to determine the best model, which was applied to internal and external test sets. Patients were stratified into high- and low-risk groups based on a fixed cutoff calculated in the training set. A multimodal model was also assessed, which used DL model-computed risk score and pretreatment carcinoembryonic antigen level as input. Results The training set included 507 patients (median age, 56 years [IQR, 46-64 years]; 355 men). In the validation set (n = 218; median age, 55 years [IQR, 47-63 years]; 144 men), the best algorithm reached a C-index of 0.82 for overall survival. The best model reached hazard ratios of 3.0 (95% CI: 1.0, 9.0) in the high-risk group in the internal test set (n = 112; median age, 60 years [IQR, 52-70 years]; 76 men) and 2.3 (95% CI: 1.0, 5.4) in the external test set (n = 58; median age, 57 years [IQR, 50-67 years]; 38 men). The multimodal model further improved the performance, with a C-index of 0.86 and 0.67 for the validation and external test set, respectively. Conclusion A DL model based on preoperative MRI was able to predict survival of patients with rectal cancer. The model could be used as a preoperative risk stratification tool. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Langs in this issue.