Pesquisa | Portal Regional da BVS

1.

Automated Analysis of Split Kidney Function from CT Scans Using Deep Learning and Delta Radiomics.

Correa-Medero, Ramon Luis; Jeong, Jiwoong; Patel, Bhavik; Banerjee, Imon; Abdul-Muhsin, Haidar.

J Endourol ; 2024 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-38695176

RESUMO

Background: Differential kidney function assessment is an important part of preoperative evaluation of various urological interventions. It is obtained through dedicated nuclear medical imaging and is not yet implemented through conventional Imaging. Objective: We assess if differential kidney function can be obtained through evaluation of contrast-enhanced computed tomography(CT) using a combination of deep learning and (2D and 3D) radiomic features. Methods: All patients who underwent kidney nuclear scanning at Mayo Clinic sites between 2018-2022 were collected. CT scans of the kidneys were obtained within a 3-month interval before or after the nuclear scans were extracted. Patients who underwent a urological or radiological intervention within this time frame were excluded. A segmentation model was used to segment both kidneys. 2D and 3D radiomics features were extracted and compared between the two kidneys to compute delta radiomics and assess its ability to predict differential kidney function. Performance was reported using receiver operating characteristics, sensitivity, and specificity. Results: Studies from Arizona & Rochester formed our internal dataset (n = 1,159). Studies from Florida were separately processed as an external test set to validate generalizability. We obtained 323 studies from our internal sites and 39 studies from external sites. The best results were obtained by a random forest model trained on 3D delta radiomics features. This model achieved an area under curve (AUC) of 0.85 and 0.81 on internal and external test sets, while specificity and sensitivity were 0.84,0.68 on the internal set, 0.70, and 0.65 on the external set. Conclusion: This proposed automated pipeline can derive important differential kidney function information from contrast-enhanced CT and reduce the need for dedicated nuclear scans for early-stage differential kidney functional assessment. Clinical Impact: We establish a machine learning methodology for assessing differential kidney function from routine CT without the need for expensive and radioactive nuclear medicine scans.

2.

Use of deep learning to evaluate tumor microenvironmental features for prediction of colon cancer recurrence.

Sinicrope, Frank A; Nelson, Garth D; Saberzadeh-Ardestani, Bahar; Segovia, Diana I; Graham, Rondell P; Wu, Christina; Hagen, Catherine E; Shivji, Sameer; Savage, Paul; Buchanan, Dan D; Jenkins, Mark A; Phipps, Amanda I; Swallow, Carol; LeMarchand, Loic; Gallinger, Steven; Grant, Robert C; Pai, Reetesh K; Sinicrope, Stephen N; Yan, Dongyao; Shanmugam, Kandavel; Conner, James; Cyr, David P; Kirsch, Richard; Banerjee, Imon; Alberts, Steve R; Shi, Qian; Pai, Rish K.

Cancer Res Commun ; 2024 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-38709069

RESUMO

Deep learning may detect biologically important signals embedded in tumor morphologic features that confer distinct prognoses. Tumor morphological features were quantified to enhance patient risk stratification within DNA mismatch repair (MMR) groups using deep learning. Using a quantitative segmentation algorithm (QuantCRC) that identifies 15 distinct morphological features, we analyzed 402 resected stage III colon carcinomas (191 d-MMR; 189 p-MMR) from participants in a phase III trial of FOLFOX-based adjuvant chemotherapy. Results were validated in an independent cohort (176 d-MMR; 1094 p-MMR). Association of morphological features with clinicopathologic variables, MMR, KRAS, BRAFV600E, and time-to-recurrence (TTR) was determined. Multivariable Cox proportional hazard models were developed to predict TTR. Tumor morphological features differed significantly by MMR status. Cancers with p-MMR had more immature desmoplastic stroma. Tumors with d-MMR had increased inflammatory stroma, epithelial tumor-infiltrating lymphocytes (TILs), high grade histology, mucin, and signet ring cells. Stromal subtype did not differ by BRAFV600E or KRAS status. In p-MMR tumors, multivariable analysis identified tumor-stroma ratio (TSR) as the strongest feature associated with TTR [HRadj 2.02; 95% CI,1.14-3.57; P=0.018; 3-year recurrence: 40.2% vs 20.4%; Q1 vs Q2-4]. Among d-MMR tumors, extent of inflammatory stroma [continuous HRadj 0.98; 95% CI,0.96-0.99; P=0.028; 3-year recurrence: 13.3% vs 33.4%, Q4 vs Q1] and N stage were the most robust prognostically. Association of TSR with TTR was independently validated. In conclusion, QuantCRC can quantify morphological differences within MMR groups in routine tumor sections to determine their relative contributions to patient prognosis, and may elucidate relevant pathophysiologic mechanisms driving prognosis.

3.

Editorial Commentary: Generative Pre-trained Transformer 4 (GPT4) makes cardiovascular magnetic resonance reports easy to understand.

Banerjee, Imon; Tariq, Amara; Chao, Chieh-Ju.

J Cardiovasc Magn Reson ; 26(1): 101043, 2024 Apr 06.

Artigo em Inglês | MEDLINE | ID: mdl-38588948

4.

Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer.

Das, Avisha; Tariq, Amara; Batalini, Felipe; Dhara, Boddhisattwa; Banerjee, Imon.

medRxiv ; 2024 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-38562849

RESUMO

Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the standard practice currently. Despite their transformative impact on natural language processing, public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain.

5.

A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.

Chiang, Chia-Chun; Luo, Man; Dumkrieger, Gina; Trivedi, Shubham; Chen, Yi-Chieh; Chao, Chieh-Ju; Schwedt, Todd J; Sarker, Abeed; Banerjee, Imon.

Headache ; 64(4): 400-409, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38525734

RESUMO

OBJECTIVE: To develop a natural language processing (NLP) algorithm that can accurately extract headache frequency from free-text clinical notes. BACKGROUND: Headache frequency, defined as the number of days with any headache in a month (or 4 weeks), remains a key parameter in the evaluation of treatment response to migraine preventive medications. However, due to the variations and inconsistencies in documentation by clinicians, significant challenges exist to accurately extract headache frequency from the electronic health record (EHR) by traditional NLP algorithms. METHODS: This was a retrospective cross-sectional study with patients identified from two tertiary headache referral centers, Mayo Clinic Arizona and Mayo Clinic Rochester. All neurology consultation notes written by 15 specialized clinicians (11 headache specialists and 4 nurse practitioners) between 2012 and 2022 were extracted and 1915 notes were used for model fine-tuning (90%) and testing (10%). We employed four different NLP frameworks: (1) ClinicalBERT (Bidirectional Encoder Representations from Transformers) regression model, (2) Generative Pre-Trained Transformer-2 (GPT-2) Question Answering (QA) model zero-shot, (3) GPT-2 QA model few-shot training fine-tuned on clinical notes, and (4) GPT-2 generative model few-shot training fine-tuned on clinical notes to generate the answer by considering the context of included text. RESULTS: The mean (standard deviation) headache frequency of our training and testing datasets were 13.4 (10.9) and 14.4 (11.2), respectively. The GPT-2 generative model was the best-performing model with an accuracy of 0.92 (0.91, 0.93, 95% confidence interval [CI]) and R2 score of 0.89 (0.87, 0.90, 95% CI), and all GPT-2-based models outperformed the ClinicalBERT model in terms of exact matching accuracy. Although the ClinicalBERT regression model had the lowest accuracy of 0.27 (0.26, 0.28), it demonstrated a high R2 score of 0.88 (0.85, 0.89), suggesting the ClinicalBERT model can reasonably predict the headache frequency within a range of ≤ ± 3 days, and the R2 score was higher than the GPT-2 QA zero-shot model or GPT-2 QA model few-shot training fine-tuned model. CONCLUSION: We developed a robust information extraction model based on a state-of-the-art large language model, a GPT-2 generative model that can extract headache frequency from EHR free-text clinical notes with high accuracy and R2 score. It overcame several challenges related to different ways clinicians document headache frequency that were not easily achieved by traditional NLP models. We also showed that GPT-2-based frameworks outperformed ClinicalBERT in terms of accuracy in extracting headache frequency from clinical notes. To facilitate research in the field, we released the GPT-2 generative model and inference code with open-source license of community use in GitHub. Additional fine-tuning of the algorithm might be required when applied to different health-care systems for various clinical use cases.

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Estudos Retrospectivos , Estudos Transversais , Masculino , Feminino , Cefaleia , Adulto , Pessoa de Meia-Idade , Algoritmos

6.

Accurate, Robust, and Scalable Machine Abstraction of Mayo Endoscopic Subscores From Colonoscopy Reports.

Silverman, Anna L; Bhasuran, Balu; Mosenia, Arman; Yasini, Fatema; Ramasamy, Gokul; Banerjee, Imon; Gupta, Saransh; Mardirossian, Taline; Narain, Rohan; Sewell, Justin; Butte, Atul J; Rudrapatna, Vivek A.

Inflamm Bowel Dis ; 2024 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-38533919

RESUMO

BACKGROUND: The Mayo endoscopic subscore (MES) is an important quantitative measure of disease activity in ulcerative colitis. Colonoscopy reports in routine clinical care usually characterize ulcerative colitis disease activity using free text description, limiting their utility for clinical research and quality improvement. We sought to develop algorithms to classify colonoscopy reports according to their MES. METHODS: We annotated 500 colonoscopy reports from 2 health systems. We trained and evaluated 4 classes of algorithms. Our primary outcome was accuracy in identifying scorable reports (binary) and assigning an MES (ordinal). Secondary outcomes included learning efficiency, generalizability, and fairness. RESULTS: Automated machine learning models achieved 98% and 97% accuracy on the binary and ordinal prediction tasks, outperforming other models. Binary models trained on the University of California, San Francisco data alone maintained accuracy (96%) on validation data from Zuckerberg San Francisco General. When using 80% of the training data, models remained accurate for the binary task (97% [n = 320]) but lost accuracy on the ordinal task (67% [n = 194]). We found no evidence of bias by gender (Pâ=â.65) or area deprivation index (Pâ=â.80). CONCLUSIONS: We derived a highly accurate pair of models capable of classifying reports by their MES and recognizing when to abstain from prediction. Our models were generalizable on outside institution validation. There was no evidence of algorithmic bias. Our methods have the potential to enable retrospective studies of treatment effectiveness, prospective identification of patients meeting study criteria, and quality improvement efforts in inflammatory bowel diseases.

Our accurate pair of models automatically classify colonoscopy reports by Mayo endoscopic subscore and abstain from prediction appropriately. Our methods can enable large-scale electronic health record studies of treatment effectiveness, prospective identification of patients for clinical trials, and quality improvement efforts in ulcerative colitis.

7.

Improved Risk-Stratification Scheme for Mismatch-Repair Proficient Stage II Colorectal Cancers Using the Digital Pathology Biomarker QuantCRC.

Wu, Christina; Pai, Reetesh K; Kosiorek, Heidi; Banerjee, Imon; Pfeiffer, Ashlyn; Hagen, Catherine E; Hartley, Christopher P; Graham, Rondell P; Sonbol, Mohamad B; Bekaii-Saab, Tanios; Xie, Hao; Sinicrope, Frank A; Patel, Bhavik; Westerling-Bui, Thomas; Shivji, Sameer; Conner, James; Swallow, Carol; Savage, Paul; Cyr, David P; Kirsch, Richard; Pai, Rish K.

Clin Cancer Res ; 30(9): 1811-1821, 2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38421684

RESUMO

PURPOSE: There is a need to improve current risk stratification of stage II colorectal cancer to better inform risk of recurrence and guide adjuvant chemotherapy. We sought to examine whether integration of QuantCRC, a digital pathology biomarker utilizing hematoxylin and eosin-stained slides, provides improved risk stratification over current American Society of Clinical Oncology (ASCO) guidelines. EXPERIMENTAL DESIGN: ASCO and QuantCRC-integrated schemes were applied to a cohort of 398 mismatch-repair proficient (MMRP) stage II colorectal cancers from three large academic medical centers. The ASCO stage II scheme was taken from recent guidelines. The QuantCRC-integrated scheme utilized pT3 versus pT4 and a QuantCRC-derived risk classification. Evaluation of recurrence-free survival (RFS) according to these risk schemes was compared using the log-rank test and HR. RESULTS: Integration of QuantCRC provides improved risk stratification compared with the ASCO scheme for stage II MMRP colorectal cancers. The QuantCRC-integrated scheme placed more stage II tumors in the low-risk group compared with the ASCO scheme (62.5% vs. 42.2%) without compromising excellent 3-year RFS. The QuantCRC-integrated scheme provided larger HR for both intermediate-risk (2.27; 95% CI, 1.32-3.91; P = 0.003) and high-risk (3.27; 95% CI, 1.42-7.55; P = 0.006) groups compared with ASCO intermediate-risk (1.58; 95% CI, 0.87-2.87; P = 0.1) and high-risk (2.24; 95% CI, 1.09-4.62; P = 0.03) groups. The QuantCRC-integrated risk groups remained prognostic in the subgroup of patients that did not receive any adjuvant chemotherapy. CONCLUSIONS: Incorporation of QuantCRC into risk stratification provides a powerful predictor of RFS that has potential to guide subsequent treatment and surveillance for stage II MMRP colorectal cancers.

Assuntos

Biomarcadores Tumorais , Neoplasias Colorretais , Reparo de Erro de Pareamento de DNA , Estadiamento de Neoplasias , Humanos , Neoplasias Colorretais/patologia , Neoplasias Colorretais/diagnóstico , Feminino , Masculino , Pessoa de Meia-Idade , Medição de Risco/métodos , Idoso , Prognóstico , Recidiva Local de Neoplasia/patologia , Adulto

8.

AI Education for Fourth-Year Medical Students: Two-Year Experience of a Web-Based, Self-Guided Curriculum and Mixed Methods Study.

Abid, Areeba; Murugan, Avinash; Banerjee, Imon; Purkayastha, Saptarshi; Trivedi, Hari; Gichoya, Judy.

JMIR Med Educ ; 10: e46500, 2024 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-38376896

RESUMO

BACKGROUND: Artificial intelligence (AI) and machine learning (ML) are poised to have a substantial impact in the health care space. While a plethora of web-based resources exist to teach programming skills and ML model development, there are few introductory curricula specifically tailored to medical students without a background in data science or programming. Programs that do exist are often restricted to a specific specialty. OBJECTIVE: We hypothesized that a 1-month elective for fourth-year medical students, composed of high-quality existing web-based resources and a project-based structure, would empower students to learn about the impact of AI and ML in their chosen specialty and begin contributing to innovation in their field of interest. This study aims to evaluate the success of this elective in improving self-reported confidence scores in AI and ML. The authors also share our curriculum with other educators who may be interested in its adoption. METHODS: This elective was offered in 2 tracks: technical (for students who were already competent programmers) and nontechnical (with no technical prerequisites, focusing on building a conceptual understanding of AI and ML). Students established a conceptual foundation of knowledge using curated web-based resources and relevant research papers, and were then tasked with completing 3 projects in their chosen specialty: a data set analysis, a literature review, and an AI project proposal. The project-based nature of the elective was designed to be self-guided and flexible to each student's interest area and career goals. Students' success was measured by self-reported confidence in AI and ML skills in pre and postsurveys. Qualitative feedback on students' experiences was also collected. RESULTS: This web-based, self-directed elective was offered on a pass-or-fail basis each month to fourth-year students at Emory University School of Medicine beginning in May 2021. As of June 2022, a total of 19 students had successfully completed the elective, representing a wide range of chosen specialties: diagnostic radiology (n=3), general surgery (n=1), internal medicine (n=5), neurology (n=2), obstetrics and gynecology (n=1), ophthalmology (n=1), orthopedic surgery (n=1), otolaryngology (n=2), pathology (n=2), and pediatrics (n=1). Students' self-reported confidence scores for AI and ML rose by 66% after this 1-month elective. In qualitative surveys, students overwhelmingly reported enthusiasm and satisfaction with the course and commented that the self-direction and flexibility and the project-based design of the course were essential. CONCLUSIONS: Course participants were successful in diving deep into applications of AI in their widely-ranging specialties, produced substantial project deliverables, and generally reported satisfaction with their elective experience. The authors are hopeful that a brief, 1-month investment in AI and ML education during medical school will empower this next generation of physicians to pave the way for AI and ML innovation in health care.

Assuntos

Inteligência Artificial , Educação Médica , Humanos , Currículo , Internet , Estudantes de Medicina

9.

Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.

Hasan, Bashar; Saadi, Samer; Rajjoub, Noora S; Hegazi, Moustafa; Al-Kordi, Mohammad; Fleti, Farah; Farah, Magdoleen; Riaz, Irbaz B; Banerjee, Imon; Wang, Zhen; Murad, Mohammad Hassan.

BMJ Evid Based Med ; 2024 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-38383136

RESUMO

Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of 'Classification of Intervention'. Kendall agreement coefficient was highest for the domains of 'Participant Selection', 'Missing Data' and 'Measurement of Outcomes', suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.

10.

Opportunistic screening for coronary artery calcium deposition using chest radiographs - a multi-objective models with multi-modal data fusion.

Jeong, Jiwoong; Chao, Chieh-Ju; Arsanjani, Reza; Ayoub, Chadi; Lester, Steven J; Pereyra, Milagros; Said, Ebram F; Roarke, Michael; Tagle-Cornell, Cecilia; Koepke, Laura M; Tsai, Yi-Lin; Jung-Hsuan, Chen; Chang, Chun-Chin; Farina, Juan M; Trivedi, Hari; Patel, Bhavik N; Banerjee, Imon.

medRxiv ; 2024 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-38260571

RESUMO

Background: To create an opportunistic screening strategy by multitask deep learning methods to stratify prediction for coronary artery calcium (CAC) and associated cardiovascular risk with frontal chest x-rays (CXR) and minimal data from electronic health records (EHR). Methods: In this retrospective study, 2,121 patients with available computed tomography (CT) scans and corresponding CXR images were collected internally (Mayo Enterprise) with calculated CAC scores binned into 3 categories (0, 1-99, and 100+) as ground truths for model training. Results from the internal training were tested on multiple external datasets (domestic (EUH) and foreign (VGHTPE)) with significant racial and ethnic differences and classification performance was compared. Findings: Classification performance between 0, 1-99, and 100+ CAC scores performed moderately on both the internal test and external datasets, reaching average f1-score of 0.66 for Mayo, 0.62 for EUH and 0.61 for VGHTPE. For the clinically relevant binary task of 0 vs 400+ CAC classification, the performance of our model on the internal test and external datasets reached an average AUCROC of 0.84. Interpretation: The fusion model trained on CXR performed better (0.84 average AUROC on internal and external dataset) than existing state-of-the-art models on predicting CAC scores only on internal (0.73 AUROC), with robust performance on external datasets. Thus, our proposed model may be used as a robust, first-pass opportunistic screening method for cardiovascular risk from regular chest radiographs. For community use, trained model and the inference code can be downloaded with an academic open-source license from https://github.com/jeong-jasonji/MTL_CAC_classification . Funding: The study was partially supported by National Institute of Health 1R01HL155410-01A1 award.

11.

Liver fibrosis classification from ultrasound using machine learning: a systematic literature review.

Punn, Narinder Singh; Patel, Bhavik; Banerjee, Imon.

Abdom Radiol (NY) ; 49(1): 69-80, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37950068

RESUMO

PURPOSE: Liver biopsy was considered the gold standard for diagnosing liver fibrosis; however, with advancements in medical technology and increasing awareness of potential complications, the reliance on liver biopsy has diminished. Ultrasound is gaining popularity due to its wider availability and cost-effectiveness. This study examined the machine learning / deep learning (ML/DL) models for non-invasive liver fibrosis classification from ultrasound. METHODS: Following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) protocol, we searched five academic databases using the query. We defined population, intervention, comparison, outcomes, and study design (PICOS) framework for the inclusion. Furthermore, Joana Briggs Institute (JBI) checklist for analytical cross-sectional studies is used for quality assessment. RESULTS: Among the 188 screened studies, 17 studies are selected. The methods are categorized as off-the-shelf (OTS), attention, generative, and ensemble classifiers. Most studies used OTS classifiers that combined pre-trained ML/DL methods with radiomics features to determine fibrosis staging. Although machine learning shows potential for fibrosis classification, there are limited external comparisons of interventions and prospective clinical trials, which limits their applicability. CONCLUSION: With the recent success of ML/DL toward biomedical image analysis, automated solutions using ultrasound are developed for predicting liver diseases. However, their applicability is bounded by the limited and imbalanced retrospective studies having high heterogeneity. This challenge could be addressed by generating a standard protocol for study design by selecting appropriate population, interventions, outcomes, and comparison.

Assuntos

Cirrose Hepática , Aprendizado de Máquina , Humanos , Estudos Prospectivos , Estudos Retrospectivos , Estudos Transversais , Cirrose Hepática/diagnóstico por imagem , Cirrose Hepática/patologia

12.

Echocardiography-Based Deep Learning Model to Differentiate Constrictive Pericarditis and Restrictive Cardiomyopathy.

Chao, Chieh-Ju; Jeong, Jiwoong; Arsanjani, Reza; Kim, Kihong; Tsai, Yi-Lin; Yu, Wen-Chung; Farina, Juan M; Mahmoud, Ahmed K; Ayoub, Chadi; Grogan, Martha; Kane, Garvan C; Banerjee, Imon; Oh, Jae K.

JACC Cardiovasc Imaging ; 17(4): 349-360, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37943236

RESUMO

BACKGROUND: Constrictive pericarditis (CP) is an uncommon but reversible cause of diastolic heart failure if appropriately identified and treated. However, its diagnosis remains a challenge for clinicians. Artificial intelligence may enhance the identification of CP. OBJECTIVES: The authors proposed a deep learning approach based on transthoracic echocardiography to differentiate CP from restrictive cardiomyopathy. METHODS: Patients with a confirmed diagnosis of CP and cardiac amyloidosis (CA) (as the representative disease of restrictive cardiomyopathy) at Mayo Clinic Rochester from January 2003 to December 2021 were identified to extract baseline demographics. The apical 4-chamber view from transthoracic echocardiography studies was used as input data. The patients were split into a 60:20:20 ratio for training, validation, and held-out test sets of the ResNet50 deep learning model. The model performance (differentiating CP and CA) was evaluated in the test set with the area under the curve. GradCAM was used for model interpretation. RESULTS: A total of 381 patients were identified, including 184 (48.3%) CP, and 197 (51.7%) CA cases. The mean age was 68.7 ± 11.4 years, and 72.8% were male. ResNet50 had a performance with an area under the curve of 0.97 to differentiate the 2-class classification task (CP vs CA). The GradCAM heatmap showed activation around the ventricular septal area. CONCLUSIONS: With a standard apical 4-chamber view, our artificial intelligence model provides a platform to facilitate the detection of CP, allowing for improved workflow efficiency and prompt referral for more advanced evaluation and intervention of CP.

Assuntos

Cardiomiopatia Restritiva , Aprendizado Profundo , Pericardite Constritiva , Humanos , Masculino , Pessoa de Meia-Idade , Idoso , Idoso de 80 Anos ou mais , Feminino , Cardiomiopatia Restritiva/diagnóstico por imagem , Pericardite Constritiva/diagnóstico por imagem , Inteligência Artificial , Valor Preditivo dos Testes , Ecocardiografia , Diagnóstico Diferencial

13.

Effects of Patient Demographics and Examination Factors on Patient Experience in Outpatient MRI Appointments.

Parikh, Parth; Klanderman, Molly; Teck, Alyssa; Kunzelman, Jackie; Banerjee, Imon; DeYoung, Dyan; Hara, Amy; Tan, Nelly; Yano, Motoyo.

J Am Coll Radiol ; 21(4): 601-608, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37247830

RESUMO

OBJECTIVE: The objective of this article is to describe the effects of patient demographics and examination factors on patient-reported experience in outpatient MRI examinations. METHODS: This institutional review board-waived, HIPPA-compliant quality improvement study evaluated outpatient MRI appointments from March 2021 to January 2022 using a postappointment survey consisting of a 5-point emoji scale and text-based feedback. Patient demographics and examination information were extracted from electronic medical records. Ratings ≤ 3 were categorized as negative, and ratings ≥ 4 were categorized as positive. Continuous variables were analyzed using the Kruskal-Wallis test, and categorical variables were analyzed using the Fisher's exact test. A P value less than .05 was considered significant. A natural language processing algorithm was trained and validated to categorize patient feedback. RESULTS: A total of 3,636 patients responded to the survey. Positive ratings had a higher proportion of male respondents compared with negative ratings (47.9% versus 37.0%, P = .004). Examination characteristics were also grouped by positive or negative rating. Patients who endured longer examination time (median 54.0 min versus 44.0 min, P < .001) and longer wait time after check-in (median 61.6 min versus 46.2 min, P < .001) were more likely to give negative ratings. The most common themes of free text feedback included excellent service (84.3%), on-time service (8.4%), and comfortable intravenous line placement (0.4%). Most common negative feedback included long wait times (10.5%), poor communication (8.4%), and physical discomfort during the examination (4.2%). CONCLUSION: Male gender, short examination duration, and on-time start were associated with positive patient ratings.

Assuntos

Pacientes Ambulatoriais , Satisfação do Paciente , Humanos , Masculino , Imageamento por Ressonância Magnética , Avaliação de Resultados da Assistência ao Paciente , Demografia

14.

Efficient adversarial debiasing with concept activation vector - Medical image case-studies.

Correa, Ramon; Pahwa, Khushbu; Patel, Bhavik; Vachon, Celine M; Gichoya, Judy W; Banerjee, Imon.

J Biomed Inform ; 149: 104548, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38043883

RESUMO

BACKGROUND: A major hurdle for the real time deployment of the AI models is ensuring trustworthiness of these models for the unseen population. More often than not, these complex models are black boxes in which promising results are generated. However, when scrutinized, these models begin to reveal implicit biases during the decision making, particularly for the minority subgroups. METHOD: We develop an efficient adversarial de-biasing approach with partial learning by incorporating the existing concept activation vectors (CAV) methodology, to reduce racial disparities while preserving the performance of the targeted task. CAV is originally a model interpretability technique which we adopted to identify convolution layers responsible for learning race and only fine-tune up to that layer instead of fine-tuning the complete network, limiting the drop in performance RESULTS:: The methodology has been evaluated on two independent medical image case-studies - chest X-ray and mammograms, and we also performed external validation on a different racial population. On the external datasets for the chest X-ray use-case, debiased models (averaged AUC 0.87 ) outperformed the baseline convolution models (averaged AUC 0.57 ) as well as the models trained with the popular fine-tuning strategy (averaged AUC 0.81). Moreover, the mammogram models is debiased using a single dataset (white, black and Asian) and improved the performance on an external datasets (averaged AUC 0.8 to 0.86 ) with completely different population (primarily Hispanic patients). CONCLUSION: In this study, we demonstrated that the adversarial models trained only with internal data performed equally or often outperformed the standard fine-tuning strategy with data from an external setting. The adversarial training approach described can be applied regardless of predictor's model architecture, as long as the convolution model is trained using a gradient-based method. We release the training code with academic open-source license - https://github.com/ramon349/JBI2023_TCAV_debiasing.

Assuntos

Inteligência Artificial , Tomada de Decisão Clínica , Diagnóstico por Imagem , Grupos Raciais , Humanos , Mamografia , Grupos Minoritários , Viés , Disparidades em Assistência à Saúde

15.

Fusion Modeling: Combining Clinical and Imaging Data to Advance Cardiac Care.

van Assen, Marly; Tariq, Amara; Razavi, Alexander C; Yang, Carl; Banerjee, Imon; De Cecco, Carlo N.

Circ Cardiovasc Imaging ; 16(12): e014533, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-38073535

RESUMO

In addition to the traditional clinical risk factors, an increasing amount of imaging biomarkers have shown value for cardiovascular risk prediction. Clinical and imaging data are captured from a variety of data sources during multiple patient encounters and are often analyzed independently. Initial studies showed that fusion of both clinical and imaging features results in superior prognostic performance compared with traditional scores. There are different approaches to fusion modeling, combining multiple data resources to optimize predictions, each with its own advantages and disadvantages. However, manual extraction of clinical and imaging data is time and labor intensive and often not feasible in clinical practice. An automated approach for clinical and imaging data extraction is highly desirable. Convolutional neural networks and natural language processing can be utilized for the extraction of electronic medical record data, imaging studies, and free-text data. This review outlines the current status of cardiovascular risk prediction and fusion modeling; and in addition gives an overview of different artificial intelligence approaches to automatically extract data from images and electronic medical records for this purpose.

Assuntos

Inteligência Artificial , Redes Neurais de Computação , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Diagnóstico por Imagem

16.

Sharing Patient Praises With Radiology Staff: Workflow Automation and Impact on Staff.

Deahl, Zoe; Banerjee, Imon; Nadella, Meghana; Patel, Anika; Dodoo, Christopher; Jaramillo, Iridian; Varner, Jacob; Nguyen, Evie; Tan, Nelly.

J Am Coll Radiol ; 2023 Dec 28.

Artigo em Inglês | MEDLINE | ID: mdl-38159832

RESUMO

OBJECTIVE: This study aims to develop and evaluate a semi-automated workflow using natural language processing (NLP) for sharing positive patient feedback with radiology staff, assessing its efficiency and impact on radiology staff morale. METHODS: The HIPAA-compliant, institutional review board-waived implementation study was conducted from April 2022 to June 2023 and introduced a Patient Praises program to distribute positive patient feedback to radiology staff collected from patient surveys. The study transitioned from an initial manual workflow to a hybrid process using an NLP model trained on 1,034 annotated comments and validated on 260 holdout reports. The times to generate Patient Praises e-mails were compared between manual and hybrid workflows. Impact of Patient Praises on radiology staff was measured using a four-question Likert scale survey and an open text feedback box. Kruskal-Wallis test and post hoc Dunn's test were performed to evaluate differences in time for different workflows. RESULTS: From April 2022 to June 2023, the radiology department received 10,643 patient surveys. Of those surveys, 95.6% contained positive comments, with 9.6% (n = 978) shared as Patient Praises to staff. After implementation of the hybrid workflow in March 2023, 45.8% of Patient Praises were sent through the hybrid workflow and 54.2% were sent manually. Time efficiency analysis on 30-case subsets revealed that the hybrid workflow without edits was the most efficient, taking a median of 0.7 min per case. A high proportion of staff found the praises made them feel appreciated (94%) and valued (90%) responding with a 5/5 agreement on 5-point Likert scale responses. CONCLUSION: A hybrid workflow incorporating NLP significantly improves time efficiency for the Patient Praises program while increasing feelings of acknowledgment and value among staff.

17.

Artificial Intelligence-Based Prediction of Cardiovascular Diseases from Chest Radiography.

Farina, Juan M; Pereyra, Milagros; Mahmoud, Ahmed K; Scalia, Isabel G; Abbas, Mohammed Tiseer; Chao, Chieh-Ju; Barry, Timothy; Ayoub, Chadi; Banerjee, Imon; Arsanjani, Reza.

J Imaging ; 9(11)2023 Oct 26.

Artigo em Inglês | MEDLINE | ID: mdl-37998083

RESUMO

Chest radiography (CXR) is the most frequently performed radiological test worldwide because of its wide availability, non-invasive nature, and low cost. The ability of CXR to diagnose cardiovascular diseases, give insight into cardiac function, and predict cardiovascular events is often underutilized, not clearly understood, and affected by inter- and intra-observer variability. Therefore, more sophisticated tests are generally needed to assess cardiovascular diseases. Considering the sustained increase in the incidence of cardiovascular diseases, it is critical to find accessible, fast, and reproducible tests to help diagnose these frequent conditions. The expanded focus on the application of artificial intelligence (AI) with respect to diagnostic cardiovascular imaging has also been applied to CXR, with several publications suggesting that AI models can be trained to detect cardiovascular conditions by identifying features in the CXR. Multiple models have been developed to predict mortality, cardiovascular morphology and function, coronary artery disease, valvular heart diseases, aortic diseases, arrhythmias, pulmonary hypertension, and heart failure. The available evidence demonstrates that the use of AI-based tools applied to CXR for the diagnosis of cardiovascular conditions and prognostication has the potential to transform clinical care. AI-analyzed CXRs could be utilized in the future as a complimentary, easy-to-apply technology to improve diagnosis and risk stratification for cardiovascular diseases. Such advances will likely help better target more advanced investigations, which may reduce the burden of testing in some cases, as well as better identify higher-risk patients who would benefit from earlier, dedicated, and comprehensive cardiovascular evaluation.

18.

Opportunistic assessment of ischemic heart disease risk using abdominopelvic computed tomography and medical record data: a multimodal explainable artificial intelligence approach.

Zambrano Chaves, Juan M; Wentland, Andrew L; Desai, Arjun D; Banerjee, Imon; Kaur, Gurkiran; Correa, Ramon; Boutin, Robert D; Maron, David J; Rodriguez, Fatima; Sandhu, Alexander T; Rubin, Daniel; Chaudhari, Akshay S; Patel, Bhavik N.

Sci Rep ; 13(1): 21034, 2023 11 29.

Artigo em Inglês | MEDLINE | ID: mdl-38030716

RESUMO

Current risk scores using clinical risk factors for predicting ischemic heart disease (IHD) events-the leading cause of global mortality-have known limitations and may be improved by imaging biomarkers. While body composition (BC) imaging biomarkers derived from abdominopelvic computed tomography (CT) correlate with IHD risk, they are impractical to measure manually. Here, in a retrospective cohort of 8139 contrast-enhanced abdominopelvic CT examinations undergoing up to 5 years of follow-up, we developed multimodal opportunistic risk assessment models for IHD by automatically extracting BC features from abdominal CT images and integrating these with features from each patient's electronic medical record (EMR). Our predictive methods match and, in some cases, outperform clinical risk scores currently used in IHD risk assessment. We provide clinical interpretability of our model using a new method of determining tissue-level contributions from CT along with weightings of EMR features contributing to IHD risk. We conclude that such a multimodal approach, which automatically integrates BC biomarkers and EMR data, can enhance IHD risk assessment and aid primary prevention efforts for IHD. To further promote research, we release the Opportunistic L3 Ischemic heart disease (OL3I) dataset, the first public multimodal dataset for opportunistic CT prediction of IHD.

Assuntos

Inteligência Artificial , Isquemia Miocárdica , Humanos , Estudos Retrospectivos , Isquemia Miocárdica/diagnóstico por imagem , Isquemia Miocárdica/etiologia , Tomografia Computadorizada por Raios X/efeitos adversos , Fatores de Risco , Medição de Risco , Biomarcadores , Prontuários Médicos

19.

Challenges and solutions of echocardiography generalization for deep learning: a study in patients with constrictive pericarditis.

Jeong, Jiwoong; Chao, Chieh-Ju; Arsanjani, Reza; Kim, Kihong; Pelkey, Melissa N; Chen, Yi-Chieh; Ramzan, Raheel N; Elbahnasawy, Mohammad; Sleem, Mohamed; Ayoub, Chadi; Farina, Juan Maria M; Grogan, Martha; Kane, Garvan C; Patel, Bhavik N; Oh, Jae K; Banerjee, Imon.

J Med Imaging (Bellingham) ; 10(5): 054502, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37840850

RESUMO

Purpose: The inherent characteristics of transthoracic echocardiography (TTE) images such as low signal-to-noise ratio and acquisition variations can limit the direct use of TTE images in the development and generalization of deep learning models. As such, we propose an innovative automated framework to address the common challenges in the process of echocardiography deep learning model generalization on the challenging task of constrictive pericarditis (CP) and cardiac amyloidosis (CA) differentiation. Approach: Patients with a confirmed diagnosis of CP or CA and normal cases from Mayo Clinic Rochester and Arizona were identified to extract baseline demographics and the apical 4 chamber view from TTE studies. We proposed an innovative preprocessing and image generalization framework to process the images for training the ResNet50, ResNeXt101, and EfficientNetB2 models. Ablation studies were conducted to justify the effect of each proposed processing step in the final classification performance. Results: The models were initially trained and validated on 720 unique TTE studies from Mayo Rochester and further validated on 225 studies from Mayo Arizona. With our proposed generalization framework, EfficientNetB2 generalized the best with an average area under the curve (AUC) of 0.96 (±0.01) and 0.83 (±0.03) on the Rochester and Arizona test sets, respectively. Conclusions: Leveraging the proposed generalization techniques, we successfully developed an echocardiography-based deep learning model that can accurately differentiate CP from CA and normal cases and applied the model to images from two sites. The proposed framework can be further extended for the development of echocardiography-based deep learning models.

20.

A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency from Electronic Health Records.

Chiang, Chia-Chun; Luo, Man; Dumkrieger, Gina; Trivedi, Shubham; Chen, Yi-Chieh; Chao, Chieh-Ju; Schwedt, Todd J; Sarker, Abeed; Banerjee, Imon.

medRxiv ; 2023 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-37873417

RESUMO

Background: Headache frequency, defined as the number of days with any headache in a month (or four weeks), remains a key parameter in the evaluation of treatment response to migraine preventive medications. However, due to the variations and inconsistencies in documentation by clinicians, significant challenges exist to accurately extract headache frequency from the electronic health record (EHR) by traditional natural language processing (NLP) algorithms. Methods: This was a retrospective cross-sectional study with human subjects identified from three tertiary headache referral centers- Mayo Clinic Arizona, Florida, and Rochester. All neurology consultation notes written by more than 10 headache specialists between 2012 to 2022 were extracted and 1915 notes were used for model fine-tuning (90%) and testing (10%). We employed four different NLP frameworks: (1) ClinicalBERT (Bidirectional Encoder Representations from Transformers) regression model (2) Generative Pre-Trained Transformer-2 (GPT-2) Question Answering (QA) Model zero-shot (3) GPT-2 QA model few-shot training fine-tuned on Mayo Clinic notes; and (4) GPT-2 generative model few-shot training fine-tuned on Mayo Clinic notes to generate the answer by considering the context of included text. Results: The GPT-2 generative model was the best-performing model with an accuracy of 0.92[0.91 - 0.93] and R2 score of 0.89[0.87, 0.9], and all GPT2-based models outperformed the ClinicalBERT model in terms of the exact matching accuracy. Although the ClinicalBERT regression model had the lowest accuracy 0.27[0.26 - 0.28], it demonstrated a high R2 score 0.88[0.85, 0.89], suggesting the ClinicalBERT model can reasonably predict the headache frequency within a range of ≤ ± 3 days, and the R2 score was higher than the GPT-2 QA zero-shot model or GPT-2 QA model few-shot training fine-tuned model. Conclusion: We developed a robust model based on a state-of-the-art large language model (LLM)- a GPT-2 generative model that can extract headache frequency from EHR free-text clinical notes with high accuracy and R2 score. It overcame several challenges related to different ways clinicians document headache frequency that were not easily achieved by traditional NLP models. We also showed that GPT2-based frameworks outperformed ClinicalBERT in terms of accuracy in extracting headache frequency from clinical notes. To facilitate research in the field, we released the GPT-2 generative model and inference code with open-source license of community use in GitHub.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA