Search | VHL Regional Portal

1.

Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance.

Wang, Andrew Y; Lin, Sherman; Tran, Christopher; Homer, Robert J; Wilsdon, Dan; Walsh, Joanna C; Goebel, Emily A; Sansano, Irene; Sonawane, Snehal; Cockenpot, Vincent; Mukhopadhyay, Sanjay; Taskin, Toros; Zahra, Nusrat; Cima, Luca; Semerci, Orhan; Özamrak, Birsen Gizem; Mishra, Pallavi; Vennavalli, Naga Sarika; Chen, Po-Hsuan Cameron; Cecchini, Matthew J.

Arch Pathol Lab Med ; 2024 Jan 20.

Article in English | MEDLINE | ID: mdl-38244054

ABSTRACT

CONTEXT.: Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine. OBJECTIVES.: To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4. DESIGN.: An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist that recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT. RESULTS.: GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5. CONCLUSIONS.: By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training.

2.

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging.

Azizi, Shekoofeh; Culp, Laura; Freyberg, Jan; Mustafa, Basil; Baur, Sebastien; Kornblith, Simon; Chen, Ting; Tomasev, Nenad; Mitrovic, Jovana; Strachan, Patricia; Mahdavi, S Sara; Wulczyn, Ellery; Babenko, Boris; Walker, Megan; Loh, Aaron; Chen, Po-Hsuan Cameron; Liu, Yuan; Bavishi, Pinal; McKinney, Scott Mayer; Winkens, Jim; Roy, Abhijit Guha; Beaver, Zach; Ryan, Fiona; Krogue, Justin; Etemadi, Mozziyar; Telang, Umesh; Liu, Yun; Peng, Lily; Corrado, Greg S; Webster, Dale R; Fleet, David; Hinton, Geoffrey; Houlsby, Neil; Karthikesalingam, Alan; Norouzi, Mohammad; Natarajan, Vivek.

Nat Biomed Eng ; 7(6): 756-779, 2023 06.

Article in English | MEDLINE | ID: mdl-37291435

ABSTRACT

Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.

Subject(s)

Machine Learning , Supervised Machine Learning , Diagnostic Imaging

3.

Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning.

Krogue, Justin D; Azizi, Shekoofeh; Tan, Fraser; Flament-Auvigne, Isabelle; Brown, Trissia; Plass, Markus; Reihs, Robert; Müller, Heimo; Zatloukal, Kurt; Richeson, Pema; Corrado, Greg S; Peng, Lily H; Mermel, Craig H; Liu, Yun; Chen, Po-Hsuan Cameron; Gombar, Saurabh; Montine, Thomas; Shen, Jeanne; Steiner, David F; Wulczyn, Ellery.

Commun Med (Lond) ; 3(1): 59, 2023 Apr 24.

Article in English | MEDLINE | ID: mdl-37095223

ABSTRACT

BACKGROUND: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. METHODS: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. RESULTS: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). CONCLUSION: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts.

When colorectal cancers spread to the lymph nodes, it can indicate a poorer prognosis. However, detecting lymph node metastasis (spread) can be difficult and depends on a number of factors such as how samples are taken and processed. Here, we show that machine learning, which involves computer software learning from patterns in data, can predict lymph node metastasis in patients with colorectal cancer from the microscopic appearance of their primary tumor and the clinical characteristics of the patients. We also show that the same approach can predict patient survival. With further work, our approach may help clinicians to inform patients about their prognosis and decide on appropriate treatments.

4.

Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification.

L'Imperio, Vincenzo; Wulczyn, Ellery; Plass, Markus; Müller, Heimo; Tamini, Nicolò; Gianotti, Luca; Zucchini, Nicola; Reihs, Robert; Corrado, Greg S; Webster, Dale R; Peng, Lily H; Chen, Po-Hsuan Cameron; Lavitrano, Marialuisa; Liu, Yun; Steiner, David F; Zatloukal, Kurt; Pagni, Fabio.

JAMA Netw Open ; 6(3): e2254891, 2023 03 01.

Article in English | MEDLINE | ID: mdl-36917112

ABSTRACT

Importance: Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. Objective: To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. Design, Setting, and Participants: This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. Main Outcomes and Measures: Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. Results: A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). Conclusions and Relevance: In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.

Subject(s)

Adenocarcinoma , Colonic Neoplasms , Male , Humans , Aged , Colonic Neoplasms/diagnosis , Pathologists , Artificial Intelligence , Machine Learning , Risk Assessment

5.

Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists.

Kazemzadeh, Sahar; Yu, Jin; Jamshy, Shahar; Pilgrim, Rory; Nabulsi, Zaid; Chen, Christina; Beladia, Neeral; Lau, Charles; McKinney, Scott Mayer; Hughes, Thad; Kiraly, Atilla P; Kalidindi, Sreenivasa Raju; Muyoyeta, Monde; Malemela, Jameson; Shih, Ting; Corrado, Greg S; Peng, Lily; Chou, Katherine; Chen, Po-Hsuan Cameron; Liu, Yun; Eswaran, Krish; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi.

Radiology ; 306(1): 124-137, 2023 01.

Article in English | MEDLINE | ID: mdl-36066366

ABSTRACT

Background The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning ("noisy-student") were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%-80% per TB-positive patient detected. Conclusion A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Subject(s)

Deep Learning , Tuberculosis, Pulmonary , Humans , Female , Middle Aged , Male , Radiography, Thoracic/methods , Retrospective Studies , Radiography , Tuberculosis, Pulmonary/diagnostic imaging , Radiologists , Sensitivity and Specificity

6.

Deep learning models for histologic grading of breast cancer and association with disease prognosis.

Jaroensri, Ronnachai; Wulczyn, Ellery; Hegde, Narayan; Brown, Trissia; Flament-Auvigne, Isabelle; Tan, Fraser; Cai, Yuannan; Nagpal, Kunal; Rakha, Emad A; Dabbs, David J; Olson, Niels; Wren, James H; Thompson, Elaine E; Seetao, Erik; Robinson, Carrie; Miao, Melissa; Beckers, Fabien; Corrado, Greg S; Peng, Lily H; Mermel, Craig H; Liu, Yun; Steiner, David F; Chen, Po-Hsuan Cameron.

NPJ Breast Cancer ; 8(1): 113, 2022 Oct 04.

Article in English | MEDLINE | ID: mdl-36192400

ABSTRACT

Histologic grading of breast cancer involves review and scoring of three well-established morphologic features: mitotic count, nuclear pleomorphism, and tubule formation. Taken together, these features form the basis of the Nottingham Grading System which is used to inform breast cancer characterization and prognosis. In this study, we develop deep learning models to perform histologic scoring of all three components using digitized hematoxylin and eosin-stained slides containing invasive breast carcinoma. We first evaluate model performance using pathologist-based reference standards for each component. To complement this typical approach to evaluation, we further evaluate the deep learning models via prognostic analyses. The individual component models perform at or above published benchmarks for algorithm-based grading approaches, achieving high concordance rates with pathologist grading. Further, prognostic performance using deep learning-based grading is on par with that of pathologists performing review of matched slides. By providing scores for each component feature, the deep-learning based approach also provides the potential to identify the grading components contributing most to prognostic value. This may enable optimized prognostic models, opportunities to improve access to consistent grading, and approaches to better understand the links between histologic features and clinical outcomes in breast cancer.

7.

Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.

Bulten, Wouter; Kartasalo, Kimmo; Chen, Po-Hsuan Cameron; Ström, Peter; Pinckaers, Hans; Nagpal, Kunal; Cai, Yuannan; Steiner, David F; van Boven, Hester; Vink, Robert; Hulsbergen-van de Kaa, Christina; van der Laak, Jeroen; Amin, Mahul B; Evans, Andrew J; van der Kwast, Theodorus; Allan, Robert; Humphrey, Peter A; Grönberg, Henrik; Samaratunga, Hemamali; Delahunt, Brett; Tsuzuki, Toyonori; Häkkinen, Tomi; Egevad, Lars; Demkin, Maggie; Dane, Sohier; Tan, Fraser; Valkonen, Masi; Corrado, Greg S; Peng, Lily; Mermel, Craig H; Ruusuvuori, Pekka; Litjens, Geert; Eklund, Martin.

Nat Med ; 28(1): 154-163, 2022 01.

Article in English | MEDLINE | ID: mdl-35027755

ABSTRACT

Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge-the largest histopathology competition to date, joined by 1,290 developers-to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840-0.884) and 0.868 (95% CI, 0.835-0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials.

Subject(s)

Neoplasm Grading , Prostatic Neoplasms/pathology , Algorithms , Biopsy , Cohort Studies , Humans , Male , Prostatic Neoplasms/diagnosis , Reproducibility of Results

8.

Evaluation of artificial intelligence on a reference standard based on subjective interpretation.

Chen, Po-Hsuan Cameron; Mermel, Craig H; Liu, Yun.

Lancet Digit Health ; 3(11): e693-e695, 2021 11.

Article in English | MEDLINE | ID: mdl-34561202

Subject(s)

Artificial Intelligence , Clinical Decision-Making/methods , Models, Biological , Humans , Observer Variation , Reference Standards

9.

Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19.

Nabulsi, Zaid; Sellergren, Andrew; Jamshy, Shahar; Lau, Charles; Santos, Edward; Kiraly, Atilla P; Ye, Wenxing; Yang, Jie; Pilgrim, Rory; Kazemzadeh, Sahar; Yu, Jin; Kalidindi, Sreenivasa Raju; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Melnick, David; Corrado, Greg S; Peng, Lily; Eswaran, Krish; Tse, Daniel; Beladia, Neeral; Liu, Yun; Chen, Po-Hsuan Cameron; Shetty, Shravya.

Sci Rep ; 11(1): 15523, 2021 09 01.

Article in English | MEDLINE | ID: mdl-34471144

ABSTRACT

Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to detect every possible condition by building multiple separate systems, each of which detects one or more pre-specified conditions. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For training and tuning the system, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. Lastly, to facilitate the continued development of AI models for CXR, we release our collected labels for the publicly available dataset.

Subject(s)

COVID-19/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Tuberculosis/diagnostic imaging , Adult , Aged , Algorithms , Case-Control Studies , China , Deep Learning , Female , Humans , India , Male , Middle Aged , Radiography, Thoracic , United States

10.

Artificial Intelligence for Diagnosis and Gleason Grading of Prostate Cancer in Biopsies-Current Status and Next Steps.

Kartasalo, Kimmo; Bulten, Wouter; Delahunt, Brett; Chen, Po-Hsuan Cameron; Pinckaers, Hans; Olsson, Henrik; Ji, Xiaoyi; Mulliqi, Nita; Samaratunga, Hemamali; Tsuzuki, Toyonori; Lindberg, Johan; Rantalainen, Mattias; Wählby, Carolina; Litjens, Geert; Ruusuvuori, Pekka; Egevad, Lars; Eklund, Martin.

Eur Urol Focus ; 7(4): 687-691, 2021 07.

Article in English | MEDLINE | ID: mdl-34393083

ABSTRACT

Diagnosis and Gleason grading of prostate cancer in biopsies are critical for the clinical management of men with prostate cancer. Despite this, the high grading variability among pathologists leads to the potential for under- and overtreatment. Artificial intelligence (AI) systems have shown promise in assisting pathologists to perform Gleason grading, which could help address this problem. In this mini-review, we highlight studies reporting on the development of AI systems for cancer detection and Gleason grading, and discuss the progress needed for widespread clinical implementation, as well as anticipated future developments. PATIENT SUMMARY: This mini-review summarizes the evidence relating to the validation of artificial intelligence (AI)-assisted cancer detection and Gleason grading of prostate cancer in biopsies, and highlights the remaining steps required prior to its widespread clinical implementation. We found that, although there is strong evidence to show that AI is able to perform Gleason grading on par with experienced uropathologists, more work is needed to ensure the accuracy of results from AI systems in diverse settings across different patient populations, digitization platforms, and pathology laboratories.

Subject(s)

Artificial Intelligence , Prostatic Neoplasms , Biopsy , Humans , Image Interpretation, Computer-Assisted , Male , Neoplasm Grading , Prostatic Neoplasms/pathology

11.

Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images.

Sadhwani, Apaar; Chang, Huang-Wei; Behrooz, Ali; Brown, Trissia; Auvigne-Flament, Isabelle; Patel, Hardik; Findlater, Robert; Velez, Vanessa; Tan, Fraser; Tekiela, Kamilla; Wulczyn, Ellery; Yi, Eunhee S; Mermel, Craig H; Hanks, Debra; Chen, Po-Hsuan Cameron; Kulig, Kimary; Batenchuk, Cory; Steiner, David F; Cimermancic, Peter.

Sci Rep ; 11(1): 16605, 2021 08 16.

Article in English | MEDLINE | ID: mdl-34400666

ABSTRACT

Both histologic subtypes and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis and treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens and costly, time-consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but can be challenging and is associated with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using de-identified Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to map histologic features across whole slide images of lung cancer resection specimens. On evaluation using an external data source, this model achieved patch-level area under the receiver operating characteristic curve (AUC) of 0.78-0.98 across nine histologic features. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification. The resulting end-to-end system was evaluated on 172 held out cases from TCGA, achieving an AUC of 0.71 (95% CI 0.63-0.80). The benefit of using histologic features in predicting TMB is highlighted by the significant improvement this approach offers over using the clinical features alone (AUC of 0.63 [95% CI 0.53-0.72], p = 0.002). Furthermore, we found that our histologic subtype-based approach achieved performance similar to that of a weakly supervised approach (AUC of 0.72 [95% CI 0.64-0.80]). Together these results underscore that incorporating histologic patterns in biomarker prediction for lung cancer provides informative signals, and that interpretable approaches utilizing these patterns perform comparably with less interpretable, weakly supervised approaches.

Subject(s)

Adenocarcinoma of Lung/genetics , Carcinoma, Non-Small-Cell Lung/genetics , Deep Learning , Lung Neoplasms/genetics , Mutation , Adenocarcinoma of Lung/pathology , Adult , Age Factors , Aged , Aged, 80 and over , Area Under Curve , Carcinoma, Non-Small-Cell Lung/pathology , Coloring Agents , Datasets as Topic , Eosine Yellowish-(YS) , Female , Hematoxylin , Humans , Lung Neoplasms/pathology , Male , Middle Aged , ROC Curve , Sex Factors , Smoking , Staining and Labeling

12.

Interpretable survival prediction for colorectal cancer using deep learning.

Wulczyn, Ellery; Steiner, David F; Moran, Melissa; Plass, Markus; Reihs, Robert; Tan, Fraser; Flament-Auvigne, Isabelle; Brown, Trissia; Regitnig, Peter; Chen, Po-Hsuan Cameron; Hegde, Narayan; Sadhwani, Apaar; MacDonald, Robert; Ayalew, Benny; Corrado, Greg S; Peng, Lily H; Tse, Daniel; Müller, Heimo; Xu, Zhaoyang; Liu, Yun; Stumpe, Martin C; Zatloukal, Kurt; Mermel, Craig H.

NPJ Digit Med ; 4(1): 71, 2021 Apr 19.

Article in English | MEDLINE | ID: mdl-33875798

ABSTRACT

Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease-specific survival for stage II and III colorectal cancer using 3652 cases (27,300 slides). When evaluated on two validation datasets containing 1239 cases (9340 slides) and 738 cases (7140 slides), respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95% CI: 0.66-0.73) and 0.69 (95% CI: 0.64-0.72), and added significant predictive value to a set of nine clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2 = 18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning-based image-similarity model and showed that they explained the majority of the variance (R2 of 73-80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0-95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies.

13.

Predicting prostate cancer specific-mortality with artificial intelligence-based Gleason grading.

Wulczyn, Ellery; Nagpal, Kunal; Symonds, Matthew; Moran, Melissa; Plass, Markus; Reihs, Robert; Nader, Farah; Tan, Fraser; Cai, Yuannan; Brown, Trissia; Flament-Auvigne, Isabelle; Amin, Mahul B; Stumpe, Martin C; Müller, Heimo; Regitnig, Peter; Holzinger, Andreas; Corrado, Greg S; Peng, Lily H; Chen, Po-Hsuan Cameron; Steiner, David F; Zatloukal, Kurt; Liu, Yun; Mermel, Craig H.

Commun Med (Lond) ; 1: 10, 2021.

Article in English | MEDLINE | ID: mdl-35602201

ABSTRACT

Background: Gleason grading of prostate cancer is an important prognostic factor, but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether and to what extent A.I. grading translates to better prognostication. Methods: In this study, we developed a system to predict prostate cancer-specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2807 prostatectomy cases from a single European center with 5-25 years of follow-up (median: 13, interquartile range 9-17). Results: Here, we show that the A.I.'s risk scores produced a C-index of 0.84 (95% CI 0.80-0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. has a C-index of 0.82 (95% CI 0.78-0.85). On the subset of cases with a GG provided in the original pathology report (n = 1517), the A.I.'s C-indices are 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95% CI 0.71-0.86) for GG obtained from the reports. These represent improvements of 0.08 (95% CI 0.01-0.15) and 0.07 (95% CI 0.00-0.14), respectively. Conclusions: Our results suggest that A.I.-based Gleason grading can lead to effective risk stratification, and warrants further evaluation for improving disease management.

14.

Determining breast cancer biomarker status and associated morphological features using deep learning.

Gamble, Paul; Jaroensri, Ronnachai; Wang, Hongwu; Tan, Fraser; Moran, Melissa; Brown, Trissia; Flament-Auvigne, Isabelle; Rakha, Emad A; Toss, Michael; Dabbs, David J; Regitnig, Peter; Olson, Niels; Wren, James H; Robinson, Carrie; Corrado, Greg S; Peng, Lily H; Liu, Yun; Mermel, Craig H; Steiner, David F; Chen, Po-Hsuan Cameron.

Commun Med (Lond) ; 1: 14, 2021.

Article in English | MEDLINE | ID: mdl-35602213

ABSTRACT

Background: Breast cancer management depends on biomarkers including estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (ER/PR/HER2). Though existing scoring systems are widely used and well-validated, they can involve costly preparation and variable interpretation. Additionally, discordances between histology and expected biomarker findings can prompt repeat testing to address biological, interpretative, or technical reasons for unexpected results. Methods: We developed three independent deep learning systems (DLS) to directly predict ER/PR/HER2 status for both focal tissue regions (patches) and slides using hematoxylin-and-eosin-stained (H&E) images as input. Models were trained and evaluated using pathologist annotated slides from three data sources. Areas under the receiver operator characteristic curve (AUCs) were calculated for test sets at both a patch-level (>135 million patches, 181 slides) and slide-level (n = 3274 slides, 1249 cases, 37 sites). Interpretability analyses were performed using Testing with Concept Activation Vectors (TCAV), saliency analysis, and pathologist review of clustered patches. Results: The patch-level AUCs are 0.939 (95%CI 0.936-0.941), 0.938 (0.936-0.940), and 0.808 (0.802-0.813) for ER/PR/HER2, respectively. At the slide level, AUCs are 0.86 (95%CI 0.84-0.87), 0.75 (0.73-0.77), and 0.60 (0.56-0.64) for ER/PR/HER2, respectively. Interpretability analyses show known biomarker-histomorphology associations including associations of low-grade and lobular histology with ER/PR positivity, and increased inflammatory infiltrates with triple-negative staining. Conclusions: This study presents rapid breast cancer biomarker estimation from routine H&E slides and builds on prior advances by prioritizing interpretability of computationally learned features in the context of existing pathological knowledge.

15.

BrainIAK: The Brain Imaging Analysis Kit.

Kumar, Manoj; Anderson, Michael J; Antony, James W; Baldassano, Christopher; Brooks, Paula P; Cai, Ming Bo; Chen, Po-Hsuan Cameron; Ellis, Cameron T; Henselman-Petrusek, Gregory; Huberdeau, David; Hutchinson, J Benjamin; Li, Y Peeta; Lu, Qihong; Manning, Jeremy R; Mennen, Anne C; Nastase, Samuel A; Richard, Hugo; Schapiro, Anna C; Schuck, Nicolas W; Shvartsman, Michael; Sundaram, Narayanan; Suo, Daniel; Turek, Javier S; Turner, David; Vo, Vy A; Wallace, Grant; Wang, Yida; Williams, Jamal A; Zhang, Hejia; Zhu, Xia; Capota, Mihai; Cohen, Jonathan D; Hasson, Uri; Li, Kai; Ramadge, Peter J; Turk-Browne, Nicholas B; Willke, Theodore L; Norman, Kenneth A.

Apert Neuro ; 1(4)2021.

Article in English | MEDLINE | ID: mdl-35939268

ABSTRACT

Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEMs), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high-performance compute (HPC) clusters, and the same code can be se amlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.

16.

Closing the translation gap: AI applications in digital pathology.

Steiner, David F; Chen, Po-Hsuan Cameron; Mermel, Craig H.

Biochim Biophys Acta Rev Cancer ; 1875(1): 188452, 2021 01.

Article in English | MEDLINE | ID: mdl-33065195

ABSTRACT

Recent advances in artificial intelligence show tremendous promise to improve the accuracy, reproducibility, and availability of medical diagnostics across a number of medical subspecialities. This is especially true in the field of digital pathology, which has recently witnessed a surge in publications describing state-of-the-art performance for machine learning models across a wide range of diagnostic applications. Nonetheless, despite this promise, there remain significant gaps in translating applications for any of these technologies into actual clinical practice. In this review, we will first give a brief overview of the recent progress in applying AI to digitized pathology images, focusing on how these tools might be applied in clinical workflows in the near term to improve the accuracy and efficiency of pathologists. Then we define and describe in detail the various factors that need to be addressed in order to successfully close the "translation gap" for AI applications in digital pathology.

Subject(s)

Artificial Intelligence/trends , Diagnosis , Diagnostic Techniques and Procedures/trends , Machine Learning/trends , Humans

17.

Current and future applications of artificial intelligence in pathology: a clinical perspective.

Rakha, Emad A; Toss, Michael; Shiino, Sho; Gamble, Paul; Jaroensri, Ronnachai; Mermel, Craig H; Chen, Po-Hsuan Cameron.

J Clin Pathol ; 74(7): 409-414, 2021 Jul.

Article in English | MEDLINE | ID: mdl-32763920

ABSTRACT

During the last decade, a dramatic rise in the development and application of artificial intelligence (AI) tools for use in pathology services has occurred. This trend is often expected to continue and reshape the field of pathology in the coming years. The deployment of computational pathology and applications of AI tools can be considered as a paradigm shift that will change pathology services, making them more efficient and capable of meeting the needs of this era of precision medicine. Despite the success of AI models, the translational process from discovery to clinical applications has been slow. The gap between self-contained research and clinical environment may be too wide and has been largely neglected. In this review, we cover the current and prospective applications of AI in pathology. We examine its applications in diagnosis and prognosis, and we offer insights for considerations that could improve clinical applicability of these tools. Then, we discuss its potential to improve workflow efficiency, and its benefits in pathologist education. Finally, we review the factors that could influence adoption in clinical practices and the associated regulatory processes.

Subject(s)

Artificial Intelligence , Pathology , Artificial Intelligence/trends , Humans , Pathology/methods , Pathology/trends

18.

Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies.

Steiner, David F; Nagpal, Kunal; Sayres, Rory; Foote, Davis J; Wedin, Benjamin D; Pearce, Adam; Cai, Carrie J; Winter, Samantha R; Symonds, Matthew; Yatziv, Liron; Kapishnikov, Andrei; Brown, Trissia; Flament-Auvigne, Isabelle; Tan, Fraser; Stumpe, Martin C; Jiang, Pan-Pan; Liu, Yun; Chen, Po-Hsuan Cameron; Corrado, Greg S; Terry, Michael; Mermel, Craig H.

JAMA Netw Open ; 3(11): e2023267, 2020 11 02.

Article in English | MEDLINE | ID: mdl-33180129

ABSTRACT

Importance: Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored. Objective: To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies. Design, Setting, and Participants: This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists. Exposure: An AI-based assistive tool for Gleason grading of prostate biopsies. Main Outcomes and Measures: Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies. Results: Biopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence-assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%; P < .001) in agreement with subspecialists (from 69.7% for unassisted reviews to 75.3% for assisted reviews) across all biopsies and a 6.2% increase (95% CI, 2.7%-9.8%; P = .001) in agreement with subspecialists (from 72.3% for unassisted reviews to 78.5% for assisted reviews) for grade group 1 biopsies. A secondary analysis indicated that AI assistance was also associated with improvements in tumor detection, mean review time, mean self-reported confidence, and interpathologist agreement. Conclusions and Relevance: In this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading.

Subject(s)

Artificial Intelligence/standards , Pathology, Clinical/standards , Prostatic Neoplasms/diagnosis , Adult , Aged , Aged, 80 and over , Biopsy, Large-Core Needle/statistics & numerical data , Humans , Male , Middle Aged , Neoplasm Grading , Prostatic Neoplasms/pathology , Retrospective Studies

19.

Vulnerability of Antarctica's ice shelves to meltwater-driven fracture.

Lai, Ching-Yao; Kingslake, Jonathan; Wearing, Martin G; Chen, Po-Hsuan Cameron; Gentine, Pierre; Li, Harold; Spergel, Julian J; van Wessem, J Melchior.

Nature ; 584(7822): 574-578, 2020 08.

Article in English | MEDLINE | ID: mdl-32848224

ABSTRACT

Atmospheric warming threatens to accelerate the retreat of the Antarctic Ice Sheet by increasing surface melting and facilitating 'hydrofracturing'1-7, where meltwater flows into and enlarges fractures, potentially triggering ice-shelf collapse3-5,8-10. The collapse of ice shelves that buttress11-13 the ice sheet accelerates ice flow and sea-level rise14-16. However, we do not know if and how much of the buttressing regions of Antarctica's ice shelves are vulnerable to hydrofracture if inundated with water. Here we provide two lines of evidence suggesting that many buttressing regions are vulnerable. First, we trained a deep convolutional neural network (DCNN) to map the surface expressions of fractures in satellite imagery across all Antarctic ice shelves. Second, we developed a stability diagram of fractures based on linear elastic fracture mechanics to predict where basal and dry surface fractures form under current stress conditions. We find close agreement between the theoretical prediction and the DCNN-mapped fractures, despite limitations associated with detecting fractures in satellite imagery. Finally, we used linear elastic fracture mechanics theory to predict where surface fractures would become unstable if filled with water. Many regions regularly inundated with meltwater today are resilient to hydrofracture-stresses are low enough that all water-filled fractures are stable. Conversely, 60 ± 10 per cent of ice shelves (by area) both buttress upstream ice and are vulnerable to hydrofracture if inundated with water. The DCNN map confirms the presence of fractures in these buttressing regions. Increased surface melting17 could trigger hydrofracturing if it leads to water inundating the widespread vulnerable regions we identify. These regions are where atmospheric warming may have the largest impact on ice-sheet mass balance.

20.

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

Nagpal, Kunal; Foote, Davis; Tan, Fraser; Liu, Yun; Chen, Po-Hsuan Cameron; Steiner, David F; Manoj, Naren; Olson, Niels; Smith, Jenny L; Mohtashamian, Arash; Peterson, Brandon; Amin, Mahul B; Evans, Andrew J; Sweet, Joan W; Cheung, Carol; van der Kwast, Theodorus; Sangoi, Ankur R; Zhou, Ming; Allan, Robert; Humphrey, Peter A; Hipp, Jason D; Gadepalli, Krishna; Corrado, Greg S; Peng, Lily H; Stumpe, Martin C; Mermel, Craig H.

JAMA Oncol ; 6(9): 1372-1380, 2020 09 01.

Article in English | MEDLINE | ID: mdl-32701148

ABSTRACT

Importance: For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice. Objective: To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens. Design, Setting, and Participants: The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019. Main Outcomes and Measures: The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists' opinions with the subspecialists' majority opinions was also evaluated. Results: For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P < .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58). Conclusions and Relevance: In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions.

Subject(s)

Image Interpretation, Computer-Assisted , Neoplasm Grading/standards , Prostatic Neoplasms/diagnosis , Adolescent , Adult , Algorithms , Artificial Intelligence , Biopsy, Large-Core Needle/methods , Deep Learning , Humans , Male , Prostatic Neoplasms/epidemiology , Prostatic Neoplasms/pathology , Specimen Handling , United States/epidemiology , Young Adult

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL