Pesquisa | Portal Regional da BVS

1.

MIDRC-MetricTree: a decision tree-based tool for recommending performance metrics in artificial intelligence-assisted medical image analysis.

Drukker, Karen; Sahiner, Berkman; Hu, Tingting; Kim, Grace Hyun; Whitney, Heather M; Baughan, Natalie; Myers, Kyle J; Giger, Maryellen L; McNitt-Gray, Michael.

J Med Imaging (Bellingham) ; 11(2): 024504, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38576536

RESUMO

Purpose: The Medical Imaging and Data Resource Center (MIDRC) was created to facilitate medical imaging machine learning (ML) research for tasks including early detection, diagnosis, prognosis, and assessment of treatment response related to the coronavirus disease 2019 pandemic and beyond. The purpose of this work was to create a publicly available metrology resource to assist researchers in evaluating the performance of their medical image analysis ML algorithms. Approach: An interactive decision tree, called MIDRC-MetricTree, has been developed, organized by the type of task that the ML algorithm was trained to perform. The criteria for this decision tree were that (1) users can select information such as the type of task, the nature of the reference standard, and the type of the algorithm output and (2) based on the user input, recommendations are provided regarding appropriate performance evaluation approaches and metrics, including literature references and, when possible, links to publicly available software/code as well as short tutorial videos. Results: Five types of tasks were identified for the decision tree: (a) classification, (b) detection/localization, (c) segmentation, (d) time-to-event (TTE) analysis, and (e) estimation. As an example, the classification branch of the decision tree includes two-class (binary) and multiclass classification tasks and provides suggestions for methods, metrics, software/code recommendations, and literature references for situations where the algorithm produces either binary or non-binary (e.g., continuous) output and for reference standards with negligible or non-negligible variability and unreliability. Conclusions: The publicly available decision tree is a resource to assist researchers in conducting task-specific performance evaluations, including classification, detection/localization, segmentation, TTE, and estimation tasks.

2.

Artificial intelligence in medicine: mitigating risks and maximizing benefits via quality assurance, quality control, and acceptance testing.

Mahmood, Usman; Shukla-Dave, Amita; Chan, Heang-Ping; Drukker, Karen; Samala, Ravi K; Chen, Quan; Vergara, Daniel; Greenspan, Hayit; Petrick, Nicholas; Sahiner, Berkman; Huo, Zhimin; Summers, Ronald M; Cha, Kenny H; Tourassi, Georgia; Deserno, Thomas M; Grizzard, Kevin T; Näppi, Janne J; Yoshida, Hiroyuki; Regge, Daniele; Mazurchuk, Richard; Suzuki, Kenji; Morra, Lia; Huisman, Henkjan; Armato, Samuel G; Hadjiiski, Lubomir.

BJR Artif Intell ; 1(1): ubae003, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38476957

RESUMO

The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples. We also highlight what we see as the shared responsibility of manufacturers or vendors, regulators, healthcare systems, medical physicists, and clinicians to enact appropriate testing and oversight to ensure a safe and equitable transformation of medicine through AI.

3.

Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI.

Whitney, Heather M; Drukker, Karen; Vieceli, Michael; Van Dusen, Amy; de Oliveira, Michelle; Abe, Hiroyuki; Giger, Maryellen L.

Med Phys ; 51(3): 1812-1821, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37602841

RESUMO

BACKGROUND: Artificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier. PURPOSE: The purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features. METHODS: Images of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations. RESULTS: In classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance. CONCLUSIONS: When there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.

Assuntos

Inteligência Artificial , Neoplasias da Mama , Humanos , Feminino , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Neoplasias da Mama/patologia , Aprendizado de Máquina

4.

Best Practices for Artificial Intelligence and Machine Learning for Computer-Aided Diagnosis in Medical Imaging.

Vergara, Daniel; Armato, Samuel G; Hadjiiski, Lubomir; Drukker, Karen.

J Am Coll Radiol ; 21(2): 341-343, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37925095

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Diagnóstico por Imagem , Diagnóstico por Computador/métodos , Computadores

5.

Sequestration of imaging studies in MIDRC: stratified sampling to balance demographic characteristics of patients in a multi-institutional data commons.

Baughan, Natalie; Whitney, Heather M; Drukker, Karen; Sahiner, Berkman; Hu, Tingting; Kim, Grace Hyun; McNitt-Gray, Michael; Myers, Kyle J; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 10(6): 064501, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38074627

RESUMO

Purpose: The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered commons for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated metadata become part of the open commons and 20% are sequestered from the open commons. To ensure that both commons are representative of the population available, we introduced a stratified sampling method to balance the demographic characteristics across the two datasets. Approach: Our method uses multi-dimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to the open or sequestered commons. This algorithm was used on an example dataset containing 5000 patients using the variables of race, age, sex at birth, ethnicity, COVID-19 status, and image modality and compared resulting demographic distributions to naïve random sampling of the dataset over 2000 independent trials. Results: Resulting prevalence of each demographic variable matched the prevalence from the input dataset within one standard deviation. Mann-Whitney U test results supported the hypothesis that sequestration by stratified sampling provided more balanced subsets than naïve randomization, except for demographic subcategories with very low prevalence. Conclusions: The developed multi-dimensional stratified sampling algorithm can partition a large dataset while maintaining balance across several variables, superior to the balance achieved from naïve randomization.

6.

U-Net breast lesion segmentations for breast dynamic contrast-enhanced magnetic resonance imaging.

Douglas, Lindsay; Bhattacharjee, Roma; Fuhrman, Jordan; Drukker, Karen; Hu, Qiyuan; Edwards, Alexandra; Sheth, Deepa; Giger, Maryellen.

J Med Imaging (Bellingham) ; 10(6): 064502, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37990686

RESUMO

Purpose: Given the dependence of radiomic-based computer-aided diagnosis artificial intelligence on accurate lesion segmentation, we assessed the performances of 2D and 3D U-Nets in breast lesion segmentation on dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) relative to fuzzy c-means (FCM) and radiologist segmentations. Approach: Using 994 unique breast lesions imaged with DCE-MRI, three segmentation algorithms (FCM clustering, 2D and 3D U-Net convolutional neural networks) were investigated. Center slice segmentations produced by FCM, 2D U-Net, and 3D U-Net were evaluated using radiologist segmentations as truth, and volumetric segmentations produced by 2D U-Net slices and 3D U-Net were compared using FCM as a surrogate reference standard. Fivefold cross-validation by lesion was conducted on the U-Nets; Dice similarity coefficient (DSC) and Hausdorff distance (HD) served as performance metrics. Segmentation performances were compared across different input image and lesion types. Results: 2D U-Net outperformed 3D U-Net for center slice (DSC, HD p<0.001) and volume segmentations (DSC, HD p<0.001). 2D U-Net outperformed FCM in center slice segmentation (DSC p<0.001). The use of second postcontrast subtraction images showed greater performance than first postcontrast subtraction images using the 2D and 3D U-Net (DSC p<0.05). Additionally, mass segmentation outperformed nonmass segmentation from first and second postcontrast subtraction images using 2D and 3D U-Nets (DSC, HD p<0.001). Conclusions: Results suggest that 2D U-Net is promising in segmenting mass and nonmass enhancing breast lesions from first and second postcontrast subtraction MRIs and thus could be an effective alternative to FCM or 3D U-Net.

7.

AI in imaging and therapy: innovations, ethics, and impact - introductory editorial.

Naqa, Issam El; Drukker, Karen.

Br J Radiol ; 96(1150): 20239004, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-38011226

8.

Special Section Editorial: Artificial Intelligence for Medical Imaging in Clinical Practice.

Mello-Thoms, Claudia; Drukker, Karen; Taylor-Phillips, Sian; Iftekharuddin, Khan; Gavrielides, Marios.

J Med Imaging (Bellingham) ; 10(5): 051801, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37915406

RESUMO

The editorial introduces the JMI Special Section on Artificial Intelligence for Medical Imaging in Clinical Practice.

9.

AI in medical imaging grand challenges: translation from competition to research benefit and patient care.

Armato, Samuel G; Drukker, Karen; Hadjiiski, Lubomir.

Br J Radiol ; 96(1150): 20221152, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37698542

RESUMO

Artificial intelligence (AI), in one form or another, has been a part of medical imaging for decades. The recent evolution of AI into approaches such as deep learning has dramatically accelerated the application of AI across a wide range of radiologic settings. Despite the promises of AI, developers and users of AI technology must be fully aware of its potential biases and pitfalls, and this knowledge must be incorporated throughout the AI system development pipeline that involves training, validation, and testing. Grand challenges offer an opportunity to advance the development of AI methods for targeted applications and provide a mechanism for both directing and facilitating the development of AI systems. In the process, a grand challenge centralizes (with the challenge organizers) the burden of providing a valid benchmark test set to assess performance and generalizability of participants' models and the collection and curation of image metadata, clinical/demographic information, and the required reference standard. The most relevant grand challenges are those designed to maximize the open-science nature of the competition, with code and trained models deposited for future public access. The ultimate goal of AI grand challenges is to foster the translation of AI systems from competition to research benefit and patient care. Rather than reference the many medical imaging grand challenges that have been organized by groups such as MICCAI, RSNA, AAPM, and grand-challenge.org, this review assesses the role of grand challenges in promoting AI technologies for research advancement and for eventual clinical implementation, including their promises and limitations.

Assuntos

Inteligência Artificial , Radiologia , Humanos , Radiografia , Diagnóstico por Imagem , Assistência ao Paciente

10.

Predicting intensive care need for COVID-19 patients using deep learning on chest radiography.

Li, Hui; Drukker, Karen; Hu, Qiyuan; Whitney, Heather M; Fuhrman, Jordan D; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 10(4): 044504, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37608852

RESUMO

Purpose: Image-based prediction of coronavirus disease 2019 (COVID-19) severity and resource needs can be an important means to address the COVID-19 pandemic. In this study, we propose an artificial intelligence/machine learning (AI/ML) COVID-19 prognosis method to predict patients' needs for intensive care by analyzing chest X-ray radiography (CXR) images using deep learning. Approach: The dataset consisted of 8357 CXR exams from 5046 COVID-19-positive patients as confirmed by reverse transcription polymerase chain reaction (RT-PCR) tests for the SARS-CoV-2 virus with a training/validation/test split of 64%/16%/20% on a by patient level. Our model involved a DenseNet121 network with a sequential transfer learning technique employed to train on a sequence of gradually more specific and complex tasks: (1) fine-tuning a model pretrained on ImageNet using a previously established CXR dataset with a broad spectrum of pathologies; (2) refining on another established dataset to detect pneumonia; and (3) fine-tuning using our in-house training/validation datasets to predict patients' needs for intensive care within 24, 48, 72, and 96 h following the CXR exams. The classification performances were evaluated on our independent test set (CXR exams of 1048 patients) using the area under the receiver operating characteristic curve (AUC) as the figure of merit in the task of distinguishing between those COVID-19-positive patients who required intensive care following the imaging exam and those who did not. Results: Our proposed AI/ML model achieved an AUC (95% confidence interval) of 0.78 (0.74, 0.81) when predicting the need for intensive care 24 h in advance, and at least 0.76 (0.73, 0.80) for 48 h or more in advance using predictions based on the AI prognostic marker derived from CXR images. Conclusions: This AI/ML prediction model for patients' needs for intensive care has the potential to support both clinical decision-making and resource management.

11.

Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons.

Whitney, Heather M; Baughan, Natalie; Myers, Kyle J; Drukker, Karen; Gichoya, Judy; Bower, Brad; Chen, Weijie; Gruszauskas, Nicholas; Kalpathy-Cramer, Jayashree; Koyejo, Sanmi; Sá, Rui C; Sahiner, Berkman; Zhang, Zi; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 10(6): 61105, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37469387

RESUMO

Purpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach: The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity. Results: Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time. Conclusion: The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.

12.

Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment.

Drukker, Karen; Chen, Weijie; Gichoya, Judy; Gruszauskas, Nicholas; Kalpathy-Cramer, Jayashree; Koyejo, Sanmi; Myers, Kyle; Sá, Rui C; Sahiner, Berkman; Whitney, Heather; Zhang, Zi; Giger, Maryellen.

J Med Imaging (Bellingham) ; 10(6): 061104, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37125409

RESUMO

Purpose: To recognize and address various sources of bias essential for algorithmic fairness and trustworthiness and to contribute to a just and equitable deployment of AI in medical imaging, there is an increasing interest in developing medical imaging-based machine learning methods, also known as medical imaging artificial intelligence (AI), for the detection, diagnosis, prognosis, and risk assessment of disease with the goal of clinical implementation. These tools are intended to help improve traditional human decision-making in medical imaging. However, biases introduced in the steps toward clinical deployment may impede their intended function, potentially exacerbating inequities. Specifically, medical imaging AI can propagate or amplify biases introduced in the many steps from model inception to deployment, resulting in a systematic difference in the treatment of different groups. Approach: Our multi-institutional team included medical physicists, medical imaging artificial intelligence/machine learning (AI/ML) researchers, experts in AI/ML bias, statisticians, physicians, and scientists from regulatory bodies. We identified sources of bias in AI/ML, mitigation strategies for these biases, and developed recommendations for best practices in medical imaging AI/ML development. Results: Five main steps along the roadmap of medical imaging AI/ML were identified: (1) data collection, (2) data preparation and annotation, (3) model development, (4) model evaluation, and (5) model deployment. Within these steps, or bias categories, we identified 29 sources of potential bias, many of which can impact multiple steps, as well as mitigation strategies. Conclusions: Our findings provide a valuable resource to researchers, clinicians, and the public at large.

13.

A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis.

Konz, Nicholas; Buda, Mateusz; Gu, Hanxue; Saha, Ashirbani; Yang, Jichen; Chledowski, Jakub; Park, Jungkyu; Witowski, Jan; Geras, Krzysztof J; Shoshan, Yoel; Gilboa-Solomon, Flora; Khapun, Daniel; Ratner, Vadim; Barkan, Ella; Ozery-Flato, Michal; Martí, Robert; Omigbodun, Akinyinka; Marasinou, Chrysostomos; Nakhaei, Noor; Hsu, William; Sahu, Pranjal; Hossain, Md Belayat; Lee, Juhun; Santos, Carlos; Przelaskowski, Artur; Kalpathy-Cramer, Jayashree; Bearce, Benjamin; Cha, Kenny; Farahani, Keyvan; Petrick, Nicholas; Hadjiiski, Lubomir; Drukker, Karen; Armato, Samuel G; Mazurowski, Maciej A.

JAMA Netw Open ; 6(2): e230524, 2023 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-36821110

RESUMO

Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. Design, Setting, and Participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22â¯032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. Main Outcomes and Measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. Conclusions and Relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.

Assuntos

Inteligência Artificial , Neoplasias da Mama , Humanos , Feminino , Benchmarking , Mamografia/métodos , Algoritmos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Neoplasias da Mama/diagnóstico por imagem

14.

AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging.

Hadjiiski, Lubomir; Cha, Kenny; Chan, Heang-Ping; Drukker, Karen; Morra, Lia; Näppi, Janne J; Sahiner, Berkman; Yoshida, Hiroyuki; Chen, Quan; Deserno, Thomas M; Greenspan, Hayit; Huisman, Henkjan; Huo, Zhimin; Mazurchuk, Richard; Petrick, Nicholas; Regge, Daniele; Samala, Ravi; Summers, Ronald M; Suzuki, Kenji; Tourassi, Georgia; Vergara, Daniel; Armato, Samuel G.

Med Phys ; 50(2): e1-e24, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-36565447

RESUMO

Rapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods. Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic. To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support.

Assuntos

Inteligência Artificial , Diagnóstico por Computador , Humanos , Reprodutibilidade dos Testes , Diagnóstico por Computador/métodos , Diagnóstico por Imagem , Aprendizado de Máquina

15.

Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification.

Whitney, Heather M; Drukker, Karen; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 9(3): 035502, 2022 May.

Artigo em Inglês | MEDLINE | ID: mdl-35656541

RESUMO

Purpose: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. Approach: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. Results: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. Conclusion: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning.

16.

Role of standard and soft tissue chest radiography images in deep-learning-based early diagnosis of COVID-19.

Hu, Qiyuan; Drukker, Karen; Giger, Maryellen L.

J Med Imaging (Bellingham) ; 8(Suppl 1): 014503, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-34595245

RESUMO

Purpose: We propose a deep learning method for the automatic diagnosis of COVID-19 at patient presentation on chest radiography (CXR) images and investigates the role of standard and soft tissue CXR in this task. Approach: The dataset consisted of the first CXR exams of 9860 patients acquired within 2 days after their initial reverse transcription polymerase chain reaction tests for the SARS-CoV-2 virus, 1523 (15.5%) of whom tested positive and 8337 (84.5%) of whom tested negative for COVID-19. A sequential transfer learning strategy was employed to fine-tune a convolutional neural network in phases on increasingly specific and complex tasks. The COVID-19 positive/negative classification was performed on standard images, soft tissue images, and both combined via feature fusion. A U-Net variant was used to segment and crop the lung region from each image prior to performing classification. Classification performances were evaluated and compared on a held-out test set of 1972 patients using the area under the receiver operating characteristic curve (AUC) and the DeLong test. Results: Using full standard, cropped standard, cropped, soft tissue, and both types of cropped CXR yielded AUC values of 0.74 [0.70, 0.77], 0.76 [0.73, 0.79], 0.73 [0.70, 0.76], and 0.78 [0.74, 0.81], respectively. Using soft tissue images significantly underperformed standard images, and using both types of CXR failed to significantly outperform using standard images alone. Conclusions: The proposed method was able to automatically diagnose COVID-19 at patient presentation with promising performance, and the inclusion of soft tissue images did not result in a significant performance improvement.

17.

AI in medical physics: guidelines for publication.

El Naqa, Issam; Boone, John M; Benedict, Stanley H; Goodsitt, Mitchell M; Chan, Heang-Ping; Drukker, Karen; Hadjiiski, Lubomir; Ruan, Dan; Sahiner, Berkman.

Med Phys ; 48(9): 4711-4714, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34545957

RESUMO

The Abstract is intended to provide a concise summary of the study and its scientific findings. For AI/ML applications in medical physics, a problem statement and rationale for utilizing these algorithms are necessary while highlighting the novelty of the approach. A brief numerical description of how the data are partitioned into subsets for training of the AI/ML algorithm, validation (including tuning of parameters), and independent testing of algorithm performance is required. This is to be followed by a summary of the results and statistical metrics that quantify the performance of the AI/ML algorithm.

Assuntos

Algoritmos , Inteligência Artificial , Física

18.

Special Section Guest Editorial: Radiogenomics in Prognosis and Treatment.

Drukker, Karen; Kontos, Despina; Li, Hui.

J Med Imaging (Bellingham) ; 8(3): 031901, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-34179216

RESUMO

The editorial introduces the Special Section on Radiogenomics in Prognosis and Treatment for Volume 8 Issue 3 of the Journal of Medical Imaging.

19.

Robustness of radiomic features of benign breast lesions and hormone receptor positive/HER2-negative cancers across DCE-MR magnet strengths.

Whitney, Heather M; Drukker, Karen; Edwards, Alexandra; Papaioannou, John; Medved, Milica; Karczmar, Gregory; Giger, Maryellen L.

Magn Reson Imaging ; 82: 111-121, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34174331

RESUMO

Radiomic features extracted from breast lesion images have shown potential in diagnosis and prognosis of breast cancer. As medical centers transition from 1.5 T to 3.0 T magnetic resonance (MR) imaging, it is beneficial to identify potentially robust radiomic features across field strengths because images acquired at different field strengths could be used in machine learning models. Dynamic contrast-enhanced MR images of benign breast lesions and hormone receptor positive/HER2-negative (HR+/HER2-) breast cancers were acquired retrospectively, yielding 612 unique cases: 150 and 99 benign lesions imaged at 1.5 T and 3.0 T, and 223 and 140 HR+/HER2- cancerous lesions imaged at 1.5 T and 3.0 T, respectively. In addition, an independent set of seven lesions imaged at both field strengths, three benign lesions and four HR+/HER2- cancers, was analyzed separately. Lesions were automatically segmented using a 4D fuzzy c-means method; thirty-eight radiomic features were extracted. Feature value distributions were compared by cancer status and imaging field strength using the Kolmogorov-Smirnov test. Features that did not demonstrate a statistically significant difference were considered to be potentially robust. The area under the receiver operating characteristic curve (AUC), for the task of classifying lesions as benign or HR+/HER2- cancer, was determined for each feature at each field strength. Three features were found to be both potentially robust across field strength and of high classification performance, i.e., AUCs statistically greater than 0.5 in the classification task: one shape feature (irregularity), one texture feature (sum average) and one enhancement variance kinetics features (enhancement variance increasing rate). In the demonstration set of lesions imaged at both field strengths, two of the three potentially robust features showed qualitative agreement across field strength. These findings may contribute to the development of computer-aided diagnosis models that are robust across field strength for this classification task.

Assuntos

Neoplasias da Mama , Imãs , Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico por imagem , Meios de Contraste , Feminino , Hormônios , Humanos , Imageamento por Ressonância Magnética , Estudos Retrospectivos

20.

SPIE-AAPM-NCI BreastPathQ challenge: an image analysis challenge for quantitative tumor cellularity assessment in breast cancer histology images following neoadjuvant treatment.

Petrick, Nicholas; Akbar, Shazia; Cha, Kenny H; Nofech-Mozes, Sharon; Sahiner, Berkman; Gavrielides, Marios A; Kalpathy-Cramer, Jayashree; Drukker, Karen; Martel, Anne L.

J Med Imaging (Bellingham) ; 8(3): 034501, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-33987451

RESUMO

Purpose: The breast pathology quantitative biomarkers (BreastPathQ) challenge was a grand challenge organized jointly by the International Society for Optics and Photonics (SPIE), the American Association of Physicists in Medicine (AAPM), the U.S. National Cancer Institute (NCI), and the U.S. Food and Drug Administration (FDA). The task of the BreastPathQ challenge was computerized estimation of tumor cellularity (TC) in breast cancer histology images following neoadjuvant treatment. Approach: A total of 39 teams developed, validated, and tested their TC estimation algorithms during the challenge. The training, validation, and testing sets consisted of 2394, 185, and 1119 image patches originating from 63, 6, and 27 scanned pathology slides from 33, 4, and 18 patients, respectively. The summary performance metric used for comparing and ranking algorithms was the average prediction probability concordance (PK) using scores from two pathologists as the TC reference standard. Results: Test PK performance ranged from 0.497 to 0.941 across the 100 submitted algorithms. The submitted algorithms generally performed well in estimating TC, with high-performing algorithms obtaining comparable results to the average interrater PK of 0.927 from the two pathologists providing the reference TC scores. Conclusions: The SPIE-AAPM-NCI BreastPathQ challenge was a success, indicating that artificial intelligence/machine learning algorithms may be able to approach human performance for cellularity assessment and may have some utility in clinical practice for improving efficiency and reducing reader variability. The BreastPathQ challenge can be accessed on the Grand Challenge website.

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA