Búsqueda | Portal Regional de la BVS

1.

AI and machine learning in medical imaging: key points from development to translation.

Samala, Ravi K; Drukker, Karen; Shukla-Dave, Amita; Chan, Heang-Ping; Sahiner, Berkman; Petrick, Nicholas; Greenspan, Hayit; Mahmood, Usman; Summers, Ronald M; Tourassi, Georgia; Deserno, Thomas M; Regge, Daniele; Näppi, Janne J; Yoshida, Hiroyuki; Huo, Zhimin; Chen, Quan; Vergara, Daniel; Cha, Kenny H; Mazurchuk, Richard; Grizzard, Kevin T; Huisman, Henkjan; Morra, Lia; Suzuki, Kenji; Armato, Samuel G; Hadjiiski, Lubomir.

BJR Artif Intell ; 1(1): ubae006, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38828430

RESUMEN

Innovation in medical imaging artificial intelligence (AI)/machine learning (ML) demands extensive data collection, algorithmic advancements, and rigorous performance assessments encompassing aspects such as generalizability, uncertainty, bias, fairness, trustworthiness, and interpretability. Achieving widespread integration of AI/ML algorithms into diverse clinical tasks will demand a steadfast commitment to overcoming issues in model design, development, and performance assessment. The complexities of AI/ML clinical translation present substantial challenges, requiring engagement with relevant stakeholders, assessment of cost-effectiveness for user and patient benefit, timely dissemination of information relevant to robust functioning throughout the AI/ML lifecycle, consideration of regulatory compliance, and feedback loops for real-world performance evidence. This commentary addresses several hurdles for the development and adoption of AI/ML technologies in medical imaging. Comprehensive attention to these underlying and often subtle factors is critical not only for tackling the challenges but also for exploring novel opportunities for the advancement of AI in radiology.

2.

Artificial intelligence in medicine: mitigating risks and maximizing benefits via quality assurance, quality control, and acceptance testing.

Mahmood, Usman; Shukla-Dave, Amita; Chan, Heang-Ping; Drukker, Karen; Samala, Ravi K; Chen, Quan; Vergara, Daniel; Greenspan, Hayit; Petrick, Nicholas; Sahiner, Berkman; Huo, Zhimin; Summers, Ronald M; Cha, Kenny H; Tourassi, Georgia; Deserno, Thomas M; Grizzard, Kevin T; Näppi, Janne J; Yoshida, Hiroyuki; Regge, Daniele; Mazurchuk, Richard; Suzuki, Kenji; Morra, Lia; Huisman, Henkjan; Armato, Samuel G; Hadjiiski, Lubomir.

BJR Artif Intell ; 1(1): ubae003, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38476957

RESUMEN

The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples. We also highlight what we see as the shared responsibility of manufacturers or vendors, regulators, healthcare systems, medical physicists, and clinicians to enact appropriate testing and oversight to ensure a safe and equitable transformation of medicine through AI.

3.

Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data.

Sobiecki, André; Hadjiiski, Lubomir M; Chan, Heang-Ping; Samala, Ravi K; Zhou, Chuan; Stojanovska, Jadranka; Agarwal, Prachi P.

Diagnostics (Basel) ; 14(3)2024 Feb 05.

Artículo en Inglés | MEDLINE | ID: mdl-38337857

RESUMEN

The diagnosis of severe COVID-19 lung infection is important because it carries a higher risk for the patient and requires prompt treatment with oxygen therapy and hospitalization while those with less severe lung infection often stay on observation. Also, severe infections are more likely to have long-standing residual changes in their lungs and may need follow-up imaging. We have developed deep learning neural network models for classifying severe vs. non-severe lung infections in COVID-19 patients on chest radiographs (CXR). A deep learning U-Net model was developed to segment the lungs. Inception-v1 and Inception-v4 models were trained for the classification of severe vs. non-severe COVID-19 infection. Four CXR datasets from multi-country and multi-institutional sources were used to develop and evaluate the models. The combined dataset consisted of 5748 cases and 6193 CXR images with physicians' severity ratings as reference standard. The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. We studied the reproducibility of classification performance using the different combinations of training and validation data sets. We also evaluated the generalizability of the trained deep learning models using both independent internal and external test sets. The Inception-v1 based models achieved AUC ranging between 0.81 ± 0.02 and 0.84 ± 0.0, while the Inception-v4 models achieved AUC in the range of 0.85 ± 0.06 and 0.89 ± 0.01, on the independent test sets, respectively. These results demonstrate the promise of using deep learning models in differentiating COVID-19 patients with severe from non-severe lung infection on chest radiographs.

4.

Decision region analysis for generalizability of artificial intelligence models: estimating model generalizability in the case of cross-reactivity and population shift.

Burgon, Alexis; Sahiner, Berkman; Petrick, Nicholas; Pennello, Gene; Cha, Kenny H; Samala, Ravi K.

J Med Imaging (Bellingham) ; 11(1): 014501, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38283653

RESUMEN

Purpose: Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach: Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results: Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions: An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.

5.

Methodology for Good Machine Learning with Multi-Omics Data.

Coroller, Thibaud; Sahiner, Berkman; Amatya, Anup; Gossmann, Alexej; Karagiannis, Konstantinos; Moloney, Conor; Samala, Ravi K; Santana-Quintero, Luis; Solovieff, Nadia; Wang, Craig; Amiri-Kordestani, Laleh; Cao, Qian; Cha, Kenny H; Charlab, Rosane; Cross, Frank H; Hu, Tingting; Huang, Ruihao; Kraft, Jeffrey; Krusche, Peter; Li, Yutong; Li, Zheng; Mazo, Ilya; Paul, Rahul; Schnakenberg, Susan; Serra, Paolo; Smith, Sean; Song, Chi; Su, Fei; Tiwari, Mohit; Vechery, Colin; Xiong, Xin; Zarate, Juan Pablo; Zhu, Hao; Chakravartty, Arunava; Liu, Qi; Ohlssen, David; Petrick, Nicholas; Schneider, Julie A; Walderhaug, Mark; Zuber, Emmanuel.

Clin Pharmacol Ther ; 115(4): 745-757, 2024 04.

Artículo en Inglés | MEDLINE | ID: mdl-37965805

RESUMEN

In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence and machine learning. This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas, such as "maintaining effective communication" and "following good data science practices," followed by the four steps of exploratory projects, namely (1) plan, (2) design, (3) develop, and (4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.

Asunto(s)

Inteligencia Artificial , Multiómica , Humanos , Aprendizaje Automático , Genómica

6.

Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science.

Gray, Magnus; Samala, Ravi; Liu, Qi; Skiles, Denny; Xu, Joshua; Tong, Weida; Wu, Leihong.

Clin Pharmacol Ther ; 115(4): 687-697, 2024 04.

Artículo en Inglés | MEDLINE | ID: mdl-38018360

RESUMEN

Artificial intelligence (AI) is increasingly being used in decision making across various industries, including the public health arena. Bias in any decision-making process can significantly skew outcomes, and AI systems have been shown to exhibit biases at times. The potential for AI systems to perpetuate and even amplify biases is a growing concern. Bias, as used in this paper, refers to the tendency toward a particular characteristic or behavior, and thus, a biased AI system is one that shows biased associations entities. In this literature review, we examine the current state of research on AI bias, including its sources, as well as the methods for measuring, benchmarking, and mitigating it. We also examine the biases and methods of mitigation specifically relevant to the healthcare field and offer a perspective on bias measurement and mitigation in regulatory science decision making.

Asunto(s)

Inteligencia Artificial , Benchmarking , Humanos , Sesgo , Salud Pública

7.

Characterization of mechanical stiffness using additive manufacturing and finite element analysis: potential tool for bone health assessment.

Marupudi, Sriharsha; Cao, Qian; Samala, Ravi; Petrick, Nicholas.

3D Print Med ; 9(1): 32, 2023 Nov 18.

Artículo en Inglés | MEDLINE | ID: mdl-37978094

RESUMEN

BACKGROUND: Bone health and fracture risk are known to be correlated with stiffness. Both micro-finite element analysis (µFEA) and mechanical testing of additive manufactured phantoms are useful approaches for estimating mechanical properties of trabecular bone-like structures. However, it is unclear if measurements from the two approaches are consistent. The purpose of this work is to evaluate the agreement between stiffness measurements obtained from mechanical testing of additive manufactured trabecular bone phantoms and µFEA modeling. Agreement between the two methods would suggest 3D printing is a viable method for validation of µFEA modeling. METHODS: A set of 20 lumbar vertebrae regions of interests were segmented and the corresponding trabecular bone phantoms were produced using selective laser sintering. The phantoms were mechanically tested in uniaxial compression to derive their stiffness values. The stiffness values were also derived from in silico simulation, where linear elastic µFEA was applied to simulate the same compression and boundary conditions. Bland-Altman analysis was used to evaluate agreement between the mechanical testing and µFEA simulation values. Additionally, we evaluated the fidelity of the 3D printed phantoms as well as the repeatability of the 3D printing and mechanical testing process. RESULTS: We observed good agreement between the mechanically tested stiffness and µFEA stiffness, with R2 of 0.84 and normalized root mean square deviation of 8.1%. We demonstrate that the overall trabecular bone structures are printed in high fidelity (Dice score of 0.97 (95% CI, [0.96,0.98]) and that mechanical testing is repeatable (coefficient of variation less than 5% for stiffness values from testing of duplicated phantoms). However, we noticed some defects in the resin microstructure of the 3D printed phantoms, which may account for the discrepancy between the stiffness values from simulation and mechanical testing. CONCLUSION: Overall, the level of agreement achieved between the mechanical stiffness and µFEA indicates that our µFEA methods may be acceptable for assessing bone mechanics of complex trabecular structures as part of an analysis of overall bone health.

8.

Proceedings of the NHLBI Workshop on Artificial Intelligence in Cardiovascular Imaging: Translation to Patient Care.

Dey, Damini; Arnaout, Rima; Antani, Sameer; Badano, Aldo; Jacques, Louis; Li, Huiqing; Leiner, Tim; Margerrison, Edward; Samala, Ravi; Sengupta, Partho P; Shah, Sanjiv J; Slomka, Piotr; Williams, Michelle C; Bandettini, W Patricia; Sachdev, Vandana.

JACC Cardiovasc Imaging ; 16(9): 1209-1223, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37480904

RESUMEN

Artificial intelligence (AI) promises to revolutionize many fields, but its clinical implementation in cardiovascular imaging is still rare despite increasing research. We sought to facilitate discussion across several fields and across the lifecycle of research, development, validation, and implementation to identify challenges and opportunities to further translation of AI in cardiovascular imaging. Furthermore, it seemed apparent that a multidisciplinary effort across institutions would be essential to overcome these challenges. This paper summarizes the proceedings of the National Heart, Lung, and Blood Institute-led workshop, creating consensus around needs and opportunities for institutions at several levels to support and advance research in this field and support future translation.

Asunto(s)

Inteligencia Artificial , Sistema Cardiovascular , Estados Unidos , Humanos , National Heart, Lung, and Blood Institute (U.S.) , Valor Predictivo de las Pruebas , Atención al Paciente

9.

Regulatory considerations for medical imaging AI/ML devices in the United States: concepts and challenges.

Petrick, Nicholas; Chen, Weijie; Delfino, Jana G; Gallas, Brandon D; Kang, Yanna; Krainak, Daniel; Sahiner, Berkman; Samala, Ravi K.

J Med Imaging (Bellingham) ; 10(5): 051804, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37361549

RESUMEN

Purpose: To introduce developers to medical device regulatory processes and data considerations in artificial intelligence and machine learning (AI/ML) device submissions and to discuss ongoing AI/ML-related regulatory challenges and activities. Approach: AI/ML technologies are being used in an increasing number of medical imaging devices, and the fast evolution of these technologies presents novel regulatory challenges. We provide AI/ML developers with an introduction to U.S. Food and Drug Administration (FDA) regulatory concepts, processes, and fundamental assessments for a wide range of medical imaging AI/ML device types. Results: The device type for an AI/ML device and appropriate premarket regulatory pathway is based on the level of risk associated with the device and informed by both its technological characteristics and intended use. AI/ML device submissions contain a wide array of information and testing to facilitate the review process with the model description, data, nonclinical testing, and multi-reader multi-case testing being critical aspects of the AI/ML device review process for many AI/ML device submissions. The agency is also involved in AI/ML-related activities that support guidance document development, good machine learning practice development, AI/ML transparency, AI/ML regulatory research, and real-world performance assessment. Conclusion: FDA's AI/ML regulatory and scientific efforts support the joint goals of ensuring patients have access to safe and effective AI/ML devices over the entire device lifecycle and stimulating medical AI/ML innovation.

10.

Data drift in medical machine learning: implications and potential remedies.

Sahiner, Berkman; Chen, Weijie; Samala, Ravi K; Petrick, Nicholas.

Br J Radiol ; 96(1150): 20220878, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-36971405

RESUMEN

Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.

Asunto(s)

Aprendizaje Automático , Computación en Informática Médica

11.

AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging.

Hadjiiski, Lubomir; Cha, Kenny; Chan, Heang-Ping; Drukker, Karen; Morra, Lia; Näppi, Janne J; Sahiner, Berkman; Yoshida, Hiroyuki; Chen, Quan; Deserno, Thomas M; Greenspan, Hayit; Huisman, Henkjan; Huo, Zhimin; Mazurchuk, Richard; Petrick, Nicholas; Regge, Daniele; Samala, Ravi; Summers, Ronald M; Suzuki, Kenji; Tourassi, Georgia; Vergara, Daniel; Armato, Samuel G.

Med Phys ; 50(2): e1-e24, 2023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-36565447

RESUMEN

Rapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods. Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic. To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support.

Asunto(s)

Inteligencia Artificial , Diagnóstico por Computador , Humanos , Reproducibilidad de los Resultados , Diagnóstico por Computador/métodos , Diagnóstico por Imagen , Aprendizaje Automático

12.

Computerized Decision Support for Bladder Cancer Treatment Response Assessment in CT Urography: Effect on Diagnostic Accuracy in Multi-Institution Multi-Specialty Study.

Sun, Di; Hadjiiski, Lubomir; Alva, Ajjai; Zakharia, Yousef; Joshi, Monika; Chan, Heang-Ping; Garje, Rohan; Pomerantz, Lauren; Elhag, Dean; Cohan, Richard H; Caoili, Elaine M; Kerr, Wesley T; Cha, Kenny H; Kirova-Nedyalkova, Galina; Davenport, Matthew S; Shankar, Prasad R; Francis, Isaac R; Shampain, Kimberly; Meyer, Nathaniel; Barkmeier, Daniel; Woolen, Sean; Palmbos, Phillip L; Weizer, Alon Z; Samala, Ravi K; Zhou, Chuan; Matuszak, Martha.

Tomography ; 8(2): 644-656, 2022 03 02.

Artículo en Inglés | MEDLINE | ID: mdl-35314631

RESUMEN

This observer study investigates the effect of computerized artificial intelligence (AI)-based decision support system (CDSS-T) on physicians' diagnostic accuracy in assessing bladder cancer treatment response. The performance of 17 observers was evaluated when assessing bladder cancer treatment response without and with CDSS-T using pre- and post-chemotherapy CTU scans in 123 patients having 157 pre- and post-treatment cancer pairs. The impact of cancer case difficulty, observers' clinical experience, institution affiliation, specialty, and the assessment times on the observers' diagnostic performance with and without using CDSS-T were analyzed. It was found that the average performance of the 17 observers was significantly improved (p = 0.002) when aided by the CDSS-T. The cancer case difficulty, institution affiliation, specialty, and the assessment times influenced the observers' performance without CDSS-T. The AI-based decision support system has the potential to improve the diagnostic accuracy in assessing bladder cancer treatment response and result in more consistent performance among all physicians.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas , Neoplasias de la Vejiga Urinaria , Inteligencia Artificial , Humanos , Tomografía Computarizada por Rayos X , Neoplasias de la Vejiga Urinaria/diagnóstico por imagen , Neoplasias de la Vejiga Urinaria/terapia , Urografía

13.

Effect of Dose Level on Radiologists' Detection of Microcalcifications in Digital Breast Tomosynthesis: An Observer Study with Breast Phantoms.

Chan, Heang-Ping; Helvie, Mark A; Klein, Katherine A; McLaughlin, Carol; Neal, Colleen H; Oudsema, Rebecca; Rahman, W Tania; Roubidoux, Marilyn A; Hadjiiski, Lubomir M; Zhou, Chuan; Samala, Ravi K.

Acad Radiol ; 29 Suppl 1: S42-S49, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-32950384

RESUMEN

OBJECTIVES: To compare radiologists' sensitivity, confidence level, and reading efficiency of detecting microcalcifications in digital breast tomosynthesis (DBT) at two clinically relevant dose levels. MATERIALS AND METHODS: Six 5-cm-thick heterogeneous breast phantoms embedded with a total of 144 simulated microcalcification clusters of four speck sizes were imaged at two dose modes by a clinical DBT system. The DBT volumes at the two dose levels were read independently by six MQSA radiologists and one fellow with 1-33 years (median 12 years) of experience in a fully-crossed counter-balanced manner. The radiologist located each potential cluster and rated its conspicuity and his/her confidence that the marked location contained a cluster. The differences in the results between the two dose modes were analyzed by two-tailed paired t-test. RESULTS: Compared to the lower-dose mode, the average glandular dose in the higher-dose mode for the 5-cm phantoms increased from 1.34 to 2.07 mGy. The detection sensitivity increased for all speck sizes and significantly for the two smaller sizes (p <0.05). An average of 13.8% fewer false positive clusters was marked. The average conspicuity rating and the radiologists' confidence level were higher for all speck sizes and reached significance (p <0.05) for the three larger sizes. The average reading time per detected cluster reduced significantly (p <0.05) by an average of 13.2%. CONCLUSION: For a 5-cm-thick breast, an increase in average glandular dose from 1.34 to 2.07 mGy for DBT imaging increased the conspicuity of microcalcifications, improved the detection sensitivity by radiologists, increased their confidence levels, reduced false positive detections, and increased the reading efficiency.

Asunto(s)

Neoplasias de la Mama , Calcinosis , Mama/diagnóstico por imagen , Calcinosis/diagnóstico por imagen , Femenino , Humanos , Masculino , Mamografía/métodos , Fantasmas de Imagen , Radiólogos

14.

Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification.

Samala, Ravi K; Chan, Heang-Ping; Hadjiiski, Lubomir; Helvie, Mark A.

Med Phys ; 48(6): 2827-2837, 2021 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-33368376

RESUMEN

PURPOSE: Transfer learning is commonly used in deep learning for medical imaging to alleviate the problem of limited available data. In this work, we studied the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) as feature extractor for classification breast masses in mammography. METHODS: Feature leakage occurs when the training set is used for feature selection and classifier modeling while the cost function is guided by the validation performance or informed by the test performance. The high-dimensional feature space extracted from pretrained DCNN suffers from the curse of dimensionality; feature subsets that can provide excessively optimistic performance can be found for the validation set or test set if the latter is allowed for unlimited reuse during algorithm development. We designed a simulation study to examine feature leakage when using DCNN as feature extractor for mass classification in mammography. Four thousand five hundred and seventy-seven unique mass lesions were partitioned by patient into three sets: 3222 for training, 508 for validation, and 847 for independent testing. Three pretrained DCNNs, AlexNet, GoogLeNet, and VGG16, were first compared using a training set in fourfold cross validation and one was selected as the feature extractor. To assess generalization errors, the independent test set was sequestered as truly unseen cases. A training set of a range of sizes from 10% to 75% was simulated by random drawing from the available training set in addition to 100% of the training set. Three commonly used feature classifiers, the linear discriminant, the support vector machine, and the random forest were evaluated. A sequential feature selection method was used to find feature subsets that could achieve high classification performance in terms of the area under the receiver operating characteristic curve (AUC) in the validation set. The extent of feature leakage and the impact of training set size were analyzed by comparison to the performance in the unseen test set. RESULTS: All three classifiers showed large generalization error between the validation set and the independent sequestered test set at all sample sizes. The generalization error decreased as the sample size increased. At 100% of the sample size, one classifier achieved an AUC as high as 0.91 on the validation set while the corresponding performance on the unseen test set only reached an AUC of 0.72. CONCLUSIONS: Our results demonstrate that large generalization errors can occur in AI tools due to feature leakage. Without evaluation on unseen test cases, optimistically biased performance may be reported inadvertently, and can lead to unrealistic expectations and reduce confidence for clinical implementation.

Asunto(s)

Mamografía , Redes Neurales de la Computación , Algoritmos , Mama/diagnóstico por imagen , Humanos , Tamaño de la Muestra

15.

Intraobserver Variability in Bladder Cancer Treatment Response Assessment With and Without Computerized Decision Support.

Hadjiiski, Lubomir M; Cha, Kenny H; Cohan, Richard H; Chan, Heang-Ping; Caoili, Elaine M; Davenport, Matthew S; Samala, Ravi K; Weizer, Alon Z; Alva, Ajjai; Kirova-Nedyalkova, Galina; Shampain, Kimberly; Meyer, Nathaniel; Barkmeier, Daniel; Woolen, Sean A; Shankar, Prasad R; Francis, Isaac R; Palmbos, Phillip L.

Tomography ; 6(2): 194-202, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32548296

RESUMEN

We evaluated the intraobserver variability of physicians aided by a computerized decision-support system for treatment response assessment (CDSS-T) to identify patients who show complete response to neoadjuvant chemotherapy for bladder cancer, and the effects of the intraobserver variability on physicians' assessment accuracy. A CDSS-T tool was developed that uses a combination of deep learning neural network and radiomic features from computed tomography (CT) scans to detect bladder cancers that have fully responded to neoadjuvant treatment. Pre- and postchemotherapy CT scans of 157 bladder cancers from 123 patients were collected. In a multireader, multicase observer study, physician-observers estimated the likelihood of pathologic T0 disease by viewing paired pre/posttreatment CT scans placed side by side on an in-house-developed graphical user interface. Five abdominal radiologists, 4 diagnostic radiology residents, 2 oncologists, and 1 urologist participated as observers. They first provided an estimate without CDSS-T and then with CDSS-T. A subset of cases was evaluated twice to study the intraobserver variability and its effects on observer consistency. The mean areas under the curves for assessment of pathologic T0 disease were 0.85 for CDSS-T alone, 0.76 for physicians without CDSS-T and improved to 0.80 for physicians with CDSS-T (P = .001) in the original evaluation, and 0.78 for physicians without CDSS-T and improved to 0.81 for physicians with CDSS-T (P = .010) in the repeated evaluation. The intraobserver variability was significantly reduced with CDSS-T (P < .0001). The CDSS-T can significantly reduce physicians' variability and improve their accuracy for identifying complete response of muscle-invasive bladder cancer to neoadjuvant chemotherapy.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas , Neoplasias de la Vejiga Urinaria , Humanos , Variaciones Dependientes del Observador , Médicos , Tomografía Computarizada por Rayos X , Neoplasias de la Vejiga Urinaria/diagnóstico por imagen , Neoplasias de la Vejiga Urinaria/tratamiento farmacológico

16.

Computer-aided diagnosis in the era of deep learning.

Chan, Heang-Ping; Hadjiiski, Lubomir M; Samala, Ravi K.

Med Phys ; 47(5): e218-e227, 2020 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-32418340

RESUMEN

Computer-aided diagnosis (CAD) has been a major field of research for the past few decades. CAD uses machine learning methods to analyze imaging and/or nonimaging patient data and makes assessment of the patient's condition, which can then be used to assist clinicians in their decision-making process. The recent success of the deep learning technology in machine learning spurs new research and development efforts to improve CAD performance and to develop CAD for many other complex clinical tasks. In this paper, we discuss the potential and challenges in developing CAD tools using deep learning technology or artificial intelligence (AI) in general, the pitfalls and lessons learned from CAD in screening mammography and considerations needed for future implementation of CAD or AI in clinical use. It is hoped that the past experiences and the deep learning technology will lead to successful advancement and lasting growth in this new era of CAD, thereby enabling CAD to deliver intelligent aids to improve health care.

Asunto(s)

Aprendizaje Profundo , Diagnóstico por Computador/métodos , Humanos

17.

Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis.

Samala, Ravi K; Chan, Heang-Ping; Hadjiiski, Lubomir M; Helvie, Mark A; Richter, Caleb D.

Phys Med Biol ; 65(10): 105002, 2020 05 11.

Artículo en Inglés | MEDLINE | ID: mdl-32208369

RESUMEN

Deep convolutional neural network (DCNN), now popularly called artificial intelligence (AI), has shown the potential to improve over previous computer-assisted tools in medical imaging developed in the past decades. A DCNN has millions of free parameters that need to be trained, but the training sample set is limited in size for most medical imaging tasks so that transfer learning is typically used. Automatic data mining may be an efficient way to enlarge the collected data set but the data can be noisy such as incorrect labels or even a wrong type of image. In this work we studied the generalization error of DCNN with transfer learning in medical imaging for the task of classifying malignant and benign masses on mammograms. With a finite available data set, we simulated a training set containing corrupted data or noisy labels. The balance between learning and memorization of the DCNN was manipulated by varying the proportion of corrupted data in the training set. The generalization error of DCNN was analyzed by the area under the receiver operating characteristic curve for the training and test sets and the weight changes after transfer learning. The study demonstrates that the transfer learning strategy of DCNN for such tasks needs to be designed properly, taking into consideration the constraints of the available training set having limited size and quality for the classification task at hand, to minimize memorization and improve generalizability.

Asunto(s)

Neoplasias de la Mama/diagnóstico por imagen , Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Femenino , Humanos , Mamografía , Curva ROC

18.

Deep Learning in Medical Image Analysis.

Chan, Heang-Ping; Samala, Ravi K; Hadjiiski, Lubomir M; Zhou, Chuan.

Adv Exp Med Biol ; 1213: 3-21, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32030660

RESUMEN

Deep learning is the state-of-the-art machine learning approach. The success of deep learning in many pattern recognition applications has brought excitement and high expectations that deep learning, or artificial intelligence (AI), can bring revolutionary changes in health care. Early studies of deep learning applied to lesion detection or classification have reported superior performance compared to those by conventional techniques or even better than radiologists in some tasks. The potential of applying deep-learning-based medical image analysis to computer-aided diagnosis (CAD), thus providing decision support to clinicians and improving the accuracy and efficiency of various diagnostic and treatment processes, has spurred new research and development efforts in CAD. Despite the optimism in this new era of machine learning, the development and implementation of CAD or AI tools in clinical practice face many challenges. In this chapter, we will discuss some of these issues and efforts needed to develop robust deep-learning-based CAD tools and integrate these tools into the clinical workflow, thereby advancing towards the goal of providing reliable intelligent aids for patient care.

Asunto(s)

Aprendizaje Profundo , Diagnóstico por Computador , Diagnóstico por Imagen , Interpretación de Imagen Asistida por Computador , Humanos

19.

CAD and AI for breast cancer-recent development and challenges.

Chan, Heang-Ping; Samala, Ravi K; Hadjiiski, Lubomir M.

Br J Radiol ; 93(1108): 20190580, 2020 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-31742424

RESUMEN

Computer-aided diagnosis (CAD) has been a popular area of research and development in the past few decades. In CAD, machine learning methods and multidisciplinary knowledge and techniques are used to analyze the patient information and the results can be used to assist clinicians in their decision making process. CAD may analyze imaging information alone or in combination with other clinical data. It may provide the analyzed information directly to the clinician or correlate the analyzed results with the likelihood of certain diseases based on statistical modeling of the past cases in the population. CAD systems can be developed to provide decision support for many applications in the patient care processes, such as lesion detection, characterization, cancer staging, treatment planning and response assessment, recurrence and prognosis prediction. The new state-of-the-art machine learning technique, known as deep learning (DL), has revolutionized speech and text recognition as well as computer vision. The potential of major breakthrough by DL in medical image analysis and other CAD applications for patient care has brought about unprecedented excitement of applying CAD, or artificial intelligence (AI), to medicine in general and to radiology in particular. In this paper, we will provide an overview of the recent developments of CAD using DL in breast imaging and discuss some challenges and practical issues that may impact the advancement of artificial intelligence and its integration into clinical workflow.

Asunto(s)

Inteligencia Artificial/tendencias , Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador/tendencias , Bibliometría , Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Profundo/tendencias , Diagnóstico por Computador/métodos , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/tendencias , Mamografía/métodos , Redes Neurales de la Computación , Garantía de la Calidad de Atención de Salud , Radiología/educación , Ultrasonografía Mamaria/métodos , Ultrasonografía Mamaria/tendencias

20.

Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets.

Samala, Ravi K; Hadjiiski, Lubomir; Helvie, Mark A; Richter, Caleb D; Cha, Kenny H.

IEEE Trans Med Imaging ; 38(3): 686-696, 2019 03.

Artículo en Inglés | MEDLINE | ID: mdl-31622238

RESUMEN

In this paper, we developed a deep convolutional neural network (CNN) for the classification of malignant and benign masses in digital breast tomosynthesis (DBT) using a multi-stage transfer learning approach that utilized data from similar auxiliary domains for intermediate-stage fine-tuning. Breast imaging data from DBT, digitized screen-film mammography, and digital mammography totaling 4039 unique regions of interest (1797 malignant and 2242 benign) were collected. Using cross validation, we selected the best transfer network from six transfer networks by varying the level up to which the convolutional layers were frozen. In a single-stage transfer learning approach, knowledge from CNN trained on the ImageNet data was fine-tuned directly with the DBT data. In a multi-stage transfer learning approach, knowledge learned from ImageNet was first fine-tuned with the mammography data and then fine-tuned with the DBT data. Two transfer networks were compared for the second-stage transfer learning by freezing most of the CNN structures versus freezing only the first convolutional layer. We studied the dependence of the classification performance on training sample size for various transfer learning and fine-tuning schemes by varying the training data from 1% to 100% of the available sets. The area under the receiver operating characteristic curve (AUC) was used as a performance measure. The view-based AUC on the test set for single-stage transfer learning was 0.85 ± 0.05 and improved significantly (p <; 0.05$ ) to 0.91 ± 0.03 for multi-stage learning. This paper demonstrated that, when the training sample size from the target domain is limited, an additional stage of transfer learning using data from a similar auxiliary domain is advantageous.

Asunto(s)

Neoplasias de la Mama/diagnóstico por imagen , Aprendizaje Automático , Mamografía/métodos , Redes Neurales de la Computación , Área Bajo la Curva , Humanos , Michigan , Tamaño de la Muestra

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA