Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 223
Filter
1.
Patterns (N Y) ; 5(7): 100974, 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39081567

ABSTRACT

Artificial intelligence (AI) shows potential to improve health care by leveraging data to build models that can inform clinical workflows. However, access to large quantities of diverse data is needed to develop robust generalizable models. Data sharing across institutions is not always feasible due to legal, security, and privacy concerns. Federated learning (FL) allows for multi-institutional training of AI models, obviating data sharing, albeit with different security and privacy concerns. Specifically, insights exchanged during FL can leak information about institutional data. In addition, FL can introduce issues when there is limited trust among the entities performing the compute. With the growing adoption of FL in health care, it is imperative to elucidate the potential risks. We thus summarize privacy-preserving FL literature in this work with special regard to health care. We draw attention to threats and review mitigation approaches. We anticipate this review to become a health-care researcher's guide to security and privacy in FL.

2.
Radiol Artif Intell ; 6(4): e240225, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38984986

ABSTRACT

The Radiological Society of North of America (RSNA) and the Medical Image Computing and Computer Assisted Intervention (MICCAI) Society have led a series of joint panels and seminars focused on the present impact and future directions of artificial intelligence (AI) in radiology. These conversations have collected viewpoints from multidisciplinary experts in radiology, medical imaging, and machine learning on the current clinical penetration of AI technology in radiology and how it is impacted by trust, reproducibility, explainability, and accountability. The collective points-both practical and philosophical-define the cultural changes for radiologists and AI scientists working together and describe the challenges ahead for AI technologies to meet broad approval. This article presents the perspectives of experts from MICCAI and RSNA on the clinical, cultural, computational, and regulatory considerations-coupled with recommended reading materials-essential to adopt AI technology successfully in radiology and, more generally, in clinical practice. The report emphasizes the importance of collaboration to improve clinical deployment, highlights the need to integrate clinical and medical imaging data, and introduces strategies to ensure smooth and incentivized integration. Keywords: Adults and Pediatrics, Computer Applications-General (Informatics), Diagnosis, Prognosis © RSNA, 2024.


Subject(s)
Artificial Intelligence , Radiology , Humans , Radiology/methods , Societies, Medical
3.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38876123

ABSTRACT

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Subject(s)
Artificial Intelligence , Magnetic Resonance Imaging , Prostatic Neoplasms , Radiologists , Humans , Male , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Aged , Retrospective Studies , Middle Aged , Neoplasm Grading , Netherlands , ROC Curve
4.
Ophthalmology ; 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38866367

ABSTRACT

PURPOSE: To evaluate whether providing clinicians with an artificial intelligence (AI)-based vascular severity score (VSS) improves consistency in the diagnosis of plus disease in retinopathy of prematurity (ROP). DESIGN: Multireader diagnostic accuracy imaging study. PARTICIPANTS: Eleven ROP experts, 9 of whom had been in practice for 10 years or more. METHODS: RetCam (Natus Medical Incorporated) fundus images were obtained from premature infants during routine ROP screening as part of the Imaging and Informatics in ROP study between January 2012 and July 2020. From all available examinations, a subset of 150 eye examinations from 110 infants were selected for grading. An AI-based VSS was assigned to each set of images using the i-ROP DL system (Siloam Vision). The clinicians were asked to diagnose plus disease for each examination and to assign an estimated VSS (range, 1-9) at baseline, and then again 1 month later with AI-based VSS assistance. A reference standard diagnosis (RSD) was assigned to each eye examination from the Imaging and Informatics in ROP study based on 3 masked expert labels and the ophthalmoscopic diagnosis. MAIN OUTCOME MEASURES: Mean linearly weighted κ value for plus disease diagnosis compared with RSD. Area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPR) for labels 1 through 9 compared with RSD for plus disease. RESULTS: Expert agreement improved significantly, from substantial (κ value, 0.69 [0.59, 0.75]) to near perfect (κ value, 0.81 [0.71, 0.86]), when AI-based VSS was integrated. Additionally, a significant improvement in plus disease discrimination was achieved as measured by mean AUC (from 0.94 [95% confidence interval (CI), 0.92-0.96] to 0.98 [95% CI, 0.96-0.99]; difference, 0.04 [95% CI, 0.01-0.06]) and AUPR (from 0.86 [95% CI, 0.81-0.90] to 0.95 [95% CI, 0.91-0.97]; difference, 0.09 [95% CI, 0.03-0.14]). CONCLUSIONS: Providing ROP clinicians with an AI-based measurement of vascular severity in ROP was associated with both improved plus disease diagnosis and improved continuous severity labeling as compared with an RSD for plus disease. If implemented in practice, AI-based VSS could reduce interobserver variability and could standardize treatment for infants with ROP. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

5.
Med Image Anal ; 95: 103206, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38776844

ABSTRACT

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.


Subject(s)
Algorithms , Breast Density , Breast Neoplasms , Mammography , Humans , Female , Mammography/methods , Breast Neoplasms/diagnostic imaging , Machine Learning
6.
Annu Rev Biomed Eng ; 26(1): 529-560, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38594947

ABSTRACT

Despite the remarkable advances in cancer diagnosis, treatment, and management over the past decade, malignant tumors remain a major public health problem. Further progress in combating cancer may be enabled by personalizing the delivery of therapies according to the predicted response for each individual patient. The design of personalized therapies requires the integration of patient-specific information with an appropriate mathematical model of tumor response. A fundamental barrier to realizing this paradigm is the current lack of a rigorous yet practical mathematical theory of tumor initiation, development, invasion, and response to therapy. We begin this review with an overview of different approaches to modeling tumor growth and treatment, including mechanistic as well as data-driven models based on big data and artificial intelligence. We then present illustrative examples of mathematical models manifesting their utility and discuss the limitations of stand-alone mechanistic and data-driven models. We then discuss the potential of mechanistic models for not only predicting but also optimizing response to therapy on a patient-specific basis. We describe current efforts and future possibilities to integrate mechanistic and data-driven models. We conclude by proposing five fundamental challenges that must be addressed to fully realize personalized care for cancer patients driven by computational models.


Subject(s)
Artificial Intelligence , Big Data , Neoplasms , Precision Medicine , Humans , Neoplasms/therapy , Precision Medicine/methods , Computer Simulation , Models, Biological , Patient-Specific Modeling
7.
Radiol Artif Intell ; 6(3): e230227, 2024 May.
Article in English | MEDLINE | ID: mdl-38477659

ABSTRACT

The Radiological Society of North America (RSNA) has held artificial intelligence competitions to tackle real-world medical imaging problems at least annually since 2017. This article examines the challenges and processes involved in organizing these competitions, with a specific emphasis on the creation and curation of high-quality datasets. The collection of diverse and representative medical imaging data involves dealing with issues of patient privacy and data security. Furthermore, ensuring quality and consistency in data, which includes expert labeling and accounting for various patient and imaging characteristics, necessitates substantial planning and resources. Overcoming these obstacles requires meticulous project management and adherence to strict timelines. The article also highlights the potential of crowdsourced annotation to progress medical imaging research. Through the RSNA competitions, an effective global engagement has been realized, resulting in innovative solutions to complex medical imaging problems, thus potentially transforming health care by enhancing diagnostic accuracy and patient outcomes. Keywords: Use of AI in Education, Artificial Intelligence © RSNA, 2024.


Subject(s)
Artificial Intelligence , Radiology , Humans , Diagnostic Imaging/methods , Societies, Medical , North America
8.
JAMA Ophthalmol ; 142(4): 327-335, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38451496

ABSTRACT

Importance: Retinopathy of prematurity (ROP) is a leading cause of blindness in children, with significant disparities in outcomes between high-income and low-income countries, due in part to insufficient access to ROP screening. Objective: To evaluate how well autonomous artificial intelligence (AI)-based ROP screening can detect more-than-mild ROP (mtmROP) and type 1 ROP. Design, Setting, and Participants: This diagnostic study evaluated the performance of an AI algorithm, trained and calibrated using 2530 examinations from 843 infants in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) study, on 2 external datasets (6245 examinations from 1545 infants in the Stanford University Network for Diagnosis of ROP [SUNDROP] and 5635 examinations from 2699 infants in the Aravind Eye Care Systems [AECS] telemedicine programs). Data were taken from 11 and 48 neonatal care units in the US and India, respectively. Data were collected from January 2012 to July 2021, and data were analyzed from July to December 2023. Exposures: An imaging processing pipeline was created using deep learning to autonomously identify mtmROP and type 1 ROP in eye examinations performed via telemedicine. Main Outcomes and Measures: The area under the receiver operating characteristics curve (AUROC) as well as sensitivity and specificity for detection of mtmROP and type 1 ROP at the eye examination and patient levels. Results: The prevalence of mtmROP and type 1 ROP were 5.9% (91 of 1545) and 1.2% (18 of 1545), respectively, in the SUNDROP dataset and 6.2% (168 of 2699) and 2.5% (68 of 2699) in the AECS dataset. Examination-level AUROCs for mtmROP and type 1 ROP were 0.896 and 0.985, respectively, in the SUNDROP dataset and 0.920 and 0.982 in the AECS dataset. At the cross-sectional examination level, mtmROP detection had high sensitivity (SUNDROP: mtmROP, 83.5%; 95% CI, 76.6-87.7; type 1 ROP, 82.2%; 95% CI, 81.2-83.1; AECS: mtmROP, 80.8%; 95% CI, 76.2-84.9; type 1 ROP, 87.8%; 95% CI, 86.8-88.7). At the patient level, all infants who developed type 1 ROP screened positive (SUNDROP: 100%; 95% CI, 81.4-100; AECS: 100%; 95% CI, 94.7-100) prior to diagnosis. Conclusions and Relevance: Where and when ROP telemedicine programs can be implemented, autonomous ROP screening may be an effective force multiplier for secondary prevention of ROP.


Subject(s)
Retinopathy of Prematurity , Infant, Newborn , Infant , Child , Humans , Retinopathy of Prematurity/diagnosis , Artificial Intelligence , Cross-Sectional Studies , Gestational Age , Infant, Premature
9.
Radiol Artif Intell ; 6(1): e220231, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38197800

ABSTRACT

Purpose To present results from a literature survey on practices in deep learning segmentation algorithm evaluation and perform a study on expert quality perception of brain tumor segmentation. Materials and Methods A total of 180 articles reporting on brain tumor segmentation algorithms were surveyed for the reported quality evaluation. Additionally, ratings of segmentation quality on a four-point scale were collected from medical professionals for 60 brain tumor segmentation cases. Results Of the surveyed articles, Dice score, sensitivity, and Hausdorff distance were the most popular metrics to report segmentation performance. Notably, only 2.8% of the articles included clinical experts' evaluation of segmentation quality. The experimental results revealed a low interrater agreement (Krippendorff α, 0.34) in experts' segmentation quality perception. Furthermore, the correlations between the ratings and commonly used quantitative quality metrics were low (Kendall tau between Dice score and mean rating, 0.23; Kendall tau between Hausdorff distance and mean rating, 0.51), with large variability among the experts. Conclusion The results demonstrate that quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences, and existing metrics do not capture the clinical perception of segmentation quality. Keywords: Brain Tumor Segmentation, Deep Learning Algorithms, Glioblastoma, Cancer, Machine Learning Clinical trial registration nos. NCT00756106 and NCT00662506 Supplemental material is available for this article. © RSNA, 2023.


Subject(s)
Brain Neoplasms , Deep Learning , Glioblastoma , Humans , Algorithms , Benchmarking , Brain Neoplasms/diagnostic imaging , Glioblastoma/diagnostic imaging
10.
Radiol Imaging Cancer ; 6(1): e230033, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38180338

ABSTRACT

Purpose To describe the design, conduct, and results of the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge. Materials and Methods The BMMR2 computational challenge opened on May 28, 2021, and closed on December 21, 2021. The goal of the challenge was to identify image-based markers derived from multiparametric breast MRI, including diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) MRI, along with clinical data for predicting pathologic complete response (pCR) following neoadjuvant treatment. Data included 573 breast MRI studies from 191 women (mean age [±SD], 48.9 years ± 10.56) in the I-SPY 2/American College of Radiology Imaging Network (ACRIN) 6698 trial (ClinicalTrials.gov: NCT01042379). The challenge cohort was split into training (60%) and test (40%) sets, with teams blinded to test set pCR outcomes. Prediction performance was evaluated by area under the receiver operating characteristic curve (AUC) and compared with the benchmark established from the ACRIN 6698 primary analysis. Results Eight teams submitted final predictions. Entries from three teams had point estimators of AUC that were higher than the benchmark performance (AUC, 0.782 [95% CI: 0.670, 0.893], with AUCs of 0.803 [95% CI: 0.702, 0.904], 0.838 [95% CI: 0.748, 0.928], and 0.840 [95% CI: 0.748, 0.932]). A variety of approaches were used, ranging from extraction of individual features to deep learning and artificial intelligence methods, incorporating DCE and DWI alone or in combination. Conclusion The BMMR2 challenge identified several models with high predictive performance, which may further expand the value of multiparametric breast MRI as an early marker of treatment response. Clinical trial registration no. NCT01042379 Keywords: MRI, Breast, Tumor Response Supplemental material is available for this article. © RSNA, 2024.


Subject(s)
Breast Neoplasms , Multiparametric Magnetic Resonance Imaging , Female , Humans , Middle Aged , Artificial Intelligence , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/drug therapy , Magnetic Resonance Imaging , Neoadjuvant Therapy , Pathologic Complete Response , Adult
11.
Elife ; 122024 Jan 15.
Article in English | MEDLINE | ID: mdl-38224340

ABSTRACT

Background: The HPV-automated visual evaluation (PAVE) Study is an extensive, multinational initiative designed to advance cervical cancer prevention in resource-constrained regions. Cervical cancer disproportionally affects regions with limited access to preventive measures. PAVE aims to assess a novel screening-triage-treatment strategy integrating self-sampled HPV testing, deep-learning-based automated visual evaluation (AVE), and targeted therapies. Methods: Phase 1 efficacy involves screening up to 100,000 women aged 25-49 across nine countries, using self-collected vaginal samples for hierarchical HPV evaluation: HPV16, else HPV18/45, else HPV31/33/35/52/58, else HPV39/51/56/59/68 else negative. HPV-positive individuals undergo further evaluation, including pelvic exams, cervical imaging, and biopsies. AVE algorithms analyze images, assigning risk scores for precancer, validated against histologic high-grade precancer. Phase 1, however, does not integrate AVE results into patient management, contrasting them with local standard care.Phase 2 effectiveness focuses on deploying AVE software and HPV genotype data in real-time clinical decision-making, evaluating feasibility, acceptability, cost-effectiveness, and health communication of the PAVE strategy in practice. Results: Currently, sites have commenced fieldwork, and conclusive results are pending. Conclusions: The study aspires to validate a screen-triage-treat protocol utilizing innovative biomarkers to deliver an accurate, feasible, and cost-effective strategy for cervical cancer prevention in resource-limited areas. Should the study validate PAVE, its broader implementation could be recommended, potentially expanding cervical cancer prevention worldwide. Funding: The consortial sites are responsible for their own study costs. Research equipment and supplies, and the NCI-affiliated staff are funded by the National Cancer Institute Intramural Research Program including supplemental funding from the Cancer Cures Moonshot Initiative. No commercial support was obtained. Brian Befano was supported by NCI/ NIH under Grant T32CA09168.


Subject(s)
Papillomavirus Infections , Uterine Cervical Neoplasms , Humans , Female , Uterine Cervical Neoplasms/diagnosis , Uterine Cervical Neoplasms/prevention & control , Early Detection of Cancer , Papillomavirus Infections/diagnosis , Vagina , Algorithms
12.
Clin Cancer Res ; 30(7): 1327-1337, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38252427

ABSTRACT

PURPOSE: Adverse clinical events cause significant morbidity in patients with GBM (GBM). We examined whether genomic alterations were associated with AE (AE) in patients with GBM. EXPERIMENTAL DESIGN: We identified adults with histologically confirmed IDH-wild-type GBM with targeted next-generation sequencing (OncoPanel) at Dana Farber Cancer Institute from 2013 to 2019. Seizure at presentation, lymphopenia, thromboembolic events, pseudoprogression, and early progression (within 6 months of diagnosis) were identified as AE. The biologic function of genetic variants was categorized as loss-of-function (LoF), no change in function, or gain-of-function (GoF) using a somatic tumor mutation knowledge base (OncoKB) and consensus protein function predictions. Associations between functional genomic alterations and AE were examined using univariate logistic regressions and multivariable regressions adjusted for additional clinical predictors. RESULTS: Our study included 470 patients diagnosed with GBM who met the study criteria. We focused on 105 genes that had sequencing data available for ≥ 90% of the patients and were altered in ≥10% of the cohort. Following false-discovery rate (FDR) correction and multivariable adjustment, the TP53, RB1, IGF1R, and DIS3 LoF alterations were associated with lower odds of seizures, while EGFR, SMARCA4, GNA11, BRD4, and TCF3 GoF and SETD2 LoF alterations were associated with higher odds of seizures. For all other AE of interest, no significant associations were found with genomic alterations following FDR correction. CONCLUSIONS: Genomic biomarkers based on functional variant analysis of a routine clinical panel may help identify AE in GBM, particularly seizures. Identifying these risk factors could improve the management of patients through better supportive care and consideration of prophylactic therapies.


Subject(s)
Brain Neoplasms , Glioblastoma , Adult , Humans , Glioblastoma/genetics , Glioblastoma/pathology , Nuclear Proteins/genetics , Transcription Factors/genetics , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Genomics , Seizures/genetics , Mutation , DNA Helicases/genetics , Bromodomain Containing Proteins , Cell Cycle Proteins/genetics
14.
Acad Radiol ; 31(4): 1572-1582, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37951777

ABSTRACT

RATIONALE AND OBJECTIVES: Brain tumor segmentations are integral to the clinical management of patients with glioblastoma, the deadliest primary brain tumor in adults. The manual delineation of tumors is time-consuming and highly provider-dependent. These two problems must be addressed by introducing automated, deep-learning-based segmentation tools. This study aimed to identify criteria experts use to evaluate the quality of automatically generated segmentations and their thought processes as they correct them. MATERIALS AND METHODS: Multiple methods were used to develop a detailed understanding of the complex factors that shape experts' perception of segmentation quality and their thought processes in correcting proposed segmentations. Data from a questionnaire and semistructured interview with neuro-oncologists and neuroradiologists were collected between August and December 2021 and analyzed using a combined deductive and inductive approach. RESULTS: Brain tumors are highly complex and ambiguous segmentation targets. Therefore, physicians rely heavily on the given context related to the patient and clinical context in evaluating the quality and need to correct brain tumor segmentation. Most importantly, the intended clinical application determines the segmentation quality criteria and editing decisions. Physicians' personal beliefs and preferences about the capabilities of AI algorithms and whether questionable areas should not be included are additional criteria influencing the perception of segmentation quality and appearance of an edited segmentation. CONCLUSION: Our findings on experts' perceptions of segmentation quality will allow the design of improved frameworks for expert-centered evaluation of brain tumor segmentation models. In particular, the knowledge presented here can inspire the development of brain tumor-specific metrics for segmentation model training and evaluation.


Subject(s)
Brain Neoplasms , Glioblastoma , Adult , Humans , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/pathology , Algorithms , Glioblastoma/pathology , Pattern Recognition, Automated/methods , Tumor Burden , Magnetic Resonance Imaging/methods , Image Processing, Computer-Assisted/methods
15.
J Natl Cancer Inst ; 116(1): 26-33, 2024 01 10.
Article in English | MEDLINE | ID: mdl-37758250

ABSTRACT

Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.


Subject(s)
Artificial Intelligence , Uterine Cervical Neoplasms , Female , Humans , Early Detection of Cancer , Uterine Cervical Neoplasms/diagnosis , Algorithms , Image Processing, Computer-Assisted
16.
J Low Genit Tract Dis ; 28(1): 37-42, 2024 Jan 01.
Article in English | MEDLINE | ID: mdl-37963327

ABSTRACT

OBJECTIVES/PURPOSE: The reproducibility and sensitivity of image-based colposcopy is low, but agreement on lesion presence and location remains to be explored. Here, we investigate the interobserver agreement on lesions on colposcopic images by evaluating and comparing marked lesions on digitized colposcopic images between colposcopists. METHODS: Five colposcopists reviewed images from 268 colposcopic examinations. Cases were selected based on histologic diagnosis, i.e., normal/cervical intraepithelial neoplasia (CIN)1 ( n = 50), CIN2 ( n = 50), CIN3 ( n = 100), adenocarcinoma in situ ( n = 53), and cancer ( n = 15). We obtained digitized time-series images every 7-10 seconds from before acetic acid application to 2 minutes after application. Colposcopists were instructed to digitally annotate all areas with acetowhitening or suspect of lesions. To estimate the agreement on lesion presence and location, we assessed the proportion of images with annotations and the proportion of images with overlapping annotated area by at least 4 (4+) colposcopists, respectively. RESULTS: We included images from 241 examinations (1 image from each) with adequate annotations. The proportion with a least 1 lesion annotated by 4+ colposcopists increased by severity of histologic diagnosis. Among the CIN3 cases, 84% had at least 1 lesion annotated by 4+ colposcopists, whereas 54% of normal/CIN1 cases had a lesion annotated. Notably, the proportion was 70% for adenocarcinoma in situ and 71% for cancer. Regarding lesion location, there was no linear association with severity of histologic diagnosis. CONCLUSION: Despite that 80% of the CIN2 and CIN3 cases were annotated by 4+ colposcopists, we did not find increasing agreement on lesion location with histology severity. This underlines the subjective nature of colposcopy.


Subject(s)
Adenocarcinoma in Situ , Uterine Cervical Dysplasia , Uterine Cervical Neoplasms , Female , Pregnancy , Humans , Colposcopy/methods , Uterine Cervical Neoplasms/diagnosis , Uterine Cervical Neoplasms/pathology , Reproducibility of Results , Uterine Cervical Dysplasia/pathology
17.
Ophthalmol Sci ; 4(2): 100417, 2024.
Article in English | MEDLINE | ID: mdl-38059124

ABSTRACT

Purpose: Retinopathy of prematurity (ROP) is one of the leading causes of blindness in children. Although the role of oxygen in the pathophysiology of ROP is well established, a precise understanding of the dynamic relationship between oxygen exposure ROP incidence and severity is lacking. The purpose of this study was to evaluate the correlation between time-dependent oxygen variables and the onset of ROP. Design: Retrospective cohort study. Participants: Two hundred thirty infants who were born at a single academic center and met the inclusion criteria were included. Infants are mainly born between January 2011 and October 2022. Methods: Patient data were extracted from electronic health records (EHRs), with sufficient time-dependent oxygen data. Clinical outcomes for ROP were recorded as none/mild or moderate/severe (defined as type II or worse). Mixed-effects linear models were used to compare the 2 groups in terms of dynamic oxygen variables, such as daily average and the coefficient of variation (COV) fraction of inspired oxygen (FiO2). Support vector machine (SVM) and long-short-term memory (LSTM)-based multimodal models were trained with fivefold cross-validation to predict which infants would develop moderate/severe ROP. Gestational age (GA), birth weight, and time-dependent oxygen variables were used to develop predictive models. Main Outcome Measures: Model cross-validation performance was evaluated by computing the mean area under the receiver operating characteristic (AUROC) curve, precision, recall, and F1 score. Results: We found that both daily average and COV of FiO2 were associated with more severe ROP (adjusted P < 0.001). With fivefold cross-validation, the multimodal LSTM models had higher performance than the best static models (SVM using GA and 3 average FiO2 features) and SVM models trained on GA alone (mean AUROC = 0.89 ± 0.04 vs. 0.86 ± 0.05 vs. 0.83 ± 0.04). Conclusions: The development of severe ROP might not only be influenced by oxygen exposure but also by its fluctuation, which provides direction for future study of pathophysiological factors associated with severe ROP development. Additionally, we demonstrated that multimodal neural networks can be a method to extract useful information from time-series data, which may be a valuable methodology for the investigation of other diseases using EHR data. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

18.
Sci Rep ; 13(1): 21772, 2023 12 08.
Article in English | MEDLINE | ID: mdl-38066031

ABSTRACT

Cervical cancer is a leading cause of cancer mortality, with approximately 90% of the 250,000 deaths per year occurring in low- and middle-income countries (LMIC). Secondary prevention with cervical screening involves detecting and treating precursor lesions; however, scaling screening efforts in LMIC has been hampered by infrastructure and cost constraints. Recent work has supported the development of an artificial intelligence (AI) pipeline on digital images of the cervix to achieve an accurate and reliable diagnosis of treatable precancerous lesions. In particular, WHO guidelines emphasize visual triage of women testing positive for human papillomavirus (HPV) as the primary screen, and AI could assist in this triage task. In this work, we implemented a comprehensive deep-learning model selection and optimization study on a large, collated, multi-geography, multi-institution, and multi-device dataset of 9462 women (17,013 images). We evaluated relative portability, repeatability, and classification performance. The top performing model, when combined with HPV type, achieved an area under the Receiver Operating Characteristics (ROC) curve (AUC) of 0.89 within our study population of interest, and a limited total extreme misclassification rate of 3.4%, on held-aside test sets. Our model also produced reliable and consistent predictions, achieving a strong quadratic weighted kappa (QWK) of 0.86 and a minimal %2-class disagreement (% 2-Cl. D.) of 0.69%, between image pairs across women. Our work is among the first efforts at designing a robust, repeatable, accurate and clinically translatable deep-learning model for cervical screening.


Subject(s)
Papillomavirus Infections , Uterine Cervical Neoplasms , Humans , Female , Cervix Uteri/pathology , Papillomavirus Infections/epidemiology , Artificial Intelligence , Early Detection of Cancer/methods , Mass Screening/methods , Neural Networks, Computer
19.
Neuro Oncol ; 2023 Dec 09.
Article in English | MEDLINE | ID: mdl-38070147

ABSTRACT

BACKGROUND: We recently conducted a phase 2 trial (NCT028865685) evaluating intracranial efficacy of pembrolizumab for brain metastases (BM) of diverse histologies. Our study met its primary efficacy endpoint and illustrates that pembrolizumab exerts promising activity in a select group of patients with BM. Given the importance of aberrant vasculature in mediating immunosuppression, we explored the relationship between checkpoint inhibitor (ICI) efficacy and vascular architecture in the hopes of identifying potential mechanisms of intracranial ICI response or resistance for BM. METHODS: Using Vessel Architectural Imaging (VAI), a histologically validated quantitative metric for in vivo tumor vascular physiology, we analyzed dual echo DSC/DCE MRI for 44 patients on trial. Tumor and peri-tumor cerebral blood volume/flow, vessel size, arterial- and venous-dominance, and vascular permeability were measured before and after treatment with pembrolizumab. RESULTS: BM that progressed on ICI were characterized by a highly aberrant vasculature dominated by large-caliber vessels. In contrast, ICI-responsive BM possessed a more structurally balanced vasculature consisting of both small and large vessels, and there was a trend towards a decrease in under-perfused tissue, suggesting a reversal of the negative effects of hypoxia. In the peri-tumor region, development of smaller blood vessels, consistent with neo-angiogenesis, was associated with tumor growth before radiographic evidence of contrast enhancement on anatomical MRI. CONCLUSIONS: This study, one of the largest functional imaging studies for BM, suggests that vascular architecture is linked with ICI efficacy. Studies identifying modulators of vascular architecture, and effects on immune activity, are warranted and may inform future combination treatments.

SELECTION OF CITATIONS
SEARCH DETAIL