RESUMEN
Significant advances in artificial intelligence (AI) over the past decade potentially may lead to dramatic effects on clinical practice. Digitized histology represents an area ripe for AI implementation. We describe several current needs within the world of gastrointestinal histopathology, and outline, using currently studied models, how AI potentially can address them. We also highlight pitfalls as AI makes inroads into clinical practice.
Asunto(s)
Inteligencia Artificial , Enfermedades Gastrointestinales , Humanos , Enfermedades Gastrointestinales/patología , Enfermedades Gastrointestinales/diagnóstico , Tracto Gastrointestinal/patología , Histocitoquímica/métodosRESUMEN
A Food and Drug Administration (FDA)-cleared artificial intelligence (AI) algorithm misdiagnosed a finding as an intracranial hemorrhage in a patient, who was finally diagnosed with an ischemic stroke. This scenario highlights a notable failure mode of AI tools, emphasizing the importance of human-machine interaction. In this report, the authors summarize the review processes by the FDA for software as a medical device and the unique regulatory designs for radiologic AI/machine learning algorithms to ensure their safety in clinical practice. Then the challenges in maximizing the efficacy of these tools posed by their clinical implementation are discussed.
Asunto(s)
Algoritmos , Inteligencia Artificial , Estados Unidos , Humanos , United States Food and Drug Administration , Programas Informáticos , Aprendizaje AutomáticoRESUMEN
Background Multiparametric MRI can help identify clinically significant prostate cancer (csPCa) (Gleason score ≥7) but is limited by reader experience and interobserver variability. In contrast, deep learning (DL) produces deterministic outputs. Purpose To develop a DL model to predict the presence of csPCa by using patient-level labels without information about tumor location and to compare its performance with that of radiologists. Materials and Methods Data from patients without known csPCa who underwent MRI from January 2017 to December 2019 at one of multiple sites of a single academic institution were retrospectively reviewed. A convolutional neural network was trained to predict csPCa from T2-weighted images, diffusion-weighted images, apparent diffusion coefficient maps, and T1-weighted contrast-enhanced images. The reference standard was pathologic diagnosis. Radiologist performance was evaluated as follows: Radiology reports were used for the internal test set, and four radiologists' PI-RADS ratings were used for the external (ProstateX) test set. The performance was compared using areas under the receiver operating characteristic curves (AUCs) and the DeLong test. Gradient-weighted class activation maps (Grad-CAMs) were used to show tumor localization. Results Among 5735 examinations in 5215 patients (mean age, 66 years ± 8 [SD]; all male), 1514 examinations (1454 patients) showed csPCa. In the internal test set (400 examinations), the AUC was 0.89 and 0.89 for the DL classifier and radiologists, respectively (P = .88). In the external test set (204 examinations), the AUC was 0.86 and 0.84 for the DL classifier and radiologists, respectively (P = .68). DL classifier plus radiologists had an AUC of 0.89 (P < .001). Grad-CAMs demonstrated activation over the csPCa lesion in 35 of 38 and 56 of 58 true-positive examinations in internal and external test sets, respectively. Conclusion The performance of a DL model was not different from that of radiologists in the detection of csPCa at MRI, and Grad-CAMs localized the tumor. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Johnson and Chandarana in this issue.
Asunto(s)
Aprendizaje Profundo , Imagen por Resonancia Magnética , Neoplasias de la Próstata , Masculino , Humanos , Neoplasias de la Próstata/diagnóstico por imagen , Estudios Retrospectivos , Anciano , Persona de Mediana Edad , Imagen por Resonancia Magnética/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imágenes de Resonancia Magnética Multiparamétrica/métodos , Próstata/diagnóstico por imagen , Próstata/patologíaRESUMEN
RATIONALE & OBJECTIVE: Simple kidney cysts, which are common and usually considered of limited clinical relevance, are associated with older age and lower glomerular filtration rate (GFR), but little has been known of their association with progressive chronic kidney disease (CKD). STUDY DESIGN: Observational cohort study. SETTING & PARTICIPANTS: Patients with presurgical computed tomography or magnetic resonance imaging who underwent a radical nephrectomy for a tumor; we reviewed the retained kidney images to characterize parenchymal cysts at least 5mm in diameter according to size and location. EXPOSURE: Parenchymal cysts at least 5mm in diameter in the retained kidney. Cyst characteristics were correlated with microstructural findings on kidney histology. OUTCOME: Progressive CKD defined by dialysis, kidney transplantation, a sustained≥40% decline in eGFR for at least 3 months, or an eGFR<10mL/min/1.73m2 that was at least 5mL/min/1.73m2 below the postnephrectomy baseline for at least 3 months. ANALYTICAL APPROACH: Cox models assessed the risk of progressive CKD. Models adjusted for baseline age, sex, body mass index, hypertension, diabetes, eGFR, proteinuria, and tumor volume. Nonparametric Spearman's correlations were used to examine the association of the number and size of the cysts with clinical characteristics, kidney function, and kidney volumes. RESULTS: There were 1,195 patients with 50 progressive CKD events over a median 4.4 years of follow-up evaluation. On baseline imaging, 38% had at least 1 cyst, 34% had at least 1 cortical cyst, and 8.7% had at least 1 medullary cyst. A higher number of cysts was associated with progressive CKD and was modestly correlated with larger nephrons and more nephrosclerosis on kidney histology. The number of medullary cysts was more strongly associated with progressive CKD than the number of cortical cysts. LIMITATIONS: Patients who undergo a radical nephrectomy may differ from the general population. A radical nephrectomy may accelerate the risk of progressive CKD. Genetic testing was not performed. CONCLUSIONS: Cysts in the kidney, particularly the medulla, should be further examined as a potentially useful imaging biomarker of progressive CKD beyond the current clinical evaluation of kidney function and common CKD risk factors. PLAIN-LANGUAGE SUMMARY: Kidney cysts are common and often are considered of limited clinical relevance despite being associated with lower glomerular filtration rate. We studied a large cohort of patients who had a kidney removed due to a tumor to determine whether cysts in the retained kidney were associated with kidney health in the future. We found that more cysts in the kidney and, in particular, cysts in the deepest tissue of the kidney (the medulla) were associated with progressive kidney disease, including kidney failure where dialysis or a kidney transplantation is needed. Patients with cysts in the kidney medulla may benefit from closer monitoring.
Asunto(s)
Progresión de la Enfermedad , Tasa de Filtración Glomerular , Enfermedades Renales Quísticas , Nefrectomía , Insuficiencia Renal Crónica , Humanos , Masculino , Femenino , Persona de Mediana Edad , Insuficiencia Renal Crónica/epidemiología , Insuficiencia Renal Crónica/etiología , Enfermedades Renales Quísticas/diagnóstico por imagen , Enfermedades Renales Quísticas/patología , Enfermedades Renales Quísticas/cirugía , Enfermedades Renales Quísticas/etiología , Anciano , Neoplasias Renales/cirugía , Neoplasias Renales/patología , Estudios de Cohortes , Imagen por Resonancia Magnética , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/epidemiología , Estudios Retrospectivos , Tomografía Computarizada por Rayos XRESUMEN
OBJECTIVE: To develop a whole-body low-dose CT (WBLDCT) deep learning model and determine its accuracy in predicting the presence of cytogenetic abnormalities in multiple myeloma (MM). MATERIALS AND METHODS: WBLDCTs of MM patients performed within a year of diagnosis were included. Cytogenetic assessments of clonal plasma cells via fluorescent in situ hybridization (FISH) were used to risk-stratify patients as high-risk (HR) or standard-risk (SR). Presence of any of del(17p), t(14;16), t(4;14), and t(14;20) on FISH was defined as HR. The dataset was evenly divided into five groups (folds) at the individual patient level for model training. Mean and standard deviation (SD) of the area under the receiver operating curve (AUROC) across the folds were recorded. RESULTS: One hundred fifty-one patients with MM were included in the study. The model performed best for t(4;14), mean (SD) AUROC of 0.874 (0.073). The lowest AUROC was observed for trisomies: AUROC of 0.717 (0.058). Two- and 5-year survival rates for HR cytogenetics were 87% and 71%, respectively, compared to 91% and 79% for SR cytogenetics. Survival predictions by the WBLDCT deep learning model revealed 2- and 5-year survival rates for patients with HR cytogenetics as 87% and 71%, respectively, compared to 92% and 81% for SR cytogenetics. CONCLUSION: A deep learning model trained on WBLDCT scans predicted the presence of cytogenetic abnormalities used for risk stratification in MM. Assessment of the model's performance revealed good to excellent classification of the various cytogenetic abnormalities.
RESUMEN
BACKGROUND: This study introduces THA-Net, a deep learning inpainting algorithm for simulating postoperative total hip arthroplasty (THA) radiographs from a single preoperative pelvis radiograph input, while being able to generate predictions either unconditionally (algorithm chooses implants) or conditionally (surgeon chooses implants). METHODS: The THA-Net is a deep learning algorithm which receives an input preoperative radiograph and subsequently replaces the target hip joint with THA implants to generate a synthetic yet realistic postoperative radiograph. We trained THA-Net on 356,305 pairs of radiographs from 14,357 patients from a single institution's total joint registry and evaluated the validity (quality of surgical execution) and realism (ability to differentiate real and synthetic radiographs) of its outputs against both human-based and software-based criteria. RESULTS: The surgical validity of synthetic postoperative radiographs was significantly higher than their real counterparts (mean difference: 0.8 to 1.1 points on 10-point Likert scale, P < .001), but they were not able to be differentiated in terms of realism in blinded expert review. Synthetic images showed excellent validity and realism when analyzed with already validated deep learning models. CONCLUSION: We developed a THA next-generation templating tool that can generate synthetic radiographs graded higher on ultimate surgical execution than real radiographs from training data. Further refinement of this tool may potentiate patient-specific surgical planning and enable technologies such as robotics, navigation, and augmented reality (an online demo of THA-Net is available at: https://demo.osail.ai/tha_net).
Asunto(s)
Artroplastia de Reemplazo de Cadera , Aprendizaje Profundo , Prótesis de Cadera , Humanos , Artroplastia de Reemplazo de Cadera/métodos , Articulación de la Cadera/diagnóstico por imagen , Articulación de la Cadera/cirugía , Radiografía , Estudios RetrospectivosRESUMEN
BACKGROUND: Revision total hip arthroplasty (THA) requires preoperatively identifying in situ implants, a time-consuming and sometimes unachievable task. Although deep learning (DL) tools have been attempted to automate this process, existing approaches are limited by classifying few femoral and zero acetabular components, only classify on anterior-posterior (AP) radiographs, and do not report prediction uncertainty or flag outlier data. METHODS: This study introduces Total Hip Arhtroplasty Automated Implant Detector (THA-AID), a DL tool trained on 241,419 radiographs that identifies common designs of 20 femoral and 8 acetabular components from AP, lateral, or oblique views and reports prediction uncertainty using conformal prediction and outlier detection using a custom framework. We evaluated THA-AID using internal, external, and out-of-domain test sets and compared its performance with human experts. RESULTS: THA-AID achieved internal test set accuracies of 98.9% for both femoral and acetabular components with no significant differences based on radiographic view. The femoral classifier also achieved 97.0% accuracy on the external test set. Adding conformal prediction increased true label prediction by 0.1% for acetabular and 0.7 to 0.9% for femoral components. More than 99% of out-of-domain and >89% of in-domain outlier data were correctly identified by THA-AID. CONCLUSIONS: The THA-AID is an automated tool for implant identification from radiographs with exceptional performance on internal and external test sets and no decrement in performance based on radiographic view. Importantly, this is the first study in orthopedics to our knowledge including uncertainty quantification and outlier detection of a DL model.
Asunto(s)
Artroplastia de Reemplazo de Cadera , Aprendizaje Profundo , Prótesis de Cadera , Humanos , Incertidumbre , Acetábulo/cirugía , Estudios RetrospectivosRESUMEN
Accelerated by the adoption of remote monitoring during the COVID-19 pandemic, interest in using digitally captured behavioral data to predict patient outcomes has grown; however, it is unclear how feasible digital phenotyping studies may be in patients with recent ischemic stroke or transient ischemic attack. In this perspective, we present participant feedback and relevant smartphone data metrics suggesting that digital phenotyping of post-stroke depression is feasible. Additionally, we proffer thoughtful considerations for designing feasible real-world study protocols tracking cerebrovascular dysfunction with smartphone sensors.
Asunto(s)
COVID-19 , Trastornos Cerebrovasculares , Fenotipo , Teléfono Inteligente , Humanos , COVID-19/virología , COVID-19/diagnóstico , Trastornos Cerebrovasculares/diagnóstico , Estudios de Factibilidad , SARS-CoV-2/aislamiento & purificación , Monitoreo Fisiológico/métodos , Monitoreo Fisiológico/instrumentación , Pandemias , MasculinoRESUMEN
New image-derived biomarkers for patients affected by autosomal dominant polycystic kidney disease are needed to improve current clinical management. The measurement of total kidney volume (TKV) provides critical information for clinicians to drive care decisions. However, patients with similar TKV may present with very different phenotypes, often requiring subjective decisions based on other factors (e.g., appearance of healthy kidney parenchyma, a few cysts contributing significantly to overall TKV, etc.). In this study, we describe a new technique to individually segment cysts and quantify biometric parameters including cyst volume, cyst number, parenchyma volume, and cyst parenchyma surface area. Using data from the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) study the utility of these new parameters was explored, both quantitatively as well as visually. Total cyst number and cyst parenchyma surface area showed superior prediction of the slope of estimated glomerular filtration rate decline, kidney failure and chronic kidney disease stages 3A, 3B, and 4, compared to TKV. In addition, presentations such as a few large cysts contributing significantly to overall kidney volume were shown to be much better stratified in terms of outcome predictions. Thus, these new image biomarkers, which can be obtained automatically, will have great utility in future studies and clinical care for patients affected by autosomal dominant polycystic kidney disease.
Asunto(s)
Riñón Poliquístico Autosómico Dominante , Humanos , Riñón Poliquístico Autosómico Dominante/complicaciones , Riñón Poliquístico Autosómico Dominante/diagnóstico por imagen , Progresión de la Enfermedad , Imagen por Resonancia Magnética/métodos , Pronóstico , Riñón/diagnóstico por imagen , Biomarcadores , Tasa de Filtración GlomerularRESUMEN
In recent years, deep learning (DL) has shown impressive performance in radiologic image analysis. However, for a DL model to be useful in a real-world setting, its confidence in a prediction must also be known. Each DL model's output has an estimated probability, and these estimated probabilities are not always reliable. Uncertainty represents the trustworthiness (validity) of estimated probabilities. The higher the uncertainty, the lower the validity. Uncertainty quantification (UQ) methods determine the uncertainty level of each prediction. Predictions made without UQ methods are generally not trustworthy. By implementing UQ in medical DL models, users can be alerted when a model does not have enough information to make a confident decision. Consequently, a medical expert could reevaluate the uncertain cases, which would eventually lead to gaining more trust when using a model. This review focuses on recent trends using UQ methods in DL radiologic image analysis within a conceptual framework. Also discussed in this review are potential applications, challenges, and future directions of UQ in DL radiologic image analysis.
Asunto(s)
Aprendizaje Profundo , Radiología , Humanos , Incertidumbre , Procesamiento de Imagen Asistido por ComputadorRESUMEN
BACKGROUND: Fatty pancreas is associated with inflammatory and neoplastic pancreatic diseases. Magnetic resonance imaging (MRI) is the diagnostic modality of choice for measuring pancreatic fat. Measurements typically use regions of interest limited by sampling and variability. We have previously described an artificial intelligence (AI)-aided approach for whole pancreas fat estimation on computed tomography (CT). In this study, we aimed to assess the correlation between whole pancreas MRI proton-density fat fraction (MR-PDFF) and CT attenuation. METHODS: We identified patients without pancreatic disease who underwent both MRI and CT between January 1, 2015 and June 1, 2020. 158 paired MRI and CT scans were available for pancreas segmentation using an iteratively trained convolutional neural network (CNN) with manual correction. Boxplots were generated to visualize slice-by-slice variability in 2D-axial slice MR-PDFF. Correlation between whole pancreas MR-PDFF and age, BMI, hepatic fat and pancreas CT-Hounsfield Unit (CT-HU) was assessed. RESULTS: Mean pancreatic MR-PDFF showed a strong inverse correlation (Spearman -0.755) with mean CT-HU. MR-PDFF was higher in males (25.22 vs 20.87; p = 0.0015) and in subjects with diabetes mellitus (25.95 vs 22.17; p = 0.0324), and was positively correlated with age and BMI. The pancreatic 2D-axial slice-to-slice MR-PDFF variability increased with increasing mean whole pancreas MR-PDFF (Spearman 0.51; p < 0.0001). CONCLUSION: Our study demonstrates a strong inverse correlation between whole pancreas MR-PDFF and CT-HU, indicating that both imaging modalities can be used to assess pancreatic fat. 2D-axial pancreas MR-PDFF is variable across slices, underscoring the need for AI-aided whole-organ measurements for objective and reproducible estimation of pancreatic fat.
Asunto(s)
Inteligencia Artificial , Enfermedades Pancreáticas , Masculino , Humanos , Imagen por Resonancia Magnética/métodos , Páncreas/diagnóstico por imagen , Páncreas/patología , Hígado , Tomografía Computarizada por Rayos X , Enfermedades Pancreáticas/diagnóstico por imagen , Enfermedades Pancreáticas/patologíaRESUMEN
OBJECTIVES: While chest radiograph (CXR) is the first-line imaging investigation in patients with respiratory symptoms, differentiating COVID-19 from other respiratory infections on CXR remains challenging. We developed and validated an AI system for COVID-19 detection on presenting CXR. METHODS: A deep learning model (RadGenX), trained on 168,850 CXRs, was validated on a large international test set of presenting CXRs of symptomatic patients from 9 study sites (US, Italy, and Hong Kong SAR) and 2 public datasets from the US and Europe. Performance was measured by area under the receiver operator characteristic curve (AUC). Bootstrapped simulations were performed to assess performance across a range of potential COVID-19 disease prevalence values (3.33 to 33.3%). Comparison against international radiologists was performed on an independent test set of 852 cases. RESULTS: RadGenX achieved an AUC of 0.89 on 4-fold cross-validation and an AUC of 0.79 (95%CI 0.78-0.80) on an independent test cohort of 5,894 patients. Delong's test showed statistical differences in model performance across patients from different regions (p < 0.01), disease severity (p < 0.001), gender (p < 0.001), and age (p = 0.03). Prevalence simulations showed the negative predictive value increases from 86.1% at 33.3% prevalence, to greater than 98.5% at any prevalence below 4.5%. Compared with radiologists, McNemar's test showed the model has higher sensitivity (p < 0.001) but lower specificity (p < 0.001). CONCLUSION: An AI model that predicts COVID-19 infection on CXR in symptomatic patients was validated on a large international cohort providing valuable context on testing and performance expectations for AI systems that perform COVID-19 prediction on CXR. KEY POINTS: ⢠An AI model developed using CXRs to detect COVID-19 was validated in a large multi-center cohort of 5,894 patients from 9 prospectively recruited sites and 2 public datasets. ⢠Differences in AI model performance were seen across region, disease severity, gender, and age. ⢠Prevalence simulations on the international test set demonstrate the model's NPV is greater than 98.5% at any prevalence below 4.5%.
Asunto(s)
COVID-19 , Aprendizaje Profundo , Humanos , Inteligencia Artificial , Radiografía Torácica/métodos , Tomografía Computarizada por Rayos X/métodos , Estudios RetrospectivosRESUMEN
BACKGROUND: Whole-body low-dose CT is the recommended initial imaging modality to evaluate bone destruction as a result of multiple myeloma. Accurate interpretation of these scans to detect small lytic bone lesions is time intensive. A functional deep learning) algorithm to detect lytic lesions on CTs could improve the value of these CTs for myeloma imaging. Our objectives were to develop a DL algorithm and determine its performance at detecting lytic lesions of multiple myeloma. METHODS: Axial slices (2-mm section thickness) from whole-body low-dose CT scans of subjects with biochemically confirmed plasma cell dyscrasias were included in the study. Data were split into train and test sets at the patient level targeting a 90%/10% split. Two musculoskeletal radiologists annotated lytic lesions on the images with bounding boxes. Subsequently, we developed a two-step deep learning model comprising bone segmentation followed by lesion detection. Unet and "You Look Only Once" (YOLO) models were used as bone segmentation and lesion detection algorithms, respectively. Diagnostic performance was determined using the area under the receiver operating characteristic curve (AUROC). RESULTS: Forty whole-body low-dose CTs from 40 subjects yielded 2193 image slices. A total of 5640 lytic lesions were annotated. The two-step model achieved a sensitivity of 91.6% and a specificity of 84.6%. Lesion detection AUROC was 90.4%. CONCLUSION: We developed a deep learning model that detects lytic bone lesions of multiple myeloma on whole-body low-dose CTs with high performance. External validation is required prior to widespread adoption in clinical practice.
Asunto(s)
Aprendizaje Profundo , Mieloma Múltiple , Osteólisis , Humanos , Mieloma Múltiple/diagnóstico por imagen , Mieloma Múltiple/patología , Algoritmos , Tomografía Computarizada por Rayos X/métodosRESUMEN
The growth of artificial intelligence combined with the collection and storage of large amounts of data in the electronic medical record collection has created an opportunity for orthopedic research and translation into the clinical environment. Machine learning (ML) is a type of artificial intelligence tool well suited for processing the large amount of available data. Specific areas of ML frequently used by orthopedic surgeons performing total joint arthroplasty include tabular data analysis (spreadsheets), medical imaging processing, and natural language processing (extracting concepts from text). Previous studies have discussed models able to identify fractures in radiographs, identify implant type in radiographs, and determine the stage of osteoarthritis based on walking analysis. Despite the growing popularity of ML, there are limitations including its reliance on "good" data, potential for overfitting, long life cycle for creation, and ability to only perform one narrow task. This educational article will further discuss a general overview of ML, discussing these challenges and including examples of successfully published models.
Asunto(s)
Procedimientos Ortopédicos , Ortopedia , Humanos , Inteligencia Artificial , Aprendizaje Automático , Procesamiento de Lenguaje NaturalRESUMEN
Total joint arthroplasty is becoming one of the most common surgeries within the United States, creating an abundance of analyzable data to improve patient experience and outcomes. Unfortunately, a large majority of this data is concealed in electronic health records only accessible by manual extraction, which takes extensive time and resources. Natural language processing (NLP), a field within artificial intelligence, may offer a viable alternative to manual extraction. Using NLP, a researcher can analyze written and spoken data and extract data in an organized manner suitable for future research and clinical use. This article will first discuss common subtasks involved in an NLP pipeline, including data preparation, modeling, analysis, and external validation, followed by examples of NLP projects. Challenges and limitations of NLP will be discussed, closing with future directions of NLP projects, including large language models.
Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Humanos , Artroplastia , Lenguaje , Registros Electrónicos de SaludRESUMEN
Image data has grown exponentially as systems have increased their ability to collect and store it. Unfortunately, there are limits to human resources both in time and knowledge to fully interpret and manage that data. Computer Vision (CV) has grown in popularity as a discipline for better understanding visual data. Computer Vision has become a powerful tool for imaging analytics in orthopedic surgery, allowing computers to evaluate large volumes of image data with greater nuance than previously possible. Nevertheless, even with the growing number of uses in medicine, literature on the fundamentals of CV and its implementation is mainly oriented toward computer scientists rather than clinicians, rendering CV unapproachable for most orthopedic surgeons as a tool for clinical practice and research. The purpose of this article is to summarize and review the fundamental concepts of CV application for the orthopedic surgeon and musculoskeletal researcher.
Asunto(s)
Procedimientos Ortopédicos , Ortopedia , Humanos , Artroplastia , ComputadoresRESUMEN
BACKGROUND: In this work, we applied and validated an artificial intelligence technique known as generative adversarial networks (GANs) to create large volumes of high-fidelity synthetic anteroposterior (AP) pelvis radiographs that can enable deep learning (DL)-based image analyses, while ensuring patient privacy. METHODS: AP pelvis radiographs with native hips were gathered from an institutional registry between 1998 and 2018. The data was used to train a model to create 512 × 512 pixel synthetic AP pelvis images. The network was trained on 25 million images produced through augmentation. A set of 100 random images (50/50 real/synthetic) was evaluated by 3 orthopaedic surgeons and 2 radiologists to discern real versus synthetic images. Two models (joint localization and segmentation) were trained using synthetic images and tested on real images. RESULTS: The final model was trained on 37,640 real radiographs (16,782 patients). In a computer assessment of image fidelity, the final model achieved an "excellent" rating. In a blinded review of paired images (1 real, 1 synthetic), orthopaedic surgeon reviewers were unable to correctly identify which image was synthetic (accuracy = 55%, Kappa = 0.11), highlighting synthetic image fidelity. The synthetic and real images showed equivalent performance when they were assessed by established DL models. CONCLUSION: This work shows the ability to use a DL technique to generate a large volume of high-fidelity synthetic pelvis images not discernible from real imaging by computers or experts. These images can be used for cross-institutional sharing and model pretraining, further advancing the performance of DL models without risk to patient data safety. LEVEL OF EVIDENCE: Level III.
Asunto(s)
Aprendizaje Profundo , Humanos , Inteligencia Artificial , Privacidad , Procesamiento de Imagen Asistido por Computador/métodos , Pelvis/diagnóstico por imagenRESUMEN
Glioblastoma (GBM) is the most common primary malignant brain tumor in adults. The standard treatment for GBM consists of surgical resection followed by concurrent chemoradiotherapy and adjuvant temozolomide. O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is an important prognostic biomarker that predicts the response to temozolomide and guides treatment decisions. At present, the only reliable way to determine MGMT promoter methylation status is through the analysis of tumor tissues. Considering the complications of the tissue-based methods, an imaging-based approach is preferred. This study aimed to compare three different deep learning-based approaches for predicting MGMT promoter methylation status. We obtained 576 T2WI with their corresponding tumor masks, and MGMT promoter methylation status from, The Brain Tumor Segmentation (BraTS) 2021 datasets. We developed three different models: voxel-wise, slice-wise, and whole-brain. For voxel-wise classification, methylated and unmethylated MGMT tumor masks were made into 1 and 2 with 0 background, respectively. We converted each T2WI into 32 × 32 × 32 patches. We trained a 3D-Vnet model for tumor segmentation. After inference, we constructed the whole brain volume based on the patch's coordinates. The final prediction of MGMT methylation status was made by majority voting between the predicted voxel values of the biggest connected component. For slice-wise classification, we trained an object detection model for tumor detection and MGMT methylation status prediction, then for final prediction, we used majority voting. For the whole-brain approach, we trained a 3D Densenet121 for prediction. Whole-brain, slice-wise, and voxel-wise, accuracy was 65.42% (SD 3.97%), 61.37% (SD 1.48%), and 56.84% (SD 4.38%), respectively.
Asunto(s)
Neoplasias Encefálicas , Aprendizaje Profundo , Glioblastoma , Adulto , Humanos , Glioblastoma/diagnóstico por imagen , Glioblastoma/genética , Glioblastoma/patología , Temozolomida/uso terapéutico , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Metilación de ADN , Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , O(6)-Metilguanina-ADN Metiltransferasa/genética , Metilasas de Modificación del ADN/genética , Proteínas Supresoras de Tumor/genética , Enzimas Reparadoras del ADN/genéticaRESUMEN
Since 2000, there have been more than 8000 publications on radiology artificial intelligence (AI). AI breakthroughs allow complex tasks to be automated and even performed beyond human capabilities. However, the lack of details on the methods and algorithm code undercuts its scientific value. Many science subfields have recently faced a reproducibility crisis, eroding trust in processes and results, and influencing the rise in retractions of scientific papers. For the same reasons, conducting research in deep learning (DL) also requires reproducibility. Although several valuable manuscript checklists for AI in medical imaging exist, they are not focused specifically on reproducibility. In this study, we conducted a systematic review of recently published papers in the field of DL to evaluate if the description of their methodology could allow the reproducibility of their findings. We focused on the Journal of Digital Imaging (JDI), a specialized journal that publishes papers on AI and medical imaging. We used the keyword "Deep Learning" and collected the articles published between January 2020 and January 2022. We screened all the articles and included the ones which reported the development of a DL tool in medical imaging. We extracted the reported details about the dataset, data handling steps, data splitting, model details, and performance metrics of each included article. We found 148 articles. Eighty were included after screening for articles that reported developing a DL model for medical image analysis. Five studies have made their code publicly available, and 35 studies have utilized publicly available datasets. We provided figures to show the ratio and absolute count of reported items from included studies. According to our cross-sectional study, in JDI publications on DL in medical imaging, authors infrequently report the key elements of their study to make it reproducible.
Asunto(s)
Inteligencia Artificial , Diagnóstico por Imagen , Humanos , Estudios Transversales , Reproducibilidad de los Resultados , AlgoritmosRESUMEN
The aim of this study is to investigate the use of an exponential-plateau model to determine the required training dataset size that yields the maximum medical image segmentation performance. CT and MR images of patients with renal tumors acquired between 1997 and 2017 were retrospectively collected from our nephrectomy registry. Modality-based datasets of 50, 100, 150, 200, 250, and 300 images were assembled to train models with an 80-20 training-validation split evaluated against 50 randomly held out test set images. A third experiment using the KiTS21 dataset was also used to explore the effects of different model architectures. Exponential-plateau models were used to establish the relationship of dataset size to model generalizability performance. For segmenting non-neoplastic kidney regions on CT and MR imaging, our model yielded test Dice score plateaus of [Formula: see text] and [Formula: see text] with the number of training-validation images needed to reach the plateaus of 54 and 122, respectively. For segmenting CT and MR tumor regions, we modeled a test Dice score plateau of [Formula: see text] and [Formula: see text], with 125 and 389 training-validation images needed to reach the plateaus. For the KiTS21 dataset, the best Dice score plateaus for nn-UNet 2D and 3D architectures were [Formula: see text] and [Formula: see text] with number to reach performance plateau of 177 and 440. Our research validates that differing imaging modalities, target structures, and model architectures all affect the amount of training images required to reach a performance plateau. The modeling approach we developed will help future researchers determine for their experiments when additional training-validation images will likely not further improve model performance.