Search | VHL Search Portal

1.

FDA Review of Radiologic AI Algorithms: Process and Challenges.

Zhang, Kuan; Khosravi, Bardia; Vahdati, Sanaz; Erickson, Bradley J.

Radiology ; 310(1): e230242, 2024 01.

Article in English | MEDLINE | ID: mdl-38165243

ABSTRACT

A Food and Drug Administration (FDA)-cleared artificial intelligence (AI) algorithm misdiagnosed a finding as an intracranial hemorrhage in a patient, who was finally diagnosed with an ischemic stroke. This scenario highlights a notable failure mode of AI tools, emphasizing the importance of human-machine interaction. In this report, the authors summarize the review processes by the FDA for software as a medical device and the unique regulatory designs for radiologic AI/machine learning algorithms to ensure their safety in clinical practice. Then the challenges in maximizing the efficacy of these tools posed by their clinical implementation are discussed.

Subject(s)

Algorithms , Artificial Intelligence , United States , Humans , United States Food and Drug Administration , Software , Machine Learning

2.

THA-Net: A Deep Learning Solution for Next-Generation Templating and Patient-specific Surgical Execution.

Rouzrokh, Pouria; Khosravi, Bardia; Mickley, John P; Erickson, Bradley J; Taunton, Michael J; Wyles, Cody C.

J Arthroplasty ; 39(3): 727-733.e4, 2024 Mar.

Article in English | MEDLINE | ID: mdl-37619804

ABSTRACT

BACKGROUND: This study introduces THA-Net, a deep learning inpainting algorithm for simulating postoperative total hip arthroplasty (THA) radiographs from a single preoperative pelvis radiograph input, while being able to generate predictions either unconditionally (algorithm chooses implants) or conditionally (surgeon chooses implants). METHODS: The THA-Net is a deep learning algorithm which receives an input preoperative radiograph and subsequently replaces the target hip joint with THA implants to generate a synthetic yet realistic postoperative radiograph. We trained THA-Net on 356,305 pairs of radiographs from 14,357 patients from a single institution's total joint registry and evaluated the validity (quality of surgical execution) and realism (ability to differentiate real and synthetic radiographs) of its outputs against both human-based and software-based criteria. RESULTS: The surgical validity of synthetic postoperative radiographs was significantly higher than their real counterparts (mean difference: 0.8 to 1.1 points on 10-point Likert scale, P < .001), but they were not able to be differentiated in terms of realism in blinded expert review. Synthetic images showed excellent validity and realism when analyzed with already validated deep learning models. CONCLUSION: We developed a THA next-generation templating tool that can generate synthetic radiographs graded higher on ultimate surgical execution than real radiographs from training data. Further refinement of this tool may potentiate patient-specific surgical planning and enable technologies such as robotics, navigation, and augmented reality (an online demo of THA-Net is available at: https://demo.osail.ai/tha_net).

Subject(s)

Arthroplasty, Replacement, Hip , Deep Learning , Hip Prosthesis , Humans , Arthroplasty, Replacement, Hip/methods , Hip Joint/diagnostic imaging , Hip Joint/surgery , Radiography , Retrospective Studies

3.

THA-AID: Deep Learning Tool for Total Hip Arthroplasty Automatic Implant Detection With Uncertainty and Outlier Quantification.

Rouzrokh, Pouria; Mickley, John P; Khosravi, Bardia; Faghani, Shahriar; Moassefi, Mana; Schulz, William R; Erickson, Bradley J; Taunton, Michael J; Wyles, Cody C.

J Arthroplasty ; 39(4): 966-973.e17, 2024 Apr.

Article in English | MEDLINE | ID: mdl-37770007

ABSTRACT

BACKGROUND: Revision total hip arthroplasty (THA) requires preoperatively identifying in situ implants, a time-consuming and sometimes unachievable task. Although deep learning (DL) tools have been attempted to automate this process, existing approaches are limited by classifying few femoral and zero acetabular components, only classify on anterior-posterior (AP) radiographs, and do not report prediction uncertainty or flag outlier data. METHODS: This study introduces Total Hip Arhtroplasty Automated Implant Detector (THA-AID), a DL tool trained on 241,419 radiographs that identifies common designs of 20 femoral and 8 acetabular components from AP, lateral, or oblique views and reports prediction uncertainty using conformal prediction and outlier detection using a custom framework. We evaluated THA-AID using internal, external, and out-of-domain test sets and compared its performance with human experts. RESULTS: THA-AID achieved internal test set accuracies of 98.9% for both femoral and acetabular components with no significant differences based on radiographic view. The femoral classifier also achieved 97.0% accuracy on the external test set. Adding conformal prediction increased true label prediction by 0.1% for acetabular and 0.7 to 0.9% for femoral components. More than 99% of out-of-domain and >89% of in-domain outlier data were correctly identified by THA-AID. CONCLUSIONS: The THA-AID is an automated tool for implant identification from radiographs with exceptional performance on internal and external test sets and no decrement in performance based on radiographic view. Importantly, this is the first study in orthopedics to our knowledge including uncertainty quantification and outlier detection of a DL model.

Subject(s)

Arthroplasty, Replacement, Hip , Deep Learning , Hip Prosthesis , Humans , Uncertainty , Acetabulum/surgery , Retrospective Studies

4.

Quantifying Uncertainty in Deep Learning of Radiologic Images.

Faghani, Shahriar; Moassefi, Mana; Rouzrokh, Pouria; Khosravi, Bardia; Baffour, Francis I; Ringler, Michael D; Erickson, Bradley J.

Radiology ; 308(2): e222217, 2023 08.

Article in English | MEDLINE | ID: mdl-37526541

ABSTRACT

In recent years, deep learning (DL) has shown impressive performance in radiologic image analysis. However, for a DL model to be useful in a real-world setting, its confidence in a prediction must also be known. Each DL model's output has an estimated probability, and these estimated probabilities are not always reliable. Uncertainty represents the trustworthiness (validity) of estimated probabilities. The higher the uncertainty, the lower the validity. Uncertainty quantification (UQ) methods determine the uncertainty level of each prediction. Predictions made without UQ methods are generally not trustworthy. By implementing UQ in medical DL models, users can be alerted when a model does not have enough information to make a confident decision. Consequently, a medical expert could reevaluate the uncertain cases, which would eventually lead to gaining more trust when using a model. This review focuses on recent trends using UQ methods in DL radiologic image analysis within a conceptual framework. Also discussed in this review are potential applications, challenges, and future directions of UQ in DL radiologic image analysis.

Subject(s)

Deep Learning , Radiology , Humans , Uncertainty , Image Processing, Computer-Assisted

5.

The impact of prior performance information on subsequent assessment: is there evidence of retaliation in an anonymous multisource assessment system?

Saberzadeh-Ardestani, Bahar; Sima, Ali Reza; Khosravi, Bardia; Young, Meredith; Mortaz Hejri, Sara.

Adv Health Sci Educ Theory Pract ; 2023 Jul 24.

Article in English | MEDLINE | ID: mdl-37488326

ABSTRACT

Few studies have engaged in data-driven investigations of the presence, or frequency, of what could be considered retaliatory assessor behaviour in Multi-source Feedback (MSF) systems. In this study, authors explored how assessors scored others if, before assessing others, they received their own assessment score. The authors examined assessments from an established MSF system in which all clinical team members - medical students, interns, residents, fellows, and supervisors - anonymously assessed each other. The authors identified assessments in which an assessor (i.e., any team member providing a score to another) gave an aberrant score to another individual. An aberrant score was defined as one that was more than two standard deviations from the assessment receiver's average score. Assessors who gave aberrant scores were categorized according to whether their behaviour was preceded by: (1) receiving a score or not from another individual in the MSF system (2) whether the score they received was aberrant or not. The authors used a multivariable logistic regression model to investigate the association between the type of score received and the type of score given by that same individual. In total, 367 unique assessors provided 6091 scores on the performance of 484 unique individuals. Aberrant scores were identified in 250 forms (4.1%). The chances of giving an aberrant score were 2.3 times higher for those who had received a score, compared to those who had not (odds ratio 2.30, 95% CI:1.54-3.44, P < 0.001). Individuals who had received an aberrant score were also 2.17 times more likely to give an aberrant score to others compared to those who had received a non-aberrant score (2.17, 95% CI:1.39-3.39, P < 0.005) after adjusting for all other variables. This study documents an association between receiving scores within an anonymous multi-source feedback (MSF) system and providing aberrant scores to team members. These findings suggest care must be given to designing MSF systems to protect against potential downstream consequences of providing and receiving anonymous feedback.

6.

A deep learning algorithm for detecting lytic bone lesions of multiple myeloma on CT.

Faghani, Shahriar; Baffour, Francis I; Ringler, Michael D; Hamilton-Cave, Matthew; Rouzrokh, Pouria; Moassefi, Mana; Khosravi, Bardia; Erickson, Bradley J.

Skeletal Radiol ; 52(1): 91-98, 2023 Jan.

Article in English | MEDLINE | ID: mdl-35980454

ABSTRACT

BACKGROUND: Whole-body low-dose CT is the recommended initial imaging modality to evaluate bone destruction as a result of multiple myeloma. Accurate interpretation of these scans to detect small lytic bone lesions is time intensive. A functional deep learning) algorithm to detect lytic lesions on CTs could improve the value of these CTs for myeloma imaging. Our objectives were to develop a DL algorithm and determine its performance at detecting lytic lesions of multiple myeloma. METHODS: Axial slices (2-mm section thickness) from whole-body low-dose CT scans of subjects with biochemically confirmed plasma cell dyscrasias were included in the study. Data were split into train and test sets at the patient level targeting a 90%/10% split. Two musculoskeletal radiologists annotated lytic lesions on the images with bounding boxes. Subsequently, we developed a two-step deep learning model comprising bone segmentation followed by lesion detection. Unet and "You Look Only Once" (YOLO) models were used as bone segmentation and lesion detection algorithms, respectively. Diagnostic performance was determined using the area under the receiver operating characteristic curve (AUROC). RESULTS: Forty whole-body low-dose CTs from 40 subjects yielded 2193 image slices. A total of 5640 lytic lesions were annotated. The two-step model achieved a sensitivity of 91.6% and a specificity of 84.6%. Lesion detection AUROC was 90.4%. CONCLUSION: We developed a deep learning model that detects lytic bone lesions of multiple myeloma on whole-body low-dose CTs with high performance. External validation is required prior to widespread adoption in clinical practice.

Subject(s)

Deep Learning , Multiple Myeloma , Osteolysis , Humans , Multiple Myeloma/diagnostic imaging , Multiple Myeloma/pathology , Algorithms , Tomography, X-Ray Computed/methods

7.

Demystifying Statistics and Machine Learning in Analysis of Structured Tabular Data.

Khosravi, Bardia; Weston, Alexander D; Nugen, Fred; Mickley, John P; Maradit Kremers, Hilal; Wyles, Cody C; Carter, Rickey E; Taunton, Michael J.

J Arthroplasty ; 38(10): 1943-1947, 2023 10.

Article in English | MEDLINE | ID: mdl-37598784

ABSTRACT

Electronic health records have facilitated the extraction and analysis of a vast amount of data with many variables for clinical care and research. Conventional regression-based statistical methods may not capture all the complexities in high-dimensional data analysis. Therefore, researchers are increasingly using machine learning (ML)-based methods to better handle these more challenging datasets for the discovery of hidden patterns in patients' data and for classification and predictive purposes. This article describes commonly used ML methods in structured data analysis with examples in orthopedic surgery. We present practical considerations in starting an ML project and appraising published studies in this field.

Subject(s)

Electronic Health Records , Machine Learning , Humans

8.

An Overview of Machine Learning in Orthopedic Surgery: An Educational Paper.

Padash, Sirwa; Mickley, John P; Vera Garcia, Diana V; Nugen, Fred; Khosravi, Bardia; Erickson, Bradley J; Wyles, Cody C; Taunton, Michael J.

J Arthroplasty ; 38(10): 1938-1942, 2023 10.

Article in English | MEDLINE | ID: mdl-37598786

ABSTRACT

The growth of artificial intelligence combined with the collection and storage of large amounts of data in the electronic medical record collection has created an opportunity for orthopedic research and translation into the clinical environment. Machine learning (ML) is a type of artificial intelligence tool well suited for processing the large amount of available data. Specific areas of ML frequently used by orthopedic surgeons performing total joint arthroplasty include tabular data analysis (spreadsheets), medical imaging processing, and natural language processing (extracting concepts from text). Previous studies have discussed models able to identify fractures in radiographs, identify implant type in radiographs, and determine the stage of osteoarthritis based on walking analysis. Despite the growing popularity of ML, there are limitations including its reliance on "good" data, potential for overfitting, long life cycle for creation, and ability to only perform one narrow task. This educational article will further discuss a general overview of ML, discussing these challenges and including examples of successfully published models.

Subject(s)

Orthopedic Procedures , Orthopedics , Humans , Artificial Intelligence , Machine Learning , Natural Language Processing

9.

Educational Overview of the Concept and Application of Computer Vision in Arthroplasty.

Vera-Garcia, Diana V; Nugen, Fred; Padash, Sirwa; Khosravi, Bardia; Mickley, John P; Erickson, Bradley J; Wyles, Cody C; Taunton, Michael J.

J Arthroplasty ; 38(10): 1954-1958, 2023 10.

Article in English | MEDLINE | ID: mdl-37633507

ABSTRACT

Image data has grown exponentially as systems have increased their ability to collect and store it. Unfortunately, there are limits to human resources both in time and knowledge to fully interpret and manage that data. Computer Vision (CV) has grown in popularity as a discipline for better understanding visual data. Computer Vision has become a powerful tool for imaging analytics in orthopedic surgery, allowing computers to evaluate large volumes of image data with greater nuance than previously possible. Nevertheless, even with the growing number of uses in medicine, literature on the fundamentals of CV and its implementation is mainly oriented toward computer scientists rather than clinicians, rendering CV unapproachable for most orthopedic surgeons as a tool for clinical practice and research. The purpose of this article is to summarize and review the fundamental concepts of CV application for the orthopedic surgeon and musculoskeletal researcher.

Subject(s)

Orthopedic Procedures , Orthopedics , Humans , Arthroplasty , Computers

10.

A Deep Learning Tool for Automated Landmark Annotation on Hip and Pelvis Radiographs.

Mulford, Kellen L; Johnson, Quinn J; Mujahed, Tala; Khosravi, Bardia; Rouzrokh, Pouria; Mickley, John P; Taunton, Michael J; Wyles, Cody C.

J Arthroplasty ; 38(10): 2024-2031.e1, 2023 10.

Article in English | MEDLINE | ID: mdl-37236288

ABSTRACT

BACKGROUND: Automatic methods for labeling and segmenting pelvis structures can improve the efficiency of clinical and research workflows and reduce the variability introduced with manual labeling. The purpose of this study was to develop a single deep learning model to annotate certain anatomical structures and landmarks on antero-posterior (AP) pelvis radiographs. METHODS: A total of 1,100 AP pelvis radiographs were manually annotated by 3 reviewers. These images included a mix of preoperative and postoperative images as well as a mix of AP pelvis and hip images. A convolutional neural network was trained to segment 22 different structures (7 points, 6 lines, and 9 shapes). Dice score, which measures overlap between model output and ground truth, was calculated for the shapes and lines structures. Euclidean distance error was calculated for point structures. RESULTS: Dice score averaged across all images in the test set was 0.88 and 0.80 for the shape and line structures, respectively. For the 7-point structures, average distance between real and automated annotations ranged from 1.9 mm to 5.6 mm, with all averages falling below 3.1 mm except for the structure labeling the center of the sacrococcygeal junction, where performance was low for both human and machine-produced labels. Blinded qualitative evaluation of human and machine produced segmentations did not reveal any drastic decrease in performance of the automatic method. CONCLUSION: We present a deep learning model for automated annotation of pelvis radiographs that flexibly handles a variety of views, contrasts, and operative statuses for 22 structures and landmarks.

Subject(s)

Deep Learning , Humans , Radiography , Neural Networks, Computer , Pelvis/diagnostic imaging , Postoperative Period

11.

Creating High Fidelity Synthetic Pelvis Radiographs Using Generative Adversarial Networks: Unlocking the Potential of Deep Learning Models Without Patient Privacy Concerns.

Khosravi, Bardia; Rouzrokh, Pouria; Mickley, John P; Faghani, Shahriar; Larson, A Noelle; Garner, Hillary W; Howe, Benjamin M; Erickson, Bradley J; Taunton, Michael J; Wyles, Cody C.

J Arthroplasty ; 38(10): 2037-2043.e1, 2023 10.

Article in English | MEDLINE | ID: mdl-36535448

ABSTRACT

BACKGROUND: In this work, we applied and validated an artificial intelligence technique known as generative adversarial networks (GANs) to create large volumes of high-fidelity synthetic anteroposterior (AP) pelvis radiographs that can enable deep learning (DL)-based image analyses, while ensuring patient privacy. METHODS: AP pelvis radiographs with native hips were gathered from an institutional registry between 1998 and 2018. The data was used to train a model to create 512 × 512 pixel synthetic AP pelvis images. The network was trained on 25 million images produced through augmentation. A set of 100 random images (50/50 real/synthetic) was evaluated by 3 orthopaedic surgeons and 2 radiologists to discern real versus synthetic images. Two models (joint localization and segmentation) were trained using synthetic images and tested on real images. RESULTS: The final model was trained on 37,640 real radiographs (16,782 patients). In a computer assessment of image fidelity, the final model achieved an "excellent" rating. In a blinded review of paired images (1 real, 1 synthetic), orthopaedic surgeon reviewers were unable to correctly identify which image was synthetic (accuracy = 55%, Kappa = 0.11), highlighting synthetic image fidelity. The synthetic and real images showed equivalent performance when they were assessed by established DL models. CONCLUSION: This work shows the ability to use a DL technique to generate a large volume of high-fidelity synthetic pelvis images not discernible from real imaging by computers or experts. These images can be used for cross-institutional sharing and model pretraining, further advancing the performance of DL models without risk to patient data safety. LEVEL OF EVIDENCE: Level III.

Subject(s)

Deep Learning , Humans , Artificial Intelligence , Privacy , Image Processing, Computer-Assisted/methods , Pelvis/diagnostic imaging

12.

Frank Stinchfield Award: Creation of a Patient-Specific Total Hip Arthroplasty Periprosthetic Fracture Risk Calculator.

Wyles, Cody C; Maradit-Kremers, Hilal; Fruth, Kristin M; Larson, Dirk R; Khosravi, Bardia; Rouzrokh, Pouria; Johnson, Quinn J; Berry, Daniel J; Sierra, Rafael J; Taunton, Michael J; Abdel, Matthew P.

J Arthroplasty ; 38(7S): S2-S10, 2023 07.

Article in English | MEDLINE | ID: mdl-36933678

ABSTRACT

BACKGROUND: Many risk factors have been described for periprosthetic femur fracture (PPFFx) following total hip arthroplasty (THA), yet a patient-specific risk assessment tool remains elusive. The purpose of this study was to develop a high-dimensional, patient-specific risk-stratification nomogram that allows dynamic risk modification based on operative decisions. METHODS: We evaluated 16,696 primary nononcologic THAs performed between 1998 and 2018. During a mean 6-year follow-up, 558 patients (3.3%) sustained a PPFFx. Patients were characterized by individual natural language processing-assisted chart review on nonmodifiable factors (demographics, THA indication, and comorbidities), and modifiable operative decisions (femoral fixation [cemented/uncemented], surgical approach [direct anterior, lateral, and posterior], and implant type [collared/collarless]). Multivariable Cox regression models and nomograms were developed with PPFFx as a binary outcome at 90 days, 1 year, and 5 years, postoperatively. RESULTS: Patient-specific PPFFx risk based on comorbid profile was wide-ranging from 0.4-18% at 90 days, 0.4%-20% at 1 year, and 0.5%-25% at 5 years. Among 18 evaluated patient factors, 7 were retained in multivariable analyses. The 4 significant nonmodifiable factors included the following: women (hazard ratio (HR) = 1.6), older age (HR = 1.2 per 10 years), diagnosis of osteoporosis or use of osteoporosis medications (HR = 1.7), and indication for surgery other than osteoarthritis (HR = 2.2 for fracture, HR = 1.8 for inflammatory arthritis, HR = 1.7 for osteonecrosis). The 3 modifiable surgical factors were included as follows: uncemented femoral fixation (HR = 2.5), collarless femoral implants (HR = 1.3), and surgical approach other than direct anterior (lateral HR = 2.9, posterior HR = 1.9). CONCLUSION: This patient-specific PPFFx risk calculator demonstrated a wide-ranging risk based on comorbid profile and enables surgeons to quantify risk mitigation based on operative decisions. LEVEL OF EVIDENCE: Level III, Prognostic.

Subject(s)

Arthroplasty, Replacement, Hip , Awards and Prizes , Femoral Fractures , Hip Prosthesis , Periprosthetic Fractures , Humans , Female , Arthroplasty, Replacement, Hip/adverse effects , Arthroplasty, Replacement, Hip/methods , Periprosthetic Fractures/epidemiology , Periprosthetic Fractures/etiology , Periprosthetic Fractures/surgery , Hip Prosthesis/adverse effects , Reoperation , Femoral Fractures/epidemiology , Femoral Fractures/etiology , Femoral Fractures/surgery , Risk Factors , Retrospective Studies

13.

A Comparison of Three Different Deep Learning-Based Models to Predict the MGMT Promoter Methylation Status in Glioblastoma Using Brain MRI.

Faghani, Shahriar; Khosravi, Bardia; Moassefi, Mana; Conte, Gian Marco; Erickson, Bradley J.

J Digit Imaging ; 36(3): 837-846, 2023 06.

Article in English | MEDLINE | ID: mdl-36604366

ABSTRACT

Glioblastoma (GBM) is the most common primary malignant brain tumor in adults. The standard treatment for GBM consists of surgical resection followed by concurrent chemoradiotherapy and adjuvant temozolomide. O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is an important prognostic biomarker that predicts the response to temozolomide and guides treatment decisions. At present, the only reliable way to determine MGMT promoter methylation status is through the analysis of tumor tissues. Considering the complications of the tissue-based methods, an imaging-based approach is preferred. This study aimed to compare three different deep learning-based approaches for predicting MGMT promoter methylation status. We obtained 576 T2WI with their corresponding tumor masks, and MGMT promoter methylation status from, The Brain Tumor Segmentation (BraTS) 2021 datasets. We developed three different models: voxel-wise, slice-wise, and whole-brain. For voxel-wise classification, methylated and unmethylated MGMT tumor masks were made into 1 and 2 with 0 background, respectively. We converted each T2WI into 32 × 32 × 32 patches. We trained a 3D-Vnet model for tumor segmentation. After inference, we constructed the whole brain volume based on the patch's coordinates. The final prediction of MGMT methylation status was made by majority voting between the predicted voxel values of the biggest connected component. For slice-wise classification, we trained an object detection model for tumor detection and MGMT methylation status prediction, then for final prediction, we used majority voting. For the whole-brain approach, we trained a 3D Densenet121 for prediction. Whole-brain, slice-wise, and voxel-wise, accuracy was 65.42% (SD 3.97%), 61.37% (SD 1.48%), and 56.84% (SD 4.38%), respectively.

Subject(s)

Brain Neoplasms , Deep Learning , Glioblastoma , Adult , Humans , Glioblastoma/diagnostic imaging , Glioblastoma/genetics , Glioblastoma/pathology , Temozolomide/therapeutic use , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/genetics , Brain Neoplasms/pathology , DNA Methylation , Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , O(6)-Methylguanine-DNA Methyltransferase/genetics , DNA Modification Methylases/genetics , Tumor Suppressor Proteins/genetics , DNA Repair Enzymes/genetics

14.

Development of a deep learning model for the histologic diagnosis of dysplasia in Barrett's esophagus.

Faghani, Shahriar; Codipilly, D Chamil; Moassefi, Mana; Rouzrokh, Pouria; Khosravi, Bardia; Agarwal, Siddharth; Dhaliwal, Lovekirat; Katzka, David A; Hagen, Catherine; Lewis, Jason; Leggett, Cadman L; Erickson, Bradley J; Iyer, Prasad G.

Gastrointest Endosc ; 96(6): 918-925.e3, 2022 12.

Article in English | MEDLINE | ID: mdl-35718071

ABSTRACT

BACKGROUND AND AIMS: The risk of progression in Barrett's esophagus (BE) increases with development of dysplasia. There is a critical need to improve the diagnosis of BE dysplasia, given substantial interobserver disagreement among expert pathologists and overdiagnosis of dysplasia by community pathologists. We developed a deep learning model to predict dysplasia grade on whole-slide imaging. METHODS: We digitized nondysplastic BE (NDBE), low-grade dysplasia (LGD), and high-grade dysplasia (HGD) histology slides. Two expert pathologists confirmed all histology and digitally annotated areas of dysplasia. Training, validation, and test sets were created (by a random 70/20/10 split). We used an ensemble approach combining a "you only look once" model to identify regions of interest and histology class (NDBE, LGD, or HGD) followed by a ResNet101 model pretrained on ImageNet applied to the regions of interest. Diagnostic performance was determined for the whole slide. RESULTS: We included slides from 542 patients (164 NDBE, 226 LGD, and 152 HGD) yielding 8596 bounding boxes in the training set, 1946 bounding boxes in the validation set, and 840 boxes in the test set. When the ensemble model was used, sensitivity and specificity for LGD was 81.3% and 100%, respectively, and >90% for NDBE and HGD. The overall positive predictive value and sensitivity metric (calculated as F1 score) was .91 for NDBE, .90 for LGD, and 1.0 for HGD. CONCLUSIONS: We successfully trained and validated a deep learning model to accurately identify dysplasia on whole-slide images. This model can potentially help improve the histologic diagnosis of BE dysplasia and the appropriate application of endoscopic therapy.

Subject(s)

Adenocarcinoma , Barrett Esophagus , Deep Learning , Esophageal Neoplasms , Humans , Barrett Esophagus/diagnosis , Barrett Esophagus/pathology , Esophageal Neoplasms/pathology , Adenocarcinoma/pathology , Disease Progression , Hyperplasia

15.

A Multi-View Deep Learning Model for Thyroid Nodules Detection and Characterization in Ultrasound Imaging.

Vahdati, Sanaz; Khosravi, Bardia; Robinson, Kathryn A; Rouzrokh, Pouria; Moassefi, Mana; Akkus, Zeynettin; Erickson, Bradley J.

Bioengineering (Basel) ; 11(7)2024 Jun 25.

Article in English | MEDLINE | ID: mdl-39061730

ABSTRACT

Thyroid Ultrasound (US) is the primary method to evaluate thyroid nodules. Deep learning (DL) has been playing a significant role in evaluating thyroid cancer. We propose a DL-based pipeline to detect and classify thyroid nodules into benign or malignant groups relying on two views of US imaging. Transverse and longitudinal US images of thyroid nodules from 983 patients were collected retrospectively. Eighty-one cases were held out as a testing set, and the rest of the data were used in five-fold cross-validation (CV). Two You Look Only Once (YOLO) v5 models were trained to detect nodules and classify them. For each view, five models were developed during the CV, which was ensembled by using non-max suppression (NMS) to boost their collective generalizability. An extreme gradient boosting (XGBoost) model was trained on the outputs of the ensembled models for both views to yield a final prediction of malignancy for each nodule. The test set was evaluated by an expert radiologist using the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS). The ensemble models for each view achieved a mAP0.5 of 0.797 (transverse) and 0.716 (longitudinal). The whole pipeline reached an AUROC of 0.84 (CI 95%: 0.75-0.91) with sensitivity and specificity of 84% and 63%, respectively, while the ACR-TIRADS evaluation of the same set had a sensitivity of 76% and specificity of 34% (p-value = 0.003). Our proposed work demonstrated the potential possibility of a deep learning model to achieve diagnostic performance for thyroid nodule evaluation.

16.

Overview of Artificial Intelligence Research Within Hip and Knee Arthroplasty.

Mickley, John P; Kaji, Elizabeth S; Khosravi, Bardia; Mulford, Kellen L; Taunton, Michael J; Wyles, Cody C.

Arthroplast Today ; 27: 101396, 2024 Jun.

Article in English | MEDLINE | ID: mdl-39071822

ABSTRACT

Hip and knee arthroplasty are high-volume procedures undergoing rapid growth. The large volume of procedures generates a vast amount of data available for next-generation analytics. Techniques in the field of artificial intelligence (AI) can assist in large-scale pattern recognition and lead to clinical insights. AI methodologies have become more prevalent in orthopaedic research. This review will first describe an overview of AI in the medical field, followed by a description of the 3 arthroplasty research areas in which AI is commonly used (risk modeling, automated radiographic measurements, arthroplasty registry construction). Finally, we will discuss the next frontier of AI research focusing on model deployment and uncertainty quantification.

17.

Analyzing Racial Differences in Imaging Joint Replacement Registries Using Generative Artificial Intelligence: Advancing Orthopaedic Data Equity.

Khosravi, Bardia; Rouzrokh, Pouria; Erickson, Bradley J; Garner, Hillary W; Wenger, Doris E; Taunton, Michael J; Wyles, Cody C.

Arthroplast Today ; 29: 101503, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39376670

ABSTRACT

Background: Discrepancies in medical data sets can perpetuate bias, especially when training deep learning models, potentially leading to biased outcomes in clinical applications. Understanding these biases is crucial for the development of equitable healthcare technologies. This study employs generative deep learning technology to explore and understand radiographic differences based on race among patients undergoing total hip arthroplasty. Methods: Utilizing a large institutional registry, we retrospectively analyzed pelvic radiographs from total hip arthroplasty patients, characterized by demographics and image features. Denoising diffusion probabilistic models generated radiographs conditioned on demographic and imaging characteristics. Fréchet Inception Distance assessed the generated image quality, showing the diversity and realism of the generated images. Sixty transition videos were generated that showed transforming White pelvises to their closest African American counterparts and vice versa while controlling for patients' sex, age, and body mass index. Two expert surgeons and 2 radiologists carefully studied these videos to understand the systematic differences that are present in the 2 races' radiographs. Results: Our data set included 480,407 pelvic radiographs, with a predominance of White patients over African Americans. The generative denoising diffusion probabilistic model created high-quality images and reached an Fréchet Inception Distance of 6.8. Experts identified 6 characteristics differentiating races, including interacetabular distance, osteoarthritis degree, obturator foramina shape, femoral neck-shaft angle, pelvic ring shape, and femoral cortical thickness. Conclusions: This study demonstrates the potential of generative models for understanding disparities in medical imaging data sets. By visualizing race-based differences, this method aids in identifying bias in downstream tasks, fostering the development of fairer healthcare practices.

18.

Deep learning classification of pediatric spinal radiographs for use in large scale imaging registries.

Mulford, Kellen L; Regan, Christina M; Todderud, Julia E; Nolte, Charles P; Pinter, Zachariah; Chang-Chien, Connie; Yan, Shi; Wyles, Cody; Khosravi, Bardia; Rouzrokh, Pouria; Maradit Kremers, Hilal; Larson, A Noelle.

Spine Deform ; 2024 Jul 22.

Article in English | MEDLINE | ID: mdl-39039392

ABSTRACT

PURPOSE: The purpose of this study is to develop and apply an algorithm that automatically classifies spine radiographs of pediatric scoliosis patients. METHODS: Anterior-posterior (AP) and lateral spine radiographs were extracted from the institutional picture archive for patients with scoliosis. Overall, there were 7777 AP images and 5621 lateral images. Radiographs were manually classified into ten categories: two preoperative and three postoperative categories each for AP and lateral images. The images were split into training, validation, and testing sets (70:15:15 proportional split). A deep learning classifier using the EfficientNet B6 architecture was trained on the spine training set. Hyperparameters and model architecture were tuned against the performance of the models in the validation set. RESULTS: The trained classifiers had an overall accuracy on the test set of 1.00 on 1166 AP images and 1.00 on 843 lateral images. Precision ranged from 0.98 to 1.00 in the AP images, and from 0.91 to 1.00 on the lateral images. Lower performance was observed on classes with fewer than 100 images in the dataset. Final performance metrics were calculated on the assigned test set, including accuracy, precision, recall, and F1 score (the harmonic mean of precision and recall). CONCLUSIONS: A deep learning convolutional neural network classifier was trained to a high degree of accuracy to distinguish between 10 categories pre- and postoperative spine radiographs of patients with scoliosis. Observed performance was higher in more prevalent categories. These models represent an important step in developing an automatic system for data ingestion into large, labeled imaging registries.

19.

Development of a deep learning model for the automated detection of green pixels indicative of gout on dual energy CT scan.

Faghani, Shahriar; Nicholas, Rhodes G; Patel, Soham; Baffour, Francis I; Moassefi, Mana; Rouzrokh, Pouria; Khosravi, Bardia; Powell, Garret M; Leng, Shuai; Glazebrook, Katrina N; Erickson, Bradley J; Tiegs-Heiden, Christin A.

Res Diagn Interv Imaging ; 9: 100044, 2024 Mar.

Article in English | MEDLINE | ID: mdl-39076582

ABSTRACT

Background: Dual-energy CT (DECT) is a non-invasive way to determine the presence of monosodium urate (MSU) crystals in the workup of gout. Color-coding distinguishes MSU from calcium following material decomposition and post-processing. Most software labels MSU as green and calcium as blue. There are limitations in the current image processing methods of segmenting green-encoded pixels. Additionally, identifying green foci is tedious, and automated detection would improve workflow. This study aimed to determine the optimal deep learning (DL) algorithm for segmenting green-encoded pixels of MSU crystals on DECTs. Methods: DECT images of positive and negative gout cases were retrospectively collected. The dataset was split into train (N = 28) and held-out test (N = 30) sets. To perform cross-validation, the train set was split into seven folds. The images were presented to two musculoskeletal radiologists, who independently identified green-encoded voxels. Two 3D Unet-based DL models, Segresnet and SwinUNETR, were trained, and the Dice similarity coefficient (DSC), sensitivity, and specificity were reported as the segmentation metrics. Results: Segresnet showed superior performance, achieving a DSC of 0.9999 for the background pixels, 0.7868 for the green pixels, and an average DSC of 0.8934 for both types of pixels, respectively. According to the post-processed results, the Segresnet reached voxel-level sensitivity and specificity of 98.72 % and 99.98 %, respectively. Conclusion: In this study, we compared two DL-based segmentation approaches for detecting MSU deposits in a DECT dataset. The Segresnet resulted in superior performance metrics. The developed algorithm provides a potential fast, consistent, highly sensitive and specific computer-aided diagnosis tool. Ultimately, such an algorithm could be used by radiologists to streamline DECT workflow and improve accuracy in the detection of gout.

20.

A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey.

Vahdati, Sanaz; Khosravi, Bardia; Mahmoudi, Elham; Zhang, Kuan; Rouzrokh, Pouria; Faghani, Shahriar; Moassefi, Mana; Tahmasebi, Aylin; Andriole, Katherine P; Chang, Peter; Farahani, Keyvan; Flores, Mona G; Folio, Les; Houshmand, Sina; Giger, Maryellen L; Gichoya, Judy W; Erickson, Bradley J.

J Imaging Inform Med ; 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38558368

ABSTRACT

In recent years, the role of Artificial Intelligence (AI) in medical imaging has become increasingly prominent, with the majority of AI applications approved by the FDA being in imaging and radiology in 2023. The surge in AI model development to tackle clinical challenges underscores the necessity for preparing high-quality medical imaging data. Proper data preparation is crucial as it fosters the creation of standardized and reproducible AI models while minimizing biases. Data curation transforms raw data into a valuable, organized, and dependable resource and is a fundamental process to the success of machine learning and analytical projects. Considering the plethora of available tools for data curation in different stages, it is crucial to stay informed about the most relevant tools within specific research areas. In the current work, we propose a descriptive outline for different steps of data curation while we furnish compilations of tools collected from a survey applied among members of the Society of Imaging Informatics (SIIM) for each of these stages. This collection has the potential to enhance the decision-making process for researchers as they select the most appropriate tool for their specific tasks.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL