Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 142
Filter
Add more filters

Country/Region as subject
Publication year range
1.
AJR Am J Roentgenol ; 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38598354

ABSTRACT

Large language models (LLMs) hold immense potential to revolutionize radiology. However, their integration into practice requires careful consideration. Artificial intelligence (AI) chatbots and general-purpose LLMs have potential pitfalls related to privacy, transparency, and accuracy, limiting their current clinical readiness. Thus, LLM-based tools must be optimized for radiology practice to overcome these limitations. While research and validation for radiology applications remain in their infancy, commercial products incorporating LLMs are becoming available alongside promises of transforming practice. To help radiologists navigate this landscape, this AJR Expert Panel Narrative Review provides a multidimensional perspective on LLMs, encompassing considerations from bench (development and optimization) to bedside (use in practice). At present, LLMs are not autonomous entities that can replace expert decision-making, and radiologists remain responsible for the content of their reports. Patient-facing tools, particularly medical AI chatbots, require additional guardrails to ensure safety and prevent misuse. Still, if responsibly implemented, LLMs are well-positioned to transform efficiency and quality in radiology. Radiologists must be well-informed and proactively involved in guiding the implementation of LLMs in practice to mitigate risks and maximize benefits to patient care.

2.
AJR Am J Roentgenol ; 222(3): e2329530, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37436032

ABSTRACT

Artificial intelligence (AI) is increasingly used in clinical practice for musculoskeletal imaging tasks, such as disease diagnosis and image reconstruction. AI applications in musculoskeletal imaging have focused primarily on radiography, CT, and MRI. Although musculoskeletal ultrasound stands to benefit from AI in similar ways, such applications have been relatively underdeveloped. In comparison with other modalities, ultrasound has unique advantages and disadvantages that must be considered in AI algorithm development and clinical translation. Challenges in developing AI for musculoskeletal ultrasound involve both clinical aspects of image acquisition and practical limitations in image processing and annotation. Solutions from other radiology subspecialties (e.g., crowdsourced annotations coordinated by professional societies), along with use cases (most commonly rotator cuff tendon tears and palpable soft-tissue masses), can be applied to musculoskeletal ultrasound to help develop AI. To facilitate creation of high-quality imaging datasets for AI model development, technologists and radiologists should focus on increasing uniformity in musculoskeletal ultrasound performance and increasing annotations of images for specific anatomic regions. This Expert Panel Narrative Review summarizes available evidence regarding AI's potential utility in musculoskeletal ultrasound and challenges facing its development. Recommendations for future AI advancement and clinical translation in musculoskeletal ultrasound are discussed.


Subject(s)
Artificial Intelligence , Tendons , Humans , Ultrasonography , Algorithms , Head
3.
Skeletal Radiol ; 53(3): 445-454, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37584757

ABSTRACT

OBJECTIVE: The purpose of this systematic review was to summarize the results of original research studies evaluating the characteristics and performance of deep learning models for detection of knee ligament and meniscus tears on MRI. MATERIALS AND METHODS: We searched PubMed for studies published as of February 2, 2022 for original studies evaluating development and evaluation of deep learning models for MRI diagnosis of knee ligament or meniscus tears. We summarized study details according to multiple criteria including baseline article details, model creation, deep learning details, and model evaluation. RESULTS: 19 studies were included with radiology departments leading the publications in deep learning development and implementation for detecting knee injuries via MRI. Among the studies, there was a lack of standard reporting and inconsistently described development details. However, all included studies reported consistently high model performance that significantly supplemented human reader performance. CONCLUSION: From our review, we found radiology departments have been leading deep learning development for injury detection on knee MRIs. Although studies inconsistently described DL model development details, all reported high model performance, indicating great promise for DL in knee MRI analysis.


Subject(s)
Anterior Cruciate Ligament Injuries , Artificial Intelligence , Ligaments, Articular , Meniscus , Humans , Anterior Cruciate Ligament Injuries/diagnostic imaging , Ligaments, Articular/diagnostic imaging , Ligaments, Articular/injuries , Magnetic Resonance Imaging/methods , Meniscus/diagnostic imaging , Meniscus/injuries
4.
Emerg Radiol ; 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39034382

ABSTRACT

PURPOSE: To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases. METHODS: Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ2 test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05. RESULTS: Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001). CONCLUSIONS: In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases.

5.
Radiology ; 306(2): e220505, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36165796

ABSTRACT

Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ2 tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.


Subject(s)
Deep Learning , Male , Female , Humans , Child , Infant , Retrospective Studies , Radiography
7.
AJR Am J Roentgenol ; 219(6): 869-878, 2022 12.
Article in English | MEDLINE | ID: mdl-35731103

ABSTRACT

Fractures are common injuries that can be difficult to diagnose, with missed fractures accounting for most misdiagnoses in the emergency department. Artificial intelligence (AI) and, specifically, deep learning have shown a strong ability to accurately detect fractures and augment the performance of radiologists in proof-of-concept research settings. Although the number of real-world AI products available for clinical use continues to increase, guidance for practicing radiologists in the adoption of this new technology is limited. This review describes how AI and deep learning algorithms can help radiologists to better diagnose fractures. The article also provides an overview of commercially available U.S. FDA-cleared AI tools for fracture detection as well as considerations for the clinical adoption of these tools by radiology practices.


Subject(s)
Fractures, Bone , Radiology , Humans , Artificial Intelligence , Radiologists , Algorithms , Radiography , Fractures, Bone/diagnostic imaging
8.
AJR Am J Roentgenol ; 218(4): 714-715, 2022 04.
Article in English | MEDLINE | ID: mdl-34755522

ABSTRACT

Convolutional neural networks (CNNs) trained to identify abnormalities on upper extremity radiographs achieved an AUC of 0.844 with a frequent emphasis on radiograph laterality and/or technologist labels for decision-making. Covering the labels increased the AUC to 0.857 (p = .02) and redirected CNN attention from the labels to the bones. Using images of radiograph labels alone, the AUC was 0.638, indicating that radiograph labels are associated with abnormal examinations. Potential radiographic confounding features should be considered when curating data for radiology CNN development.


Subject(s)
Deep Learning , Algorithms , Humans , Neural Networks, Computer , Radiography , Upper Extremity
9.
Skeletal Radiol ; 51(2): 345-353, 2022 Feb.
Article in English | MEDLINE | ID: mdl-33576861

ABSTRACT

OBJECTIVE: To develop and evaluate a two-stage deep convolutional neural network system that mimics a radiologist's search pattern for detecting two small fractures: triquetral avulsion fractures and Segond fractures. MATERIALS AND METHODS: We obtained 231 lateral wrist radiographs and 173 anteroposterior knee radiographs from the Stanford MURA and LERA datasets and the public domain to train and validate a two-stage deep convolutional neural network system: (1) object detectors that crop the dorsal triquetrum or lateral tibial condyle, trained on control images, followed by (2) classifiers for triquetral and Segond fractures, trained on a 1:1 case:control split. A second set of classifiers was trained on uncropped images for comparison. External test sets of 50 lateral wrist radiographs and 24 anteroposterior knee radiographs were used to evaluate generalizability. Gradient-class activation mapping was used to inspect image regions of greater importance in deciding the final classification. RESULTS: The object detectors accurately cropped the regions of interest in all validation and test images. The two-stage system achieved cross-validated area under the receiver operating characteristic curve values of 0.959 and 0.989 on triquetral and Segond fractures, compared with 0.860 (p = 0.0086) and 0.909 (p = 0.0074), respectively, for a one-stage classifier. Two-stage cross-validation accuracies were 90.8% and 92.5% for triquetral and Segond fractures, respectively. CONCLUSION: A two-stage pipeline increases accuracy in the detection of subtle fractures on radiographs compared with a one-stage classifier and generalized well to external test data. Focusing attention on specific image regions appears to improve detection of subtle findings that may otherwise be missed.


Subject(s)
Deep Learning , Algorithms , Humans , Neural Networks, Computer , Radiologists , Sensitivity and Specificity
10.
Skeletal Radiol ; 51(2): 407-416, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34351457

ABSTRACT

Although artificial intelligence models have demonstrated high accuracy in identifying specific orthopedic implant models from imaging, which is an important and time-consuming task, the scope of prior works and performance of prior models have not been evaluated. We performed a systematic review to summarize the scope, methodology, and performance of artificial intelligence algorithms in classifying orthopedic implant models. We performed a literature search in PubMed, EMBASE, and the Cochrane Library for studies published up to March 10, 2021, using search terms related to "artificial intelligence", "orthopedic", "implant", and "arthroplasty". Studies were assessed using a modified version of the methodologic index for non-randomized studies. Reported outcomes included area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The search identified 2689 records, of which 11 were included in the final review. The number of implant models evaluated ranged from 2 to 27. Five studies reported overall AUC across all included models which ranged from 0.94 to 1.0. Overall accuracy values ranged from 0.804 to 1.0. One study compared AI model performance with that of three surgeons, reporting similar performance. There was a large degree of variation in methodology and reporting quality. Artificial intelligence algorithms have demonstrated strong performance in classifying orthopedic implant models from radiographs. Further research is needed to compare artificial intelligence alone and as an adjunct with human experts in implant identification. Future studies should aim to adhere to rigorous artificial intelligence development methods and thorough, transparent reporting of methods and results.


Subject(s)
Artificial Intelligence , Orthopedics , Algorithms , Humans , ROC Curve , Radiography
11.
Skeletal Radiol ; 51(2): 271-278, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34191083

ABSTRACT

Artificial intelligence (AI) represents a broad category of algorithms for which deep learning is currently the most impactful. When electing to begin the process of building an adequate fundamental knowledge base allowing them to decipher machine learning research and algorithms, clinical musculoskeletal radiologists currently have few options to turn to. In this article, we provide an introduction to the vital terminology to understand, how to make sense of data splits and regularization, an introduction to the statistical analyses used in AI research, a primer on what deep learning can or cannot do, and a brief overview of clinical integration methods. Our goal is to improve the readers' understanding of this field.


Subject(s)
Artificial Intelligence , Radiology , Algorithms , Humans , Machine Learning , Radiologists
12.
Skeletal Radiol ; 51(2): 305-313, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34350476

ABSTRACT

Artificial intelligence (AI) and deep learning have multiple potential uses in aiding the musculoskeletal radiologist in the radiological evaluation of orthopedic implants. These include identification of implants, characterization of implants according to anatomic type, identification of specific implant models, and evaluation of implants for positioning and complications. In addition, natural language processing (NLP) can aid in the acquisition of clinical information from the medical record that can help with tasks like prepopulating radiology reports. Several proof-of-concept works have been published in the literature describing the application of deep learning toward these various tasks, with performance comparable to that of expert musculoskeletal radiologists. Although much work remains to bring these proof-of-concept algorithms into clinical deployment, AI has tremendous potential toward automating these tasks, thereby augmenting the musculoskeletal radiologist.


Subject(s)
Musculoskeletal System , Orthopedics , Algorithms , Artificial Intelligence , Humans , Radiologists
13.
Skeletal Radiol ; 51(2): 423-429, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34476558

ABSTRACT

OBJECTIVE: The purpose of this study was to evaluate agreement in predictions made by a bone age prediction application ("app") among three data input methods. METHODS: The 16Bit Bone Age app is a browser-based deep learning application for predicting bone age on pediatric hand radiographs; recommended data input methods are direct image file upload or smartphone-capture of image. We collected 50 hand radiographs, split equally among 5 bone age groups. Three observers used the 16Bit Bone Age app to assess these images using 3 different data input methods: (1) direct image upload, (2) smartphone photo of image in radiology reading room, and (3) smartphone photo of image in a clinic. RESULTS: Interobserver agreement was excellent for direct upload (ICC = 1.00) and for photos in reading room (ICC = 0.96) and good for photos in clinic (ICC = 0.82), respectively. Intraobserver agreement for the entire test set across the 3 data input methods was variable with ICCs of 0.95, 0.96, and 0.57 for the 3 observers, respectively. DISCUSSION: Our findings indicate that different data input methods can result in discordant bone age predictions from the 16Bit Bone Age app. Further study is needed to determine the impact of data input methods, such as smartphone image capture, on deep learning app performance and accuracy.


Subject(s)
Deep Learning , Mobile Applications , Child , Humans , Smartphone
14.
Skeletal Radiol ; 51(2): 401-406, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34351456

ABSTRACT

OBJECTIVE: To evaluate the behavior of a publicly available deep convolutional neural network (DCNN) bone age algorithm when presented with inappropriate data inputs in both radiological and non-radiological domains. METHODS: We evaluated a publicly available DCNN-based bone age application. The DCNN was trained on 12,612 pediatric hand radiographs and won the 2017 RSNA Pediatric Bone Age Challenge (concordance of 0.991 with radiologist ground-truth). We used the application to analyze 50 left-hand radiographs (appropriate data inputs) and seven classes of inappropriate data inputs in radiological (i.e., chest radiographs) and non-radiological (i.e., image of street numbers) domains. For each image, we noted if (1) the application distinguished between appropriate and inappropriate data inputs and (2) inference time per image. Mean inference times were compared using ANOVA. RESULTS: The 16Bit Bone Age application calculated bone age for all pediatric hand radiographs with mean inference time of 1.1 s. The application did not distinguish between pediatric hand radiographs and inappropriate image types, including radiological and non-radiological domains. The application inappropriately calculated bone age for all inappropriate image types, with mean inference time of 1.1 s for all categories (p = 1). CONCLUSION: A publicly available DCNN-based bone age application failed to distinguish between appropriate and inappropriate data inputs and calculated bone age for inappropriate images. The awareness of inappropriate outputs based on inappropriate DCNN input is important if tasks such as bone age determination are automated, emphasizing the need for appropriate oversight at the data input and verification stage to avoid unrecognized erroneous results.


Subject(s)
Deep Learning , Automobiles , Child , Flowers , Humans , Neural Networks, Computer , Radiography
15.
Skeletal Radiol ; 51(11): 2121-2128, 2022 Nov.
Article in English | MEDLINE | ID: mdl-35624310

ABSTRACT

OBJECTIVE: Deep learning has the potential to automatically triage orthopedic emergencies, such as joint dislocations. However, due to the rarity of these injuries, collecting large numbers of images to train algorithms may be infeasible for many centers. We evaluated if the Internet could be used as a source of images to train convolutional neural networks (CNNs) for joint dislocations that would generalize well to real-world clinical cases. METHODS: We collected datasets from online radiology repositories of 100 radiographs each (50 dislocated, 50 located) for four joints: native shoulder, elbow, hip, and total hip arthroplasty (THA). We trained a variety of CNN binary classifiers using both on-the-fly and static data augmentation to identify the various joint dislocations. The best-performing classifier for each joint was evaluated on an external test set of 100 corresponding radiographs (50 dislocations) from three hospitals. CNN performance was evaluated using area under the ROC curve (AUROC). To determine areas emphasized by the CNN for decision-making, class activation map (CAM) heatmaps were generated for test images. RESULTS: The best-performing CNNs for elbow, hip, shoulder, and THA dislocation achieved high AUROCs on both internal and external test sets (internal/external AUC): elbow (1.0/0.998), hip (0.993/0.880), shoulder (1.0/0.993), THA (1.0/0.950). Heatmaps demonstrated appropriate emphasis of joints for both located and dislocated joints. CONCLUSION: With modest numbers of images, radiographs from the Internet can be used to train clinically-generalizable CNNs for joint dislocations. Given the rarity of joint dislocations at many centers, online repositories may be a viable source for CNN-training data.


Subject(s)
Crowdsourcing , Deep Learning , Joint Dislocations , Algorithms , Humans , Internet
16.
Emerg Radiol ; 29(1): 107-113, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34648114

ABSTRACT

PURPOSE: (1) Develop a deep learning system (DLS) to identify pneumonia in pediatric chest radiographs, and (2) evaluate its generalizability by comparing its performance on internal versus external test datasets. METHODS: Radiographs of patients between 1 and 5 years old from the Guangzhou Women and Children's Medical Center (Guangzhou dataset) and NIH ChestXray14 dataset were included. We utilized 5232 radiographs from the Guangzhou dataset to train a ResNet-50 deep convolutional neural network (DCNN) to identify pediatric pneumonia. DCNN testing was performed on a holdout set of 624 radiographs from the Guangzhou dataset (internal test set) and 383 radiographs from the NIH ChestXray14 dataset (external test set). Receiver operating characteristic curves were generated, and area under the curve (AUC) was compared via DeLong parametric method. Colored heatmaps were generated using class activation mapping (CAM) to identify important image pixels for DCNN decision-making. RESULTS: The DCNN achieved AUC of 0.95 and 0.54 for identifying pneumonia on internal and external test sets, respectively (p < 0.0001). Heatmaps generated by the DCNN showed the algorithm focused on clinically relevant features for images from the internal test set, but not for images from the external test set. CONCLUSION: Our model had high performance when tested on an internal dataset but significantly lower accuracy when tested on an external dataset. Likewise, marked differences existed in the clinical relevance of features highlighted by heatmaps generated from internal versus external datasets. This study underscores potential limitations in the generalizability of such DLS models.


Subject(s)
Deep Learning , Pneumonia , Algorithms , Child , Child, Preschool , Female , Humans , Infant , Neural Networks, Computer , Pneumonia/diagnostic imaging , Retrospective Studies
17.
Emerg Radiol ; 29(2): 365-370, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35006495

ABSTRACT

BACKGROUND: Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias. OBJECTIVE: To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs. MATERIALS AND METHODS: Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN's prediction. RESULTS: On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions. CONCLUSION: DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.


Subject(s)
Deep Learning , Algorithms , Female , Humans , Male , Neural Networks, Computer , Radiography , Radiologists
18.
Emerg Radiol ; 29(5): 801-808, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35608786

ABSTRACT

OBJECTIVE: Periprosthetic dislocations of total hip arthroplasty (THA) are time-sensitive injuries, as the longer diagnosis and treatment are delayed, the more difficult they are to reduce. Automated triage of radiographs with dislocations could help reduce these delays. We trained convolutional neural networks (CNNs) for the detection of THA dislocations, and evaluated their generalizability by evaluating them on external datasets. METHODS: We used 357 THA radiographs from a single hospital (185 with dislocation [51.8%]) to develop and internally test a variety of CNNs to identify THA dislocation. We performed external testing of these CNNs on two datasets to evaluate generalizability. CNN performance was evaluated using area under the receiving operating characteristic curve (AUROC). Class activation mapping (CAM) was used to create heatmaps of test images for visualization of regions emphasized by the CNNs. RESULTS: Multiple CNNs achieved AUCs of 1 for both internal and external test sets, indicating good generalizability. Heatmaps showed that CNNs consistently emphasized the THA for both dislocated and located THAs. CONCLUSION: CNNs can be trained to recognize THA dislocation with high diagnostic performance, which supports their potential use for triage in the emergency department. Importantly, our CNNs generalized well to external data from two sources, further supporting their potential clinical utility.


Subject(s)
Arthroplasty, Replacement, Hip , Deep Learning , Joint Dislocations , Humans , Internet , Neural Networks, Computer , Retrospective Studies
19.
J Digit Imaging ; 35(3): 660-665, 2022 06.
Article in English | MEDLINE | ID: mdl-35166969

ABSTRACT

The purpose of this study was to evaluate the feasibility of translation of RadLex lexicon from English to German performed by Google Translate, using the RadLex ontology as ground truth. The same comparison was also performed for German to English translations. We determined the concordance rate of the Google Translate-rendered translations (for both English to German and German to English translations) to the official German RadLex (translations provided by the German Radiological Society) and English RadLex terms via character-by-character concordance analysis (string matching). Specific term characteristics of term character count and word count were compared between concordant and discordant translations using t-tests. Google Translate-rendered translations originally considered incongruent (2482 English terms and 2500 German terms) were then reviewed by German and English-speaking radiologists to further evaluate clinical utility. Overall success rates of both methods were calculated by adding the percentage of terms marked correct by string comparison to the percentage marked correct during manual review extrapolated to the terms that had been initially marked incorrect during string analysis. 64,632 English and 47,425 German RadLex terms were analyzed. 3507 (5.4%) of the Google Translate-rendered English to German translations were concordant with the official German RadLex terms when evaluated via character-by-character concordance. 3288 (6.9%) of the Google Translate-rendered German to English translations matched the corresponding English RadLex terms. Human review of a random sample of non-concordant machine translations revealed that 95.5% of such English to German translations were understandable, whereas 43.9% of such German to English translations were understandable. Combining both string matching and human review resulted in an overall Google Translate success rate of 95.7% for English to German translations and 47.8% for German to English translations. For certain radiologic text translation tasks, Google Translate may be a useful tool for translating multi-language radiology reports into a common language for natural language processing and subsequent labeling of datasets for machine learning. Indeed, string matching analysis alone is an incomplete method for evaluating machine translation. However, when human review of automated translation is also incorporated, measured performance improves. Additional evaluation using longer text samples and full imaging reports is needed. An apparent discordance between English to German versus German to English translation suggests that the direction of translation affects accuracy.


Subject(s)
Language , Translating , Humans , Natural Language Processing , Radiologists , Translations
20.
J Neuroophthalmol ; 41(3): 368-374, 2021 Sep 01.
Article in English | MEDLINE | ID: mdl-34415271

ABSTRACT

BACKGROUND: To date, deep learning-based detection of optic disc abnormalities in color fundus photographs has mostly been limited to the field of glaucoma. However, many life-threatening systemic and neurological conditions can manifest as optic disc abnormalities. In this study, we aimed to extend the application of deep learning (DL) in optic disc analyses to detect a spectrum of nonglaucomatous optic neuropathies. METHODS: Using transfer learning, we trained a ResNet-152 deep convolutional neural network (DCNN) to distinguish between normal and abnormal optic discs in color fundus photographs (CFPs). Our training data set included 944 deidentified CFPs (abnormal 364; normal 580). Our testing data set included 151 deidentified CFPs (abnormal 71; normal 80). Both the training and testing data sets contained a wide range of optic disc abnormalities, including but not limited to ischemic optic neuropathy, atrophy, compressive optic neuropathy, hereditary optic neuropathy, hypoplasia, papilledema, and toxic optic neuropathy. The standard measures of performance (sensitivity, specificity, and area under the curve of the receiver operating characteristic curve (AUC-ROC)) were used for evaluation. RESULTS: During the 10-fold cross-validation test, our DCNN for distinguishing between normal and abnormal optic discs achieved the following mean performance: AUC-ROC 0.99 (95 CI: 0.98-0.99), sensitivity 94% (95 CI: 91%-97%), and specificity 96% (95 CI: 93%-99%). When evaluated against the external testing data set, our model achieved the following mean performance: AUC-ROC 0.87, sensitivity 90%, and specificity 69%. CONCLUSION: In summary, we have developed a deep learning algorithm that is capable of detecting a spectrum of optic disc abnormalities in color fundus photographs, with a focus on neuro-ophthalmological etiologies. As the next step, we plan to validate our algorithm prospectively as a focused screening tool in the emergency department, which if successful could be beneficial because current practice pattern and training predict a shortage of neuro-ophthalmologists and ophthalmologists in general in the near future.


Subject(s)
Algorithms , Deep Learning , Diagnostic Techniques, Ophthalmological , Optic Disk/abnormalities , Optic Nerve Diseases/diagnosis , Humans , Optic Disk/diagnostic imaging , ROC Curve
SELECTION OF CITATIONS
SEARCH DETAIL