RESUMO
OBJECTIVE: This proof-of-concept study assessed how confidently an artificial intelligence (AI) model can determine the sex of a fetus from an ultrasound image. STUDY DESIGN: Analysis was performed using 19,212 ultrasound image slices from a high-volume fetal sex determination practice. This dataset was split into a training set (11,769) and test set (7,443). A computer vision model was trained using a transfer learning approach with EfficientNetB4 architecture as base. The performance of the computer vision model was evaluated on the hold out test set. Accuracy, Cohen's Kappa and Multiclass Receiver Operating Characteristic area under the curve (AUC) were used to evaluate the performance of the model. RESULTS: The AI model achieved an Accuracy of 88.27% on the holdout test set and a Cohen's Kappa score 0.843. The ROC AUC score for Male was calculated to be 0.896, for Female a score of 0.897, for Unable to Assess a score of 0.916, and for Text Added a score of 0.981 was achieved. CONCLUSION: This novel AI model proved to have a high rate of fetal sex capture that could be of significant use in areas where ultrasound expertise is not readily available. KEY POINTS: · This is the first proof-of-concept AI model to determine fetal sex.. · This study adds to the growing research in ultrasound AI.. · Our findings demonstrate AI integration into obstetric care..
Assuntos
Inteligência Artificial , Curva ROC , Análise para Determinação do Sexo , Ultrassonografia Pré-Natal , Humanos , Feminino , Ultrassonografia Pré-Natal/métodos , Masculino , Análise para Determinação do Sexo/métodos , Gravidez , Estudo de Prova de Conceito , Área Sob a CurvaRESUMO
BACKGROUND: The purpose of this study is to assess the viability of a knee arthroplasty prediction model using 3-view X-rays that helps determine if patients with knee pain are candidates for total knee arthroplasty (TKA), unicompartmental knee arthroplasty (UKA), or are not arthroplasty candidates. METHODS: Analysis was performed using radiographic and surgical data from a high-volume joint replacement practice. The dataset included 3 different X-ray views (anterior-posterior, lateral, and sunrise) for 2,767 patients along with information of whether that patient underwent an arthroplasty surgery (UKA or TKA) or not. This resulted in a dataset including 8,301 images from 2,707 patients. This dataset was then split into a training set (70%) and holdout test set (30%). A computer vision model was trained using a transfer learning approach. The performance of the computer vision model was evaluated on the holdout test set. Accuracy and multiclass receiver operating characteristic area under curve was used to evaluate the performance of the model. RESULTS: The artificial intelligence model achieved an accuracy of 87.8% on the holdout test set and a quadratic Cohen's kappa score of 0.811. The multiclass receiver operating characteristic area under curve score for TKA was calculated to be 0.97; for UKA a score of 0.96 and for No Surgery a score of 0.98 was achieved. An accuracy of 93.8% was achieved for predicting Surgery versus No Surgery and 88% for TKA versus not TKA was achieved. CONCLUSION: The artificial intelligence/machine learning model demonstrated viability for predicting which patients are candidates for a UKA, TKA, or no surgical intervention.
Assuntos
Artroplastia do Joelho , Osteoartrite do Joelho , Humanos , Artroplastia do Joelho/métodos , Osteoartrite do Joelho/cirurgia , Inteligência Artificial , Resultado do Tratamento , Articulação do Joelho/diagnóstico por imagem , Articulação do Joelho/cirurgia , Aprendizado de MáquinaRESUMO
BACKGROUND: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks. OBJECTIVE: This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance. METHODS: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group. RESULTS: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen. CONCLUSIONS: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.
RESUMO
Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we investigate the strengths and weaknesses of multiple deep learning approaches for automated B-line detection and localization in LUS videos. We curate and publish, BEDLUS, a new ultrasound dataset comprising 1,419 videos from 113 patients with a total of 15,755 expert-annotated B-lines. Based on this dataset, we present a benchmark of established deep learning methods applied to the task of B-line detection. To pave the way for interpretable quantification of B-lines, we propose a novel "single-point" approach to B-line localization using only the point of origin. Our results show that (a) the area under the receiver operating characteristic curve ranges from 0.864 to 0.955 for the benchmarked detection methods, (b) within this range, the best performance is achieved by models that leverage multiple successive frames as input, and (c) the proposed single-point approach for B-line localization reaches an F 1-score of 0.65, performing on par with the inter-observer agreement. The dataset and developed methods can facilitate further biomedical research on automated interpretation of lung ultrasound with the potential to expand the clinical utility.