Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 318
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
J Magn Reson Imaging ; 59(2): 483-493, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37177832

RESUMO

BACKGROUND: The diagnosis of prenatal placenta accreta spectrum (PAS) with magnetic resonance imaging (MRI) is highly dependent on radiologists' experience. A deep learning (DL) method using the prior knowledge that PAS-related signs are generally found along the utero-placental borderline (UPB) may help radiologists, especially those with less experience, to mitigate this issue. PURPOSE: To develop a DL tool for antenatal diagnosis of PAS using T2-weighted MR images. STUDY TYPE: Retrospective. SUBJECTS: Five hundred and forty pregnant women with clinically suspected PAS disorders from two institutions, divided into training (409), internal test (103), and external test (28) datasets. FIELD STRENGTH/SEQUENCE: Sagittal T2-weighted fast spin echo sequence at 1.5 T and 3 T. ASSESSMENT: An nnU-Net was trained for placenta segmentation. The UPB straightening approach was used to extract the utero-placental boundary region. The UPB image was then fed into DenseNet-PAS for PAS diagnosis. DenseNet-PP learnt placental position information to improve the PAS diagnosis performance. Three radiologists with 8, 10, and 12 years of experience independently evaluated the images. Two radiologists marked the placenta tissue. Histopathological findings were the reference standard. STATISTICAL TESTS: Area under the curve (AUC) was used to evaluate the classification. Dice coefficient evaluated the segmentation between radiologists and the model performance. The Mann-Whitney U-test or the chi-squared test assessed the significance of differences. Decision curve analysis was used to determine clinical effectiveness. DeLong's test was used to compare AUCs. RESULTS: Of the 540 patients, 170 had PAS disorders confirmed by histopathology. The DL model using UPB images and placental position yielded the highest AUC of 0.860 and 0.897 in internal test and external test cohorts, respectively, significantly exceeding the performance of three radiologists (internal test AUC, 0.737-0.770). DATA CONCLUSION: By extracting the UPB image, this fully automatic DL pipeline achieved high accuracy and may assist radiologists in PAS diagnosis using MRI. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY: Stage 2.


Assuntos
Aprendizado Profundo , Placenta Acreta , Feminino , Gravidez , Humanos , Placenta , Placenta Acreta/diagnóstico por imagem , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos
2.
Eur Radiol ; 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38842692

RESUMO

OBJECTIVES: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes. MATERIALS AND METHODS: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists. RESULTS: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively. CONCLUSION: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes. CLINICAL RELEVANCE STATEMENT: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI. KEY POINTS: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.

3.
Eur Radiol ; 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38861161

RESUMO

PURPOSE: This work aims to assess standard evaluation practices used by the research community for evaluating medical imaging classifiers, with a specific focus on the implications of class imbalance. The analysis is performed on chest X-rays as a case study and encompasses a comprehensive model performance definition, considering both discriminative capabilities and model calibration. MATERIALS AND METHODS: We conduct a concise literature review to examine prevailing scientific practices used when evaluating X-ray classifiers. Then, we perform a systematic experiment on two major chest X-ray datasets to showcase a didactic example of the behavior of several performance metrics under different class ratios and highlight how widely adopted metrics can conceal performance in the minority class. RESULTS: Our literature study confirms that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest X-ray classifiers, albeit its importance in the context of healthcare. Moreover, our systematic experiments confirm that current evaluation practices may not reflect model performance in real clinical scenarios and suggest complementary metrics to better reflect the performance of the system in such scenarios. CONCLUSION: Our analysis underscores the need for enhanced evaluation practices, particularly in the context of class-imbalanced chest X-ray classifiers. We recommend the inclusion of complementary metrics such as the area under the precision-recall curve (AUC-PR), adjusted AUC-PR, and balanced Brier score, to offer a more accurate depiction of system performance in real clinical scenarios, considering metrics that reflect both, discrimination and calibration performance. CLINICAL RELEVANCE STATEMENT: This study underscores the critical need for refined evaluation metrics in medical imaging classifiers, emphasizing that prevalent metrics may mask poor performance in minority classes, potentially impacting clinical diagnoses and healthcare outcomes. KEY POINTS: Common scientific practices in papers dealing with X-ray computer-assisted diagnosis (CAD) systems may be misleading. We highlight limitations in reporting of evaluation metrics for X-ray CAD systems in highly imbalanced scenarios. We propose adopting alternative metrics based on experimental evaluation on large-scale datasets.

4.
AJR Am J Roentgenol ; 222(1): e2329655, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37493324

RESUMO

BACKGROUND. Screening mammography has decreased performance in patients with dense breasts. Supplementary screening ultrasound is a recommended option in such patients, although it has yielded mixed results in prior investigations. OBJECTIVE. The purpose of this article is to compare the performance characteristics of screening mammography alone, standalone artificial intelligence (AI), ultrasound alone, and mammography in combination with AI and/or ultrasound in patients with dense breasts. METHODS. This retrospective study included 1325 women (mean age, 53 years) with dense breasts who underwent both screening mammography and supplementary breast ultrasound within a 1-month interval from January 2017 to December 2017; prior mammography and prior ultrasound examinations were available for comparison in 91.2% and 91.8%, respectively. Mammography and ultrasound examinations were interpreted by one of 15 radiologists (five staff; 10 fellows); clinical reports were used for the present analysis. A commercial AI tool was used to retrospectively evaluate mammographic examinations for presence of cancer. Screening performances were compared among mammography, AI, ultrasound, and test combinations, using generalized estimating equations. Benign diagnoses required 24 months or longer of imaging stability. RESULTS. Twelve cancers (six invasive ductal carcinoma; six ductal carcinoma in situ) were diagnosed. Mammography, standalone AI, and ultrasound showed cancer detection rates (per 1000 patients) of 6.0, 6.8, and 6.0 (all p > .05); recall rates of 4.4%, 11.9%, and 9.2% (all p < .05); sensitivity of 66.7%, 75.0%, and 66.7% (all p > .05); specificity of 96.2%, 88.7%, and 91.3% (all p < .05); and accuracy of 95.9%, 88.5%, and 91.1% (all p < .05). Mammography with AI, mammography with ultrasound, and mammography with both ultrasound and AI showed cancer detection rates of 7.5, 9.1, and 9.1 (all p > .05); recall rates of 14.9, 11.7, and 21.4 (all p < .05); sensitivity of 83.3%, 100.0%, and 100.0% (all p > .05); specificity of 85.8%, 89.1%, and 79.4% (all p < .05); and accuracy of 85.7%, 89.2%, and 79.5% (all p < .05). CONCLUSION. Mammography with supplementary ultrasound showed higher accuracy, higher specificity, and lower recall rate in comparison with mammography with AI and in comparison with mammography with both ultrasound and AI. CLINICAL IMPACT. The findings fail to show benefit of AI with respect to screening mammography performed with supplementary breast ultrasound in patients with dense breasts.


Assuntos
Neoplasias da Mama , Mamografia , Humanos , Feminino , Pessoa de Meia-Idade , Mamografia/métodos , Densidade da Mama , Estudos Retrospectivos , Inteligência Artificial , Detecção Precoce de Câncer/métodos , Programas de Rastreamento/métodos
5.
Neuroradiology ; 66(7): 1153-1160, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38619571

RESUMO

PURPOSE: To evaluate the impact of an AI-based software trained to detect cerebral aneurysms on TOF-MRA on the diagnostic performance and reading times across readers with varying experience levels. METHODS: One hundred eighty-six MRI studies were reviewed by six readers to detect cerebral aneurysms. Initially, readings were assisted by the CNN-based software mdbrain. After 6 weeks, a second reading was conducted without software assistance. The results were compared to the consensus reading of two neuroradiological specialists and sensitivity (lesion and patient level), specificity (patient level), and false positives per case were calculated for the group of all readers, for the subgroup of physicians, and for each individual reader. Also, reading times for each reader were measured. RESULTS: The dataset contained 54 aneurysms. The readers had no experience (three medical students), 2 years experience (resident in neuroradiology), 6 years experience (radiologist), and 12 years (neuroradiologist). Significant improvements of overall specificity and the overall number of false positives per case were observed in the reading with AI support. For the physicians, we found significant improvements of sensitivity on lesion and patient level and false positives per case. Four readers experienced reduced reading times with the software, while two encountered increased times. CONCLUSION: In the reading with the AI-based software, we observed significant improvements in terms of specificity and false positives per case for the group of all readers and significant improvements of sensitivity and false positives per case for the physicians. Further studies are needed to investigate the effects of the AI-based software in a prospective setting.


Assuntos
Aneurisma Intracraniano , Angiografia por Ressonância Magnética , Sensibilidade e Especificidade , Software , Humanos , Aneurisma Intracraniano/diagnóstico por imagem , Angiografia por Ressonância Magnética/métodos , Masculino , Feminino , Pessoa de Meia-Idade , Competência Clínica , Interpretação de Imagem Assistida por Computador/métodos , Inteligência Artificial , Idoso , Adulto
6.
Int J Comput Vis ; 132(7): 2567-2584, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38911323

RESUMO

Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Thus, accurate and early detection of PH and the classification of its severity is crucial for appropriate and successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. Little effort has been directed towards automatic assessment of PH using echocardiography, and the few proposed methods only focus on binary PH classification on the adult population. In this work, we present an explainable multi-view video-based deep learning approach to predict and classify the severity of PH for a cohort of 270 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation and 0.63 for severity prediction and 0.78 for binary detection on the held-out test set. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms.

7.
Acta Radiol ; 65(9): 1065-1079, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39043232

RESUMO

Radiographic measurements play a crucial role in evaluating the alignment of distal radius fractures (DRFs). Various manual methods have been used to perform the measurements, but they are susceptible to inaccuracies. Recently, computer-aided methods have become available. This review explores the methods commonly used to assess DRFs. The review introduces the different measurement techniques, discusses the sources of measurement errors and measurement reliability, and provides a recommendation for their use. Radiographic measurements used in the evaluation of DRFs are not reliable. Standardizing the measurement techniques is crucial to address this and automated image analysis could help improve accuracy and reliability.


Assuntos
Fraturas do Rádio , Humanos , Fraturas do Rádio/diagnóstico por imagem , Reprodutibilidade dos Testes , Radiografia/métodos , Radiografia/normas , Rádio (Anatomia)/diagnóstico por imagem , Fraturas do Punho
8.
J Obstet Gynaecol Can ; 46(3): 102277, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37951574

RESUMO

The transformative power of artificial intelligence (AI) is reshaping diverse domains of medicine. Recent progress, catalyzed by computing advancements, has seen commensurate adoption of AI technologies within obstetrics and gynaecology. We explore the use and potential of AI in three focus areas: predictive modelling for pregnancy complications, Deep learning-based image interpretation for precise diagnoses, and large language models enabling intelligent health care assistants. We also provide recommendations for the ethical implementation, governance of AI, and promote research into AI explainability, which are crucial for responsible AI integration and deployment. AI promises a revolutionary era of personalized health care in obstetrics and gynaecology.


Assuntos
Ginecologia , Obstetrícia , Feminino , Gravidez , Humanos , Inteligência Artificial , Pessoal Técnico de Saúde , Instalações de Saúde
9.
J Neuroeng Rehabil ; 21(1): 129, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39085937

RESUMO

BACKGROUND: Positional preferences, asymmetry of body position and movements potentially indicate abnormal clinical conditions in infants. However, a lack of standardized nomenclature hinders accurate assessment and documentation of these preferences over time. Video tools offer a safe and reproducible method to analyze and describe infant movement patterns, aiding in physiotherapy management and goal planning. The study aimed to develop an objective classification system for infant movement patterns with particular emphasis on the specific distribution of muscle tension, using methods of computer analysis of video recordings to enhance accuracy and reproducibility in assessments. METHODS: The study involved the recording of videos of 51 infants between 6 and 15 weeks of age, born at term, with an Apgar score of at least 8 points. Based on observations of a recording of infant spontaneous movements in the supine position, experts identified postural-motor patterns: symmetry and typical asymmetry linked to the asymmetrical tonic neck reflex. Deviations from the typical postural-motor system were indicated, and subcategories of atypical patterns were distinguished. A computer-based inference system was developed to automatically classify individual patterns. RESULTS: The following division of motor patterns was used: (1) normal patterns, including (a) typical (symmetrical, asymmetrical: variants 1 and 2); and (b) atypical (variants: 1 to 4), (2) positional preference, and (3) abnormal patterns. The proposed automatic classification method achieved an expert decision mapping accuracy of 84%. For atypical patterns, the high reproducibility of the system's results was confirmed. Lower reproducibility, not exceeding 70%, was achieved with typical patterns. CONCLUSIONS: Based on the observation of infant spontaneous movements, it is possible to identify movement patterns divided into typical and atypical patterns. Computer-based analysis of infant movement patterns makes it possible to objectify and satisfactorily reproduce diagnostic decisions.


Assuntos
Movimento , Gravação em Vídeo , Humanos , Lactente , Movimento/fisiologia , Gravação em Vídeo/métodos , Feminino , Masculino , Reprodutibilidade dos Testes , Postura/fisiologia
10.
Eur Arch Otorhinolaryngol ; 281(4): 1835-1841, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38189967

RESUMO

PURPOSE: This study aimed to evaluate the utility of large language model (LLM) artificial intelligence tools, Chat Generative Pre-Trained Transformer (ChatGPT) versions 3.5 and 4, in managing complex otolaryngological clinical scenarios, specifically for the multidisciplinary management of odontogenic sinusitis (ODS). METHODS: A prospective, structured multidisciplinary specialist evaluation was conducted using five ad hoc designed ODS-related clinical scenarios. LLM responses to these scenarios were critically reviewed by a multidisciplinary panel of eight specialist evaluators (2 ODS experts, 2 rhinologists, 2 general otolaryngologists, and 2 maxillofacial surgeons). Based on the level of disagreement from panel members, a Total Disagreement Score (TDS) was calculated for each LLM response, and TDS comparisons were made between ChatGPT3.5 and ChatGPT4, as well as between different evaluators. RESULTS: While disagreement to some degree was demonstrated in 73/80 evaluator reviews of LLMs' responses, TDSs were significantly lower for ChatGPT4 compared to ChatGPT3.5. Highest TDSs were found in the case of complicated ODS with orbital abscess, presumably due to increased case complexity with dental, rhinologic, and orbital factors affecting diagnostic and therapeutic options. There were no statistically significant differences in TDSs between evaluators' specialties, though ODS experts and maxillofacial surgeons tended to assign higher TDSs. CONCLUSIONS: LLMs like ChatGPT, especially newer versions, showed potential for complimenting evidence-based clinical decision-making, but substantial disagreement was still demonstrated between LLMs and clinical specialists across most case examples, suggesting they are not yet optimal in aiding clinical management decisions. Future studies will be important to analyze LLMs' performance as they evolve over time.


Assuntos
Inteligência Artificial , Sinusite , Humanos , Estudos Prospectivos , Reprodutibilidade dos Testes , Idioma
11.
Eur Arch Otorhinolaryngol ; 281(9): 5001-5006, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38795148

RESUMO

PURPOSE: This study evaluates the efficacy of two advanced Large Language Models (LLMs), OpenAI's ChatGPT 4 and Google's Gemini Advanced, in providing treatment recommendations for head and neck oncology cases. The aim is to assess their utility in supporting multidisciplinary oncological evaluations and decision-making processes. METHODS: This comparative analysis examined the responses of ChatGPT 4 and Gemini Advanced to five hypothetical cases of head and neck cancer, each representing a different anatomical subsite. The responses were evaluated against the latest National Comprehensive Cancer Network (NCCN) guidelines by two blinded panels using the total disagreement score (TDS) and the artificial intelligence performance instrument (AIPI). Statistical assessments were performed using the Wilcoxon signed-rank test and the Friedman test. RESULTS: Both LLMs produced relevant treatment recommendations with ChatGPT 4 generally outperforming Gemini Advanced regarding adherence to guidelines and comprehensive treatment planning. ChatGPT 4 showed higher AIPI scores (median 3 [2-4]) compared to Gemini Advanced (median 2 [2-3]), indicating better overall performance. Notably, inconsistencies were observed in the management of induction chemotherapy and surgical decisions, such as neck dissection. CONCLUSIONS: While both LLMs demonstrated the potential to aid in the multidisciplinary management of head and neck oncology, discrepancies in certain critical areas highlight the need for further refinement. The study supports the growing role of AI in enhancing clinical decision-making but also emphasizes the necessity for continuous updates and validation against current clinical standards to integrate AI into healthcare practices fully.


Assuntos
Neoplasias de Cabeça e Pescoço , Humanos , Neoplasias de Cabeça e Pescoço/terapia , Neoplasias de Cabeça e Pescoço/patologia , Reprodutibilidade dos Testes , Tomada de Decisão Clínica , Idioma , Inteligência Artificial
12.
BMC Med ; 21(1): 405, 2023 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-37880716

RESUMO

BACKGROUND: Most of superficial soft-tissue masses are benign tumors, and very few are malignant tumors. However, persistent growth, of both benign and malignant tumors, can be painful and even life-threatening. It is necessary to improve the differential diagnosis performance for superficial soft-tissue masses by using deep learning models. This study aimed to propose a new ultrasonic deep learning model (DLM) system for the differential diagnosis of superficial soft-tissue masses. METHODS: Between January 2015 and December 2022, data for 1615 patients with superficial soft-tissue masses were retrospectively collected. Two experienced radiologists (radiologists 1 and 2 with 8 and 30 years' experience, respectively) analyzed the ultrasound images of each superficial soft-tissue mass and made a diagnosis of malignant mass or one of the five most common benign masses. After referring to the DLM results, they re-evaluated the diagnoses. The diagnostic performance and concerns of the radiologists were analyzed before and after referring to the results of the DLM results. RESULTS: In the validation cohort, DLM-1 was trained to distinguish between benign and malignant masses, with an AUC of 0.992 (95% CI: 0.980, 1.0) and an ACC of 0.987 (95% CI: 0.968, 1.0). DLM-2 was trained to classify the five most common benign masses (lipomyoma, hemangioma, neurinoma, epidermal cyst, and calcifying epithelioma) with AUCs of 0.986, 0.993, 0.944, 0.973, and 0.903, respectively. In addition, under the condition of the DLM-assisted diagnosis, the radiologists greatly improved their accuracy of differential diagnosis between benign and malignant tumors. CONCLUSIONS: The proposed DLM system has high clinical application value in the differential diagnosis of superficial soft-tissue masses.


Assuntos
Aprendizado Profundo , Neoplasias de Tecidos Moles , Humanos , Estudos Retrospectivos , Diagnóstico Diferencial , Neoplasias de Tecidos Moles/diagnóstico por imagem , Neoplasias de Tecidos Moles/patologia , Ultrassonografia , Sensibilidade e Especificidade
13.
Eur Radiol ; 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37938383

RESUMO

OBJECTIVES: To evaluate the improvement of mammography interpretation for novice and experienced radiologists assisted by two commercial AI software. METHODS: We compared the performance of two AI software (AI-1 and AI-2) in two experienced and two novice readers for 200 mammographic examinations (80 cancer cases). Two reading sessions were conducted within 4 weeks. The readers rated the likelihood of malignancy (range, 1-7) and the percentage probability of malignancy (range, 0-100%), with and without AI assistance. Differences in AUROC, sensitivity, and specificity were analyzed. RESULTS: Mean AUROC increased in both novice (0.86 to 0.90 with AI-1 [p = 0.005]; 0.91 with AI-2 [p < 0.001]) and experienced readers (0.87 to 0.92 with AI-1 [p < 0.001]; 0.90 with AI-2 [p = 0.004]). Sensitivities increased from 81.3 to 88.8% with AI-1 (p = 0.027) and to 91.3% with AI-2 (p = 0.005) in novice readers, and from 81.9 to 90.6% with AI-1 (p = 0.001) and to 87.5% with AI-2 (p = 0.016) in experienced readers. Specificity did not decrease significantly in both novice (p > 0.999, both) and experienced readers (p > 0.999 with AI-1 and 0.282 with AI-2). There was no significant difference in the performance change depending on the type of AI software (p > 0.999). CONCLUSION: Commercial AI software improved the diagnostic performance of both novice and experienced readers. The type of AI software used did not significantly impact performance changes. Further validation with a larger number of cases and readers is needed. CLINICAL RELEVANCE STATEMENT: Commercial AI software effectively aided mammography interpretation irrespective of the experience level of human readers. KEY POINTS: • Mammography interpretation remains challenging and is subject to a wide range of interobserver variability. • In this multi-reader study, two commercial AI software improved the sensitivity of mammography interpretation by both novice and experienced readers. The type of AI software used did not significantly impact performance changes. • Commercial AI software may effectively support mammography interpretation irrespective of the experience level of human readers.

14.
Eur Radiol ; 33(1): 348-359, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35751697

RESUMO

OBJECTIVES: To compare the performance of radiologists in characterizing and diagnosing pulmonary nodules/masses with and without deep learning (DL)-based computer-aided diagnosis (CAD). METHODS: We studied a total of 101 nodules/masses detected on CT performed between January and March 2018 at Osaka University Hospital (malignancy: 55 cases). SYNAPSE SAI Viewer V1.4 was used to analyze the nodules/masses. In total, 15 independent radiologists were grouped (n = 5 each) according to their experience: L (< 3 years), M (3-5 years), and H (> 5 years). The likelihoods of 15 characteristics, such as cavitation and calcification, and the diagnosis (malignancy) were evaluated by each radiologist with and without CAD, and the assessment time was recorded. The AUCs compared with the reference standard set by two board-certified chest radiologists were analyzed following the multi-reader multi-case method. Furthermore, interobserver agreement was compared using intraclass correlation coefficients (ICCs). RESULTS: The AUCs for ill-defined boundary, irregular margin, irregular shape, calcification, pleural contact, and malignancy in all 15 radiologists, irregular margin and irregular shape in L and ill-defined boundary and irregular margin in M improved significantly (p < 0.05); no significant improvements were found in H. L showed the greatest increase in the AUC for malignancy (not significant). The ICCs improved in all groups and for nearly all items. The median assessment time was not prolonged by CAD. CONCLUSIONS: DL-based CAD helps radiologists, particularly those with < 5 years of experience, to accurately characterize and diagnose pulmonary nodules/masses, and improves the reproducibility of findings among radiologists. KEY POINTS: • Deep learning-based computer-aided diagnosis improves the accuracy of characterizing nodules/masses and diagnosing malignancy, particularly by radiologists with < 5 years of experience. • Computer-aided diagnosis increases not only the accuracy but also the reproducibility of the findings across radiologists.


Assuntos
Aprendizado Profundo , Neoplasias Pulmonares , Nódulos Pulmonares Múltiplos , Nódulo Pulmonar Solitário , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Nódulos Pulmonares Múltiplos/diagnóstico por imagem , Radiologistas , Diagnóstico por Computador/métodos , Computadores , Neoplasias Pulmonares/diagnóstico por imagem , Sensibilidade e Especificidade , Nódulo Pulmonar Solitário/diagnóstico por imagem
15.
Eur Radiol ; 33(4): 2665-2675, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36396792

RESUMO

OBJECTIVES: To develop a U-Net-based deep learning model for automated segmentation of craniopharyngioma. METHODS: A total number of 264 patients diagnosed with craniopharyngiomas were included in this research. Pre-treatment MRIs were collected, annotated, and used as ground truth to learn and evaluate the deep learning model. Thirty-eight patients from another institution were used for independently external testing. The proposed segmentation model was constructed based on a U-Net architecture. Dice similarity coefficients (DSCs), Hausdorff distance of 95% percentile (95HD), Jaccard value, true positive rate (TPR), and false positive rate (FPR) of each case were calculated. One-way ANOVA analysis was used to investigate if the model performance was associated with the radiological characteristics of tumors. RESULTS: The proposed model showed a good performance in segmentation with average DSCs of 0.840, Jaccard of 0.734, TPR of 0.820, FPR of 0.000, and 95HD of 3.669 mm. It performed feasibly in the independent external test set, with average DSCs of 0.816, Jaccard of 0.704, TPR of 0.765, FPR of 0.000, and 95HD of 4.201 mm. Also, one-way ANOVA suggested the performance was not statistically associated with radiological characteristics, including predominantly composition (p = 0.370), lobulated shape (p = 0.353), compressed or enclosed ICA (p = 0.809), and cavernous sinus invasion (p = 0.283). CONCLUSIONS: The proposed deep learning model shows promising results for the automated segmentation of craniopharyngioma. KEY POINTS: • The segmentation model based on U-Net showed good performance in segmentation of craniopharyngioma. • The proposed model showed good performance regardless of the radiological characteristics of craniopharyngioma. • The model achieved feasibility in the independent external dataset obtained from another center.


Assuntos
Craniofaringioma , Aprendizado Profundo , Neoplasias Hipofisárias , Humanos , Craniofaringioma/diagnóstico por imagem , Redes Neurais de Computação , Imageamento por Ressonância Magnética/métodos , Neoplasias Hipofisárias/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos
16.
Eur Radiol ; 33(5): 3532-3543, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36725720

RESUMO

OBJECTIVES: Time of flight magnetic resonance angiography (TOF-MRA) is the primary non-invasive screening method for cerebral aneurysms. We aimed to develop a computer-aided aneurysm detection method to improve the diagnostic efficiency and accuracy, especially decrease the false positive rate. METHODS: This is a retrospective multicenter study. The dataset contained 1160 TOF-MRA examinations composed of unruptured aneurysms (n = 1096) and normal controls (n = 166) from six hospitals. A total of 1037 examinations acquired from 2013 to 2019 were used as training set; 123 examinations acquired from 2020 to 2021 were used as external test set. We proposed an equalized augmentation strategy based on aneurysm location and constructed a detection model based on dual channel SE-3D UNet. The model was trained with a 5-fold cross-validation in the training set, then tested on the external test set. RESULTS: The proposed method achieved 82.46% sensitivity on patient-level, 73.85% sensitivity on lesion-level, and 0.88 false positives per case in the external test set. The performance did not show significant differences in subgroups according to the aneurysm site (except ACA), aneurysm size (except smaller than 3 mm), or MRI scanners. The performance preceded the basic SE-3D UNet by increasing 15.79% patient-level sensitivity and decreasing 4.19 FPs/case. CONCLUSIONS: The proposed automated aneurysm detection method achieved acceptable sensitivity while controlling fairly low false positives per case. It might provide a useful auxiliary tool of cerebral aneurysms MRA screening. KEY POINTS: • The need for automated cerebral aneurysms detecting is growing. • The strategy of equalized augmentation based on aneurysm location and dual-channel input could improve the model performance. • The retrospective multi-center study showed that the proposed automated cerebral aneurysms detection using dual-channel SE-3D UNet could achieve acceptable sensitivity while controlling a low false positive rate.


Assuntos
Aneurisma Intracraniano , Humanos , Aneurisma Intracraniano/patologia , Imageamento Tridimensional/métodos , Sensibilidade e Especificidade , Imageamento por Ressonância Magnética , Angiografia por Ressonância Magnética/métodos , Angiografia Cerebral/métodos , Angiografia Digital
17.
Eur Radiol ; 33(6): 4052-4062, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36472694

RESUMO

OBJECTIVES: Preventing the expansion of perihematomal edema (PHE) represents a novel strategy for the improvement of neurological outcomes in intracerebral hemorrhage (ICH) patients. Our goal was to predict early and delayed PHE expansion using a machine learning approach. METHODS: We enrolled 550 patients with spontaneous ICH to study early PHE expansion, and 389 patients to study delayed expansion. Two imaging researchers rated the shape and density of hematoma in non-contrast computed tomography (NCCT). We trained a radiological machine learning (ML) model, a radiomics ML model, and a combined ML model, using data from radiomics, traditional imaging, and clinical indicators. We then validated these models on an independent dataset by using a nested 4-fold cross-validation approach. We compared models with respect to their predictive performance, which was assessed using the receiver operating characteristic curve. RESULTS: For both early and delayed PHE expansion, the combined ML model was most predictive (early/delayed AUC values were 0.840/0.705), followed by the radiomics ML model (0.799/0.663), the radiological ML model (0.779/0.631), and the imaging readers (reader 1: 0.668/0.565, reader 2: 0.700/0.617). CONCLUSION: We validated a machine learning approach with high interpretability for the prediction of early and delayed PHE expansion. This new technique may assist clinical practice for the management of neurocritical patients with ICH. KEY POINTS: • This is the first study to use artificial intelligence technology for the prediction of perihematomal edema expansion. • A combined machine learning model, trained on data from radiomics, clinical indicators, and imaging features associated with hematoma expansion, outperformed all other methods.


Assuntos
Inteligência Artificial , Edema Encefálico , Humanos , Edema Encefálico/diagnóstico por imagem , Edema Encefálico/etiologia , Hemorragia Cerebral/complicações , Hemorragia Cerebral/diagnóstico por imagem , Edema/diagnóstico por imagem , Edema/complicações , Aprendizado de Máquina , Hematoma/complicações , Hematoma/diagnóstico por imagem
18.
Eur Radiol ; 33(8): 5568-5577, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-36894752

RESUMO

OBJECTIVES: To evaluate and compare the measurement accuracy of two different computer-aided diagnosis (CAD) systems regarding artificial pulmonary nodules and assess the clinical impact of volumetric inaccuracies in a phantom study. METHODS: In this phantom study, 59 different phantom arrangements with 326 artificial nodules (178 solid, 148 ground-glass) were scanned at 80 kV, 100 kV, and 120 kV. Four different nodule diameters were used: 5 mm, 8 mm, 10 mm, and 12 mm. Scans were analyzed by a deep-learning (DL)-based CAD and a standard CAD system. Relative volumetric errors (RVE) of each system vs. ground truth and the relative volume difference (RVD) DL-based vs. standard CAD were calculated. The Bland-Altman method was used to define the limits of agreement (LOA). The hypothetical impact on LungRADS classification was assessed for both systems. RESULTS: There was no difference between the three voltage groups regarding nodule volumetry. Regarding the solid nodules, the RVE of the 5-mm-, 8-mm-, 10-mm-, and 12-mm-size groups for the DL CAD/standard CAD were 12.2/2.8%, 1.3/ - 2.8%, - 3.6/1.5%, and - 12.2/ - 0.3%, respectively. The corresponding values for the ground-glass nodules (GGN) were 25.6%/81.0%, 9.0%/28.0%, 7.6/20.6%, and 6.8/21.2%. The mean RVD for solid nodules/GGN was 1.3/ - 15.2%. Regarding the LungRADS classification, 88.5% and 79.8% of all solid nodules were correctly assigned by the DL CAD and the standard CAD, respectively. 14.9% of the nodules were assigned differently between the systems. CONCLUSIONS: Patient management may be affected by the volumetric inaccuracy of the CAD systems and hence demands supervision and/or manual correction by a radiologist. KEY POINTS: • The DL-based CAD system was more accurate in the volumetry of GGN and less accurate regarding solid nodules than the standard CAD system. • Nodule size and attenuation have an effect on the measurement accuracy of both systems; tube voltage has no effect on measurement accuracy. • Measurement inaccuracies of CAD systems can have an impact on patient management, which demands supervision by radiologists.


Assuntos
Neoplasias Pulmonares , Nódulos Pulmonares Múltiplos , Nódulo Pulmonar Solitário , Humanos , Tomografia Computadorizada por Raios X/métodos , Diagnóstico por Computador/métodos , Nódulos Pulmonares Múltiplos/diagnóstico por imagem , Imagens de Fantasmas , Radiologistas , Neoplasias Pulmonares/diagnóstico por imagem , Nódulo Pulmonar Solitário/diagnóstico por imagem , Nódulo Pulmonar Solitário/terapia , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Sensibilidade e Especificidade
19.
Eur Radiol ; 33(7): 4822-4832, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36856842

RESUMO

OBJECTIVES: Diagnosis of flatfoot using a radiograph is subject to intra- and inter-observer variabilities. Here, we developed a cascade convolutional neural network (CNN)-based deep learning model (DLM) for an automated angle measurement for flatfoot diagnosis using landmark detection. METHODS: We used 1200 weight-bearing lateral foot radiographs from young adult Korean males for the model development. An experienced orthopedic surgeon identified 22 radiographic landmarks and measured three angles for flatfoot diagnosis that served as the ground truth (GT). Another orthopedic surgeon (OS) and a general physician (GP) independently identified the landmarks of the test dataset and measured the angles using the same method. External validation was performed using 100 and 17 radiographs acquired from a tertiary referral center and a public database, respectively. RESULTS: The DLM showed smaller absolute average errors from the GT for the three angle measurements for flatfoot diagnosis compared with both human observers. Under the guidance of the DLM, the average errors of observers OS and GP decreased from 2.35° ± 3.01° to 1.55° ± 2.09° and from 1.99° ± 2.76° to 1.56° ± 2.19°, respectively (both p < 0.001). The total measurement time decreased from 195 to 135 min in observer OS and from 205 to 155 min in observer GP. The absolute average errors of the DLM in the external validation sets were similar or superior to those of human observers in the original test dataset. CONCLUSIONS: Our CNN model had significantly better accuracy and reliability than human observers in diagnosing flatfoot, and notably improved the accuracy and reliability of human observers. KEY POINTS: • Development of deep learning model (DLM) that allows automated angle measurements for landmark detection based on 1200 weight-bearing lateral radiographs for diagnosing flatfoot. • Our DLM showed smaller absolute average errors for flatfoot diagnosis compared with two human observers. • Under the guidance of the model, the average errors of two human observers decreased and total measurement time also decreased from 195 to 135 min and from 205 to 155 min.


Assuntos
Pé Chato , Masculino , Adulto Jovem , Humanos , Pé Chato/diagnóstico por imagem , Pé Chato/cirurgia , Reprodutibilidade dos Testes , Radiografia , Redes Neurais de Computação , Suporte de Carga
20.
Skin Res Technol ; 29(2): e13270, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36823506

RESUMO

BACKGROUND: Hyperspectral imaging (HSI) is an emerging modality for the gross pathology of the skin. Spectral signatures of HSI could discriminate malignant from benign tissue. Because of inherent redundancies in HSI and in order to facilitate the use of deep-learning models, dimension reduction is a common preprocessing step. The effects of dimension reduction choice, training scope, and number of retained dimensions have not been evaluated on skin HSI for segmentation tasks. MATERIALS AND METHODS: An in-house dataset of HSI signatures from pigmented skin lesions was prepared and labeled with histology. Eleven different dimension reduction methods were used as preprocessing for tumor margin detection with support vector machines. Cluster-wise principal component analysis (ClusterPCA), a new variant of PCA, was proposed. The scope of application for dimension reduction was also investigated. RESULTS: The components produced by ClusterPCA show good agreement with the expected optical properties of skin chromophores. Random forest importance performed best during classification. However, all methods suffered from low sensitivity and generalization. CONCLUSION: Investigation of more complex reduction and segmentation schemes with emphasis on the nature of HSI and optical properties of the skin is necessary. Insights on dimension reduction for skin tissue could facilitate the development of HSI-based systems for cancer margin detection at gross level.


Assuntos
Algoritmo Florestas Aleatórias , Máquina de Vetores de Suporte , Humanos , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA