RESUMO
The aim was to evaluate the accuracy of a prototypical artificial intelligence-based algorithm for automated segmentation and diameter measurement of the thoracic aorta (TA) using CT. One hundred twenty-two patients who underwent dual-source CT were retrospectively included. Ninety-three of these patients had been administered intravenous iodinated contrast. Images were evaluated using the prototypical algorithm, which segments the TA and determines the corresponding diameters at predefined anatomical locations based on the American Heart Association guidelines. The reference standard was established by two radiologists individually in a blinded, randomized fashion. Equivalency was tested and inter-reader agreement was assessed using intra-class correlation (ICC). In total, 99.2% of the parameters measured by the prototype were assessable. In nine patients, the prototype failed to determine one diameter along the vessel. Measurements along the TA did not differ between the algorithm and readers (p > 0.05), establishing equivalence. Inter-reader agreement between the algorithm and readers (ICC ≥ 0.961; 95% CI: 0.940−0.974), and between the readers was excellent (ICC ≥ 0.879; 95% CI: 0.818−0.92). The evaluated prototypical AI-based algorithm accurately measured TA diameters at each region of interest independent of the use of either contrast utilization or pathology. This indicates that the prototypical algorithm has substantial potential as a valuable tool in the rapid clinical evaluation of aortic pathology.
RESUMO
Purpose: Thoracic aortic (TA) dilatation (TAD) is a risk factor for acute aortic syndrome and must therefore be reported in every CT report. However, the complex anatomy of the thoracic aorta impedes TAD detection. We investigated the performance of a deep learning (DL) prototype as a secondary reading tool built to measure TA diameters in a large-scale cohort. Material and methods: Consecutive contrast-enhanced (CE) and non-CE chest CT exams with "normal" TA diameters according to their radiology reports were included. The DL-prototype (AIRad, Siemens Healthineers, Germany) measured the TA at nine locations according to AHA guidelines. Dilatation was defined as >45 mm at aortic sinus, sinotubular junction (STJ), ascending aorta (AA) and proximal arch and >40 mm from mid arch to abdominal aorta. A cardiovascular radiologist reviewed all cases with TAD according to AIRad. Multivariable logistic regression (MLR) was used to identify factors (demographics and scan parameters) associated with TAD classification by AIRad. Results: 18,243 CT scans (45.7% female) were successfully analyzed by AIRad. Mean age was 62.3 ± 15.9 years and 12,092 (66.3%) were CE scans. AIRad confirmed normal diameters in 17,239 exams (94.5%) and reported TAD in 1,004/18,243 exams (5.5%). Review confirmed TAD classification in 452/1,004 exams (45.0%, 2.5% total), 552 cases were false-positive but identification was easily possible using visual outputs by AIRad. MLR revealed that the following factors were significantly associated with correct TAD classification by AIRad: TAD reported at AA [odds ratio (OR): 1.12, p < 0.001] and STJ (OR: 1.09, p = 0.002), TAD found at >1 location (OR: 1.42, p = 0.008), in CE exams (OR: 2.1-3.1, p < 0.05), men (OR: 2.4, p = 0.003) and patients presenting with higher BMI (OR: 1.05, p = 0.01). Overall, 17,691/18,243 (97.0%) exams were correctly classified. Conclusions: AIRad correctly assessed the presence or absence of TAD in 17,691 exams (97%), including 452 cases with previously missed TAD independent from contrast protocol. These findings suggest its usefulness as a secondary reading tool by improving report quality and efficiency.
RESUMO
PURPOSE: In the literature on automated phenotyping of chronic obstructive pulmonary disease (COPD), there is a multitude of isolated classical machine learning and deep learning techniques, mostly investigating individual phenotypes, with small study cohorts and heterogeneous meta-parameters, e.g., different scan protocols or segmented regions. The objective is to compare the impact of different experimental setups, i.e., varying meta-parameters related to image formation and data representation, with the impact of the learning technique for subtyping automation for a variety of phenotypes. The identified associations of these parameters with automation performance and their interactions might be a first step towards a determination of optimal meta-parameters, i.e., a meta-strategy. METHODS: A clinical cohort of 981 patients (53.8 ± 15.1 years, 554 male) was examined. The inspiratory CT images were analyzed to automate the diagnosis of 13 COPD phenotypes given by two radiologists. A benchmark feature set that integrates many quantitative criteria was extracted from the lung and trained a variety of learning algorithms on the first 654 patients (two thirds) and the respective algorithm retrospectively assessed the remaining 327 patients (one third). The automation performance was evaluated by the area under the receiver operating characteristic curve (AUC). 1717 experiments were conducted with varying meta-parameters such as reconstruction kernel, segmented regions and input dimensionality, i.e., number of extracted features. The association of the meta-parameters with the automation performance was analyzed by multivariable general linear model decomposition of the automation performance in the contributions of meta-parameters and the learning technique. RESULTS: The automation performance varied strongly for varying meta-parameters. For emphysema-predominant phenotypes, an AUC of 93%-95% could be achieved for the best meta-configuration. The airways-predominant phenotypes led to a lower performance of 65%-85%, while smooth kernel configurations on average were unexpectedly superior to those with sharp kernels. The performance impact of meta-parameters, even that of often neglected ones like the missing-data imputation, was in general larger than that of the learning technique. Advanced learning techniques like 3D deep learning or automated machine learning yielded inferior automation performance for non-optimal meta-configurations in comparison to simple techniques with suitable meta-configurations. The best automation performance was achieved by a combination of modern learning techniques and a suitable meta-configuration. CONCLUSIONS: Our results indicate that for COPD phenotype automation, study design parameters such as reconstruction kernel and the model input dimensionality should be adapted to the learning technique and may be more important than the technique itself. To achieve optimal automation and prediction results, the interaction between input those meta-parameters and the learning technique should be considered. This might be particularly relevant for the development of specific scan protocols for novel learning algorithms, and towards an understanding of good study design for automated phenotyping.
Assuntos
Doença Pulmonar Obstrutiva Crônica , Enfisema Pulmonar , Automação , Humanos , Masculino , Doença Pulmonar Obstrutiva Crônica/diagnóstico por imagem , Estudos Retrospectivos , Tomografia Computadorizada por Raios XRESUMO
BACKGROUND: Manually performed diameter measurements on ECG-gated CT-angiography (CTA) represent the gold standard for diagnosis of thoracic aortic dilatation. However, they are time-consuming and show high inter-reader variability. Therefore, we aimed to evaluate the accuracy of measurements of a deep learning-(DL)-algorithm in comparison to those of radiologists and evaluated measurement times (MT). METHODS: We retrospectively analyzed 405 ECG-gated CTA exams of 371 consecutive patients with suspected aortic dilatation between May 2010 and June 2019. The DL-algorithm prototype detected aortic landmarks (deep reinforcement learning) and segmented the lumen of the thoracic aorta (multi-layer convolutional neural network). It performed measurements according to AHA-guidelines and created visual outputs. Manual measurements were performed by radiologists using centerline technique. Human performance variability (HPV), MT and DL-performance were analyzed in a research setting using a linear mixed model based on 21 randomly selected, repeatedly measured cases. DL-algorithm results were then evaluated in a clinical setting using matched differences. If the differences were within 5 mm for all locations, the cases was regarded as coherent; if there was a discrepancy >5 mm at least at one location (incl. missing values), the case was completely reviewed. RESULTS: HPV ranged up to ±3.4 mm in repeated measurements under research conditions. In the clinical setting, 2,778/3,192 (87.0%) of DL-algorithm's measurements were coherent. Mean differences of paired measurements between DL-algorithm and radiologists at aortic sinus and ascending aorta were -0.45±5.52 and -0.02±3.36 mm. Detailed analysis revealed that measurements at the aortic root were over-/underestimated due to a tilted measurement plane. In total, calculated time saved by DL-algorithm was 3:10 minutes/case. CONCLUSIONS: The DL-algorithm provided coherent results to radiologists at almost 90% of measurement locations, while the majority of discrepent cases were located at the aortic root. In summary, the DL-algorithm assisted radiologists in performing AHA-compliant measurements by saving 50% of time per case.
RESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
PURPOSE: The purpose of this study was to validate the accuracy of an artificial intelligence (AI) prototype application in determining bone mineral density (BMD) from chest computed tomography (CT), as compared with dual-energy x-ray absorptiometry (DEXA). MATERIALS AND METHODS: In this Institutional Review Board-approved study, we analyzed the data of 65 patients (57 female, mean age: 67.4 y) who underwent both DEXA and chest CT (mean time between scans: 1.31 y). From the DEXA studies, T-scores for L1-L4 (lumbar vertebrae 1 to 4) were recorded. Patients were then divided on the basis of their T-scores into normal control, osteopenic, or osteoporotic groups. An AI algorithm based on wavelet features, AdaBoost, and local geometry constraints independently localized thoracic vertebrae from chest CT studies and automatically computed average Hounsfield Unit (HU) values with kVp-dependent spectral correction. The Pearson correlation evaluated the correlation between the T-scores and HU values. Mann-Whitney U test was implemented to compare the HU values of normal control versus osteoporotic patients. RESULTS: Overall, the DEXA-determined T-scores and AI-derived HU values showed a moderate correlation (r=0.55; P<0.001). This 65-patient population was divided into 3 subgroups on the basis of their T-scores. The mean T-scores for the 3 subgroups (normal control, osteopenic, osteoporotic) were 0.77±1.50, -1.51±0.04, and -3.26±0.59, respectively. The mean DEXA-determined L1-L4 BMD measures were 1.13±0.16, 0.88±0.06, and 0.68±0.06 g/cm, respectively. The mean AI-derived attenuation values were 145±42.5, 136±31.82, and 103±16.28 HU, respectively. Using these AI-derived HU values, a significant difference was found between the normal control patients and osteoporotic group (P=0.045). CONCLUSION: Our results show that this AI prototype can successfully determine BMD in moderate correlation with DEXA. Combined with other AI algorithms directed at evaluating cardiac and lung diseases, this prototype may contribute to future comprehensive preventative care based on a single chest CT.
Assuntos
Inteligência Artificial , Densidade Óssea , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Tomografia Computadorizada por Raios X/métodos , Absorciometria de Fóton , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Vértebras Lombares/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
The goal of radiomics is to convert medical images into a minable data space by extraction of quantitative imaging features for clinically relevant analyses, e.g. survival time prediction of a patient. One problem of radiomics from computed tomography is the impact of technical variation such as reconstruction kernel variation within a study. Additionally, what is often neglected is the impact of inter-patient technical variation, resulting from patient characteristics, even when scan and reconstruction parameters are constant. In our approach, measurements within 3D regions-of-interests (ROI) are calibrated by further ROIs such as air, adipose tissue, liver, etc. that are used as control regions (CR). Our goal is to derive general rules for an automated internal calibration that enhance prediction, based on the analysed features and a set of CRs. We define qualification criteria motivated by status-quo radiomics stability analysis techniques to only collect information from the CRs which is relevant given a respective task. These criteria are used in an optimisation to automatically derive a suitable internal calibration for prediction tasks based on the CRs. Our calibration enhanced the performance for centrilobular emphysema prediction in a COPD study and prediction of patients' one-year-survival in an oncological study.
Assuntos
Biomarcadores , Calibragem , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Tomografia Computadorizada por Raios X/métodos , Idoso , Enfisema/mortalidade , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Doença Pulmonar Obstrutiva Crônica/diagnóstico por imagem , Doença Pulmonar Obstrutiva Crônica/mortalidade , Taxa de SobrevidaRESUMO
RATIONALE AND OBJECTIVES: This study aimed to evaluate the potential role of computed tomography texture analysis (CTTA) of arterial and portal-venous enhancement phase image data for prediction and accurate assessment of response of hepatocellular carcinoma undergoing drug-eluting bead transarterial chemoembolization (TACE) by comparison to liver perfusion CT (PCT). MATERIALS AND METHODS: Twenty-eight patients (27 male; mean age 67.2 ± 10.4) with 56 hepatocellular carcinoma-typical liver lesions were included. Arterial and portal-venous phase CT data obtained before and after TACE with a mean time of 39.93 ± 62.21 days between examinations were analyzed. TACE was performed within 48 hours after first contrast-enhanced CT. CTTA software was a prototype. CTTA analysis was performed blinded (for results) by two observers separately. Combined results of modified Response Evaluation Criteria In Solid Tumors (mRECIST) and PCT of the liver were used as the standard of reference. Time to progression was additionally assessed for all patients. CTTA parameters included heterogeneity, intensity, average, deviation, skewness, and entropy of co-occurrence. Each parameter was compared to those of PCT (blood flow [BF], blood volume, arterial liver perfusion [ALP], portal-venous perfusion, and hepatic perfusion index) measured before and after TACE. RESULTS: mRECIST + PCT yielded 28.6% complete response (CR), 42.8% partial response, and 28.6% stable disease. Significant correlations were registered in the arterial phase in CR between changes in mean heterogeneity and BF (P = .004, r = -0.815), blood volume (P = .002, r = -0.851), and ALP (P = .002, r = -0.851), respectively. In the partial response group, changes in mean heterogeneity correlated with changes in ALP (P = .003) and to a lesser degree with hepatic perfusion index (P = .027) in the arterial phase. In the stable disease group, BF correlated with entropy of nonuniformity (P = .010). In the portal-venous phase, no statistically significant correlations were registered in all groups. Receiver operating characteristic analysis of CTTA parameters yielded predictive cutoff values for CR in the arterial contrast-enhanced CT phase for uniformity of skewness (sensitivity: 90.0%; specificity: 45.8%), and in the portal-venous phase for uniformity of heterogeneity (sensitivity: 92.3%; specificity: 81.8%). CONCLUSIONS: Significant correlations exist between CTTA parameters and those derived from PCT both in the pre- and the post-TACE settings, and some of them have predictive value for TACE midterm outcome.