RESUMO
OBJECTIVES: To investigate the intra- and inter-rater reliability of the total radiomics quality score (RQS) and the reproducibility of individual RQS items' score in a large multireader study. METHODS: Nine raters with different backgrounds were randomly assigned to three groups based on their proficiency with RQS utilization: Groups 1 and 2 represented the inter-rater reliability groups with or without prior training in RQS, respectively; group 3 represented the intra-rater reliability group. Thirty-three original research papers on radiomics were evaluated by raters of groups 1 and 2. Of the 33 papers, 17 were evaluated twice with an interval of 1 month by raters of group 3. Intraclass coefficient (ICC) for continuous variables, and Fleiss' and Cohen's kappa (k) statistics for categorical variables were used. RESULTS: The inter-rater reliability was poor to moderate for total RQS (ICC 0.30-055, p < 0.001) and very low to good for item's reproducibility (k - 0.12 to 0.75) within groups 1 and 2 for both inexperienced and experienced raters. The intra-rater reliability for total RQS was moderate for the less experienced rater (ICC 0.522, p = 0.009), whereas experienced raters showed excellent intra-rater reliability (ICC 0.91-0.99, p < 0.001) between the first and second read. Intra-rater reliability on RQS items' score reproducibility was higher and most of the items had moderate to good intra-rater reliability (k - 0.40 to 1). CONCLUSIONS: Reproducibility of the total RQS and the score of individual RQS items is low. There is a need for a robust and reproducible assessment method to assess the quality of radiomics research. CLINICAL RELEVANCE STATEMENT: There is a need for reproducible scoring systems to improve quality of radiomics research and consecutively close the translational gap between research and clinical implementation. KEY POINTS: ⢠Radiomics quality score has been widely used for the evaluation of radiomics studies. ⢠Although the intra-rater reliability was moderate to excellent, intra- and inter-rater reliability of total score and point-by-point scores were low with radiomics quality score. ⢠A robust, easy-to-use scoring system is needed for the evaluation of radiomics research.
Assuntos
Radiômica , Leitura , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos TestesRESUMO
PURPOSE: For patients with vestibular schwannomas (VS), a conservative observational approach is increasingly used. Therefore, the need for accurate and reliable volumetric tumor monitoring is important. Currently, a volumetric cutoff of 20% increase in tumor volume is widely used to define tumor growth in VS. The study investigates the tumor volume dependency on the limits of agreement (LoA) for volumetric measurements of VS by means of an inter-observer study. METHODS: This retrospective study included 100 VS patients who underwent contrast-enhanced T1-weighted MRI. Five observers volumetrically annotated the images. Observer agreement and reliability was measured using the LoA, estimated using the limits of agreement with the mean (LOAM) method, and the intraclass correlation coefficient (ICC). RESULTS: The 100 patients had a median average tumor volume of 903 mm3 (IQR: 193-3101). Patients were divided into four volumetric size categories based on tumor volume quartile. The smallest tumor volume quartile showed a LOAM relative to the mean of 26.8% (95% CI: 23.7-33.6), whereas for the largest tumor volume quartile this figure was found to be 7.3% (95% CI: 6.5-9.7) and when excluding peritumoral cysts: 4.8% (95% CI: 4.2-6.2). CONCLUSION: Agreement limits within volumetric annotation of VS are affected by tumor volume, since the LoA improves with increasing tumor volume. As a result, for tumors larger than 200 mm3, growth can reliably be detected at an earlier stage, compared to the currently widely used cutoff of 20%. However, for very small tumors, growth should be assessed with higher agreement limits than previously thought.
RESUMO
PURPOSE: This study aimed to demonstrate the potential clinical applicability of an organ-contour-driven auto-matching algorithm in image-guided radiotherapy. METHODS: This study included eleven consecutive patients with cervical cancer who underwent radiotherapy in 23 or 25 fractions. Daily and reference magnetic resonance images were converted into mesh models. A weight-based algorithm was implemented to optimize the distance between the mesh model vertices and surface of the reference model during the positioning process. Within the cost function, weight parameters were employed to prioritize specific organs for positioning. In this study, three scenarios with different weight parameters were prepared. The optimal translation and rotation values for the cervix and uterus were determined based on the calculated translations alone or in combination with rotations, with a rotation limit of ±3°. Subsequently, the coverage probabilities of the following two planning target volumes (PTV), an isotropic 5 mm and anisotropic margins derived from a previous study, were evaluated. RESULTS: The percentage of translations exceeding 10 mm varied from 9% to 18% depending on the scenario. For small PTV sizes, more than 80% of all fractions had a coverage of 80% or higher. In contrast, for large PTV sizes, more than 90% of all fractions had a coverage of 95% or higher. The difference between the median coverage with translational positioning alone and that with both translational and rotational positioning was 1% or less. CONCLUSION: This algorithm facilitates quantitative positioning by utilizing a cost function that prioritizes organs for positioning. Consequently, consistent displacement values were algorithmically generated. This study also revealed that the impact of rotational corrections, limited to ±3°, on PTV coverage was minimal.
Assuntos
Radioterapia Guiada por Imagem , Radioterapia de Intensidade Modulada , Feminino , Humanos , Radioterapia Guiada por Imagem/métodos , Dosagem Radioterapêutica , Planejamento da Radioterapia Assistida por Computador/métodos , Radioterapia de Intensidade Modulada/métodos , AlgoritmosRESUMO
BACKGROUND: Accurate breast density evaluation allows for more precise risk estimation but suffers from high inter-observer variability. PURPOSE: To evaluate the feasibility of reducing inter-observer variability of breast density assessment through artificial intelligence (AI) assisted interpretation. STUDY TYPE: Retrospective. POPULATION: Six hundred and twenty-one patients without breast prosthesis or reconstructions were randomly divided into training (N = 377), validation (N = 98), and independent test (N = 146) datasets. FIELD STRENGTH/SEQUENCE: 1.5 T and 3.0 T; T1-weighted spectral attenuated inversion recovery. ASSESSMENT: Five radiologists independently assessed each scan in the independent test set to establish the inter-observer variability baseline and to reach a reference standard. Deep learning and three radiomics models were developed for three classification tasks: (i) four Breast Imaging-Reporting and Data System (BI-RADS) breast composition categories (A-D), (ii) dense (categories C, D) vs. non-dense (categories A, B), and (iii) extremely dense (category D) vs. moderately dense (categories A-C). The models were tested against the reference standard on the independent test set. AI-assisted interpretation was performed by majority voting between the models and each radiologist's assessment. STATISTICAL TESTS: Inter-observer variability was assessed using linear-weighted kappa (κ) statistics. Kappa statistics, accuracy, and area under the receiver operating characteristic curve (AUC) were used to assess models against reference standard. RESULTS: In the independent test set, five readers showed an overall substantial agreement on tasks (i) and (ii), but moderate agreement for task (iii). The best-performing model showed substantial agreement with reference standard for tasks (i) and (ii), but moderate agreement for task (iii). With the assistance of the AI models, almost perfect inter-observer variability was obtained for tasks (i) (mean κ = 0.86), (ii) (mean κ = 0.94), and (iii) (mean κ = 0.94). DATA CONCLUSION: Deep learning and radiomics models have the potential to help reduce inter-observer variability of breast density assessment. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY: Stage 1.
RESUMO
BACKGROUND: Reliable preoperative staging of rectal cancers is crucial for treatment decision making. PURPOSE: To assess the intra- and inter-observer agreement of rectal cancer staging, including the sub-categories, with magnetic resonance imaging (MRI). MATERIAL AND METHODS: The study includes 85 patients (35.3% women; mean age = 62.2 ± 11.2 years) who underwent MRI for rectal cancer staging between August 2020 and April 2021. All the stored images were evaluated independently by two radiologists with 10-15 years of experience. For intra-observer agreement, the evaluations were done two months apart. Analyses were made using kappa, prevalence and bias-adjusted kappa (PABAK), and intraclass correlation coefficient (ICC), where appropriate. RESULTS: There was a substantial inter-observer agreement for tumor localization (kappa = 0.665, PABAK = 0.682), mesorectal fascia invasion (kappa = 0.663, PABAK = 0.822), internal and external sphincter involvement (kappa 0.804 and 0.751, PABAK 0.859 and 0.929, respectively), and moderate to substantial agreement for M-staging (kappa = 0.451, PABAK = 0.742) and extramural vascular invasion (kappa = 0.569, PABAK = 0.741). There was also a good inter-observer agreement for T staging and N staging (ICC = 0.862, 95% confidence interval [CI] = 0.788-0.911; and ICC = 0.841, 95% CI = 0.595-0.922, respectively). As expected, intra-observer agreement was better than inter-observer agreement. CONCLUSION: Intra- and inter-observer agreement for MRI staging of rectal cancers using the structured reporting template is good.
Assuntos
Neoplasias Retais , Humanos , Feminino , Pessoa de Meia-Idade , Idoso , Masculino , Estadiamento de Neoplasias , Variações Dependentes do Observador , Neoplasias Retais/diagnóstico por imagem , Neoplasias Retais/patologia , Fáscia/patologia , Imageamento por Ressonância Magnética/métodos , Reprodutibilidade dos TestesRESUMO
INTRODUCTION: In the Library-of-Plans (LoP) approach, correct plan selection is essential for delivering radiotherapy treatment accurately. However, poor image quality of the cone-beam computed tomography (CBCT) may introduce inter-observer variability and thereby hamper accurate plan selection. In this study, we investigated whether new techniques to improve the CBCT image quality and improve consistency in plan selection, affects the accuracy of LoP selection in cervical cancer patients. MATERIALS AND METHODS: CBCT images of 12 patients were used to investigate the inter-observer variability of plan selection based on different CBCT image types. Six observers were asked to individually select a plan based on clinical X-ray Volumetric Imaging (XVI) CBCT, iterative reconstructed CBCT (iCBCT) and synthetic CTs (sCT). Selections were performed before and after a consensus meeting with the entire group, in which guidelines were created. A scoring by all observers on the image quality and plan selection procedure was also included. For plan selection, Fleiss' kappa (κ) statistical test was used to determine the inter-observer variability within one image type. RESULTS: The agreement between observers was significantly higher on sCT compared to CBCT. The consensus meeting improved the duration and inter-observer variability. In this manuscript, the guidelines attributed the overall results in the plan selection. Before the meeting, the gold standard was selected in 76% of the cases on XVI CBCT, 74% on iCBCT, and 76% on sCT. After the meeting, the gold standard was selected in 83% of the cases on XVI CBCT, 81% on iCBCT, and 90% on sCT. CONCLUSION: The use of sCTs can increase the agreement of plan selection among observers and the gold standard was indicated to be selected more often. It is important that clear guidelines for plan selection are implemented in order to benefit from the increased image quality, accurate selection, and decrease inter-observer variability.
Assuntos
Tomografia Computadorizada de Feixe Cônico Espiral , Neoplasias do Colo do Útero , Feminino , Humanos , Neoplasias do Colo do Útero/diagnóstico por imagem , Neoplasias do Colo do Útero/radioterapia , Variações Dependentes do Observador , Planejamento da Radioterapia Assistida por Computador/métodos , Tomografia Computadorizada de Feixe Cônico/métodosRESUMO
BACKGROUND: Ultrasonographic measurements of the diameter of the sheath of the optic nerve can be used to assess intracranial pressure indirectly. These measurements come with measurement error. OBJECTIVE: Our aim was to estimate observer's measurement error as a determinant of ultrasonographic measurement variability of the optic nerve sheath diameter. METHODS: A systematic search of the literature was conducted in Embase, Medline, Web of Science, the Cochrane Central Register of Trials, and the first 200 articles of Google Scholar up to April 19, 2021. Inclusion criteria were the following: healthy adults, B-mode ultrasonography, and measurements 3 mm behind the retina. Studies were excluded if standard error of measurement could not be calculated. Nine studies featuring 389 participants (median 40; range 15-100) and 22 observers (median 2; range 1-4) were included. Standard error of measurement and minimal detectable differences were calculated to quantify observer variability. Quality and risk of bias were assessed with the Guidelines for Reporting Reliability and Agreement Studies. RESULTS: The standard error of measurement of the intra- and interobserver variability had a range of 0.10-0.41 mm and 0.14-0.42 mm, respectively. Minimal detectable difference of a single observer was 0.28-1.1 mm. Minimal detectable difference of multiple observers (range 2-4) was 0.40-1.1 mm. Quality assessment showed room for methodological improvement of included studies. CONCLUSIONS: The standard errors of measurement and minimal detectable differences of ultrasonographic measurements of the optic nerve sheath diameter found in this review with healthy participants indicate caution should be urged when interpreting results acquired with this measurement method in clinical context.
Assuntos
Pressão Intracraniana , Nervo Óptico , Adulto , Humanos , Variações Dependentes do Observador , Nervo Óptico/diagnóstico por imagem , Reprodutibilidade dos Testes , Ultrassonografia/métodosRESUMO
BACKGROUND: Quantification of myocardial blood flow (MBF) and myocardial flow reserve (MFR) has shown diagnostic and prognostic values for the assessment of coronary artery disease (CAD). This study aimed to evaluate in patients a highly automatic Yale-MQ (myocardial blood flow quantification) software incorporated with a novel image segmentation approach for quantification of global and regional MBF and MFR from dynamic 82Rb cardiac positron emission tomography (PET). METHODS: Global and regional MBFs and MFRs were quantified in 80 patients (18 normal and 62 CAD subjects) by two different observers using the Yale-MQ software. Lower limits of normal (LLN) values and intra- and inter-observer variabilities of MBFs and MFRs were calculated for the assessment of quantitative precision. The Yale-MQ was compared with a commercially available software (Corridor 4DM) being used as a reference. RESULTS: The Yale-MQ method provided precise assessments of LLNs of MBF and MFR. The global and regional MBFs and MFR quantified via Yale-MQ were correlated strongly with those via Corridor4DM (R ≥ 0.867). The intra- and inter-observer variabilities of MBFs and MFRs quantified via Yale-MQ were small (≤ 7.7% for MBFs and ≤ 10.0% for MFRs) with excellent correlations (R ≥ 0.980 for MBFs and R ≥ 0.976 for MFRs). CONCLUSIONS: The new Yale-MQ software associated with the automatic processing scheme provides a highly reproducible clinical tool for precise quantification of MBF and MFR in patients with reliable LLN values.
Assuntos
Doença da Artéria Coronariana/diagnóstico por imagem , Doença da Artéria Coronariana/fisiopatologia , Reserva Fracionada de Fluxo Miocárdico/fisiologia , Processamento de Imagem Assistida por Computador , Imagem de Perfusão do Miocárdio , Tomografia por Emissão de Pósitrons , Adulto , Idoso , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Reprodutibilidade dos Testes , SoftwareRESUMO
BACKGROUND: Inter-observer variations (IOVs) arising during contouring can potentially impact plan quality and patient outcomes. Regular assessment of contouring IOV is not commonly performed in clinical practice due to the large time commitment required of clinicians from conventional methods. This work uses retrospective information from past treatment plans to facilitate a time-efficient, evidence-based intervention to reduce contouring IOV. METHODS: The contours of 492 prostate cancer treatment plans created by four radiation oncologists were analyzed in this study. Structure volumes, lengths, and DVHs were extracted from the treatment planning system and stratified based on primary oncologist and inclusion of a pelvic lymph node (PLN) target. Inter-observer variations and their dosimetric consequences were assessed using Student's t-tests. Results of this analysis were presented at an intervention meeting, where new consensus contour definitions were agreed upon. The impact of the intervention was assessed one-year later by repeating the analysis on 152 new plans. RESULTS: Significant IOV in prostate and PLN target delineation existed pre-intervention between oncologists, impacting dose to nearby OARs. IOV was also present for rectum and penile-bulb structures. Post-intervention, IOV decreased for all previously discordant structures. Dosimetric variations were also reduced. Although target contouring concordance increased significantly, some variations still persisted for PLN structures, highlighting remaining areas for improvement. CONCLUSION: We detected significant contouring IOV in routine practice using easily accessible retrospective data and successfully decreased IOV in our clinic through a reflective intervention. Continued application of this approach may aid improvements in practice standardization and enhance quality of care.
Assuntos
Neoplasias da Próstata , Medicina Baseada em Evidências , Humanos , Masculino , Variações Dependentes do Observador , Neoplasias da Próstata/radioterapia , Planejamento da Radioterapia Assistida por Computador , Estudos RetrospectivosRESUMO
BACKGROUND: Endoscopic ultrasound routinely guides lymph node evaluation for the staging of a known or suspected lung cancer. Characteristics seen on B-mode imaging might help the observer decide on the lymph nodes of risk. The influence of nodal size on the predictivity of these characteristics and the agreement with which operators can combine these for malignancy risk prediction is to be determined. OBJECTIVES: We evaluated (1) if prospectively scored individual B-mode ultrasound features predict malignancy when further divided by size and (2) assessed if observers were able to reproducibly agree on still lymph node image malignancy risk. METHODS: Lymph nodes as visualized by EBUS were prospectively scored for B-mode characteristics. Still B-mode images were furthermore collected. After collection, a repeated scoring of a subset of lymph nodes was retrospectively performed (n = 11 observers). RESULTS: Analysis of 490 lymph nodes revealed the short axis size is an objective measure for stratifying risk of malignancy (ROC area under the curve 0.78). With ≥8-mm size, 210/237 malignant lymph nodes were correctly identified (89% sensitivity, 46% specificity, 61% PPV, and 81% NPV). Secondary addition of B-mode features in <8-mm nodes had limited value. Retrospective analysis of intra- and interobserver scoring furthermore revealed significant disagreement. CONCLUSIONS: Lymph nodes of ≥8-mm size and preferably even smaller should be aspirated regardless of other B-mode features. Observer disagreement in scoring both small and large lymph nodes suggests it is infeasible to include subjective features for stratification. Future research should focus on (integrating) other (semi)quantitative values for improving prediction.
Assuntos
Neoplasias Pulmonares , Linfoma , Broncoscopia/métodos , Aspiração por Agulha Fina Guiada por Ultrassom Endoscópico/métodos , Endossonografia , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Linfonodos/diagnóstico por imagem , Linfonodos/patologia , Metástase Linfática/diagnóstico por imagem , Linfoma/patologia , Mediastino , Estadiamento de Neoplasias , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
PURPOSE: To compare size and morphologic features of three-dimensional aneurysm models, obtained with a semi-automated segmentation software (Stroke VCAR, GE, USA) from cerebral CT angiography (CTA) data, to three-dimensional aneurysm models obtained with digital subtraction angiography (DSA, with 3D rotational angiography acquisition-3DRA), considered as the reference standard. METHODS: In this retrospective study, we reviewed 132 patients, with a total number of 137 intracranial aneurysm, who underwent CTA and subsequent DSA examination, supplemented with 3DRA. We compared neck length, short axis and long axis measured on 3DRA model to the same variables measured on 3D-CTA model by two blinded readers and to the automatic software dimensions. Therefore, statistics analysis assessed intra-observer and inter-observer variability and differences between patients with or without subarachnoid hemorrhage (SAH). RESULTS: There were no significant differences in short-axis and long-axis measurements between 3D angiographic and 3D-CTA models, while comparison of neck lengths revealed a statistically significant difference, which tended to be greater for smaller neck lengths (partial volume effect and "kissing vessels" artifact). There were significant differences between manual and automatic data measured for the same three variables, and the presence of SAH did not affect aneurysm 3D reconstruction. Inter-observer agreement resulted moderate for neck length and substantial for short axis and long axis. CONCLUSION: The examined 3D-CTA segmentation system is a reproducible procedure for aneurysm morphologic characterization and, in particular, for assessment of aneurysm sac dimensions, but considerable carefulness is required in neck length interpretation.
Assuntos
Angiografia Digital/métodos , Angiografia por Tomografia Computadorizada/métodos , Aneurisma Intracraniano/diagnóstico por imagem , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Angiografia Cerebral/métodos , Feminino , Humanos , Aneurisma Intracraniano/patologia , Masculino , Pessoa de Meia-Idade , Tomografia Computadorizada Multidetectores/métodos , Variações Dependentes do Observador , Padrões de Referência , Reprodutibilidade dos Testes , Estudos Retrospectivos , Estatísticas não Paramétricas , Hemorragia Subaracnóidea/diagnóstico por imagem , Adulto JovemRESUMO
INTRODUCTION: By the end 2019 there was an outbreak of pneumonia caused by a new coronavirus, a disease that was called coronavirus disease 2019 (COVID-19). Computed tomography (CT) has played an important role in the diagnosis of COVID-19 patients. OBJECTIVE: To demonstrate inter-observer variability with five scales proposed for measuring the extent of COVID-19 pneumonia on tomography. METHODS: Thirty five initial chest CT scans of patients who attended respiratory triage for suspected COVID-19 pneumonia were analyzed. Three radiologists classified the tomographic images according to the severity scales proposed by Yang (1), Yuan (2), Chun (3), Wang (4) and Instituto Nacional de Enfermedades Respiratorias-Chung-Pan (5). The percentage of agreement between the evaluators for each scale was calculated using the intra-class correlation index. RESULTS: In most patients were five pulmonary lobes compromised (77.1% of the patients). Scales 1, 2, 4 and 5 showed an intra-class correlation > 0.91 (p < 0.0001), with agreement thus being almost perfect. CONCLUSIONS: Scale 4 (proposed by Wang) showed the best inter-observer agreement, with a coefficient of 0.964 (p = 0.001).
INTRODUCCIÓN: A finales de 2019 se presentó un brote de neumonía causada por un nuevo coronavirus, enfermedad a la que se denominó COVID-19. La tomografía computarizada ha desempeñado un papel importante en el diagnóstico de los pacientes con COVID-19. OBJETIVO: Demostrar la variabilidad interobservador con cinco escalas propuestas para la medición de la extensión de la neumonía ocasionada por COVID-19 mediante tomografía. MÉTODOS: Se analizaron 35 tomografías de tórax iniciales de pacientes que asistieron al triaje respiratorio por sospecha de neumonía por COVID-19. Tres radiólogos realizaron la clasificación de las imágenes tomográficas de acuerdo con las escalas de severidad propuestas por Yang (1), Yuan (2), Chun (3), Wang (4) e INER-Chung-Pan (5). Se calculó el porcentaje de concordancia entre los evaluadores para cada escala con el índice de correlación intraclase. RESULTADOS: La mayoría de los pacientes presentó afección de cinco lóbulos pulmonares (77.1 % de los pacientes). Las escalas 1, 2, 4 y 5 mostraron una correlación intraclase > 0.91, con p < 0.0001, por lo que la concordancia fue casi perfecta. CONCLUSIONES: La escala 4 (de Wang) mostró la mejor concordancia interobservador, con un coeficiente de 0.964 (p = 0.001).
Assuntos
COVID-19 , Pneumonia , Humanos , Variações Dependentes do Observador , Pneumonia/diagnóstico por imagem , Pneumonia/epidemiologia , Estudos Retrospectivos , SARS-CoV-2 , Tomografia Computadorizada por Raios XRESUMO
[Purpose] To clarify the inter-rater reliability of the evaluation criteria for paraspinal muscle fat infiltration on magnetic resonance images between two examiners with different professional roles in interdisciplinary physical therapy teams. [Participants and Methods] In this retrospective study, we reviewed the clinical data of 225 patients with degenerative lumbar diseases who underwent posterior lumbar surgery at our hospital. A physical therapist and a spinal surgeon visually quantified fat infiltration of the multifidus muscles at the level of L4/5 on the preoperative magnetic resonance images of the patients using Kjaer's criteria (Grade 0: 0-10%, Grade 1: 10-50%, and Grade 2: >50%). We used the kappa coefficient to assess inter-rater reliability. [Results] The participants included 142 males and 83 females (mean age, 64.7â years; range, 21-89â years). The number of patients with grades 0/1/2 were 50/160/15, respectively, for examiner 1; and 59/155/11, respectively, for examiner 2. The kappa coefficient was 0.69, indicating a substantial agreement. [Conclusion] Our study, which is the first to assess the inter-rater reliability of Kjaer's criteria between examiners with different medical occupations, revealed that these criteria could be a reliable tool for evaluating fat infiltration in the multifidus muscles and sharing information between interdisciplinary physical therapy teams.
RESUMO
PURPOSE: In selective internal radiation therapy (SIRT), an accurate total liver segmentation is required for activity prescription and absorbed dose calculation. Our goal was to investigate the feasibility of using automatic liver segmentation based on a convolutional neural network (CNN) for CT imaging in SIRT, and the ability of CNN to reduce inter-observer variability of the segmentation. METHODS: A multi-scale CNN was modified for liver segmentation for SIRT patients. The CNN model was trained with 139 datasets from three liver segmentation challenges and 12 SIRT patient datasets from our hospital. Validation was performed on 13 SIRT datasets and 12 challenge datasets. The model was tested on 40 SIRT datasets. One expert manually delineated the livers and adjusted the liver segmentations from CNN for 40 test SIRT datasets. Another expert performed the same tasks for 20 datasets randomly selected from the 40 SIRT datasets. The CNN segmentations were compared with the manual and adjusted segmentations from the experts. The difference between the manual segmentations was compared with the difference between the adjusted segmentations to investigate the inter-observer variability. Segmentation difference was evaluated through dice similarity coefficient (DSC), volume ratio (RV), mean surface distance (MSD), and Hausdorff distance (HD). RESULTS: The CNN segmentation achieved a median DSC of 0.94 with the manual segmentation and of 0.98 with the manually corrected CNN segmentation, respectively. The DSC between the adjusted segmentations is 0.98, which is 0.04 higher than the DSC between the manual segmentations. CONCLUSION: The CNN model achieved good liver segmentations on CT images of good image quality, with relatively normal liver shapes and low tumor burden. 87.5% of the 40 CNN segmentations only needed slight adjustments for clinical use. However, the trained model failed on SIRT data with low dose or contrast, lesions with large density difference from their surroundings, and abnormal liver position and shape. The abovementioned scenarios were not adequately represented in the training data. Despite this limitation, the current CNN is already a useful clinical tool which improves inter-observer agreement and therefore contributes to the standardization of the dosimetry. A further improvement is expected when the CNN will be trained with more data from SIRT patients.
Assuntos
Aprendizado Profundo , Humanos , Processamento de Imagem Assistida por Computador , Fígado/diagnóstico por imagem , Redes Neurais de Computação , Variações Dependentes do Observador , Carga TumoralRESUMO
OBJECTIVES: To evaluate the agreement among readers with different expertise in detecting suspicious lesions at prostate multiparametric MRI using Prostate Imaging Reporting and Data System (PI-RADS) version 2.1. METHODS: We evaluated 200 consecutive biopsy-naïve or previously negative biopsy men who underwent MRI for clinically suspected prostate cancer (PCa) between May and September 2017. Of them, 132 patients underwent prostate biopsy. Seven radiologists (four dedicated uro-radiologists and three non-dedicated abdominal radiologists) reviewed and scored all MRI examinations according to PI-RADS v2.1. Agreement on index lesion detection was evaluated with Conger's k coefficient, agreement coefficient 1 (AC1), percentage of agreement (PA), and indexes of specific positive and negative agreement. Clinical and radiological features that may influence variability were evaluated. RESULTS: Agreement in index lesion detection among all readers was substantial (AC1 0.738; 95% CI 0.695-0.782); dedicated radiologists showed higher agreement compared with non-dedicated readers. Clinical and radiological parameters that positively influenced agreement were PSA density ≥ 0.15 ng/mL/cc, pre-MRI high risk for PCa, positivity threshold of PI-RADS score 4 + 5, PZ lesions, homogeneous signal intensity of the PZ, and subjectively easy interpretation of MRI. Positive specific agreement was significantly higher among dedicated readers, up to 93.4% (95% CI 90.7-95.4) in patients harboring csPCa. Agreement on absence of lesions was excellent for both dedicated and non-dedicated readers (respectively 85.1% [95% CI 78.4-92.3] and 82.0% [95% CI 77.2-90.1]). CONCLUSIONS: Agreement on index lesion detection among radiologists of various experiences is substantial to excellent using PI-RADS v2.1. Concordance on absence of lesions is excellent across readers' experience. KEY POINTS: ⢠Agreement on index lesion detection among radiologists of various experiences is substantial to excellent using PI-RADS v2.1. ⢠Concordance between experienced readers is higher than between less-experienced readers. ⢠Concordance on absence of lesions is excellent across readers' experience.
Assuntos
Imageamento por Ressonância Magnética Multiparamétrica/métodos , Neoplasias da Próstata/diagnóstico por imagem , Radiologistas , Idoso , Biópsia , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Neoplasias da Próstata/patologia , Radiologia , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
Pelvis size plays an important role to prevent dystocia in cattle caused by the foeto-maternal disproportion in commonly primiparous females. The reproducibility and repeatability are two important aspects for the reliability of the measurements to use in the selection of cattle for culling. Pelvic measures were taken with a Rice pelvimeter from 224 young cattle (180 females and 44 males) of four beef breeds in South Africa. One experienced and two inexperienced observers each measured pelvic height and width twice. The proportion measurements with a maximum difference of 0.5 cm within animal compared with the first measurement by the experienced observer are around 80% and by the inexperienced observers around 50% for pelvic height and around 60% for pelvic width. Breed and sex do not affect the reliability of pelvimetry by an experienced observer. Under- and overestimation of pelvis size were observed in inexperienced observers, which seems to be unrelated to breed and sex.
Assuntos
Variações Dependentes do Observador , Pelvimetria/veterinária , Animais , Bovinos , Estudos Transversais , Feminino , Humanos , Masculino , Pelvimetria/métodos , Pelve/anatomia & histologia , Reprodutibilidade dos Testes , Especificidade da EspécieRESUMO
OBJECTIVES: To evaluate the intra- and inter-rater agreement for myometrial lesions using Morphologic Uterus Sonographic Assessment terminology. METHODS: Thirteen raters with high (n = 6) or medium experience (n = 7) assessed 30 3-dimensional ultrasound clips with (n = 20) and without (n = 10) benign myometrial lesions. Myometrial lesions were reported as poorly or well defined and then systematically evaluated for the presence of individual features. The clips were blindly assessed twice (at a 2-month interval). Intra- and inter-rater agreements were calculated with κ statistics. RESULTS: The reporting of poorly defined lesions reached moderate intra-rater agreement (κ = 0.49 [high experience] and 0.47 [medium experience]) and poor inter-rater agreement (κ = 0.39 [high experience] and 0.25 [medium experience]). The reporting of well-defined lesions reached good to very good intra-rater agreement (κ = 0.73 [high experience] and 0.82 [medium experience]) and good inter-rater agreement (κ = 0.75 [high experience] and 0.63 [medium experience]). Most individual features associated with ill-defined lesions reached moderate intra- and inter-rater agreement among highly experienced raters (κ = 0.41-0.60). The least reproducible features were myometrial cysts, hyperechoic islands, subendometrial lines and buds, and translesional flow (κ = 0.11-0.34). Most individual features associated with well-defined lesions reached moderate to good intra- and inter-rater agreement among all observers (κ = 0.41-0.80). The least reproducible features were a serosal contour, asymmetry, a hyperechoic rim, and fan-shaped shadows (κ = 0.00-0.35). CONCLUSIONS: The reporting of well-defined lesions showed excellent agreement, whereas the agreement for poorly defined lesions was low, even among highly experienced raters. The agreement on identifying individual features varied, especially for features associated with ill-defined lesions. Guidelines on minimum requirements for features associated with ill-defined lesions to be interpreted as poorly defined lesions may improve agreement.
Assuntos
Miométrio/diagnóstico por imagem , Ultrassonografia/métodos , Neoplasias Uterinas/diagnóstico por imagem , Adulto , Feminino , Humanos , Imageamento Tridimensional/métodos , Pessoa de Meia-Idade , Variações Dependentes do Observador , Projetos Piloto , Estudos Prospectivos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: This study, promoted by Italian Association of Radiotherapy and Clinical Oncology (AIRO) Head and Neck Group, aimed to assess the current national practice of target volume delineation on a case of neck lymph node metastases from unknown primary evaluating inter-observer variability, in a setting of primary radiotherapy. MATERIALS AND METHODS: A case of metastatic neck lymph node from occult primary was proposed to 17 radiation oncologists. A national reference RT center was identified and considered as benchmark. Participants were requested to delineate target volumes. A structured questionnaire was administered. A comparison between following parameters of the CTVs was performed: centroids distances, Dice similarity index (DSI), Jaccard index and mean distance to agreement (MDA). Volume expressed in cubic centimeters and CTVs cranio-caudal extension were evaluated. RESULTS: Sixteen of 17 radiation oncologists recommended three CTVs dose levels. (CTV HD, CTV ID and CTV LD); CTV ID was not delineated by one of the participants and by the reference center. The distance between the reference centroid and the mean centroid of CTVs HD was 1.09 cm (0.36-3.99 cm); for CTV LD, a mean centroids distance of 2.45 (0.27-4.83 cm) was found, and for CTV HD, mean DSI is 0.48 and mean Jaccard index is 0.32 and MDA was 8.89 mm. CTV LD showed a mean DSI of 0.46, mean Jaccard index of 0.31 and MDA of 14.87 when compared to the reference. CONCLUSION: Many aspects concerning treatment optimization of cervical nodes metastases from occult primary remain unclear, and we found a notable heterogeneity of global radiotherapy management reporting discordances both in target volume delineation and volume prescription.
Assuntos
Neoplasias de Cabeça e Pescoço/radioterapia , Metástase Linfática/radioterapia , Neoplasias Primárias Desconhecidas/patologia , Padrões de Prática Médica/estatística & dados numéricos , Feminino , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Humanos , Itália , Metástase Linfática/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Variações Dependentes do Observador , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Sociedades Médicas , Inquéritos e Questionários , Tomografia Computadorizada por Raios X , Resultado do Tratamento , Imagem Corporal TotalRESUMO
BACKGROUND: Diagnostic interpretations of melanocytic skin lesions vary widely among pathologists, yet the underlying reasons remain unclear. OBJECTIVE: Identify pathologist characteristics associated with rates of accuracy and reproducibility. METHODS: Pathologists independently interpreted the same set of biopsy specimens from melanocytic lesions on 2 occasions. Diagnoses were categorized into 1 of 5 classes according to the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis system. Reproducibility was determined by pathologists' concordance of diagnoses across 2 occasions. Accuracy was defined by concordance with a consensus reference standard. Associations of pathologist characteristics with reproducibility and accuracy were assessed individually and in multivariable logistic regression models. RESULTS: Rates of diagnostic reproducibility and accuracy were highest among pathologists with board certification and/or fellowship training in dermatopathology and in those with 5 or more years of experience. In addition, accuracy was high among pathologists with a higher proportion of melanocytic lesions in their caseload composition and higher volume of melanocytic lesions. LIMITATIONS: Data gathered in a test set situation by using a classification tool not currently in clinical use. CONCLUSION: Diagnoses are more accurate among pathologists with specialty training and those with more experience interpreting melanocytic lesions. These findings support the practice of referring difficult cases to more experienced pathologists to improve diagnostic accuracy, although the impact of these referrals on patient outcomes requires additional research.
Assuntos
Melanoma/patologia , Patologistas , Patologia Clínica/normas , Neoplasias Cutâneas/patologia , Biópsia por Agulha , Competência Clínica , Consenso , Técnica Delphi , Feminino , Humanos , Masculino , Variações Dependentes do Observador , Melanoma Maligno CutâneoRESUMO
INTRODUCTION AND AIM: Transient elastography is gaining popularity as a non-invasive method for predicting liver fibrosis, but inter observer agreement and factors influencing reproducibility have not been adequately assessed. MATERIAL AND METHODS: This cross-sectional study was conducted at Specialized Medical Hospital and the Egyptian Liver Foundation, Mansoura, Egypt. The inclusion criteria were: age older than 18 years and chronic infection by hepatitis C. The exclusion criteria were the presence of ascites, pacemaker or pregnancy. Three hundred and fifty-six patients participated in the study. Therefore, 356 pairs of exams were done by two operators on the same day. RESULTS: The overall inter observer agreement ICC was 0.921. The correlation the two operators was excellent (Spearman's value q = 0.808, p < 0.001). Inter-observer reliability values were κ = 0.557 (p < 0.001). A not negligible discordance of fibrosis staging between operators was observed (87 cases, 24.4%). Discordance of at least one stage and for two or more stages of fibrosis occurred in 60 (16.9%) and 27 cases (7.6%) respectively. Obesity (BMI ≥ 30 kg/m2) is the main factor associated with discordance (p = 0.002). CONCLUSION: Although liver stiffness measurement has had an excellent correlation between the two operators, TE presented an inter-observer variability that may not be negligible.