RESUMO
Endometrial biopsies are important in the diagnostic workup of women who present with abnormal uterine bleeding or hereditary risk of endometrial cancer. In general, approximately 10% of all endometrial biopsies demonstrate endometrial (pre)malignancy that requires specific treatment. As the diagnostic evaluation of mostly benign cases results in a substantial workload for pathologists, artificial intelligence (AI)-assisted preselection of biopsies could optimize the workflow. This study aimed to assess the feasibility of AI-assisted diagnosis for endometrial biopsies (endometrial Pipelle biopsy computer-aided diagnosis), trained on daily-practice whole-slide images instead of highly selected images. Endometrial biopsies were classified into 6 clinically relevant categories defined as follows: nonrepresentative, normal, nonneoplastic, hyperplasia without atypia, hyperplasia with atypia, and malignant. The agreement among 15 pathologists, within these classifications, was evaluated in 91 endometrial biopsies. Next, an algorithm (trained on a total of 2819 endometrial biopsies) rated the same 91 cases, and we compared its performance using the pathologist's classification as the reference standard. The interrater reliability among pathologists was moderate with a mean Cohen's kappa of 0.51, whereas for a binary classification into benign vs (pre)malignant, the agreement was substantial with a mean Cohen's kappa of 0.66. The AI algorithm performed slightly worse for the 6 categories with a moderate Cohen's kappa of 0.43 but was comparable for the binary classification with a substantial Cohen's kappa of 0.65. AI-assisted diagnosis of endometrial biopsies was demonstrated to be feasible in discriminating between benign and (pre)malignant endometrial tissues, even when trained on unselected cases. Endometrial premalignancies remain challenging for both pathologists and AI algorithms. Future steps to improve reliability of the diagnosis are needed to achieve a more refined AI-assisted diagnostic solution for endometrial biopsies that covers both premalignant and malignant diagnoses.
Assuntos
Inteligência Artificial , Computadores , Humanos , Feminino , Estudos de Viabilidade , Hiperplasia , Reprodutibilidade dos Testes , BiópsiaRESUMO
Grading lung squamous cell carcinoma (LUSC) is controversial and not universally accepted. The histomorphologic feature of tumor budding (TB) is an established independent prognostic factor in colorectal cancer, and its importance is growing in other solid cancers, making it a candidate for inclusion in tumor grading schemes. We aimed to compare TB between preoperative biopsies and resection specimens in pulmonary squamous cell carcinoma and assess interobserver variability. A retrospective cohort of 249 consecutive patients primarily resected with LUSC in Bern (2000-2013, n = 136) and Lausanne (2005-2020, n = 113) with available preoperative biopsies was analyzed for TB and additional histomorphologic parameters, such as spread through airspaces and desmoplasia, by 2 expert pathologists (M.M., C.N.). Results were correlated with clinicopathologic parameters and survival. In resection specimens, peritumoral budding (PTB) score was low (0-4 buds/0.785 mm2) in 47.6%, intermediate (5-9 buds/0.785 mm2) in 27.4%, and high (≥10 buds/0.785 mm2) in 25% of cases (median bud count, 5; IQR, 0-26). Both the absolute number of buds and TB score were similar when comparing tumor edge and intratumoral zone (P = .192) but significantly different from the score obtained in the biopsy (P < .001). Interobserver variability was moderate, regardless of score location (Cohen kappa, 0.59). The discrepant cases were reassessed, and consensus was reached in all cases with identification of causes of discordance. TB score was significantly associated with stage (P = .002), presence of lymph node (P = .033), and distant metastases (P = .020), without significant correlation with overall survival, tumor size, or pleural invasion. Desmoplasia was significantly associated with higher PTB (P < .001). Spread through airspaces was present in 34% and associated with lower PTB (P < .001). To conclude, despite confirming TB as a reproducible factor in LUSC, we disclose areas of scoring ambiguity. Preoperative biopsy evaluation was insufficient in establishing the final TB score of the resected tumor.
Assuntos
Carcinoma de Células Escamosas , Neoplasias Pulmonares , Variações Dependentes do Observador , Humanos , Feminino , Masculino , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/cirurgia , Neoplasias Pulmonares/mortalidade , Idoso , Estudos Retrospectivos , Pessoa de Meia-Idade , Carcinoma de Células Escamosas/patologia , Carcinoma de Células Escamosas/cirurgia , Carcinoma de Células Escamosas/mortalidade , Biópsia , Gradação de Tumores , Idoso de 80 Anos ou maisRESUMO
BACKGROUND: In patients undergoing breast-conserving therapy without surgical clip implantation, the accuracy of tumor bed identification and the consistency of clinical target volume (CTV) delineation under computed tomography (CT) simulation remain suboptimal. This study aimed to investigate the feasibility of implementing preoperative magnetic resonance (MR) simulation on delineations by assessing interobserver variability (IOV). METHODS: Preoperative MR and postoperative CT simulations were performed in patients who underwent breast-conserving surgery with no surgical clips implanted. Custom immobilization pads were used to ensure the same supine position. Three radiation oncologists independently delineated the CTV of tumor bed on the images acquired from MR and CT simulation registration and CT simulation alone. Cavity visualization score (CVS) was assigned to each patient based on the clarity of the tumor bed on CT simulation images. IOV was indicated by generalized conformity index (CIgen), denoted as CIgen-CT and CIgen-MR/CT, and the distance between the centroid of mass (dCOM), denoted as dCOMCT and dCOMMR/CT. The variation of IOV in different CVS subgroups was analyzed. RESULTS: A total of 10 patients were enrolled in this study. The median and interquartile range (IQR) of maximum pathological diameter of the tumors in all patients were 1.55 (0.80-1.92) cm. No statistical significance was found between the volumes of CTVs on CT simulation and on MR/CT simulation registration images (p = 0.387). CIgen-MR/CT was significantly larger than CIgen-CT (p = 0.005). dCOMMR/CT was significantly smaller than dCOMCT (p = 0.037). The median and IQR of CVS in all patients were 2.34 (2.00-3.08). The difference of CIgen between CIgen-MR/CT and CIgen-CT was larger in the low CVS group (p = 0.016). The difference of dCOM showed a decreasing trend when CVS was lower, although it did not reach statistical significance (p = 0.095). CONCLUSIONS: For patients who underwent breast-conserving surgery without surgical clip implantation, the use of preoperative MR simulation in delineating the CTV of tumor bed decreased the IOV among observers. The consistency of tumor bed identification was improved especially in cases where the margins of tumor bed were challenging to visualize on CT simulation images. The study findings offer potential benefits in reducing local recurrence and minimizing tissue irritation in the surrounding areas. Future investigation in a larger patient cohort to validate our results is warranted.
Assuntos
Neoplasias da Mama , Imageamento por Ressonância Magnética , Mastectomia Segmentar , Variações Dependentes do Observador , Tomografia Computadorizada por Raios X , Humanos , Feminino , Neoplasias da Mama/cirurgia , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Mastectomia Segmentar/métodos , Projetos Piloto , Imageamento por Ressonância Magnética/métodos , Pessoa de Meia-Idade , Tomografia Computadorizada por Raios X/métodos , Idoso , Adulto , Cuidados Pré-Operatórios/métodos , Carga Tumoral , Estudos de Viabilidade , Planejamento da Radioterapia Assistida por Computador/métodosRESUMO
OBJECTIVES: To explore the topic of Prostate Imaging-Reporting and Data System (PI-RADS) interobserver variability, including a discussion of major sources, mitigation approaches, and future directions. METHODS: A narrative review of PI-RADS interobserver variability. RESULTS: PI-RADS was developed in 2012 to set technical standards for prostate magnetic resonance imaging (MRI), reduce interobserver variability at interpretation, and improve diagnostic accuracy in the MRI-directed diagnostic pathway for detection of clinically significant prostate cancer. While PI-RADS has been validated in selected research cohorts with prostate cancer imaging experts, subsequent prospective studies in routine clinical practice demonstrate wide variability in diagnostic performance. Radiologist and biopsy operator experience are the most important contributing drivers of high-quality care among multiple interrelated factors including variability in MRI hardware and technique, image quality, and population and patient-specific factors such as prostate cancer disease prevalence. Iterative improvements in PI-RADS have helped flatten the curve for novice readers and reduce variability. Innovations in image quality reporting, administrative and organisational workflows, and artificial intelligence hold promise in improving variability even further. CONCLUSION: Continued research into PI-RADS is needed to facilitate benchmark creation, reader certification, and independent accreditation, which are systems-level interventions needed to uphold and maintain high-quality prostate MRI across entire populations.
Assuntos
Imageamento por Ressonância Magnética , Variações Dependentes do Observador , Neoplasias da Próstata , Masculino , Humanos , Neoplasias da Próstata/diagnóstico por imagem , Próstata/patologia , Próstata/diagnóstico por imagem , Sistemas de Dados , Sistemas de Informação em RadiologiaRESUMO
OBJECTIVES: The aim of this study is to improve the reliability of subjective IQ assessment using a pairwise comparison (PC) method instead of a Likert scale method in abdominal CT scans. METHODS: Abdominal CT scans (single-center) were retrospectively selected between September 2019 and February 2020 in a prior study. Sample variance in IQ was obtained by adding artificial noise using dedicated reconstruction software, including reconstructions with filtered backprojection and varying iterative reconstruction strengths. Two datasets (each n = 50) were composed with either higher or lower IQ variation with the 25 original scans being part of both datasets. Using in-house developed software, six observers (five radiologists, one resident) rated both datasets via both the PC method (forcing observers to choose preferred scans out of pairs of scans resulting in a ranking) and a 5-point Likert scale. The PC method was optimized using a sorting algorithm to minimize necessary comparisons. The inter- and intraobserver agreements were assessed for both methods with the intraclass correlation coefficient (ICC). RESULTS: Twenty-five patients (mean age 61 years ± 15.5; 56% men) were evaluated. The ICC for interobserver agreement for the high-variation dataset increased from 0.665 (95%CI 0.396-0.814) to 0.785 (95%CI 0.676-0.867) when the PC method was used instead of a Likert scale. For the low-variation dataset, the ICC increased from 0.276 (95%CI 0.034-0.500) to 0.562 (95%CI 0.337-0.729). Intraobserver agreement increased for four out of six observers. CONCLUSION: The PC method is more reliable for subjective IQ assessment indicated by improved inter- and intraobserver agreement. CLINICAL RELEVANCE STATEMENT: This study shows that the pairwise comparison method is a more reliable method for subjective image quality assessment. Improved reliability is of key importance for optimization studies, validation of automatic image quality assessment algorithms, and training of AI algorithms. KEY POINTS: ⢠Subjective assessment of diagnostic image quality via Likert scale has limited reliability. ⢠A pairwise comparison method improves the inter- and intraobserver agreement. ⢠The pairwise comparison method is more reliable for CT optimization studies.
Assuntos
Tomografia Computadorizada por Raios X , Humanos , Masculino , Feminino , Tomografia Computadorizada por Raios X/métodos , Reprodutibilidade dos Testes , Pessoa de Meia-Idade , Estudos Retrospectivos , Variações Dependentes do Observador , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Abdominal/métodos , Algoritmos , SoftwareRESUMO
OBJECTIVE: This study aimed to understand reasons for interobserver variability in the grading of oral epithelial dysplasia (OED) through a survey of pathologists to provide insight for improvements in the reliability and reproducibility of OED diagnoses. METHODS: The study design included quantitative and qualitative methodology. A pre-validated 31-item questionnaire was distributed to general, head and neck, and oral and maxillofacial histopathology specialists worldwide. RESULTS: A total of 132 pathologists participated and completed the questionnaire. Over two-thirds used the three-tier grading system for OED, while about a third used both binary and three-tier systems. Regular reporters of OED preferred the three-tier system and grading architectural features. Continuing education significantly aided recognition of architectural and cytological changes. Irregular epithelial stratification and drop-shaped rete ridges had the lowest prognostic value and recognition scores, while loss of epithelial cell cohesion had the highest. Most participants used clinical information and often sought a second opinion when grading OED. CONCLUSION: Our study has found that frequency of OED reporting and attendance of CME/CPD can play an important role in grading OED. Variations in the prognostic value of individual histological features and the use of clinical information may further contribute to interobserver variability.
RESUMO
PURPOSE: Trochlear dysplasia is one of the main risk factors for recurrent patellar dislocation. The Dejour classification identifies four categories that can be used to classify trochlear dysplasia. The purpose of this study is to evaluate the inter- and intraobserver reliability of the Dejour classification for trochlear dysplasia. The hypothesis was that both intra- and interobserver reliability would be at least moderate. METHODS: This is a cross-sectional, reliability study. Twenty-eight examiners from the International Patellofemoral Study Group 2022 meeting evaluated lateral radiographs of the knee and axial magnetic resonance images from 15 cases of patellofemoral instability with trochlear dysplasia. They classified each case according to Dejour's classification for trochlear dysplasia (A-D). There were three rounds: one with only computed radiograph (CR), one with only magnetic resonance imaging (MRI) and one with both. Inter- and intraobserver reliability were calculated using κ coefficient (0-1). RESULTS: The mean age of patients was: 14.6 years; 60% were female and 53% had open physis. The interobserver reliability κ probabilities were 0.2 (CR), 0.13 (MRI) and 0.12 (CR and MRI). The intraobserver reliability κ probabilities were 0.45 (CR), 0.44 (MRI) and 0.65 (CR and MRI). CONCLUSION: The Dejour classification for trochlear dysplasia has slight interobserver reliability and substantial intraobserver reliability. LEVEL OF EVIDENCE: Level I.
Assuntos
Imageamento por Ressonância Magnética , Variações Dependentes do Observador , Articulação Patelofemoral , Humanos , Estudos Transversais , Feminino , Reprodutibilidade dos Testes , Adolescente , Masculino , Articulação Patelofemoral/diagnóstico por imagem , Articulação Patelofemoral/patologia , Luxação Patelar/diagnóstico por imagem , Luxação Patelar/classificação , Instabilidade Articular/classificação , Instabilidade Articular/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Fêmur/diagnóstico por imagem , Fêmur/patologia , CriançaRESUMO
PURPOSE: This study aims to investigate the interobserver variability in the quantitative assessment of liver fat content using ultrasound attenuation imaging technology (USAT). METHODS: This prospective, single-center study included 96 adult patients who were either diagnosed with or suspected of having metabolic dysfunction-associated steatotic liver disease. Independent observers, blinded to each other's assessments, evaluated hepatic steatosis visually and through USAT measurements. Separate measurements were taken at five intercostal and subcostal sites, and the median values of these measurements were recorded. The correlation between USAT measurements and visual steatosis grades was examined using Spearman's correlation test. Intraclass correlation coefficient (ICC) and Bland-Altman analysis were used to evaluate the interobserver variability of USAT measurements. RESULTS: Interobserver agreement for USAT measurements was excellent for the intercostal examination and good for the subcostal examination (p < 0.001). Body mass index did not significantly affect the level of interobserver agreement. Interobserver variability in Bland-Altman plots of USAT measurements was within the 95% limits of agreement. USAT measurements correlated very strongly with the visual degree of hepatic steatosis, both intercostal and subcostal (p < 0.001). USAT measurements were also significantly different between different visual degrees of hepatic steatosis (p < 0.001). CONCLUSION: In the assessment of hepatic steatosis, USAT measurements obtained from the intercostal space showed excellent agreement in terms of interobserver reproducibility.
RESUMO
Objective: To assess the effectiveness of periodic acid-Schiff stain and p53 immunohistochemical marker in reducing interobserver variability for diagnosing microinvasive oral squamous cell carcinoma cases. METHODS: The cross-sectional study was conducted at a tertiary care diagnostic hospital in Rawalpindi, Pakistan, from March 31 to July 31, 2023, and comprised diagnostically challenging biopsy specimens. The specimens were subjected first to haematoxylin and eosin stain, and then with periodic acid-Schiff stain and tumour protein p53 immunohistochemistry simultaneously. A preliminary diagnosis on routine staining alone and a final diagnosis with the two adjuncts were reported by two observers who were both blinded to the prior diagnosis. Data was analysed using SPSS 25. RESULTS: Of the 30 specimens diagnosed, 21 (70%) belonged to males and 9 (30%) to females. The mean age of the patients was 60.47±11.78 years. Periodic acid-Schiff staining and tumour protein p53 immunohistochemistry demonstrated a significant decrease in interobserver variability in the diagnosis of microinvasive oral squamous cell carcinoma, exhibiting enhanced visualisation in basement membrane breach and identifying the invading cells within the lamina propria that were masked on routine staining (p<0.05). Conclusion: Periodic acid-Schiff stain and tumour protein p53 immunohistochemistry could assist in reducing interobserver variability in the diagnosis of microinvasive oral squamous cell carcinoma.
Assuntos
Carcinoma de Células Escamosas , Imuno-Histoquímica , Neoplasias Bucais , Variações Dependentes do Observador , Reação do Ácido Periódico de Schiff , Proteína Supressora de Tumor p53 , Humanos , Feminino , Masculino , Neoplasias Bucais/patologia , Neoplasias Bucais/diagnóstico , Neoplasias Bucais/metabolismo , Estudos Transversais , Pessoa de Meia-Idade , Proteína Supressora de Tumor p53/metabolismo , Carcinoma de Células Escamosas/patologia , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/metabolismo , Idoso , Biomarcadores Tumorais/metabolismo , Coloração e Rotulagem/métodos , Invasividade Neoplásica , PaquistãoRESUMO
Tumor cell fraction (TCF) estimation is a common clinical task with well-established large interobserver variability. It thus provides an ideal test bed to evaluate potential impacts of employing a tumor cell fraction computer-aided diagnostic (TCFCAD) tool to support pathologists' evaluation. During a National Slide Seminar event, pathologists (n = 69) were asked to visually estimate TCF in 10 regions of interest (ROIs) from hematoxylin and eosin colorectal cancer images intentionally curated for diverse tissue compositions, cellularity, and stain intensities. Next, they re-evaluated the same ROIs while being provided a TCFCAD-created overlay highlighting predicted tumor vs nontumor cells, together with the corresponding TCF percentage. Participants also reported confidence levels in their assessments using a 5-tier scale, indicating no confidence to high confidence, respectively. The TCF ground truth (GT) was defined by manual cell-counting by experts. When assisted, interobserver variability significantly decreased, showing estimates converging to the GT. This improvement remained even when TCFCAD predictions deviated slightly from the GT. The standard deviation (SD) of the estimated TCF to the GT across ROIs was 9.9% vs 5.8% with TCFCAD (P < .0001). The intraclass correlation coefficient increased from 0.8 to 0.93 (95% CI, 0.65-0.93 vs 0.86-0.98), and pathologists stated feeling more confident when aided (3.67 ± 0.81 vs 4.17 ± 0.82 with the computer-aided diagnostic [CAD] tool). TCFCAD estimation support demonstrated improved scoring accuracy, interpathologist agreement, and scoring confidence. Interestingly, pathologists also expressed more willingness to use such a CAD tool at the end of the survey, highlighting the importance of training/education to increase adoption of CAD systems.
Assuntos
Computadores , Patologistas , Humanos , SuíçaRESUMO
OBJECTIVES: To analyze discordant and false-negatives of double reading digital breast tomosynthesis (DBT) versus digital mammography (DM) including reading times in the Oslo Tomosynthesis Screening Trial (OTST), and reclassify these in a retrospective reader study as missed, minimal sign, or true-negatives. METHODS: The prospective OTST comparing double reading DBT vs. DM had paired design with four parallel arms: DM, DM + computer aided detection, DBT + DM, and DBT + synthetic mammography. Eight radiologists interpreted images in batches using a 5-point scale. Reading time was automatically recorded. A retrospective reader study including four radiologists classified screen-detected cancers with at least one false-negative score and screening examinations of interval cancers as negative, non-specific minimal sign, significant minimal sign, and missed; the two latter groups are defined "actionable." Statistics included chi-square, Fisher's exact, McNemar's, and Mann-Whitney U tests. RESULTS: Discordant rate (cancer missed by one reader) for screen-detected cancers was overall comparable (DBT (31% [71/227]) and DM (30% [52/175]), p = .81), significantly lower at DBT for spiculated cancers (DBT, 19% [20/106] vs. DM, 36% [38/106], p = .003), but high (28/49 = 57%, p = 0.001) for DBT-only detected spiculated cancers. Reading time and sensitivity varied among readers. False-negative DBT-only detected spiculated cancers had shorter reading time than true-negatives in 46% (13/28). Retrospective evaluation classified the following DBT exams "actionable": three missed by both readers, 95% (39/41) of discordant cancers detected by both modes, all 30 discordant DBT-only cancers, 25% (13/51) of interval cancers. CONCLUSIONS: Discordant rate was overall comparable for DBT and DM, significantly lower at DBT for spiculated cancers, but high for DBT-only detected spiculated lesions. Most false-negative screen-detected DBT were classified as "actionable." CLINICAL RELEVANCE STATEMENT: Retrospective evaluation of false-negative interpretations from the Oslo Tomosynthesis Screening Trial shows that most discordant and several interval cancers could have been detected at screening. This underlines the potential for modern AI-based reading aids and triage, as high-volume screening is a demanding task. KEY POINTS: ⢠Digital breast tomosynthesis (DBT) screening is more sensitive and has higher specificity compared to digital mammography screening, but high-volume DBT screening is a demanding task which can result in high discordance rate among readers. ⢠Independent double reading DBT screening had overall comparable discordance rate as digital mammography, lower for spiculated masses seen on both modalities, and higher for small spiculated cancer seen only on DBT. ⢠Almost all discordant digital breast tomosynthesis-detected cancers (72 of 74) and 25% (13 of 51) of the interval cancers in the Oslo Tomosynthesis Screening Trial were retrospectively classified as actionable and could have been detected by the readers.
RESUMO
OBJECTIVES: To assess interobserver variability in ultrasound-based quantitative liver fat content measurements and to determine how much time these quantitative ultrasound (QUS) techniques require. METHODS: One hundred patients with known or suspected of having nonalcoholic fatty liver disease were included in this prospective study. Two observers who were blinded to each other measurements performed tissue attenuation imaging (TAI) and tissue scatter distribution imaging (TSI) techniques independently. Both observers assessed hepatic steatosis visually and obtained 5 measurements for each QUS technique and the median values of the measurements were recorded. Spearman's correlation test was used to assess the correlation between QUS measurements and visual hepatic stetaosis grades. Intraclass correlation coefficient (ICC) test was used to assess interobserver variability in QUS measurements. RESULTS: The median values of TAI measurements for the observers 1 and 2 were 0.75 and 0.74 dB/cm/MHz, respectively. The median values of TSI measurements for the observers 1 and 2 were 93.53 and 92.58, respectively. The interobserver agreement in TAI (ICC: 0.970) and TSI (ICC: 0.938) measurements were excellent. The mean of the required time period for TAI technique were 55.1 ± 7.8 and 59.9 ± 6.6 seconds for the observers 1 and 2, respectively. The mean of the required time period for TSI technique were 49.1 ± 5.8 and 54.1 ± 5.4 seconds for the observers 1 and 2, respectively. CONCLUSION: The current study revealed that both TAI and TSI techniques are highly reproducible and can be implemented into daily practice with little additional time requirement.
Assuntos
Fígado , Hepatopatia Gordurosa não Alcoólica , Humanos , Variações Dependentes do Observador , Estudos Prospectivos , Fígado/diagnóstico por imagem , Hepatopatia Gordurosa não Alcoólica/diagnóstico por imagem , Ultrassonografia/métodosRESUMO
BACKGROUND: Computerized methodologies standardize the myocardial perfusion imaging (MPI) interpretation process. METHODS: To develop an automated relative perfusion quantitation approach for 18F-flurpiridaz, PET MPI studies from all phase III trial participants of 18F-flurpiridaz were divided into 3 groups. Count distributions were obtained in N = 40 normal patients undergoing pharmacological or exercise stress. Then, N = 90 additional studies were selected in a derivation group. Following receiver operating characteristic curve analysis, various standard deviations below the mean normal were used as cutoffs for significant CAD, and interobserver variability determined. Finally, diagnostic performance was compared between blinded visual readers and blinded derivations of automated relative quantitation in the remaining N = 548 validation patients. RESULTS: Both approaches yielded comparable accuracies for the detection of global CAD, reaching 71% and 72% by visual reads, and 72% and 68% by automated relative quantitation, when using CAD ≥ 70% or ≥ 50% stenosis for significance, respectively. Similar results were observed when analyzing individual coronary territories. In both pharmacological and exercise stress, automated relative quantitation demonstrated significantly more interobserver agreement than visual reads. CONCLUSIONS: Our automated method of 18F-flurpiridaz relative perfusion analysis provides a quantitative, objective, and highly reproducible assessment of PET MPI in normal and CAD subjects undergoing either pharmacological or exercise stress.
Assuntos
Doença da Artéria Coronariana , Imagem de Perfusão do Miocárdio , Piridazinas , Doença da Artéria Coronariana/diagnóstico por imagem , Humanos , Imagem de Perfusão do Miocárdio/métodos , Variações Dependentes do Observador , Perfusão , Tomografia por Emissão de Pósitrons/métodos , Tomografia Computadorizada de Emissão de Fóton ÚnicoRESUMO
Background: Several reports have suggested that radiotherapy after reconstructive surgery for head and neck cancer (HNC), could have deleterious effects on the flaps with respect to functional outcomes. To predict and prevent toxicities, flap delineation should be accurate and reproducible. The objective of the present study was to evaluate the interobserver variability of frequent types of flaps used in HNC, based on the recent GORTEC atlas.Materials and methods: Each member of an international working group (WG) consisting of 14 experts delineated the flaps on a CT set from six patients. Each patient had one of the five most commonly used flaps in HNC: a regional pedicled pectoralis major myocutaneous flap, a local pedicled rotational soft tissue facial artery musculo-mucosal (FAMM) (2 patients), a fasciocutaneous radial forearm free flap, a soft tissue anterolateral thigh (ALT) free flap, or a fibular free flap. The WG's contours were compared to a reference contour, validated by a surgeon and a radiologist specializing in HNC. Contours were considered as reproducible if the median Dice Similarity Coefficient (DSC) was > 0.7.Results: The median volumes of the six flaps delineated by the WG were close to the reference contour value, with approximately 50 cc for the pectoral, fibula, and ALT flaps, 20 cc for the radial forearm, and up to 10 cc for the FAMM. The volumetric ratio was thus close to the optimal value of 100% for all flaps. The median DSC obtained by the WG compared to the reference for the pectoralis flap, the FAMM, the radial forearm flap, ALT flap, and the fibular flap were 0.82, 0.40, 0.76, 0.81, and 0.76, respectively.Conclusions: This study showed that the delineation of four main flaps used for HNC was reproducible. The delineation of the FAMM, however, requires close cooperation between radiologist, surgeon and radiation oncologist because of the poor visibility of this flap on CT and its small size.
Assuntos
Carcinoma , Retalhos de Tecido Biológico , Neoplasias de Cabeça e Pescoço , Procedimentos de Cirurgia Plástica , Neoplasias de Cabeça e Pescoço/cirurgia , Humanos , Melanoma , Procedimentos de Cirurgia Plástica/métodos , Reprodutibilidade dos Testes , Neoplasias Cutâneas , Melanoma Maligno CutâneoRESUMO
BACKGROUND: We aimed to determine whether the histopathological grading of dysplastic nevi is an objective endeavor, considering interobserver variability, according to 2018 World Health Organization (WHO) criteria. METHODS: In total, 179 cases of dysplastic nevi, with high and moderate degree of atypia, diagnosed and graded according to the previous criteria were reviewed by three pathologists. Then, the observers graded the dysplastic nevi as low or high according to 2018 WHO criteria. RESULTS: Grading of dysplastic nevi was in complete agreement in 99 out of 179 cases across three observers with a fair level of overall interobserver agreement (multirater κfree : 0.40). The observers showed moderate to good agreement for most of the architectural features, except for criteria regarding focal continuous basal proliferation of melanocytes, density of non-nested junctional melanocytes, and presence of dyscohesive nests of intraepidermal melanocytes, whereas fair agreement was achieved for the cytological criteria. CONCLUSIONS: The 2018 WHO criteria for dysplastic nevus will ensure a common approach to the diagnosis and grading of dysplastic nevi. However, histopathological criteria, such as cytological features and focal continuous basal proliferation of melanocytes, should be improved so as to ensure a more accurate surgical approach and risk assessment.
Assuntos
Síndrome do Nevo Displásico/patologia , Neoplasias Cutâneas/patologia , Humanos , Gradação de Tumores/normas , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
OBJECTIVE: In our country, thyroid nodules are sonographically evaluated in health maintenance organization (HMO) imaging centers, and patients are referred to tertiary hospitals for ultrasound-guided fine-needle aspiration (FNA) biopsy when indicated. We evaluated the concordance in Thyroid Imaging Reporting and Data System (TI-RADS) classification reporting between these sites. METHODS: We conducted a retrospective cohort study reviewing the sonographic features of thyroid nodules evaluated both at the HMO and a large tertiary center between January 2018 and December 2019. The primary outcome was concordance between the TI-RADS classification at both sites. Additional endpoints included correlation of TI-RADS to the Bethesda category following FNA and correlation of TI-RADS with malignancy on final pathology at each site. RESULTS: The records of 336 patients with 370 nodules were reviewed. The level of concordance was poor (19.8%), with 277 (74.8%) nodules demonstrating higher TI-RADS and 20 (5.4%) lower TI-RADS at the HMO compared to the hospital (P < .001; weighted κ = 0.120). FNA results were available for 236 (63.8%) nodules. The Bethesda category strongly correlated with the hospital TI-RADS (P < .001), yet not with HMO TI-RADS (P = .123). In the surgically removed 57 nodules, a strong correlation was identified between the malignancy on final pathology and TI-RADS documented at the hospital (P < .001), yet not at the HMO (P = .259). CONCLUSIONS: There is poor agreement between TI-RADS classification on ultrasound performed in the HMO compared to a tertiary hospital. The hospital's TI-RADS strongly correlated with the Bethesda category and the final risk of malignancy, unlike the HMO.
Assuntos
Nódulo da Glândula Tireoide , Biópsia por Agulha Fina , Humanos , Estudos Retrospectivos , Centros de Atenção Terciária , Nódulo da Glândula Tireoide/diagnóstico por imagem , Nódulo da Glândula Tireoide/patologia , Ultrassonografia/métodosRESUMO
Indocyanine Green Fluorescence Angiography (ICGFA) has been deployed to tackle malperfusion-related anastomotic complications. This study assesses variations in operator interpretation of pre-anastomotic ICGFA inflow in the gastric conduit. Utilizing an innovative online interactive multimedia platform (Mindstamp), esophageal surgeons completed a baseline opinion-practice questionnaire and proceeded to interpret, and then digitally assign, a distal transection point on 8 ICGFA videos of esophageal resections (6 Ivor Lewis, 2 McKeown). Annotations regarding gastric conduit transection by ICGFA were compared between expert users versus non-expert participants using ImageJ to delineate longitudinal distances with Shapiro Wilk and t-tests to ascertain significance. Expert versus non-expert correlation was assessed via Intraclass Correlation Coefficients (ICC). Thirty participants (13 consultants, 6 ICGFA experts) completed the study in all aspects. Of these, a high majority (29 participants) stated ICGFA should be used routinely with most (21, including 5/6 experts) stating that 11-50 cases were needed for competency in interpretation. Among users, there were wide variations in dosing (0.05-3 mg/kg) and practice impact. Agreement regarding ICGFA video interpretation concerning transection level among experts was 'moderate' (ICC = 0.717) overall but 'good' (ICC = 0.871) among seven videos with Leave One Out (LOO) exclusion of the video with highest disagreement. Agreement among non-experts was moderate (ICC = 0.641) overall and in every subgroup including among consultants (ICC = 0.626). Experts choose levels that preserved more gastric conduit length versus non-experts in all but one video (P = 0.02). Considerable variability exists with ICGFA interpretation and indeed impact. Even adept users may be challenged in specific cases. Standardized training and/or computerized quantitative fluorescence may help better usage.
Assuntos
Neoplasias Esofágicas , Esofagectomia , Humanos , Esofagectomia/efeitos adversos , Verde de Indocianina , Fístula Anastomótica/etiologia , Angiofluoresceinografia , Anastomose Cirúrgica/efeitos adversos , Neoplasias Esofágicas/diagnóstico por imagem , Neoplasias Esofágicas/cirurgia , Neoplasias Esofágicas/complicações , Perfusão/efeitos adversosRESUMO
The histologic diagnosis of acute ascending intrauterine infection permits a higher-efficacy identification of both subclinical infection and clinical chorioamnionitis, but procedures for placental pathology need to adopt a unified approach and work toward reproducible grading and staging systems. We conducted a retrospective chart review of 696 placental records from single and multiple deliveries between January 2011 and February 2020. Then, we compared original diagnoses with diagnoses based on Redline criteria, which is an internationally recognized system of staging and grading. Of the 696 cases available for review, 255 had complete medical records. Findings showed a strong degree of agreement (90%-100%) between original investigators' histological diagnoses of acute ascending intrauterine infection and a review by researchers using Redline criteria. Although interobserver agreement was good, more education is needed on Redline criteria to avoid missed cases (primarily Stage 1), support protocols for pathologists and obstetricians/gynecologists in determining which cases need to be investigated, and the development of reporting standards for acute ascending intrauterine infection and feedback mechanisms during follow-up.
Assuntos
Corioamnionite/diagnóstico , Corioamnionite/patologia , Feminino , Humanos , Variações Dependentes do Observador , Patologistas , Gravidez , Estudos RetrospectivosRESUMO
BACKGROUND: The most widely used classification for hemorrhoidal disease (HD) is the Goligher classification, which ranks presence and severity of prolapse in four grades. Since physicians base this gradation on medical history and physical examination, it might be prone to interobserver variability. Furthermore, the gradation impacts the treatment of choice which makes reproducibility of utmost importance. The aim of this study was to determine the interobserver variability of Goligher classification among surgeons in the Netherlands. METHODS: A single-choice survey was used. The first part consisted of questions concerning baseline characteristics and the use of the Goligher classification in routine clinical practice. In the second part, to assess interobserver variability, we asked gastrointestinal surgeons and residents who routinely treat HD to review 25 photographs (with given timing as during rest or push) of patients with HD and classify the gradation using the Goligher classification. The survey was sent by email on April 19, 2021 and was available online until July 5, 2021. Interobserver variability was assessed using Fleiss' Kappa test. RESULTS: A total of 329 gastrointestinal surgeons, fellows and residents were sent an invitation email, of whom 95 (29%) completed the survey. Among the respondents, 87% indicated that they use the Goligher classification in clinical practice. Eighty-one percent found the classification helpful and 63% classified HD according to Goligher and followed the guidelines for treatment of HD accordingly. The interobserver variability showed an overall fair strength of agreement, with a Fleiss' Kappa (κ) of 0.376 (95% CI 0.373-0.380). There was a moderate agreement for grade I and IV HD with a κ statistic of 0.466 and 0.522, respectively. For grades II and III, there was a lower (fair) strength of agreement with 0.206 and 0.378, respectively. CONCLUSIONS: The fair interobserver variability is disappointing and demonstrates the need for a more reliable, and internationally accepted, classification for HD. A new classification should enable more uniformity in treating HD and in comparing outcomes of future trials and prospective registries. The protocol for a Delphi study for a new classification system is currently being prepared and led by an international research group.
Assuntos
Hemorroidas , Hemorroidas/diagnóstico , Hemorroidas/cirurgia , Humanos , Variações Dependentes do Observador , Estudos Prospectivos , Reprodutibilidade dos Testes , Inquéritos e QuestionáriosRESUMO
PURPOSE: To retrospectively compare interpretations of Doppler ultrasound (US) in newborns with confirmed perinatal testicular torsion (PTT) by an experienced faculty (staff) pediatric radiologist (SPR), pediatric radiology fellow (PRF), pediatric urology fellow (PUF) and staff pediatric urologist (SPU). METHODS: US images of 27 consecutive males with PTT between May 2000 and July 2020 were retrieved. The testicles were classified as affected or non-affected by PTT. We performed a blinded comparison of interpretation by four assessors (SPR, PRF, PUF, SPU), with respect to the US features of PTT. Paired inter-rater agreement was calculated using Cohen's Kappa (κ) and overall agreement was assessed using Fleiss' kappa. RESULTS: Overall comparison using Fleiss' kappa found fair agreement for most features except testicular echogenicity and echogenic foci at interface for which there was poor agreement. Paired comparisons revealed better agreement between the SPR and PRF compared to the remaining two pairs, suggesting a need for the pediatric urologists (PUF and SPU) to acquaint themselves with testicular ultrasonography as this may have an impact on patient risk stratification and the quality of information given to parents. CONCLUSION: This study highlights the need for focused training program for pediatric urologists to attain similar agreement as the radiologists, suggesting a need for the pediatric urologists (PUF and SPU) to acquaint themselves with testicular ultrasonography as this may have an impact on patient risk stratification and the quality of information given to parents.