Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46.165
Filter
Add more filters

Publication year range
1.
Gastroenterology ; 167(3): 493-504.e10, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38467384

ABSTRACT

BACKGROUND & AIMS: Histologic evaluation of gut biopsies is a cornerstone for diagnosis and management of celiac disease (CeD). Despite its wide use, the method depends on proper biopsy orientation, and it suffers from interobserver variability. Biopsy proteome measurement reporting on the tissue state can be obtained by mass spectrometry analysis of formalin-fixed paraffin-embedded tissue. Here we aimed to transform biopsy proteome data into numerical scores that give observer-independent measures of mucosal remodeling in CeD. METHODS: A pipeline using glass-mounted formalin-fixed paraffin-embedded sections for mass spectrometry-based proteome analysis was established. Proteome data were converted to numerical scores using 2 complementary approaches: a rank-based enrichment score and a score based on machine learning using logistic regression. The 2 scoring approaches were compared with each other and with histology analyzing 18 patients with CeD with biopsies collected before and after treatment with a gluten-free diet as well as biopsies from patients with CeD with varying degree of remission (n = 22). Biopsies from individuals without CeD (n = 32) were also analyzed. RESULTS: The method yielded reliable proteome scoring of both unstained and H&E-stained glass-mounted sections. The scores of the 2 approaches were highly correlated, reflecting that both approaches pick up proteome changes in the same biological pathways. The proteome scores correlated with villus height-to-crypt depth ratio. Thus, the method is able to score biopsies with poor orientation. CONCLUSIONS: Biopsy proteome scores give reliable observer and orientation-independent measures of mucosal remodeling in CeD. The proteomic method can readily be implemented by nonexpert laboratories in parallel to histology assessment and easily scaled for clinical trial settings.


Subject(s)
Celiac Disease , Diet, Gluten-Free , Intestinal Mucosa , Proteome , Proteomics , Celiac Disease/pathology , Celiac Disease/metabolism , Celiac Disease/diagnosis , Humans , Intestinal Mucosa/pathology , Intestinal Mucosa/metabolism , Biopsy , Proteome/analysis , Proteomics/methods , Female , Male , Adult , Machine Learning , Middle Aged , Mass Spectrometry , Observer Variation , Predictive Value of Tests , Paraffin Embedding , Reproducibility of Results , Case-Control Studies
2.
Stroke ; 55(9): 2240-2246, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39051112

ABSTRACT

BACKGROUND: Acute ischemic stroke is a leading cause of pediatric death and disability. A clinical scale adapted for children can ensure early detection of candidates for urgent acute ischemic stroke treatment. The Rapid Arterial Occlusion Evaluation (RACE) scale for adults, which scores 5 items (facial palsy 0-2; arm motor function 0-2; leg motor function 0-2; head/gaze deviation 0-1; and aphasia or agnosia 0-2), has good sensitivity and specificity in detecting large vessel occlusion. METHODS: We adapted the previously validated RACE scale for use in children as the Pediatric RACE scale. This adapted scale was tested by prehospital/emergency room staff attending to patients covered by the Catalan Pediatric Stroke Code and child neurologists for its correlation with the Pediatric National Institutes of Health Stroke Scale and for interrater reliability. RESULTS: The study included 50 children, 18 with confirmed strokes (7 acute ischemic strokes and 11 hemorrhagic strokes). Prehospital/emergency staff and child neurologists agreed fully regarding 82% of patients and 100% regarding head/gaze deviation and agnosia. The Pediatric RACE scale correlated strongly with the Pediatric National Institutes of Health Stroke Scale in evaluations by child neurologists (Spearman ρ, 0.852; P<0.001) and prehospital/emergency staff (Spearman ρ, 0.781; P<0.001). The median Pediatric RACE score was significantly higher in patients with large vessel occlusion (6.5; interquartile range, 6-7) than with other etiologies. CONCLUSIONS: Pediatric RACE, showing good interrater reliability and correlation with the Pediatric National Institutes of Health Stroke Scale, is a simple scale to detect candidates for pediatric acute stroke treatment, designed for both prehospital and in-hospital use by non-neurologist medical staff.


Subject(s)
Ischemic Stroke , Humans , Female , Child , Male , Child, Preschool , Reproducibility of Results , Adolescent , Infant , Ischemic Stroke/diagnosis , Ischemic Stroke/therapy , Ischemic Stroke/ethnology , Observer Variation , Severity of Illness Index , Stroke/diagnosis , Stroke/therapy
3.
Breast Cancer Res ; 26(1): 31, 2024 02 23.
Article in English | MEDLINE | ID: mdl-38395930

ABSTRACT

BACKGROUND: Accurate classification of breast cancer molecular subtypes is crucial in determining treatment strategies and predicting clinical outcomes. This classification largely depends on the assessment of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) status. However, variability in interpretation among pathologists pose challenges to the accuracy of this classification. This study evaluates the role of artificial intelligence (AI) in enhancing the consistency of these evaluations. METHODS: AI-powered HER2 and ER/PR analyzers, consisting of cell and tissue models, were developed using 1,259 HER2, 744 ER, and 466 PR-stained immunohistochemistry (IHC) whole-slide images of breast cancer. External validation cohort comprising HER2, ER, and PR IHCs of 201 breast cancer cases were analyzed with these AI-powered analyzers. Three board-certified pathologists independently assessed these cases without AI annotation. Then, cases with differing interpretations between pathologists and the AI analyzer were revisited with AI assistance, focusing on evaluating the influence of AI assistance on the concordance among pathologists during the revised evaluation compared to the initial assessment. RESULTS: Reevaluation was required in 61 (30.3%), 42 (20.9%), and 80 (39.8%) of HER2, in 15 (7.5%), 17 (8.5%), and 11 (5.5%) of ER, and in 26 (12.9%), 24 (11.9%), and 28 (13.9%) of PR evaluations by the pathologists, respectively. Compared to initial interpretations, the assistance of AI led to a notable increase in the agreement among three pathologists on the status of HER2 (from 49.3 to 74.1%, p < 0.001), ER (from 93.0 to 96.5%, p = 0.096), and PR (from 84.6 to 91.5%, p = 0.006). This improvement was especially evident in cases of HER2 2+ and 1+, where the concordance significantly increased from 46.2 to 68.4% and from 26.5 to 70.7%, respectively. Consequently, a refinement in the classification of breast cancer molecular subtypes (from 58.2 to 78.6%, p < 0.001) was achieved with AI assistance. CONCLUSIONS: This study underscores the significant role of AI analyzers in improving pathologists' concordance in the classification of breast cancer molecular subtypes.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/diagnosis , Breast Neoplasms/metabolism , Receptors, Estrogen/metabolism , Biomarkers, Tumor/metabolism , Artificial Intelligence , Observer Variation , Receptors, Progesterone/metabolism , Receptor, ErbB-2/metabolism
4.
Breast Cancer Res Treat ; 204(2): 415-422, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38157098

ABSTRACT

PURPOSE: Ki-67 expression levels in breast cancer have prognostic and predictive significance. Therefore, accurate Ki-67 evaluation is important for optimal patient care. Although an algorithm developed by the International Ki-67 in Breast Cancer Working Group (IKWG) improves interobserver variability, it is tedious and time-consuming. In this study, we simplify IKWG algorithm and evaluate its interobserver agreement among breast pathologists in Ki-67 evaluation. METHODS: Six subspecialized breast pathologists (4 juniors, 2 seniors) assessed the percentage of positive cells in 5% increments in 57 immunostained Ki-67 slides. The time spent on each slide was recorded. Two rounds of ring study (R1, R2) were performed before and after training with the modified IKWG algorithm (eyeballing method at 400× instead of counting 100 tumor nuclei per area). Concordance was assessed using Kendall's and Kappa coefficients. RESULTS: Analysis of ordinal scale ratings for all categories with 5% increments showed almost perfect agreement in R1 (0.821) and substantial in R2 (0.793); Seniors and juniors had substantial agreement in R1 (0.718 vs. 0.649) and R2 (0.756 vs. 0.658). In dichotomous scale analysis using 20% as the cutoff, the overall agreement was moderate in R1 (0.437) and R2 (0.479), among seniors (R1: 0.436; R2: 0.437) and juniors (R1: 0.445; R2: 0.505). Average scoring time per case was higher in R2 (71 vs. 37 s). CONCLUSION: The modified IKWG algorithm does not significantly improve interobserver agreement. A better algorithm or assistance from digital image analysis is needed to improve interobserver variability in Ki-67 evaluation.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/pathology , Ki-67 Antigen/metabolism , Observer Variation , Pathologists , Breast/pathology , Reproducibility of Results
5.
Breast Cancer Res Treat ; 205(2): 403-411, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38441847

ABSTRACT

PURPOSE: The recent findings from the DESTINY-Breast04 trial highlighted the clinical importance of distinguishing between HER2 immunohistochemistry (IHC) scores 0 and 1 + in metastatic breast cancer (BC). However, pathologist interpretation of HER2 IHC scoring is subjective, and standardized methodology is needed. We evaluated the consistency of HER2 IHC scoring among pathologists and the accuracy of digital image analysis (DIA) in interpreting HER2 IHC staining in cases of HER2-low BC. METHODS: Fifty whole-slide biopsies of BC with HER2 IHC staining were evaluated, comprising 25 cases originally reported as IHC score 0 and 25 as 1 +. These slides were digitally scanned. Six pathologists with breast expertise independently reviewed and scored the scanned images, and DIA was applied. Agreement among pathologists and concordance between pathologist scores and DIA results were statistically analyzed using Kendall coefficient of concordance (W) tests. RESULTS: Substantial agreement among at least five of the six pathologists was found for 18 of the score 0 cases (72%) and 15 of the score 1 + cases (60%), indicating excellent interobserver agreement (W = 0.828). DIA scores were highly concordant with pathologist scores in 96% of cases (47/49), indicating excellent concordance (W = 0.959). CONCLUSION: Although breast subspecialty pathologists were relatively consistent in evaluating BC with HER2 IHC scores of 0 and 1 +, DIA may be a reliable supplementary tool to enhance the standardization and quantification of HER2 IHC assessment, especially in challenging cases where results may be ambiguous (i.e., scores 0-1 +). These findings hold promise for improving the accuracy and consistency of HER2 testing.


Subject(s)
Breast Neoplasms , Immunohistochemistry , Observer Variation , Receptor, ErbB-2 , Humans , Breast Neoplasms/pathology , Breast Neoplasms/metabolism , Receptor, ErbB-2/metabolism , Female , Immunohistochemistry/methods , Reproducibility of Results , Biomarkers, Tumor/metabolism , Biomarkers, Tumor/analysis , Image Processing, Computer-Assisted/methods
6.
Radiology ; 312(1): e233341, 2024 07.
Article in English | MEDLINE | ID: mdl-38980184

ABSTRACT

Background Due to conflicting findings in the literature, there are concerns about a lack of objectivity in grading knee osteoarthritis (KOA) on radiographs. Purpose To examine how artificial intelligence (AI) assistance affects the performance and interobserver agreement of radiologists and orthopedists of various experience levels when evaluating KOA on radiographs according to the established Kellgren-Lawrence (KL) grading system. Materials and Methods In this retrospective observer performance study, consecutive standing knee radiographs from patients with suspected KOA were collected from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels. KL grading (KL-0 = no KOA, KL-4 = severe KOA) on the frontal view was assessed by readers with and without assistance from a commercial AI tool. The majority vote of three musculoskeletal radiology consultants established the reference standard. The ordinal receiver operating characteristic method was used to estimate grading performance. Light kappa was used to estimate interrater agreement, and bootstrapped t statistics were used to compare groups. Results Seventy-five studies were included from each center, totaling 225 studies (mean patient age, 55 years ± 15 [SD]; 113 female patients). The KL grades were KL-0, 24.0% (n = 54); KL-1, 28.0% (n = 63); KL-2, 21.8% (n = 49); KL-3, 18.7% (n = 42); and KL-4, 7.6% (n = 17). Eleven readers completed their readings. Three of the six junior readers showed higher KL grading performance with versus without AI assistance (area under the receiver operating characteristic curve, 0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]; 0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]; and 0.89 ± 0.011 vs 0.91 ± 0.009 [P = .008]). Interobserver agreement for KL grading among all readers was higher with versus without AI assistance (κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001). Board-certified radiologists achieved almost perfect agreement for KL grading when assisted by AI (κ = 0.90 ± 0.01), which was higher than that achieved by the reference readers independently (κ = 0.84 ± 0.017; P = .01). Conclusion AI assistance increased junior readers' radiographic KOA grading performance and increased interobserver agreement for osteoarthritis grading across all readers and experience levels. Published under a CC BY 4.0 license. Supplemental material is available for this article.


Subject(s)
Artificial Intelligence , Observer Variation , Osteoarthritis, Knee , Humans , Female , Male , Osteoarthritis, Knee/diagnostic imaging , Middle Aged , Retrospective Studies , Radiography/methods , Aged
7.
Mod Pathol ; 37(8): 100535, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38852812

ABSTRACT

The DESTINY Breast-04 trial revealed survival advantages of trastuzumab deruxtecan for women with metastatic HER2-low breast cancer (1+ or 2+ immunohistochemistry [IHC], without amplification). Although this trial applied the 2018 Americal Society of Clinial Oncology (ASCO)/College of American Pathologists (CAP) HER2 IHC scoring criteria, the subjectivity and imprecision in IHC scoring have raised concerns that patients' treatment may be misaligned. Our group of 9 experienced breast pathologists collated a deidentified set of 60 breast cancer core biopsies from 3 laboratories, evaluated with the Ventana 4B5 HER2 assay and mostly scored locally as HER2 0 or 1+. Based on ASCO/CAP 2018 criteria and our extensive experience of reporting HER2 IHC, we specified scoring conventions for cancers with low levels of HER2 protein expression, articulating specific scoring pitfalls. Each pathologist then reviewed digitized whole slide images of the IHC slides and scored the HER2 expression for each case. At a subsequent consensus workshop, we reviewed the cases jointly to establish consensus scores for each case and determine the percentage of HER2 expressing tumor cells. Consensus was reached on all cases, with 40 classified as 1+ and 3 as 2+ (not amplified), totaling 43 (71.7%) HER2-low cancers. The remaining cases were HER2 0. In 93.3% of cases (56/60), the consensus score matched with the majority opinion of pathologists' independent scores. Seven (41.2%) of the 17 cases reported locally as HER2 0 were classified as HER2 low. Conversely, among 32 cases with local scores of 1+, 7 (21.8%) were reclassified as ultralow or null. Individual pathologists' accuracy in matching the consensus scores ranged from 73.3% to 91.67% (mean, 80.74%). Among HER2-low cancers those in which <20% of the tumor cells expressed HER2 had the lowest concordance levels. Observers Cohen's κ coefficients for concordance were excellent for 4, good in 1, and moderate in the 4 observers. This reference set of cases with expert consensus HER2 scores will be invaluable for peer training and development of our national external quality assurance program for HER2-low cancers. For assessing breast cancers at the low end of HER2 protein expression, our targeted scoring criteria and explicit instruction on pitfalls improved pathologists' accuracy and concordance.


Subject(s)
Biomarkers, Tumor , Breast Neoplasms , Immunohistochemistry , Observer Variation , Receptor, ErbB-2 , Humans , Breast Neoplasms/pathology , Breast Neoplasms/metabolism , Breast Neoplasms/drug therapy , Female , Receptor, ErbB-2/analysis , Receptor, ErbB-2/metabolism , Biomarkers, Tumor/analysis , Australia , Reproducibility of Results
8.
Ann Rheum Dis ; 83(8): 1060-1071, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-38531611

ABSTRACT

OBJECTIVES: The main objective was to generate a GLobal OMERACT Ultrasound DActylitis Score (GLOUDAS) in psoriatic arthritis and to test its reliability. To this end, we assessed the validity, feasibility and applicability of ultrasound assessment of finger entheses to incorporate them into the scoring system. METHODS: The study consisted of a stepwise process. First, in cadaveric specimens, we identified enthesis sites of the fingers by ultrasound and gross anatomy, and then verified presence of entheseal tissue in histological samples. We then selected the entheses to be incorporated into a dactylitis scoring system through a Delphi consensus process among international experts. Next, we established and defined the ultrasound components of dactylitis and their scoring systems using Delphi methodology. Finally, we tested the interobserver and intraobserver reliability of the consensus- based scoring systemin patients with psoriatic dactylitis. RESULTS: 32 entheses were identified in cadaveric fingers. The presence of entheseal tissues was confirmed in all cadaveric samples. Of these, following the consensus process, 12 entheses were selected for inclusion in GLOUDAS. Ultrasound components of GLOUDAS agreed on through the Delphi process were synovitis, tenosynovitis, enthesitis, subcutaneous tissue inflammation and periextensor tendon inflammation. The scoring system for each component was also agreed on. Interobserver reliability was fair to good (κ 0.39-0.71) and intraobserver reliability good to excellent (κ 0.80-0.88) for dactylitis components. Interobserver and intraobserver agreement for the total B-mode and Doppler mode scores (sum of the scores of the individual abnormalities) were excellent (interobserver intraclass correlation coefficient (ICC) 0.98 for B-mode and 0.99 for Doppler mode; intraobserver ICC 0.98 for both modes). CONCLUSIONS: We have produced a consensus-driven ultrasound dactylitis scoring system that has shown acceptable interobserver reliability and excellent intraobserver reliability. Through anatomical knowledge, small entheses of the fingers were identified and histologically validated.


Subject(s)
Arthritis, Psoriatic , Finger Joint , Severity of Illness Index , Ultrasonography , Humans , Arthritis, Psoriatic/diagnostic imaging , Reproducibility of Results , Finger Joint/diagnostic imaging , Finger Joint/pathology , Ultrasonography/methods , Male , Female , Delphi Technique , Synovitis/diagnostic imaging , Synovitis/pathology , Middle Aged , Observer Variation , Enthesopathy/diagnostic imaging , Tenosynovitis/diagnostic imaging , Cadaver , Feasibility Studies , Adult , Aged , Fingers/diagnostic imaging , Fingers/pathology
9.
Osteoarthritis Cartilage ; 32(10): 1273-1282, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38823432

ABSTRACT

OBJECTIVE: Synovial pathology has been linked to osteoarthritis (OA) pain in patients. Microscopic grading systems for synovial changes in human OA have been described, but a standardized approach for murine models of OA is needed. We sought to develop a reproducible approach and set of minimum recommendations for reporting of synovial histopathology in mouse models of OA. METHODS: Coronal and sagittal sections from male mouse knee joints subjected to destabilization of medial meniscus (DMM) or partial meniscectomy (PMX) were collected as part of other studies. Stains included Hematoxylin and Eosin (H&E), Toluidine Blue (T-Blue), and Safranin O/Fast Green (Saf-O). Four blinded readers graded pathological features (hyperplasia, cellularity, and fibrosis) at specific anatomic locations. Inter-reader agreement of each feature score was determined. RESULTS: There was acceptable to very good agreement when using 3-4 individual readers. After DMM and PMX, expected medial predominant changes in hyperplasia and cellularity were observed, with fibrosis noted at 12 weeks post-PMX. Synovial changes were consistent from section to section in the mid-joint area. When comparing stains, H&E and T-blue resulted in better agreement compared to Saf-O stain. CONCLUSIONS: To account for the pathologic and anatomic variability in synovial pathology and allow for a more standardized evaluation that can be compared across studies, we recommend evaluating a minimum set of 3 pathological features at standardized anatomic areas. Further, we suggest reporting individual feature scores separately before relying on a single summed "synovitis" score. H&E or T-blue are preferred, inter-reader agreement for each feature should be considered.


Subject(s)
Disease Models, Animal , Menisci, Tibial , Osteoarthritis, Knee , Synovial Membrane , Animals , Synovial Membrane/pathology , Mice , Male , Osteoarthritis, Knee/pathology , Menisci, Tibial/pathology , Menisci, Tibial/surgery , Meniscectomy , Arthritis, Experimental/pathology , Hyperplasia/pathology , Fibrosis/pathology , Observer Variation , Mice, Inbred C57BL , Coloring Agents
10.
Rheumatology (Oxford) ; 63(10): 2781-2790, 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-38305463

ABSTRACT

OBJECTIVES: Our aim was to introduce a standardized system for assessing the extent of GCA on MRI, i.e. the Magnetic Resonance Vasculitis Activity Score (MRVAS). To obtain a comprehensive view, we used an extensive MRI protocol including cranial vessels and the aorta with its branches. To test reliability, MRI was assessed by four readers with different levels of experience. METHODS: A total of 80 patients with suspected GCA underwent MRI of the cranial arteries and the aorta and its branches (20 vessel segments). Every vessel was rated dichotomous [inflamed (coded as 1) or not (coded as 0)], providing a summed score of 0-20. Blinded readers [two experienced radiologists (ExR) and two inexperienced radiologists (InR)] applied the MRVAS on an individual vessel and an overall level (defined as the highest score of any of the individual vessel scores). To determine interrater agreement, Cohen's κ was calculated for pairwise comparison of each reader for individual vessel segments. Intraclass correlation coefficients (ICCs) were used for the MRVAS. RESULTS: Concordance rates were excellent for both subcohorts on an individual vessel-based (GCA: ICC 0.95; non-GCA: ICC 0.96) and overall MRVAS level (GCA: ICC 0.96; non-GCA: ICC 1.0). Interrater agreement yielded significant concordance (P < 0.001) for all pairs (κ range 0.78-0.98). No significant differences between ExRs and InRs were observed (P = 0.38). CONCLUSION: The proposed MRVAS allows standardized scoring of inflammation in GCA and achieved high agreement rates in a prospective setting.


Subject(s)
Giant Cell Arteritis , Magnetic Resonance Imaging , Severity of Illness Index , Humans , Giant Cell Arteritis/diagnostic imaging , Female , Male , Aged , Magnetic Resonance Imaging/methods , Reproducibility of Results , Aorta/diagnostic imaging , Aorta/pathology , Middle Aged , Observer Variation , Aged, 80 and over
11.
Rheumatology (Oxford) ; 63(SI2): SI219-SI227, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38426363

ABSTRACT

OBJECTIVES: To introduce and evaluate a simple method for assessing joint inflammation and structural damage on whole-body MRI (WBMRI) in juvenile idiopathic arthritis (JIA), which is usable in clinical practice. METHODS: The proposed system utilizes post-contrast Dixon WBMRI scans. Joints are assessed for synovitis (grade 0-2) and structural damage (present/absent) at 81 sites. The synovitis grading is based on features including above-normal intensity synovial enhancement, synovial hypertrophy, joint effusion, subarticular bone marrow oedema and peri-articular soft tissue oedema.This system was evaluated in a prospective study of 60 young people (47 patients with JIA and 13 controls with non-inflammatory musculoskeletal pain) who underwent a WBMRI. Three readers (blinded to diagnosis) independently reviewed all images and re-reviewed 20 individual scans. The intra- and inter-reader overall agreement (OA) and the intra- and inter-reader Gwet's agreement coefficients 2 (GAC2) were measured for the detection of a) participants with ≥1 joint with inflammation or structural damage and b) joint inflammation or structural damage for each joint. RESULTS: The inter-reader OA for detecting patients with ≥1 joint with inflammation, defined as grade 2 synovitis (G2), and ≥1 joint with structural damage were 80% and 73%, respectively. The intra-reader OA for readers 1-3 was 80-90% and 75-90%, respectively. The inter-reader OA and GAC2 for joint inflammation (G2) at each joint were both ≥85% for all joints but were lower if grade 1 synovitis was included as positive. CONCLUSION: The intra- and inter-reader agreements of this WBMRI assessment system are adequate for assessing objective joint inflammation and damage in JIA.


Subject(s)
Arthritis, Juvenile , Magnetic Resonance Imaging , Synovitis , Whole Body Imaging , Humans , Arthritis, Juvenile/diagnostic imaging , Magnetic Resonance Imaging/methods , Adolescent , Female , Male , Synovitis/diagnostic imaging , Prospective Studies , Child , Whole Body Imaging/methods , Joints/diagnostic imaging , Joints/pathology , Young Adult , Severity of Illness Index , Case-Control Studies , Reproducibility of Results , Observer Variation
12.
J Anat ; 244(4): 620-627, 2024 04.
Article in English | MEDLINE | ID: mdl-38214341

ABSTRACT

Imaging techniques in anatomy have developed rapidly over the last decades through the emergence of various 3D scanning systems. Depending on the dissection level, non-contact or tactile contact methods can be applied on the targeted structure. The aim of this study was to assess the inter and intra-observer reproducibility of an ArUco-based localisation stylus, that is, a manual technique on a hand-held stylus. Ten fresh-frozen, unembalmed adult arms were used to digitalise the glenoid cartilage related to the glenohumeral joint and the contour of the clavicle cartilage related to the acromioclavicular joint. Three operators performed consecutive digitalisations of each cartilage contour using an ArUco-based localisation stylus recorded by a single monocular camera. The shape of each cartilage was defined by nine shape parameters. Intra-observer repeatability and inter-observer reproducibility were computed using an intra-class correlation (ICC) for each of these parameters. Overall, 35.2 ± 2.4 s and 26.6 ± 10.2 s were required by each examiner to digitalise the contour of a glenoid and acromioclavicular cartilage, respectively. For most parameters, good-to-excellent agreements were observed concerning intra-observer (ICC ranging between 0.81 and 1.00) and inter-observer (ICC ranging between 0.75 and 0.99) reproducibility. To conclude, through a fast and versatile process, the use of an ArUco-based localisation stylus can be a reliable low-cost alternative to conventional imaging methods to digitalise shoulder cartilage contours.


Subject(s)
Shoulder Joint , Shoulder , Adult , Humans , Reproducibility of Results , Observer Variation , Cartilage
13.
J Vasc Res ; 61(3): 122-128, 2024.
Article in English | MEDLINE | ID: mdl-38547846

ABSTRACT

INTRODUCTION: We aimed to compare conventional vessel wall MR imaging techniques and quantitative susceptibility mapping (QSM) to determine the optimal sequence for detecting carotid artery calcification. METHODS: Twenty-two patients who underwent carotid vessel wall MR imaging and neck CT were enrolled. Four slices of 6-mm sections from the bilateral internal carotid bifurcation were subdivided into 4 segments according to clock position (0-3, 3-6, 6-9, and 9-12) and assessed for calcification. Two blinded radiologists independently reviewed a total of 704 segments and scored the likelihood of calcification using a 5-point scale on spin-echo imaging, FLASH, and QSM. The observer performance for detecting calcification was evaluated by a multireader, multiple-case receiver operating characteristic study. Weighted κ statistics were calculated to assess interobserver agreement. RESULTS: QSM had a mean area under the receiver operating characteristic curve of 0.85, which was significantly higher than that of any other sequence (p < 0.01) and showed substantial interreader agreement (κ = 0.68). A segment with a score of 3-5 was defined as positive, and a segment with a score of 1-2 was defined as negative; the sensitivity and specificity of QSM were 0.75 and 0.87, respectively. CONCLUSION: QSM was the most reliable MR sequence for the detection of plaque calcification.


Subject(s)
Carotid Artery Diseases , Observer Variation , Plaque, Atherosclerotic , Predictive Value of Tests , Vascular Calcification , Humans , Vascular Calcification/diagnostic imaging , Vascular Calcification/pathology , Female , Male , Aged , Middle Aged , Carotid Artery Diseases/diagnostic imaging , Carotid Artery Diseases/pathology , Reproducibility of Results , Magnetic Resonance Angiography , Retrospective Studies , Aged, 80 and over , Computed Tomography Angiography , Carotid Artery, Internal/diagnostic imaging , Carotid Artery, Internal/pathology , Magnetic Resonance Imaging
14.
Histopathology ; 85(1): 171-181, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38571446

ABSTRACT

AIMS: Following the increased use of neoadjuvant therapy for pancreatic cancer, grading of tumour regression (TR) has become part of routine diagnostics. However, it suffers from marked interobserver variation, which is mainly ascribed to the subjectivity of the defining criteria of the categories in TR grading systems. We hypothesized that a further cause for the interobserver variation is the use of divergent and nonspecific morphological criteria to identify tumour regression. METHODS AND RESULTS: Twenty treatment-naïve pancreatic cancers and 20 pancreatic cancers treated with neoadjuvant chemotherapy were reviewed by three experienced pancreatic pathologists who, blinded for treatment status, categorized each tumour as treatment-naïve or neoadjuvantly treated, and annotated all tissue areas they considered showing tumour regression. Only 50%-65% of the cases were categorized correctly, and the annotated tissue areas were highly discrepant (only 3%-41% overlap). When the prevalence of various morphological features deemed to indicate TR was compared between treatment-naïve and neoadjuvantly treated tumours, only one pattern, characterized by reduced cancer cell density and prominent stroma affecting a large area of the tumour bed, occurred significantly more frequently, but not exclusively, in the neoadjuvantly treated group. Finally, stromal features, both morphological and biological, were investigated as possible markers for tumour regression, but failed to distinguish TR from native tumour stroma. CONCLUSION: There is considerable divergence in opinion between pathologists when it comes to the identification of tumour regression. Reliable identification of TR is only possible if it is extensive, while lesser degrees of treatment effect cannot be recognized with certainty.


Subject(s)
Neoadjuvant Therapy , Pancreatic Neoplasms , Humans , Pancreatic Neoplasms/pathology , Pancreatic Neoplasms/diagnosis , Pancreatic Neoplasms/therapy , Male , Female , Aged , Middle Aged , Observer Variation , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Neoplasm Grading
15.
Histopathology ; 85(1): 81-91, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38477366

ABSTRACT

AIMS: Immune checkpoint inhibitors targeting programmed death-ligand 1 (PD-L1) have shown promising clinical outcomes in urothelial carcinoma (UC). The combined positive score (CPS) quantifies PD-L1 22C3 expression in UC, but it can vary between pathologists due to the consideration of both immune and tumour cell positivity. METHODS AND RESULTS: An artificial intelligence (AI)-powered PD-L1 CPS analyser was developed using 1,275,907 cells and 6175.42 mm2 of tissue annotated by pathologists, extracted from 400 PD-L1 22C3-stained whole slide images of UC. We validated the AI model on 543 UC PD-L1 22C3 cases collected from three institutions. There were 446 cases (82.1%) where the CPS results (CPS ≥10 or <10) were in complete agreement between three pathologists, and 486 cases (89.5%) where the AI-powered CPS results matched the consensus of two or more pathologists. In the pathologist's assessment of the CPS, statistically significant differences were noted depending on the source hospital (P = 0.003). Three pathologists reevaluated discrepancy cases with AI-powered CPS results. After using the AI as a guide and revising, the complete agreement increased to 93.9%. The AI model contributed to improving the concordance between pathologists across various factors including hospital, specimen type, pathologic T stage, histologic subtypes, and dominant PD-L1-positive cell type. In the revised results, the evaluation discordance among slides from different hospitals was mitigated. CONCLUSION: This study suggests that AI models can help pathologists to reduce discrepancies between pathologists in quantifying immunohistochemistry including PD-L1 22C3 CPS, especially when evaluating data from different institutions, such as in a telepathology setting.


Subject(s)
Artificial Intelligence , B7-H1 Antigen , Carcinoma, Transitional Cell , Observer Variation , Urinary Bladder Neoplasms , Humans , B7-H1 Antigen/analysis , B7-H1 Antigen/metabolism , Urinary Bladder Neoplasms/pathology , Urinary Bladder Neoplasms/diagnosis , Urinary Bladder Neoplasms/metabolism , Carcinoma, Transitional Cell/pathology , Carcinoma, Transitional Cell/metabolism , Carcinoma, Transitional Cell/diagnosis , Biomarkers, Tumor/analysis , Biomarkers, Tumor/metabolism , Urologic Neoplasms/pathology , Urologic Neoplasms/diagnosis , Male , Immunohistochemistry/methods , Female , Aged
16.
Eur J Nucl Med Mol Imaging ; 51(6): 1741-1752, 2024 May.
Article in English | MEDLINE | ID: mdl-38273003

ABSTRACT

PURPOSE: Prostate-specific membrane antigen (PSMA) positron emission tomography/ computed tomography (PET/CT) is recognized as the most accurate imaging modality for detection of metastatic high-risk prostate cancer (PCa). Its role in the local staging of disease is yet unclear. We assessed the intra- and interobserver variability, as well as the diagnostic accuracy of the PSMA PET/CT based molecular imaging local tumour stage (miT-stage) for the local tumour stage assessment in a large, multicentre cohort of patients with intermediate and high-risk primary PCa, with the radical prostatectomy specimen (pT-stage) serving as the reference standard. METHODS: A total of 600 patients who underwent staging PSMA PET/CT before robot-assisted radical prostatectomy was studied. In 579 PSMA positive primary prostate tumours a comparison was made between miT-stage as assessed by four nuclear physicians and the pT-stage according to ISUP protocol. Sensitivity, specificity and diagnostic accuracy were determined. In a representative subset of 100 patients, the intra-and interobserver variability were assessed using Kappa-estimates. RESULTS: The sensitivity and specificity of the PSMA PET/CT based miT-stage were 58% and 59% for pT3a-stage, 30% and 97% for ≥ pT3b-stage, and 68% and 61% for overall ≥ pT3-stage, respectively. No statistically significant differences in diagnostic accuracy were found between tracers. We found a substantial intra-observer agreement for PSMA PET/CT assessment of ≥ T3-stage (k 0.70) and ≥ T3b-stage (k 0.75), whereas the interobserver agreement for the assessment of ≥ T3-stage (k 0.47) and ≥ T3b-stage (k 0.41) were moderate. CONCLUSION: In a large, multicentre study evaluating 600 patients with newly diagnosed intermediate and high-risk PCa, we showed that PSMA PET/CT may have a value in local tumour staging when pathological tumour stage in the radical prostatectomy specimen was used as the reference standard. The intra-observer and interobserver variability of assessment of tumour extent on PSMA PET/CT was moderate to substantial.


Subject(s)
Antigens, Surface , Glutamate Carboxypeptidase II , Neoplasm Staging , Observer Variation , Positron Emission Tomography Computed Tomography , Prostatic Neoplasms , Humans , Male , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Prostatic Neoplasms/surgery , Aged , Middle Aged , Glutamate Carboxypeptidase II/metabolism
17.
J Magn Reson Imaging ; 60(3): 1037-1048, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38100302

ABSTRACT

BACKGROUND: MR elastography (MRE) may provide quantitative imaging biomarkers of lumbar back muscles (LBMs), complementing MRI in spinal diseases by assessing muscle mechanical properties. However, reproducibility analyses for MRE of LBM are lacking. PURPOSE: To assess technical failure, within-day and inter-day reproducibility, robustness with the excitation source positioning, and inter-observer agreement of MRE of muscles. STUDY TYPE: Prospective. SUBJECTS: Seventeen healthy subjects (mean age 28 ± 4 years; 11 females). FIELD STRENGTH/SEQUENCE: 1.5 T, gradient-echo MRE, T1-weighted turbo spin echo. ASSESSMENT: The pneumatic driver was centered at L3 level. Four MRE were performed during two visits, 2-4 weeks apart, each consisting of two MRE with less than 10 minutes inter-scan interval. At Visit 1, after the first MRE, the coil and driver were removed, then reinstalled. The MRE was repeated. At Visit 2, following the first MRE, only the driver was moved down 5 cm. The MRE was repeated. Two radiologists segmented the multifidus and erector spinae muscles. STATISTICAL TESTS: Paired t-test, analysis of variance, intraclass correlation coefficients (ICCs). P-values <0.05 were considered statistically significant. RESULTS: Mean stiffness of LBM ranged from 1.44 to 1.60 kPa. Mean technical failure rate was 2.5%. Inter-observer agreement was excellent (ICC ranging from 0.82 [0.64-0.96] to 0.99 [0.98-0.99] in the multifidus, and from 0.85 [0.69-0.92] to 0.99 [0.97-0.99] in the erector spinae muscles). Within-day reproducibility was fair in the multifidus (ICC: 0.53 [0.47-0.77]) and good in the erector spinae muscles (ICC: 0.74 [0.48-0.88]). Reproducibility after moving the driver was excellent in both multifidus (ICC: 0.85 [0.69-0.93]) and erector spinae muscles (ICC: 0.84 [0.67-0.92]). Inter-day reproducibility was excellent in the multifidus (ICC: 0.76 [0.48-0.89]) and poor in the erector spinae muscles (ICC: 0.23 [-0.61 to 0.63]). DATA CONCLUSION: MRE of LBM provides measurements of stiffness with fair to excellent reproducibility and excellent inter-observer agreement. However, inter-day reproducibility in the multifidus muscles indicated that the herein used MRE protocol may not be optimal for this muscle. EVIDENCE LEVEL: 2 TECHNICAL EFFICACY: Stage 1.


Subject(s)
Back Muscles , Elasticity Imaging Techniques , Magnetic Resonance Imaging , Humans , Female , Elasticity Imaging Techniques/methods , Reproducibility of Results , Adult , Male , Prospective Studies , Magnetic Resonance Imaging/methods , Back Muscles/diagnostic imaging , Observer Variation , Lumbosacral Region/diagnostic imaging , Healthy Volunteers , Lumbar Vertebrae/diagnostic imaging , Young Adult
18.
Gastrointest Endosc ; 100(3): 417-428.e1, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38431105

ABSTRACT

BACKGROUND AND AIMS: The diagnosis of achalasia is associated with an average delay of 2 years. Endoscopic features may prompt an earlier diagnosis. We aimed to develop and test a novel endoscopic score, CARS, for the prediction of achalasia. METHODS: Part 1: Twenty endoscopic videos were taken from patients undergoing endoscopy for dysphagia or reflux. A survey with videos and endoscopic criteria options was distributed to 6 esophagologists and 6 general gastroenterologists. Inter-rater reliability (IRR) was measured and logistic regression was used to evaluate predictive performance. Three rounds of review were conducted to select the final score of 4 components. Part 2: A retrospective review was conducted for consecutive patients who had comprehensive esophageal testing. Each patient had a CARS endoscopic score calculated based on findings reported at endoscopy. RESULTS: From a video review and analysis of score components, IRR ranged from 0.23 to 0.57 for score components. The final CARS score was selected based on the following 4 components: Contents, Anatomy, Resistance, and Stasis. In a mixed-effects model, the mean score across raters was higher for achalasia compared with nonachalasia subjects (4.44 vs 0.87; P < .01). In part 2 of the study, achalasia patients had a higher mean CARS score compared with those with no or ineffective motility disorder (mean 4.1 vs 1.3; P < .01). CONCLUSIONS: We developed a CARS score based on reliability performance in a video-based survey and tested the score in a clinical setting. The CARS score performed well in predicting achalasia.


Subject(s)
Esophageal Achalasia , Esophageal Achalasia/diagnosis , Humans , Retrospective Studies , Reproducibility of Results , Female , Video Recording , Male , Esophagoscopy/methods , Middle Aged , Deglutition Disorders/diagnosis , Deglutition Disorders/etiology , Logistic Models , Gastroesophageal Reflux/diagnosis , Adult , Observer Variation , Aged
19.
BJU Int ; 134(1): 89-95, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38627205

ABSTRACT

OBJECTIVES: To assess the intra/inter-observer reliability of cystoscopic sphincter evaluation (CSE) in men undergoing sling surgery for urinary incontinence and if possible to evaluate its correlation with the final clinical decision. PATIENTS AND METHODS: Two expert urologists prospectively filmed and recorded, incontinent patient's cystoscopies according to a standard scenario. Anonymised recordings where randomly offered to the same observer twice. The observers (medical students, urology residents and full urologist with 0-5, 5-10, >10 years of practice, respectively) were asked to assess and score the recordings without knowing any of the patients' characteristics. RESULTS: In total, 37 recordings were scored twice by the 26 observers. The intraclass correlation coefficient (ICC) for intra-observer reliability of the CSE was 0.54 (moderate), 0.58 (moderate) and 0.60 (substantial) for medical students, residents, and urologists, respectively. However, when stratifying observers according to their experience, the lowest agreement values were found between experts with >10 years of experience. The inter-observer reliability for the CSE ICCs ranged between 0.31and 0.53, with the lowest ICC value observed between urologists (0.31). CONCLUSIONS: The study demonstrates poor intra- and inter-observer reliability of the CSE. According to these results, a CSE does not add valuable information to the clinical evaluation. In this scenario, it should not be considered in isolation from the patient's characteristics.


Subject(s)
Cystoscopy , Observer Variation , Humans , Male , Reproducibility of Results , Prospective Studies , Suburethral Slings , Middle Aged , Aged , Adult , Urinary Incontinence/diagnosis , Clinical Competence
20.
BJU Int ; 134(4): 510-518, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38923789

ABSTRACT

OBJECTIVES: To explore the topic of Prostate Imaging-Reporting and Data System (PI-RADS) interobserver variability, including a discussion of major sources, mitigation approaches, and future directions. METHODS: A narrative review of PI-RADS interobserver variability. RESULTS: PI-RADS was developed in 2012 to set technical standards for prostate magnetic resonance imaging (MRI), reduce interobserver variability at interpretation, and improve diagnostic accuracy in the MRI-directed diagnostic pathway for detection of clinically significant prostate cancer. While PI-RADS has been validated in selected research cohorts with prostate cancer imaging experts, subsequent prospective studies in routine clinical practice demonstrate wide variability in diagnostic performance. Radiologist and biopsy operator experience are the most important contributing drivers of high-quality care among multiple interrelated factors including variability in MRI hardware and technique, image quality, and population and patient-specific factors such as prostate cancer disease prevalence. Iterative improvements in PI-RADS have helped flatten the curve for novice readers and reduce variability. Innovations in image quality reporting, administrative and organisational workflows, and artificial intelligence hold promise in improving variability even further. CONCLUSION: Continued research into PI-RADS is needed to facilitate benchmark creation, reader certification, and independent accreditation, which are systems-level interventions needed to uphold and maintain high-quality prostate MRI across entire populations.


Subject(s)
Magnetic Resonance Imaging , Observer Variation , Prostatic Neoplasms , Male , Humans , Prostatic Neoplasms/diagnostic imaging , Prostate/pathology , Prostate/diagnostic imaging , Data Systems , Radiology Information Systems
SELECTION OF CITATIONS
SEARCH DETAIL