Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 265
Filtrer
1.
Mod Pathol ; : 100571, 2024 Jul 20.
Article de Anglais | MEDLINE | ID: mdl-39038789

RÉSUMÉ

Grading lung squamous cell carcinoma (LUSC) is controversial and not universally accepted. The histomorphological feature of tumor budding (TB) is an established independent prognostic factor in colorectal cancer and its importance is growing in other solid cancers, making it a candidate for inclusion in tumor grading schemes. We aimed to compare TB between preoperative biopsies and resection specimens in pulmonary squamous cell carcinoma and assess interobserver variability. A retrospective cohort of 249 consecutive patients primarily resected with LUSC in Bern (2000-2013, N=136) and Lausanne (2005-2020 N=113) with available preoperative biopsies was analyzed for TB and additional histomorphological parameters such as spread through airspaces (STAS) and desmoplasia by two expert pathologists. Results were correlated with clinicopathological parameters and survival. In resection specimens, peritumoral budding (PTB) score was low (0-4 buds/0.785 mm2) in 47.6%, intermediate (5-9 buds/0.785 mm2) in 27.4 %, and high (≥10 buds/0.785 mm2) in 25 % of cases (median bud count = 5, IQR = 0 - 26). Both the absolute number of buds and TB score were similar when comparing tumor edge and intratumoral zone (p=0.192) but significantly different from the score obtained in the biopsy (p<0.001). Interobserver variability was moderate, regardless of score location (Cohen's kappa 0.59). The discrepant cases were reassessed, and consensus was reached in all cases with identification of causes of discordance. TB score was significantly associated with stage (p=0.002), presence of lymph node (p=0.033) and distant metastases (p=0.020), without significant correlation with overall survival, tumor size or pleural invasion. Desmoplasia was significantly associated with higher PTB (p<0.001). STAS was present in 34% and associated with lower PTB (p<0.001). To conclude, despite confirming TB as a reproductible factor in LUSC we disclose areas of scoring ambiguity. Preoperative biopsy evaluation was insufficient in establishing the final tumor budding score of the resected tumor.

2.
Oral Dis ; 2024 Jul 22.
Article de Anglais | MEDLINE | ID: mdl-39039698

RÉSUMÉ

OBJECTIVE: This study aimed to understand reasons for interobserver variability in the grading of oral epithelial dysplasia (OED) through a survey of pathologists to provide insight for improvements in the reliability and reproducibility of OED diagnoses. METHODS: The study design included quantitative and qualitative methodology. A pre-validated 31-item questionnaire was distributed to general, head and neck, and oral and maxillofacial histopathology specialists worldwide. RESULTS: A total of 132 pathologists participated and completed the questionnaire. Over two-thirds used the three-tier grading system for OED, while about a third used both binary and three-tier systems. Regular reporters of OED preferred the three-tier system and grading architectural features. Continuing education significantly aided recognition of architectural and cytological changes. Irregular epithelial stratification and drop-shaped rete ridges had the lowest prognostic value and recognition scores, while loss of epithelial cell cohesion had the highest. Most participants used clinical information and often sought a second opinion when grading OED. CONCLUSION: Our study has found that frequency of OED reporting and attendance of CME/CPD can play an important role in grading OED. Variations in the prognostic value of individual histological features and the use of clinical information may further contribute to interobserver variability.

3.
Article de Anglais | MEDLINE | ID: mdl-38873843

RÉSUMÉ

BACKGROUND: Early Barrett cancer can be curatively treated by endoscopic resection. The choice of the resection technique, however-endoscopic mucosal resection (EMR) or submucosal dissection (ESD)-largely depends on the assumed infiltration depth as judged by the endoscopist. However, the accuracy of endoscopic diagnosis of the degree of cancer infiltration is not known. METHODS: Three to four high-quality images (both in overview and close-up) from 202 of early Barrett esophagus cancer cases (82% men, mean age 66.9 years) were selected from our endoscopy database (73.3% stage T1a and 26.7% in stage T1b). Images were shown to 9 Barrett esophagus experts, with patients' clinical data (age, sex, Barrett esophagus length) and biopsy results. The experts were asked to predict infiltration depth (T1b vs. T1a), and to suggest the appropriate endoscopic resection technique (EMR or ESD, or surgery). Interobserver variability (kappa values) was also determined for these parameters. RESULTS: Overall positive (PPV) and negative predictive values (NPV) to diagnose T1b versus T1a infiltration were 40.7% (95% CI: 36.7, 44.8) and 79.8% (95% CI: 77.5, 81.9), respectively; kappa value was 0.41. Paris classification (kappa 0.51) and suggested treatment also varied between experts. In a post hoc analysis, only the correlation between lesions classified as invisible or flat according to the Paris classification (IIB; 25% of all cases) and the suggested resection technique was better: In this subgroup, EMR was recommended in >80% of cases, with a high complete (basal R0) resection rate (mean of 88.1%). CONCLUSIONS: Precise endoscopic distinction between mucosal and submucosal involvement of Barrett esophagus cancer by experts as a basis for choosing the resection technique has limited predictive values and high interobserver variability. It seems that mainly invisible/flat lesions may result in good resection outcomes when treated by EMR, but this stratification strategy has to be assessed in further studies.

4.
BJU Int ; 2024 Jun 25.
Article de Anglais | MEDLINE | ID: mdl-38923789

RÉSUMÉ

OBJECTIVES: To explore the topic of Prostate Imaging-Reporting and Data System (PI-RADS) interobserver variability, including a discussion of major sources, mitigation approaches, and future directions. METHODS: A narrative review of PI-RADS interobserver variability. RESULTS: PI-RADS was developed in 2012 to set technical standards for prostate magnetic resonance imaging (MRI), reduce interobserver variability at interpretation, and improve diagnostic accuracy in the MRI-directed diagnostic pathway for detection of clinically significant prostate cancer. While PI-RADS has been validated in selected research cohorts with prostate cancer imaging experts, subsequent prospective studies in routine clinical practice demonstrate wide variability in diagnostic performance. Radiologist and biopsy operator experience are the most important contributing drivers of high-quality care among multiple interrelated factors including variability in MRI hardware and technique, image quality, and population and patient-specific factors such as prostate cancer disease prevalence. Iterative improvements in PI-RADS have helped flatten the curve for novice readers and reduce variability. Innovations in image quality reporting, administrative and organisational workflows, and artificial intelligence hold promise in improving variability even further. CONCLUSION: Continued research into PI-RADS is needed to facilitate benchmark creation, reader certification, and independent accreditation, which are systems-level interventions needed to uphold and maintain high-quality prostate MRI across entire populations.

5.
Children (Basel) ; 11(6)2024 Jun 17.
Article de Anglais | MEDLINE | ID: mdl-38929320

RÉSUMÉ

Accurate measurement of testicular volume (TV) in boys is an important tool in clinical practice, e.g., in varicocele treatment. This study aims to assess the degree of intra- and interobserver variability of testicular volume measurements. In a prospective study, boys between 11 and 17 years of age without testicular pathology were enrolled. Testicular ultrasound was performed by three investigators (A: pediatric radiologist; B: pediatric surgery/urology resident; C: pediatric urologist). Intraobserver variability was calculated in investigators B and C and interobserver variability between all three investigators. A total of 30 boys were enrolled. Mean intraobserver variability in both observers was +0.3% with a range of -39.6 to 51.5%. The proportion of measurements with a difference >20% was 18.6%. The mean interobserver variability was -1.0% (range: -74.1% to 62.8%). The overall proportion of measurements with a difference >20% was 35%. A lower testicular size of < 4 mL showed a significantly higher rate of >20% difference in both the intraobserver group (31.1% vs. 14.4%; p = 0.035) and the interobserver group (63.2% vs. 26.2%; p = 0.000031). Furthermore, the rate of >20% difference was significantly lower in obese compared to non-obese patients in both the intraobserver (2.8% vs. 22.4%; p = 0.0084) and the interobserver group (24% vs. 40.8%, p = 0.0427). Both intraobserver and interobserver variability in ultrasound-based TV measurements in pubertal boys contain a relevant degree of uncertainty that renders them unsuitable for individualized follow-up care. At the cohort level, however, mean differences in ultrasound-based TV measurements are low enough to make ultrasound comparisons reasonable.

6.
Cureus ; 16(5): e59647, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38832163

RÉSUMÉ

Objective Evaluating an artificial intelligence (AI) tool (AIATELLA, version 1.0; AIATELLA Oy, Helsinki, Finland) in interpreting cardiac magnetic resonance (CMR) imaging to produce measurements of the aortic root and valve by comparison of accuracy and efficiency with that of three National Health Service (NHS) cardiologists. Methods AI-derived aortic root and valve measurements were recorded alongside manual measurements from three experienced NHS consultant cardiologists (CCs) over three separate sites in the northeast part of the United Kingdom. The study utilised a comprehensive dataset of CMR images, with the intraclass correlation coefficient (ICC) being the primary measure of concordance between the AI and the cardiologist assessments. Patient imaging was anonymised and blinded at the point of transfer to a secure data server.  Results The study demonstrates a high level of concordance between AI assessment of the aortic root and valve with NHS cardiologists (ICC of 0.98). Notably, the AI delivered results in 2.6 seconds (+/- 0.532) compared to a mean of 334.5 seconds (+/- 61.9) by the cardiologists, a statistically significant improvement in efficiency without compromising accuracy. Conclusion AI's accuracy and speed of analysis suggest that it could be a valuable tool in cardiac diagnostics, addressing the challenges of time-consuming and variable clinician-based assessments. This research reinforces AI's role in optimising the patient journey and improving the efficiency of the diagnostic pathway.

7.
Cancers (Basel) ; 16(10)2024 May 15.
Article de Anglais | MEDLINE | ID: mdl-38791956

RÉSUMÉ

The overexpression of somatostatin receptor type 2 (SSTR2) is a property of various tumor types. Hybrid imaging utilizing [68Ga]1,4,7,10-tetraazacyclododecane-1,4,7,10-tetra-acetic acid (DOTA) may improve the differentiation between tumor and healthy tissue. We conducted an experimental study on 47 anonymized patient cases including 30 meningiomas, 12 PitNET and 5 SBPGL. Four independent observers were instructed to contour the macroscopic tumor volume on planning MRI and then reassess their volumes with the additional information from DOTA-PET/CT. The conformity between observers and reference volumes was assessed. In total, 46 cases (97.9%) were DOTA-avid and included in the final analysis. In eight cases, PET/CT additional tumor volume was identified that was not detected by MRI; these PET/CT findings were potentially critical for the treatment plan in four cases. For meningiomas, the interobserver and observer to reference volume conformity indices were higher with PET/CT. For PitNET, the volumes had higher conformity between observers with MRI. With regard to SBGDL, no significant trend towards conformity with the addition of PET/CT information was observed. DOTA PET/CT supports accurate tumor recognition in meningioma and PitNET and is recommended in SSTR2-expressing tumors planned for treatment with highly conformal radiation.

8.
Front Oncol ; 14: 1335623, 2024.
Article de Anglais | MEDLINE | ID: mdl-38800394

RÉSUMÉ

Purpose: Differences in the contours created during magnetic resonance imaging-guided online adaptive radiotherapy (MRgOART) affect dose distribution. This study evaluated the interobserver error in delineating the organs at risk (OARs) in patients with pancreatic cancer treated with MRgOART. Moreover, we explored the effectiveness of drugs that could suppress peristalsis in restraining intra-fractional motion by evaluating OAR visualization in multiple patients. Methods: This study enrolled three patients who underwent MRgOART for pancreatic cancer. The study cohort was classified into three conditions based on the MRI sequence and butylscopolamine administration (Buscopan): 1, T2 imaging without butylscopolamine administration; 2, T2 imaging with butylscopolamine administration; and 3, multi-contrast imaging with butylscopolamine administration. Four blinded observers visualized the OARs (stomach, duodenum, small intestine, and large intestine) on MR images acquired during the initial and final MRgOART sessions. The contour was delineated on a slice area of ±2 cm surrounding the planning target volume. The dice similarity coefficient (DSC) was used to evaluate the contour. Moreover, the OARs were visualized on both MR images acquired before and after the contour delineation process during MRgOART to evaluate whether peristalsis could be suppressed. The DSC was calculated for each OAR. Results: Interobserver errors in the OARs (stomach, duodenum, small intestine, large intestine) for the three conditions were 0.636, 0.418, 0.676, and 0.806; 0.725, 0.635, 0.762, and 0.821; and 0.841, 0.677, 0.762, and 0.807, respectively. The DSC was higher in all conditions with butylscopolamine administration compared with those without it, except for the stomach in condition 2, as observed in the last session of MR image. The DSCs for OARs (stomach, duodenum, small intestine, large intestine) extracted before and after contouring were 0.86, 0.78, 0.88, and 0.87; 0.97, 0.94, 0.90, and 0.94; and 0.94, 0.86, 0.89, and 0.91 for conditions 1, 2, and 3, respectively. Conclusion: Butylscopolamine effectively reduced interobserver error and intra-fractional motion during the MRgOART treatment.

10.
EJNMMI Rep ; 8(1): 6, 2024 Mar 15.
Article de Anglais | MEDLINE | ID: mdl-38748042

RÉSUMÉ

PURPOSE: To determine the efficacy and safety of target volume determination by 18F-fluorodeoxyglucose positron emission tomography-computed tomography (PET-CT) for intensity-modulated radiation therapy (IMRT) for locally advanced head and neck squamous cell carcinoma (HNSCC) extending into the oral cavity or oropharynx. METHODS: We prospectively treated 10 consecutive consenting patients with HNSCC using IMRT, with target volumes determined by PET-CT. Gross tumor volume (GTV) and clinical target volume (CTV) at the oral level were determined by two radiation oncologists for CT, magnetic resonance imaging (MRI), and PET-CT. Differences in target volume (GTVPET, GTVCT, GTVMRI, CTVPET, CTVCT, and CTVMRI) for each modality and the interobserver variability of the target volume were evaluated using the Dice similarity coefficient and Hausdorff distance. Clinical outcomes, including acute adverse events (AEs) and local control were evaluated. RESULTS: The mean GTV was smallest for GTVPET, followed by GTVCT and GTVMRI. There was a significant difference between GTVPET and GTVMRI, but not between the other two groups. The interobserver variability of target volume with PET-CT was significantly less than that with CT or MRI for GTV and tended to be less for CTV, but there was no significant difference in CTV between the modalities. Grade ≤ 3 acute dermatitis, mucositis, and dysphagia occurred in 55%, 88%, and 22% of patients, respectively, but no grade 4 AEs were observed. There was no local recurrence at the oral level after a median follow-up period of 37 months (range, 15-55 months). CONCLUSIONS: The results suggest that the target volume determined by PET-CT could safely reduce GTV size and interobserver variability in patients with locally advanced HNSCC extending into the oral cavity or oropharynx undergoing IMRT. Trial registration UMIN, UMIN000033007. Registered 16 jun 2018, https://center6.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000037631.

11.
Hum Pathol ; 146: 75-85, 2024 Apr.
Article de Anglais | MEDLINE | ID: mdl-38640986

RÉSUMÉ

INTRODUCTION: Semi-quantitative scoring of various parameters in renal biopsy is accepted as an important tool to assess disease activity and prognostication. There are concerns on the impact of interobserver variability in its prognostic utility, generating a need for computerized quantification. METHODS: We studied 94 patients with renal biopsies, 45 with native diseases and 49 transplant patients with index biopsies for Polyomavirus nephropathy. Chronicity scores were evaluated using two methods. A standard definition diagram was agreed after international consultation and four renal pathologists scored each parameter in a double-blinded manner. Interstitial fibrosis (IF) score was assessed with five different computerized and AI-based algorithms on trichrome and PAS stains. RESULTS: There was strong prognostic correlation with renal function and graft outcome at a median follow-up ranging from 24 to 42 months respectively, independent of moderate concordance for pathologists scores. IF scores with two of the computerized algorithms showed significant correlation with estimated glomerular filtration rate (eGFR) at biopsy but not at the end of follow-up. There was poor concordance for AI based platforms. CONCLUSION: Chronicity scores are robust prognostic tools despite interobserver reproducibility. AI-algorithms have absolute precision but are limited by significant variation when different hardware and software algorithms are used for quantification.


Sujet(s)
Intelligence artificielle , Rein , Biais de l'observateur , Humains , Biopsie , Reproductibilité des résultats , Rein/anatomopathologie , Mâle , Femelle , Pronostic , Adulte d'âge moyen , Microscopie/méthodes , Interprétation d'images assistée par ordinateur/méthodes , Adulte , Algorithmes , Débit de filtration glomérulaire , Fibrose/anatomopathologie , Valeur prédictive des tests , Maladies du rein/anatomopathologie , Maladies du rein/diagnostic , Transplantation rénale , Sujet âgé , Infections à polyomavirus/anatomopathologie
12.
Diagnostics (Basel) ; 14(7)2024 Apr 08.
Article de Anglais | MEDLINE | ID: mdl-38611692

RÉSUMÉ

Prior to the curative resection of colorectal carcinoma (CRC) or pancreatic ductal adenocarcinoma (PDAC), the exclusion of hepatic metastasis using cross-sectional imaging is mandatory. The Doppler perfusion index (DPI) of the liver is a promising method for detecting occult liver metastases, but the underlying visceral duplex sonography is critically viewed in terms of its reproducibility. The aim of this study was to investigate systematically the reproducibility of the measured variables, the calculated blood flow, and the DPI. Between February and September 2023, two examinations were performed on 80 subjects within a period of 0-30 days and at two previously defined quality levels, aligned to the German standards of the DEGUM. Correlation analyses were carried out using Pearson's correlation coefficient (PCC) and the intraclass correlation coefficient (ICC). The diameters, blood flow, and DPI showed a high degree of agreement (PCC of 0.9 and ICC of 0.9 for AHP). Provided that a precise standard of procedure is adhered to, the Doppler examination of AHC, AHP, and PV yields very reproducible blood flows and DPI, which is a prerequisite for a comprehensive investigation of its prognostic value for the prediction of metachronous hepatic metastasis in the context of curatively treated CRC or PDAC.

13.
Article de Anglais | MEDLINE | ID: mdl-38613681

RÉSUMÉ

PURPOSE: Traffic accidents persist as a leading cause of death. European law mandates the integration of automatic emergency call systems (eCall). Our project focuses on an automated injury prediction device for car accidents, correlating technical and epidemiological input data, such as age, gender, seating position, impact on the passenger compartment, seatbelt usage, impact direction, EES, vehicle class, and airbag deployment. This study aims to explore interobserver variability in data collection quality in real accident scenarios. The assessment will evaluate the impact of user training and measure the time needed for data collection to inform user recommendations for the prehospital assessment. Insights from this study can aid in evaluating the ability of different professional groups to identify potential accident-independent parameters at accident scenes. This includes, among other things, relaying information to dispatchers at rescue control centers, also within the context of telemedicine approaches. METHODS: During group sessions, real accident scenarios were presented both before and after a training presentation. Participants, including laypersons, accident research staff, emergency services, hospital physicians, and emergency physicians, visually assessed injury prediction parameters within a time limit. Training involved defining and explaining parameters using accident images. The study analyzed participant demographics, prediction accuracy, and time required, comparing assessment quality between professional groups and before and after training. RESULTS: In summary, the study demonstrates that training had a significantly positive impact on the quality of assessment for technical accident parameters. The processing time decreased significantly after training. A notable training effect was observed, particularly for the parameters of rigid collision object, affected passenger compartment, energy equivalent speed (EES), and front and side airbags. It was recommended that individuals without prior knowledge should receive training on assessing EES. Overall, it was evident that technical parameters following a traffic accident can be well assessed through training, irrespective of the professional group. CONCLUSION: Significant differences in the assessment quality of technical accident parameters were observed based on technical and medical expertise. After user training, interdisciplinary differences were reconciled, and all professional groups yielded comparable results, indicating that training can enhance the assessment abilities of all participants in the rescue chain, while the time required for assessing accident parameters was significantly reduced with training.

14.
Knee Surg Sports Traumatol Arthrosc ; 32(6): 1363-1369, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38532466

RÉSUMÉ

PURPOSE: Trochlear dysplasia is one of the main risk factors for recurrent patellar dislocation. The Dejour classification identifies four categories that can be used to classify trochlear dysplasia. The purpose of this study is to evaluate the inter- and intraobserver reliability of the Dejour classification for trochlear dysplasia. The hypothesis was that both intra- and interobserver reliability would be at least moderate. METHODS: This is a cross-sectional, reliability study. Twenty-eight examiners from the International Patellofemoral Study Group 2022 meeting evaluated lateral radiographs of the knee and axial magnetic resonance images from 15 cases of patellofemoral instability with trochlear dysplasia. They classified each case according to Dejour's classification for trochlear dysplasia (A-D). There were three rounds: one with only computed radiograph (CR), one with only magnetic resonance imaging (MRI) and one with both. Inter- and intraobserver reliability were calculated using κ coefficient (0-1). RESULTS: The mean age of patients was: 14.6 years; 60% were female and 53% had open physis. The interobserver reliability κ probabilities were 0.2 (CR), 0.13 (MRI) and 0.12 (CR and MRI). The intraobserver reliability κ probabilities were 0.45 (CR), 0.44 (MRI) and 0.65 (CR and MRI). CONCLUSION: The Dejour classification for trochlear dysplasia has slight interobserver reliability and substantial intraobserver reliability. LEVEL OF EVIDENCE: Level I.


Sujet(s)
Imagerie par résonance magnétique , Biais de l'observateur , Articulation fémoropatellaire , Humains , Études transversales , Femelle , Reproductibilité des résultats , Adolescent , Mâle , Articulation fémoropatellaire/imagerie diagnostique , Articulation fémoropatellaire/anatomopathologie , Luxation patellaire/imagerie diagnostique , Luxation patellaire/classification , Instabilité articulaire/classification , Instabilité articulaire/imagerie diagnostique , Tomodensitométrie , Fémur/imagerie diagnostique , Fémur/anatomopathologie , Enfant
15.
Inflamm Bowel Dis ; 2024 Mar 28.
Article de Anglais | MEDLINE | ID: mdl-38547325

RÉSUMÉ

BACKGROUND: Endoscopy scoring is a key component in the diagnosis of ulcerative colitis (UC) and Crohn's disease (CD). Variability in endoscopic scoring can impact patient trial eligibility and treatment effect measurement. In this study, we examine inter- and intraobserver variability of inflammatory bowel disease endoscopic scoring systems in a systematic review and meta-analysis. METHODS: We included observational studies that evaluated the inter- and intraobserver variability using UC (endoscopic Mayo Score [eMS], Ulcerative Colitis Endoscopic Index of Severity [UCEIS]) or CD (Crohn's Disease Endoscopic Index of Severity [CDEIS], Simple Endoscopic Score for Crohn's Disease [SES-CD]) systems among adults (≥18 years of age) and were published in English. The strength of agreement was categorized as fair, moderate, good, and very good. RESULTS: A total of 6003 records were identified. After screening, 13 studies were included in our analysis. The overall interobserver agreement rates were 0.58 for eMS, 0.66 for UCEIS, 0.80 for CDEIS, and 0.78 for SES-CD. The overall heterogeneity (I2) for these systems ranged from 93.2% to 99.2%. A few studies assessed the intraobserver agreement rate. The overall effect sizes were 0.75 for eMS, 0.87 for UCEIS, 0.89 for CDEIS, and 0.91 for SES-CD. CONCLUSIONS: The interobserver agreement rates for eMS, UCEIS, CDEIS, and SES-CD ranged from moderate to good. The intraobserver agreement rates for eMS, UCEIS, CDEIS, and SES-CD ranged from good to very good. Solutions to improve interobserver agreement could allow for more accurate patient assessment, leading to richer, more accurate clinical management and clinical trial data.


This study examined the inter- and intraobserver variability of inflammatory bowel disease endoscopic scoring systems (endoscopic Mayo Score, Ulcerative Colitis Endoscopic Index of Severity, Crohn's Disease Endoscopic Index of Severity, Simple Endoscopic Score for Crohn's Disease) in a systematic review and meta-analysis.

16.
Cureus ; 16(2): e54389, 2024 Feb.
Article de Anglais | MEDLINE | ID: mdl-38505432

RÉSUMÉ

INTRODUCTION: Knowledge of the morphology of the suprascapular notch is clinically beneficial in patients with suspected suprascapular nerve compression or palsy. Several classification systems have been proposed for the morphological classification of the suprascapular notch and its several anatomical variations. The purpose of this study was to evaluate the inter- and intraobserver reliability of four different classification systems for suprascapular notch typing analysing shoulder computed tomography (CT) scans. METHODS: Shoulder CT scans from 109 subjects (71.5% males) were examined by three raters of various experience levels, one senior, one experienced, and one junior orthopaedic surgeon. The CT scans were evaluated quantitatively and qualitatively and the suprascapular notch was classified according to four classification systems at two separate timepoints, four weeks apart. To determine consistency among the same or different raters, the Kappa statistic was performed and intrarater reliability for each rater between the first and the second evaluation was assessed using Cohen's kappa. Reliability across all raters at each timepoint was assessed using the Fleiss kappa. RESULTS: Agreement was almost perfect for all the classification systems and amongst all raters, regardless of their experience level. There were no significant differences between the raters on any of the evaluations. The overall interobserver agreement for all classifications was almost perfect. CONCLUSION: The four suprascapular notch classification systems are reliable, and the rater's experience level has no impact on the evaluation.

17.
Radiat Oncol J ; 42(1): 63-73, 2024 Mar.
Article de Anglais | MEDLINE | ID: mdl-38549385

RÉSUMÉ

PURPOSE: To assess the interobserver delineation variability of radiomic features of the parotid gland from computed tomography (CT) images and evaluate the correlation of these features for head and neck cancer (HNC) radiotherapy patients. MATERIALS AND METHODS: Contrast-enhanced CT images of 20 HNC patients were utilized. The parotid glands were delineated by treating radiation oncologists (ROs), a selected RO and AccuContour auto-segmentation software. Dice similarity coefficients (DSCs) between each pair of observers were calculated. A total of 107 radiomic features were extracted, whose robustness to interobserver delineation was assessed using the intraclass correlation coefficient (ICC). Pearson correlation coefficients (r) were calculated to determine the relationship between the features. The influence of excluding unrobust features from normal tissue complication probability (NTCP) modeling was investigated for severe oral mucositis (grade ≥3). RESULTS: The average DSC was 0.84 (95% confidence interval, 0.83-0.86). Most of the shape features demonstrated robustness (ICC ≥0.75), while the first-order and texture features were influenced by delineation variability. Among the three observers investigated, 42 features were sufficiently robust, out of which 36 features exhibited weak correlation (|r|<0.8). No significant difference in the robustness level was found when comparing manual segmentation by a single RO or automated segmentation with the actual clinical contour data made by treating ROs. Excluding unrobust features from the NTCP model for severe oral mucositis did not deteriorate the model performance. CONCLUSION: Interobserver delineation variability had substantial impact on radiomic features of the parotid gland. Both manual and automated segmentation methods contributed similarly to this variation.

18.
Endocrine ; 85(2): 730-736, 2024 Aug.
Article de Anglais | MEDLINE | ID: mdl-38372907

RÉSUMÉ

PURPOSE: Ultrasound evaluation of thyroid nodules is the preferred technique, but it is dependent on operator interpretation, leading to inter-observer variability. The current study aimed to determine the inter-physician consensus on nodular characteristics, risk categorization in the classification systems, and the need for fine needle aspiration puncture. METHODS: Four endocrinologists from the same center blindly evaluated 100 ultrasound images of thyroid nodules from 100 different patients. The following ultrasound features were evaluated: composition, echogenicity, margins, calcifications, and microcalcifications. Nodules were also classified according to ATA, EU-TIRADS, K-TIRADS, and ACR-TIRADS classifications. Krippendorff's alpha test was used to assess interobserver agreement. RESULTS: The interobserver agreement for ultrasound features was: Krippendorff's coefficient 0.80 (0.71-0.89) for composition, 0.59 (0.47-0.72) for echogenicity, 0.73 (0.57-0.88) for margins, 0.55 (0.40-0.69) for calcifications, and 0.50 (0.34-0.67) for microcalcifications. The concordance for the classification systems was 0.7 (0.61-0.80) for ATA, 0.63 (0.54-0.73) for EU-TIRADS, 0.64 (0.55-0.73) for K-TIRADS, and 0.68 (0.60-0.77) for K-TIRADS. The concordance in the indication of fine needle aspiration puncture (FNA) was 0.86 (0.71-1), 0.80 (0.71-0.88), 0.77 0.67-0.87), and 0.73 (0.64-0.83) for systems previously described respectively. CONCLUSIONS: Interobserver agreement was acceptable for the identification of nodules requiring cytologic study using various classification systems. However, limited concordance was observed in risk stratification and many ultrasonographic characteristics of the nodules.


Sujet(s)
Biais de l'observateur , Glande thyroide , Nodule thyroïdien , Échographie , Humains , Nodule thyroïdien/imagerie diagnostique , Nodule thyroïdien/anatomopathologie , Nodule thyroïdien/classification , Échographie/méthodes , Femelle , Mâle , Adulte d'âge moyen , Glande thyroide/imagerie diagnostique , Glande thyroide/anatomopathologie , Adulte , Sujet âgé , Cytoponction
19.
Eur Radiol ; 34(7): 4494-4503, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38165429

RÉSUMÉ

OBJECTIVES: The aim of this study is to improve the reliability of subjective IQ assessment using a pairwise comparison (PC) method instead of a Likert scale method in abdominal CT scans. METHODS: Abdominal CT scans (single-center) were retrospectively selected between September 2019 and February 2020 in a prior study. Sample variance in IQ was obtained by adding artificial noise using dedicated reconstruction software, including reconstructions with filtered backprojection and varying iterative reconstruction strengths. Two datasets (each n = 50) were composed with either higher or lower IQ variation with the 25 original scans being part of both datasets. Using in-house developed software, six observers (five radiologists, one resident) rated both datasets via both the PC method (forcing observers to choose preferred scans out of pairs of scans resulting in a ranking) and a 5-point Likert scale. The PC method was optimized using a sorting algorithm to minimize necessary comparisons. The inter- and intraobserver agreements were assessed for both methods with the intraclass correlation coefficient (ICC). RESULTS: Twenty-five patients (mean age 61 years ± 15.5; 56% men) were evaluated. The ICC for interobserver agreement for the high-variation dataset increased from 0.665 (95%CI 0.396-0.814) to 0.785 (95%CI 0.676-0.867) when the PC method was used instead of a Likert scale. For the low-variation dataset, the ICC increased from 0.276 (95%CI 0.034-0.500) to 0.562 (95%CI 0.337-0.729). Intraobserver agreement increased for four out of six observers. CONCLUSION: The PC method is more reliable for subjective IQ assessment indicated by improved inter- and intraobserver agreement. CLINICAL RELEVANCE STATEMENT: This study shows that the pairwise comparison method is a more reliable method for subjective image quality assessment. Improved reliability is of key importance for optimization studies, validation of automatic image quality assessment algorithms, and training of AI algorithms. KEY POINTS: • Subjective assessment of diagnostic image quality via Likert scale has limited reliability. • A pairwise comparison method improves the inter- and intraobserver agreement. • The pairwise comparison method is more reliable for CT optimization studies.


Sujet(s)
Tomodensitométrie , Humains , Mâle , Femelle , Tomodensitométrie/méthodes , Reproductibilité des résultats , Adulte d'âge moyen , Études rétrospectives , Biais de l'observateur , Interprétation d'images radiographiques assistée par ordinateur/méthodes , Radiographie abdominale/méthodes , Algorithmes , Logiciel
20.
Virchows Arch ; 484(1): 61-69, 2024 Jan.
Article de Anglais | MEDLINE | ID: mdl-37924345

RÉSUMÉ

Hemophagocytic lymphohistiocytosis (HLH) is a rare disease with high mortality. Liver involvement is common (based on elevated liver function tests) with most patients demonstrating acute hepatitis. Liver biopsies are frequently obtained in the setting of suspected HLH for the purpose of identification of erythrophagocytosis, and if present, this finding is thought to suggest or support the diagnosis of HLH. However, there are problems with this approach; in particular, we do not know whether this finding is reproducible or whether it is specific to HLH. Therefore, we conducted a multi-institutional study in which experienced liver pathologists reviewed images taken from liver biopsies from patients with normal liver, acute hepatitis, possible HLH, and clinical HLH to determine if there was agreement about the presence or absence of erythrophagocytosis, and to ascertain whether the finding corresponds to a clinical diagnosis of HLH. Twelve liver pathologists reviewed 141 images in isolation (i.e., no clinical information or diagnosis provided). These came from 32 patients (five normal, 17 acute hepatitis, six HLH, four possible HLH). The pathologists classified each image as negative, equivocal, or positive for erythrophagocytosis. Kappa was .08 (no agreement) for case-level and 0.1 for image-level (1.4% agreement, based on two images which were universally considered negative). There was no difference in the proportion of pathologists who diagnosed erythrophagocytosis among those with different diagnoses at case or image-level (p = 0.82 and p = 0.82, respectively). Thus, erythrophagocytosis is an entirely unreliable histologic parameter in liver, as it is irreproducible and not demonstrably associated with a clinical disease (namely, HLH). Unless and until more reliable guidelines can be established, pathologists should refrain from commenting on the presence or absence of erythrophagocytosis in liver biopsy.


Sujet(s)
Hépatite , Lymphohistiocytose hémophagocytaire , Humains , Lymphohistiocytose hémophagocytaire/diagnostic , Lymphohistiocytose hémophagocytaire/complications , Lymphohistiocytose hémophagocytaire/anatomopathologie , Maladie aigüe , Biopsie
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE