Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 376
Filtrar
1.
Int J Occup Saf Ergon ; : 1-9, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38961651

RESUMEN

Objectives. This study aimed to investigate the consistency between results of the American Conference for Governmental Occupational Hygienists (ACGIH) threshold limit value (TLV) for hand activity and proposed action levels of objective measurements in risk assessments of work-related musculoskeletal disorders. Methods. Wrist velocities and forearm muscular load were measured for 11 assemblers during one working day. Simultaneously, each assembler's hand activity level (HAL) during three sub-cycles was rated twice on two separate occasions by two experts, using a HAL scale. Arm/hand exertion was also rated by the assemblers themselves using a Borg scale. In total, 66 sub-cycles were assessed and assigned to three exposure categories: A) below ACGIH action limit (AL) (green); B) between AL and TLV (yellow); and C) above TLV (red). The median wrist velocity and the 90th percentile of forearm muscular load obtained from the objective measurements corresponding to the sub-cycles were calculated and assigned to two exposure categories: A) below or C) above the proposed action level. Results. The agreement between ACGIH TLV for hand activity and the proposed action level for wrist velocity was 87%. Conclusions. The proposed action level for wrist velocity is highly consistent with the TLV. Additional studies are needed to confirm the results.

2.
Laryngoscope Investig Otolaryngol ; 9(4): e1298, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38974605

RESUMEN

Background: Dysphagia is commonly evaluated using videofluoroscopy (VFS). As its ratings are usually subjective normal-abnormal ratings, objective measurements have been developed. We compared the inter-rater reliability of the usual VFS ratings to the objective measurement VFS ratings and evaluated their clinical relevance. Methods: Two blinded raters analyzed the subjective normal-abnormal ratings of 77 patients' VFS. Two other blinded raters analyzed the objective measurements of pharyngeal aerated area with bolus held in the oral cavity (PAhold), the pharyngeal area of residual bolus during swallowing (PAmax), the pharyngeal constriction ratio (PCR), the maximum pharyngoesophageal segment opening (PESmax), pharyngoesophageal segment opening duration (POD), airway closure duration (ACD), and total pharyngeal transit time (TPT). We evaluated the inter-rater agreement in the subjective ratings and the objective measurements. Clinical utility analysis compared the measurements with the VFS findings of pharyngeal phase abnormality, penetration/aspiration, and cricopharyngeal relaxation. Results: In the pharyngeal findings, the subjective analysis inter-rater agreement was mainly moderate to strong. The strongest agreements were on the pharyngeal residues and penetration/aspiration findings. The objective measurements had fair to good inter-rater agreement. Clinical utility analysis found statistically significant connections between TPT and pharyngeal phase abnormality, normal PCR and lack of penetration/aspiration, and normal PESmax and normal cricopharyngeal relaxation. Conclusions: The subjective analysis had moderate to strong inter-rater agreement in the pharyngeal VFS findings, especially concerning pharyngeal residues and penetration/aspiration detection, reflecting the efficacy and safety of swallowing. The objective measurements had fair to good inter-observer reproducibility and could thus improve the reliability of VFS diagnostics. Level of evidence: 4.

3.
J Clin Med ; 13(12)2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38930128

RESUMEN

Background: Chronic leg ulcers present a global challenge in healthcare, necessitating precise wound measurement for effective treatment evaluation. This study is the first to validate the "split-wound design" approach for wound studies using objective measures. We further improved this relatively new approach and combined it with a semi-automated wound measurement algorithm. Method: The algorithm is capable of plotting an objective halving line that is calculated by splitting the bounding box of the wound surface along the longest side. To evaluate this algorithm, we compared the accuracy of the subjective wound halving of manual operators of different backgrounds with the algorithm-generated halving line and the ground truth, in two separate rounds. Results: The median absolute deviation (MAD) from the ground truth of the manual wound halving was 2% and 3% in the first and second round, respectively. On the other hand, the algorithm-generated halving line showed a significantly lower deviation from the ground truth (MAD = 0.3%, p < 0.001). Conclusions: The data suggest that this wound-halving algorithm is suitable and reliable for conducting wound studies. This innovative combination of a semi-automated algorithm paired with a unique study design offers several advantages, including reduced patient recruitment needs, accelerated study planning, and cost savings, thereby expediting evidence generation in the field of wound care. Our findings highlight a promising path forward for improving wound research and clinical practice.

4.
Bioengineering (Basel) ; 11(6)2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38927762

RESUMEN

Bone marrow edema-like lesions (BMEL) in the knee have been linked to the symptoms and progression of osteoarthritis (OA), a highly prevalent disease with profound public health implications. Manual and semi-automatic segmentations of BMELs in magnetic resonance images (MRI) have been used to quantify the significance of BMELs. However, their utilization is hampered by the labor-intensive and time-consuming nature of the process as well as by annotator bias, especially since BMELs exhibit various sizes and irregular shapes with diffuse signal that lead to poor intra- and inter-rater reliability. In this study, we propose a novel unsupervised method for fully automated segmentation of BMELs that leverages conditional diffusion models, multiple MRI sequences that have different contrast of BMELs, and anomaly detection that do not rely on costly and error-prone annotations. We also analyze BMEL segmentation annotations from multiple experts, reporting intra-/inter-rater variability and setting better benchmarks for BMEL segmentation performance.

5.
Medicina (Kaunas) ; 60(6)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38929468

RESUMEN

Background and Objectives: Muscle properties are critical for performance and injury risk, with changes occurring due to physical exertion, aging, and neurological conditions. The MyotonPro device offers a non-invasive method to comprehensively assess muscle biomechanical properties. This systematic review evaluates the reliability of MyotonPro across various muscles for diagnostic purposes. Materials and Methods: Following PRISMA guidelines, a comprehensive literature search was conducted in Medline (PubMed), Ovid (Med), Epistemonikos, Embase, Cochrane Library, Clinical trials.gov, and the WHO International Clinical Trials platform. Studies assessing the reliability of MyotonPro across different muscles were included. A methodological quality assessment was performed using established tools, and reviewers independently conducted data extraction. Statistical analysis involved summarizing intra-rater and inter-rater reliability measures across muscles. Results: A total of 48 studies assessing 31 muscles were included in the systematic review. The intra-rater and inter-rater reliability were consistently high for parameters such as frequency and stiffness in muscles of the lower and upper extremities, as well as other muscle groups. Despite methodological heterogeneity and limited data on specific parameters, MyotonPro demonstrated promising reliability for diagnostic purposes across diverse patient populations. Conclusions: The findings suggest the potential of MyotonPro in clinical assessments for accurate diagnosis, treatment planning, and monitoring of muscle properties. Further research is needed to address limitations and enhance the applicability of MyotonPro in clinical practice. Reliable muscle assessments are crucial for optimizing treatment outcomes and improving patient care in various healthcare settings.


Asunto(s)
Músculo Esquelético , Humanos , Reproducibilidad de los Resultados , Músculo Esquelético/fisiología , Músculo Esquelético/fisiopatología , Pruebas Diagnósticas de Rutina/normas , Pruebas Diagnósticas de Rutina/métodos
6.
Am J Surg ; : 115787, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38944624

RESUMEN

BACKGROUND: The American College of Surgeons National Surgical Quality Improvement Project (ACS-NSQIP) uses Current Procedural Terminology (CPT) codes for risk-adjusted calculations. This study evaluates the inter-rater reliability of coding colorectal resections across Canada by ACS-NSQIP surgical clinical nurse reviewers (SCNR) and its impact on risk predictions. METHODS: SCNRs in Canada were asked to code simulated operative reports. Percent agreement and free-marginal kappa correlation were calculated. The ACS-NSQIP risk calculator was utilized to illustrate its impact on risk prediction. RESULTS: Responses from 44 of 150 (29.3 â€‹%) SCNRs revealed 3 to 6 different codes chosen per case, with agreement ranging from 6.7 â€‹% to 62.3 â€‹%. Free-marginal kappa correlation ranged from moderate agreement (0.53) to high disagreement (-0.17). ACS-NSQIP risk calculator predicted large absolute differences in risk for serious complications (0.2 â€‹%-13.7 â€‹%) and mortality (0.2 â€‹%-6.3 â€‹%). CONCLUSION: This study demonstrated low inter-rater reliability in coding ACS-NSQIP colorectal procedures in Canada among SCNRs, impacting risk predictions.

7.
Complement Med Res ; : 1-8, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38744266

RESUMEN

BACKGROUND: Neck reflex points or Adler-Langer points are commonly used in neural therapy to detect so-called interference fields. Chronic irritations or inflammations in the sinuses, teeth, tonsils, or ears are supposed to induce tension and tenderness of the soft tissues and short muscles in the upper cervical spine. The individual treatment strategy is based on the results of diagnostic Adler-Langer point palpation. This study investigated the inter- and intra-rater reliability and explored treatment effects. METHODS: We performed a randomized controlled trial with 104 inpatients (80.8% female, 51.8 ± 12.74 years) of a German department for internal and integrative medicine. Patients were randomized to individual neural therapy according to the pathological findings (n = 48) or no treatment (n = 56). In each patient, three experienced raters (20-45 years of experience in neural therapy) and two novice raters (medical students) rated Adler-Langer points rigidity on a standardized rating scale ("strong," "weak," "none"). The patients independently evaluated the tenderness on palpation of the eight points using the same scale. Pressure pain thresholds were assessed at the eight Adler-Langer points. All patients were retested after 30 min. The five raters were blinded to treatment allocation and assessments of the other raters. Video recordings were obtained to assess the consistency of the areas tested by the different raters. RESULTS: Agreement between patients and raters (Cohen's kappa = 0.161-0.400) and inter-rater reliability were low (Fleiss kappa = 0.132-0.150). Moreover, the individual agreement (pre-post comparisons in untreated patients) was similarly low even in experienced raters (Cohen's kappa = 0.099-0.173). Video documentation suggests that raters do not place their fingers in the correct segments (percentage of correct position: 42.0-60.6%). Pressure pain thresholds at five of the eight Adler-Langer points showed significant changes after treatment compared to none in the control group. CONCLUSION: Under this artificial experimental setting, this method of Adler-Langer point palpation has not proven to be a reliable diagnostic tool. But it could be shown that, as claimed by the method, the tenderness in five of eight Adler-Langer points decreased after neural therapy.

8.
J Surg Educ ; 81(7): 967-972, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38816336

RESUMEN

OBJECTIVE: Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency. DESIGN: General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations. PARTICIPANTS: Nine faculty members viewed the videos and assigned ratings for multiple WBAs. RESULTS: Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47. CONCLUSIONS: All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Cirugía General , Internado y Residencia , Lugar de Trabajo , Cirugía General/educación , Reproducibilidad de los Resultados , Humanos , Evaluación Educacional/métodos , Educación de Postgrado en Medicina/métodos , Grabación en Video , Docentes Médicos , Masculino , Femenino
9.
Forensic Sci Res ; 9(2): owae004, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38765699

RESUMEN

Age assessment of the living is a fundamental procedure in the process of human identification, in order to guarantee fair treatment of individuals, which has ethical, civil, legal, and medical repercussions. The careful selection of the appropriate methods requires evaluation of several parameters: accuracy, precision of the method, as well as its reproducibility. The approach proposed by Mincer et al. adapted from Demirjian et al. exploring third molar mineralisation, is one of the most frequently considered for age estimation of the living. Thus, this work aims to assess potential bias in the data collection when applying the classification stages for dental mineralisation adapted by Mincer et al. A total of 102 orthopantomographs, of clinical origin, belonging to individuals aged between 12 and 25 years ([Formula: see text] = 20.12 years, SD = 3.49 years; 65 females, 37 males, all of Portuguese nationality) were included and a retrospective analysis performed by five observers with different levels of experience (high, average, and basic). The performance and agreement between five observers were evaluated using Weighted Cohen's Kappa and the Intraclass Correlation Coefficient. To access the influence of impaction on third molar classification, variables were tested using ordinal logistic regression Generalised Linear Model. It was observed that there were variations in the number of teeth identified among the observers, but the agreement levels ranged from moderate to substantial (0.4-0.8). Upon closer examination of the results, it was observed that although there were discernible differences between highly experienced observers and those with less experience, the gap was not as significant as initially hypothesised, and a greater disparity between the classifications of the upper (0.24-0.49) and lower third molars (>0.55) was observed. When bone superimposition is present, the classification process is not significantly influenced; however, variation in teeth angulation affects the assessment. The results suggest that with an efficient preparation, the level of experience as a factor can be overcome. Mincer and colleague's classification system can be replicated with ease and consistency, even though the classification of upper and lower third molars presents distinct challenges.

10.
Cureus ; 16(4): e59135, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38803745

RESUMEN

Purpose The purpose of this study was to verify the feasibility and inter-rater reliability of the Japanese version of the Intensive Care Unit Mobility Scale (IMS). Methods A prospective observational study was conducted at two intensive care units (ICUs) in Japan. The feasibility of the Japanese version of the IMS was assessed by 25 ICU staff (12 physical therapists and 13 nurses) using a 10-item questionnaire. Inter-rater reliability was assessed by two experienced physical therapists and two experienced nurses working with 100 ICU patients using the Japanese version of the IMS. Results In the questionnaire survey assessing feasibility, a high agreement rate was shown in 8 out of the 10 questions. All respondents could complete the IMS evaluation, and most respondents were able to complete the scoring of the IMS in a short time. The inter-rater reliability of the Japanese version of the IMS on the first day of physical therapy for ICU patients was 0.966 (95% CI: 9.94-9.99) for the weighted kappa coefficient and 0.985 (95% CI: 9.97-9.99) on the ICU discharge date assessment. The weighted κ coefficient showed an "almost perfect agreement" of 0.8 or higher. Conclusion The Japanese version of the IMS is a feasible tool with strong inter-rater reliability for the measurement of physical activity in ICU patients.

11.
Ultrasound ; 32(2): 76-84, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38694831

RESUMEN

Introduction: The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2-5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author's department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Methods: Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss' kappa and Gwet-AC1. The operators' accuracy was measured against the most experienced operator in the department using Cohen's kappa and percentage agreement. Results: Fair agreement (Fleiss' K = 0.21) was obtained between the participants when U-scoring (U2-5). Fair-to-moderate agreement was noted between sonographers (K = 0.40). Significant variability was demonstrated between radiologists (p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists' AC1 = 0.34, sonographers' AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity (K = 0.29), composition (K = 0.33), shape (K = 0.58), margin (K = 0.45), halo (K = 0.34) and vascularity (K = 0.44). Accuracy reached fair agreement (mean Cohen's K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. Conclusion: No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.

12.
Cureus ; 16(3): e56407, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38638709

RESUMEN

PURPOSE: Neck pain is a common musculoskeletal disorder. Therefore, establishing effective physical therapy for neck pain is one of the most important issues. In addition, in physical therapy for neck pain, it is important to evaluate the thoracic spine, which is an adjacent region of the neck. The lumbar-locked rotation test is designed to evaluate the rotational range of the thoracic spine. However, the reliability of the test when performed on patients with neck pain has not been confirmed. OBJECTIVE: We aimed to determine the intra- and inter-rater reliability of the lumbar-locked rotation test in patients with neck pain. METHODS: In this study involving 43 patients, two separate examiners measured thoracic spine rotation. Both examiners conducted three measurements for each side, before and after a five-minute interval. Reliability was assessed using various intra-class correlation coefficient (ICC) models. RESULTS: The intra-rater reliability showed ICC values of 0.99 for both examiners. The inter-rater reliability showed ICC values of 0.98 for both right and left thoracic rotations. CONCLUSION: The findings strongly suggest that the lumbar-locked rotation test has high within-session intra- and inter-rater reliability for patients with neck pain. This test can be considered a reliable method of measuring the thoracic spine rotational range of motion in patients with neck pain in clinical practice.

13.
Med Image Anal ; 94: 103141, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38489896

RESUMEN

In the context of automatic medical image segmentation based on statistical learning, raters' variability of ground truth segmentations in training datasets is a widely recognized issue. Indeed, the reference information is provided by experts but bias due to their knowledge may affect the quality of the ground truth data, thus hindering creation of robust and reliable datasets employed in segmentation, classification or detection tasks. In such a framework, automatic medical image segmentation would significantly benefit from utilizing some form of presegmentation during training data preparation process, which could lower the impact of experts' knowledge and reduce time-consuming labeling efforts. The present manuscript proposes a superpixels-driven procedure for annotating medical images. Three different superpixeling methods with two different number of superpixels were evaluated on three different medical segmentation tasks and compared with manual annotations. Within the superpixels-based annotation procedure medical experts interactively select superpixels of interest, apply manual corrections, when necessary, and then the accuracy of the annotations, the time needed to prepare them, and the number of manual corrections are assessed. In this study, it is proven that the proposed procedure reduces inter- and intra-rater variability leading to more reliable annotations datasets which, in turn, may be beneficial for the development of more robust classification or segmentation models. In addition, the proposed approach reduces time needed to prepare the annotations.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Humanos , Reproducibilidad de los Resultados , Imagen por Resonancia Magnética/métodos , Sesgo , Procesamiento de Imagen Asistido por Computador/métodos
14.
BMC Nephrol ; 25(1): 94, 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38481181

RESUMEN

BACKGROUND: The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS: This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS: The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION: Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.


Asunto(s)
Nefritis Lúpica , Humanos , Creatinina , Reproducibilidad de los Resultados , Nefritis Lúpica/diagnóstico , Variaciones Dependientes del Observador , Células Epiteliales
15.
Asian J Psychiatr ; 93: 103944, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38364598

RESUMEN

This study aims to adapt the Schizophrenia Proneness Instrument, Adult Version (SPI-A) for the assessment of basic symptoms to the Indonesian context (culturally and linguistically) and analyze the inter-rater reliability of the translated version. Following a specific methodology for cultural adaptation, direct and back-translations were conducted together with cognitive interviews to analyze the comprehensibility of the translated version. A linguistic expert analyzes the resulting version to determine its grammatical and linguistic adequacy. Finally, the interclass correlation (ICC) of the three expert ratings of the samples (N = 9) was analyzed. The direct and back-translation phases showed good conceptual equivalence to the original version. The cognitive interviews revealed items that were challenging to understand and required revision. The final version also considered the judgments of a linguistic expert for grammatical and conceptual improvements. Inter-rater reliability analysis showed an excellent degree of agreement (ICC value: 0.984; 95% CI: 0.950-0.996). The translated SPI-A fits the Indonesian context and can be used in clinical settings to assess basic symptoms in help-seeking individuals in Indonesia.


Asunto(s)
Esquizofrenia , Adulto , Humanos , Esquizofrenia/diagnóstico , Lenguaje , Comparación Transcultural , Indonesia , Reproducibilidad de los Resultados , Traducciones , Encuestas y Cuestionarios
16.
J Neuroradiol ; 51(4): 101184, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38387650

RESUMEN

BACKGROUND AND PURPOSE: To evaluate the reliability and accuracy of nonaneurysmal perimesencephalic subarachnoid hemorrhage (NAPSAH) on Noncontrast Head CT (NCCT) between numerous raters. MATERIALS AND METHODS: 45 NCCT of adult patients with SAH who also had a catheter angiography (CA) were independently evaluated by 48 diverse raters; 45 raters performed a second assessment one month later. For each case, raters were asked: 1) whether they judged the bleeding pattern to be perimesencephalic; 2) whether there was blood anterior to brainstem; 3) complete filling of the anterior interhemispheric fissure (AIF); 4) extension to the lateral part of the sylvian fissure (LSF); 5) frank intraventricular hemorrhage; 6) whether in the hypothetical presence of a negative CT angiogram they would still recommend CA. An automatic NAPSAH diagnosis was also generated by combining responses to questions 2-5. Reliability was estimated using Gwet's AC1 (κG), and the relationship between the NCCT diagnosis of NAPSAH and the recommendation to perform CA using Cramer's V test. Multi-rater accuracy of NCCT in predicting negative CA was explored. RESULTS: Inter-rater reliability for the presence of NAPSAH was moderate (κG = 0.58; 95%CI: 0.47, 0.69), but improved to substantial when automatically generated (κG = 0.70; 95%CI: 0.59, 0.81). The most reliable criteria were the absence of AIF filling (κG = 0.79) and extension to LSF (κG = 0.79). Mean intra-rater reliability was substantial (κG = 0.65). NAPSAH weakly correlated with CA decision (V = 0.50). Mean sensitivity and specificity were 58% (95%CI: 44%, 71%) and 83 % (95%CI: 72 %, 94%), respectively. CONCLUSION: NAPSAH remains a diagnosis of exclusion. The NCCT diagnosis was moderately reliable and its impact on clinical decisions modest.


Asunto(s)
Hemorragia Subaracnoidea , Tomografía Computarizada por Rayos X , Humanos , Hemorragia Subaracnoidea/diagnóstico por imagen , Reproducibilidad de los Resultados , Femenino , Masculino , Persona de Mediana Edad , Tomografía Computarizada por Rayos X/métodos , Anciano , Adulto , Variaciones Dependientes del Observador , Sensibilidad y Especificidad , Angiografía por Tomografía Computarizada/métodos , Angiografía Cerebral/métodos
17.
Spine Deform ; 12(3): 755-761, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38336942

RESUMEN

INTRODUCTION: Spinal measurements play an integral role in surgical planning for a variety of spine procedures. Full-length imaging eliminates distortions that can occur with stitched images. However, these images take radiologists significantly longer to read than conventional radiographs. Artificial intelligence (AI) image analysis software that can make such measurements quickly and reliably would be advantageous to surgeons, radiologists, and the entire health system. MATERIALS AND METHODS: Institutional Review Board approval was obtained for this study. Preoperative full-length standing anterior-posterior and lateral radiographs of patients that were previously measured by fellowship-trained spine surgeons at our institution were obtained. The measurements included lumbar lordosis (LL), greatest coronal Cobb angle (GCC), pelvic incidence (PI), coronal balance (CB), and T1-pelvic angle (T1PA). Inter-rater intra-class correlation (ICC) values were calculated based on an overlapping sample of 10 patients measured by surgeons. Full-length standing radiographs of an additional 100 patients were provided for AI software training. The AI algorithm then measured the radiographs and ICC values were calculated. RESULTS: ICC values for inter-rater reliability between surgeons were excellent and calculated to 0.97 for LL (95% CI 0.88-0.99), 0.78 (0.33-0.94) for GCC, 0.86 (0.55-0.96) for PI, 0.99 for CB (0.93-0.99), and 0.95 for T1PA (0.82-0.99). The algorithm computed the five selected parameters with ICC values between 0.70 and 0.94, indicating excellent reliability. Exemplary for the comparison of AI and surgeons, the ICC for LL was 0.88 (95% CI 0.83-0.92) and 0.93 for CB (0.90-0.95). GCC, PI, and T1PA could be determined with ICC values of 0.81 (0.69-0.87), 0.70 (0.60-0.78), and 0.94 (0.91-0.96) respectively. CONCLUSIONS: The AI algorithm presented here demonstrates excellent reliability for most of the parameters and good reliability for PI, with ICC values corresponding to measurements conducted by experienced surgeons. In future, it may facilitate the analysis of large data sets and aid physicians in diagnostics, pre-operative planning, and post-operative quality control.


Asunto(s)
Algoritmos , Inteligencia Artificial , Radiografía , Humanos , Radiografía/métodos , Radiografía/estadística & datos numéricos , Reproducibilidad de los Resultados , Adulto , Femenino , Masculino , Columna Vertebral/diagnóstico por imagen , Columna Vertebral/cirugía , Lordosis/diagnóstico por imagen , Persona de Mediana Edad , Variaciones Dependientes del Observador , Curvaturas de la Columna Vertebral/diagnóstico por imagen , Curvaturas de la Columna Vertebral/cirugía
18.
Front Sports Act Living ; 6: 1282031, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38304420

RESUMEN

Introduction: The purpose of this study was to investigate inter- and intra-rater reliability as well as the inter-rater interpretation error of ultrasound measurements assessing skeletal muscle architecture and tissue organization of the gastrocnemius medialis (GM) and vastus lateralis (VL) muscle. Methods: The GM and VL of 13 healthy adults (22 ± 3 years) were examined thrice with sagittal B-mode ultrasound: intraday test-retest examination by one investigator (intra-rater) and separate examinations by two investigators (inter-rater). Additionally, images from one investigator were analysed by two interpretators (interpretation error). Muscle architecture was assessed by muscle thickness [MT], fascicle length [FL], as well as superior and inferior pennation angle [PA]. Muscle tissue organization was determined by spatial frequency analysis (SFA: peak spatial frequency radius, peak -6 dB width, PSFR/P6, normalized peak value of amplitude spectrum [Amax], power within peak [PWP], peak power percent). Reliability of ultrasound examination and image interpretation are presented as intraclass correlation coefficient (ICC), test-retest variability, standard error of measurement as well as bias and limits of agreement. Results: GM and VL demonstrated excellent ICCs for inter- and intra-rater reliability, along with excellent ICCs for interpretation error of MT (0.91-0.99), showing minimal variability (<5%) and SEM% (<5%). Systematic bias for MT was less than 1 mm. For PA and FL poor to good ICCs for inter- and intra-rater reliability were revealed (0.41-0.90), with moderate variability (<12%), low SEM% (<10%) and systematic bias between 0.1-1.4°. Tissue organization analysis indicated moderate to good ICCs for inter- and intra-rater reliability. Notably, Amax and PWP consistently held the highest ICC values (0.77-0.87) across all analyses but with higher variability (<24%) and SEM% (<18%), compared to lower variability (<9%) and SEM% (<8%) in other tissue organization parameters. Interpretation error of all muscle tissue organization parameters showed excellent ICCs (0.96-0.999) with very low variability (≤1%) and SEM% (<2%), except Amax & PWP (TRV%: <6%; SEM%: <7%). Conclusion: Our findings demonstrated excellent inter- and intra-rater reliability for MT. However, agreement for PA, FL, and SFA parameters was not as strong. Additionally, MT and all SFA parameters exhibited excellent agreement for inter-rater interpretation error. Therefore, the SFA seems to offer the possibility of objectively and reliably evaluating ultrasound images.

19.
Psychometrika ; 89(2): 517-541, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38190018

RESUMEN

Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen's kappa or Fleiss's kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss's kappa, Conger's kappa, and Hubert's kappa, the variant of Fleiss's kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.


Asunto(s)
Psicometría , Humanos , Psicometría/métodos , Modelos Estadísticos , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Interpretación Estadística de Datos
20.
Nurse Educ Today ; 134: 106083, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38183907

RESUMEN

OBJECTIVES: The Objective Structured Clinical Examination (OSCE) is an assessment format commonly used to assess undergraduate nursing students. However, in spite of its prominence, relatively little research has been conducted into how OSCE assessors form judgements about student performances, and whether divergent processes of judgement formation have the potential to negatively impact the inter-rater reliability (IRR) of awarded scores. This qualitative study aimed to uncover the cognitive processes which assessors employ when assessing OSCE performances. DESIGN, SETTING, PARTICIPANTS: In order to investigate this, a convenience, purposive sample of 12 assessors watched four videos of students completing single-station OSCEs: two videos of blood pressure measurement, and two of naso-gastric tube insertion. METHODS: Assessors were asked to "think aloud" while watching the videos, and also participated in a semi-structured interview about their assessment practices. RESULTS: Thematic analysis of the qualitative data revealed three themes: observation, processing, and integration. Within each theme, a number of sub-themes were identified, which explain the cognitive mechanisms used by assessors when watching, judging and grading student performances. CONCLUSIONS: Notably, the presence of these mechanisms was not uniform across the sample, indicating that assessors utilise different approaches when viewing and interpreting the same performances. This has the potential to threaten the IRR of awarded scores, and thus the validity of decisions made on the basis of those scores.


Asunto(s)
Bachillerato en Enfermería , Estudiantes de Enfermería , Humanos , Reproducibilidad de los Resultados , Competencia Clínica , Evaluación Educacional , Investigación Cualitativa , Cognición
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...