Búsqueda | Portal Regional de la BVS

1.

Preliminary evaluation of a questionnaire for assessing fidelity of early intervention for psychosis services.

Kinkaid, Miriam; Fuhrer, Rebecca; McGowan, Stephen; Malla, Ashok.

Soc Psychiatry Psychiatr Epidemiol ; 2024 Aug 05.

Artículo en Inglés | MEDLINE | ID: mdl-39102065

RESUMEN

PURPOSE: Fast, easy, and cost-effective methods are needed for fidelity assessment, quality improvement initiatives, and population-based studies in Early Intervention for Psychosis (EIP) services. Having an online questionnaire assessing the fidelity of EIP services, completed by staff self-reports, and having evidence of reliability and validity, could fill that gap. We assess the reliability and validity of the Early Intervention for Psychosis Services Fidelity Questionnaire (EIPS-FQ), developed in Part I of this set of papers. METHODS: A convenience sample of 10 EIP teams in England was used. Two staff members completed online questionnaires assessing recent and past fidelity. An external rater completed the same questionnaire for the two time periods, using a random sample of patient medical records, program documentation, and interviews with staff. The intra-class correlation coefficient (ICC) was calculated to assess inter-rater reliability. Validity was assessed using Bland-Altman plots, absolute mean differences, and the ICC. RESULTS: The fidelity score measuring recent fidelity ranged from 54.2 to 82.7, out of a possible 100. The ICC assessing reliability of the fidelity score was 0.40 (95% CI: 0.0-0.81). The ICCs for the fidelity sub-category scores ranged from 0 to 0.76. Two sub-categories, comprehensive assessments and family involvement and intervention, had low ICCs, regardless of period examined. CONCLUSIONS: This first attempt at validating the EIPS-FQ has demonstrated that the reliability of the EIPS-FQ is moderate/low, and therefore requires modification prior to use. The next iteration of the fidelity questionnaire will clarify or remove items which had very low fidelity and add evidence-based components not identified in the Delphi exercise.

2.

Reliability of Goniometric Techniques for Measuring Hip Flexor Length Using the Modified Thomas Test.

Eimiller, Kira; Stoddard, Emma; Janes, Briana; Smith, Mason; Vincek, Andrew.

Int J Sports Phys Ther ; 19(8): 997-1002, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39100940

RESUMEN

Background: The modified Thomas test (MTT) is commonly used to assess the flexibility of hip musculature, including the iliopsoas, rectus femoris, and tensor fascia latae. This measurement is important to include in a comprehensive musculoskeletal examination. However, existing research shows conflicting results regarding its reliability, particularly due to variations in controlling pelvic tilt during testing, which may lead to inaccurate measurements of hip extension when quantifying the test outcomes. Purpose/Hypothesis: This study aimed to evaluate the intra- and inter-rater reliability of the Modified Thomas Test (MTT) in assessing hip flexor length using a goniometer. It was hypothesized that controlling for pelvic tilt would enhance the reliability of these measurements. Study Design: Intra- and inter-rater reliability study. Methods: Sixty-four healthy individuals were recruited to participate in this study. The MTT was performed twice on each leg by both an experienced and a student physical therapist. Blinded goniometric measurements for hip extension range of motion (ROM) in the MTT position were taken with neutral pelvic tilt being enforced via palpation. A double-blind protocol was used where both examiners were unaware of each other's measurements and the goniometer was covered to blind the measuring therapist to the values as well. ROM values were entered into a Microsoft Excel spreadsheet and quantified using SPSS software. Statistical analysis included calculating Intraclass Correlation Coefficients (ICCs) and Standard Errors of Measurement (SEMs) using SPSS software. Results: The study included 64 participants (mean age = 23.7 ± 4.34 years). The MTT demonstrated high intra-rater reliability (ICC = 0.911) and inter-rater reliability (ICC = 0.851). The SEMs indicated minimal variability around the mean scores. The average hip extension ROM measured was 5.43± 9.73 degrees. Conclusion: These results suggest that the MTT is a reliable tool for assessing hip flexor length in clinical practice, particularly when pelvic tilt is controlled. These results have important implications for accurately testing orthopedic limitations that can contribute to low back, hip, and knee pain. Level of Evidence: 3.

3.

Curiosities in a Table: Learning Points for Responsible Clinical Rating.

Andrade, Chittaranjan.

Indian J Psychol Med ; 46(4): 356-357, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-39056029

RESUMEN

This article presents a table containing redacted data from a real study. The table contains three curiosities: statistical significance in the absence of clinical significance, narrow standard deviations, and the absence of a placebo effect. The data in the table had been obtained by an inexperienced rater; how the inexperience compromised the data is explained. Action points for rater experience, rater training, and rating procedures are suggested.

4.

Reliability of Suicide Risk Estimates: A Vignette Study.

Kolochowski, Finn Dario; Kreckeler, Nina; Forkmann, Thomas; Teismann, Tobias.

Arch Suicide Res ; : 1-12, 2024 Jul 24.

Artículo en Inglés | MEDLINE | ID: mdl-39045846

RESUMEN

OBJECTIVE: Suicide risk assessments are obligatory when patients express a death wish in clinical practice. Yet, suicide risk estimates based on unguided risk assessments have been shown to be of low reliability. Since generalizability of previous studies is limited, the current study aimed to assess inter-rater and intra-rater reliability of risk estimates conducted by psychotherapists and psychology students using written case vignettes. METHOD: In total, N = 256 participants (psychology students, psychotherapists) were presented with 24 case vignettes describing patients at either low, moderate, severe or extreme risk of suicide. Participants were asked to assign a level of risk to each single vignette at a baseline assessment and again at a follow-up assessment two weeks later. RESULTS: Risk estimates showed a low inter-rater reliability, both for students (AC1 = .35) and for psychotherapists (AC1 = .44). Intra-rater reliability was moderate for psychotherapists (AC1 = .59) and rather low for psychology students (AC1 = .47). In general, intra- and intra-rater reliability were highest for vignettes displaying "low" and "extreme" risk. CONCLUSIONS: The results highlight that the reliability of unguided suicide risk assessments is questionable. Standardized risk assessment protocols are therefore recommended. Nonetheless, even reliable risk estimation does not imply predictive validity of risk estimates for future suicidal behavior.

Suicide risk estimates have been shown to be of low reliabilitySuicide risk estimates by psychotherapists and students also showed low inter-rater and intra-rater reliability in the current studyReliable risk estimation does not imply predictive validity of risk estimates for future suicidal behavior.

5.

Reliability of the Test of Gross Motor Development Third Edition Among Children with Developmental Coordination Disorder.

Roczniak, Laine; Jutras, Mylène; Lévesque, Caroline; Fortin, Carole.

Phys Occup Ther Pediatr ; : 1-14, 2024 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-39007754

RESUMEN

AIM: The Test of Gross Motor Development Third Edition (TGMD-3) is used to assess the development of fundamental movement skills in children from 3 to 10 years old. This study aimed to evaluate the intra-rater, inter-rater, and test-retest reliability and to determine the minimal detectable change (MDC) value of the TGMD-3 in children with developmental coordination disorder (DCD). METHODS: The TGMD-3 was administered to 20 children with DCD. The child's fundamental movement skills were recorded using a digital video camera. Reliability was assessed at two occasions by three raters using the generalizability theory. RESULTS: The TGMD-3 demonstrates good inter-rater reliability for the locomotor skills subscale, the ball skills subscale, and the total score (φ = 0.77 - 0.91), while the intra-rater reliability was even higher (φ = 0.94 - 0.97). Test-retest reliability was also shown to be good (φ = 0.79-0.93). The MDC95 was determined to be 10 points. CONCLUSION: This study provides evidence that the TGMD-3 is a reliable test when used to evaluate fundamental movement skills in children with DCD and suggests that an increase of 10 points represents a significant change in the motor function of a child with DCD.

6.

Consistency between the ACGIH TLV for hand activity and proposed action levels for wrist velocity and forearm muscular load based on objective measurements: an example from the assembly industry.

Dahlqvist, Camilla; Arvidsson, Inger; Löfqvist, Lotta; Gremark Simonsen, Jenny.

Int J Occup Saf Ergon ; : 1-9, 2024 Jul 03.

Artículo en Inglés | MEDLINE | ID: mdl-38961651

RESUMEN

Objectives. This study aimed to investigate the consistency between results of the American Conference for Governmental Occupational Hygienists (ACGIH) threshold limit value (TLV) for hand activity and proposed action levels of objective measurements in risk assessments of work-related musculoskeletal disorders. Methods. Wrist velocities and forearm muscular load were measured for 11 assemblers during one working day. Simultaneously, each assembler's hand activity level (HAL) during three sub-cycles was rated twice on two separate occasions by two experts, using a HAL scale. Arm/hand exertion was also rated by the assemblers themselves using a Borg scale. In total, 66 sub-cycles were assessed and assigned to three exposure categories: A) below ACGIH action limit (AL) (green); B) between AL and TLV (yellow); and C) above TLV (red). The median wrist velocity and the 90th percentile of forearm muscular load obtained from the objective measurements corresponding to the sub-cycles were calculated and assigned to two exposure categories: A) below or C) above the proposed action level. Results. The agreement between ACGIH TLV for hand activity and the proposed action level for wrist velocity was 87%. Conclusions. The proposed action level for wrist velocity is highly consistent with the TLV. Additional studies are needed to confirm the results.

7.

Arthroscopic Assessment of Temporomandibular Joint Pathologies-Is It Possible for Non-Specialists in Arthroscopy? Analysis of Variability and Reliability of Dental Students' Ratings after a Comprehensive One-Semester Introduction.

Brüning, Lennard-Luca; Rösner, Yannick; Meisgeier, Axel; Neff, Andreas.

J Clin Med ; 13(14)2024 Jul 09.

Artículo en Inglés | MEDLINE | ID: mdl-39064035

RESUMEN

Background: Arthroscopy of the temporomandibular joint (TMJ) plays a long-established role in the diagnostics and therapy of patients suffering from arthrogenic temporomandibular disorders (TMDs), which do not respond adequately to conservative/non-invasive therapy. However, the interpretation of arthroscopic findings remains challenging. This study investigates the reliability and variability of assessing arthroscopic views of pathologies in patients with TMDs by non-specialists in arthroscopy and whether a standardized assessment tool may improve correctness. Methods: Following a comprehensive one-semester lecture, dental students in the clinical stage of education were asked to rate 25 arthroscopic views (freeze images and corresponding video clips) regarding the severity of synovitis, adhesions, and degenerative changes on a scale of 0-10 (T1). The results were compared to ratings stated by two European-board-qualified academic OMF surgeons. In a second round (T2), the students were asked to repeat the ratings using a 10-point rating scheme. Results: With regard to all three subcategories, congruency with the surgeons' results at T1 was at a low level (p < 0.05 in 19/75 cases) and even decreased at T2 after the implementation of the TMDs-SevS (p < 0.05 in 38/75 cases). For both T1 and T2, therefore, the inter-rater agreement was at a low level, showing only a slight agreement for all three subcategories (Fleiss' Kappa (κ) between 0.014 and 0.099). Conclusions: The judgement of the arthroscopic pathologies of the TMJ remains an area of temporomandibular surgery that requires wide experience and training in TMDs to achieve expertise in TMJ arthroscopic assessments, which cannot be transferred by theoretical instruction alone.

8.

Inter-rater reliability and clinical relevance of subjective and objective interpretation of videofluoroscopy findings.

Kuuskoski, Jonna; Vanhatalo, Jaakko; Hirvonen, Jussi; Rekola, Jami; Aaltonen, Leena-Maija; Järvenpää, Pia.

Laryngoscope Investig Otolaryngol ; 9(4): e1298, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38974605

RESUMEN

Background: Dysphagia is commonly evaluated using videofluoroscopy (VFS). As its ratings are usually subjective normal-abnormal ratings, objective measurements have been developed. We compared the inter-rater reliability of the usual VFS ratings to the objective measurement VFS ratings and evaluated their clinical relevance. Methods: Two blinded raters analyzed the subjective normal-abnormal ratings of 77 patients' VFS. Two other blinded raters analyzed the objective measurements of pharyngeal aerated area with bolus held in the oral cavity (PAhold), the pharyngeal area of residual bolus during swallowing (PAmax), the pharyngeal constriction ratio (PCR), the maximum pharyngoesophageal segment opening (PESmax), pharyngoesophageal segment opening duration (POD), airway closure duration (ACD), and total pharyngeal transit time (TPT). We evaluated the inter-rater agreement in the subjective ratings and the objective measurements. Clinical utility analysis compared the measurements with the VFS findings of pharyngeal phase abnormality, penetration/aspiration, and cricopharyngeal relaxation. Results: In the pharyngeal findings, the subjective analysis inter-rater agreement was mainly moderate to strong. The strongest agreements were on the pharyngeal residues and penetration/aspiration findings. The objective measurements had fair to good inter-rater agreement. Clinical utility analysis found statistically significant connections between TPT and pharyngeal phase abnormality, normal PCR and lack of penetration/aspiration, and normal PESmax and normal cricopharyngeal relaxation. Conclusions: The subjective analysis had moderate to strong inter-rater agreement in the pharyngeal VFS findings, especially concerning pharyngeal residues and penetration/aspiration detection, reflecting the efficacy and safety of swallowing. The objective measurements had fair to good inter-observer reproducibility and could thus improve the reliability of VFS diagnostics. Level of evidence: 4.

9.

Inter-rater reliability of ACS-NSQIP colorectal procedure coding in Canada.

Xiong, Yingqi; Spence, Richard T; Hirsch, Greg; Walsh, Mark J; Neumann, Katerina.

Am J Surg ; : 115787, 2024 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-38944624

RESUMEN

BACKGROUND: The American College of Surgeons National Surgical Quality Improvement Project (ACS-NSQIP) uses Current Procedural Terminology (CPT) codes for risk-adjusted calculations. This study evaluates the inter-rater reliability of coding colorectal resections across Canada by ACS-NSQIP surgical clinical nurse reviewers (SCNR) and its impact on risk predictions. METHODS: SCNRs in Canada were asked to code simulated operative reports. Percent agreement and free-marginal kappa correlation were calculated. The ACS-NSQIP risk calculator was utilized to illustrate its impact on risk prediction. RESULTS: Responses from 44 of 150 (29.3 â%) SCNRs revealed 3 to 6 different codes chosen per case, with agreement ranging from 6.7 â% to 62.3 â%. Free-marginal kappa correlation ranged from moderate agreement (0.53) to high disagreement (-0.17). ACS-NSQIP risk calculator predicted large absolute differences in risk for serious complications (0.2 â%-13.7 â%) and mortality (0.2 â%-6.3 â%). CONCLUSION: This study demonstrated low inter-rater reliability in coding ACS-NSQIP colorectal procedures in Canada among SCNRs, impacting risk predictions.

10.

Evaluating the Reliability of MyotonPro in Assessing Muscle Properties: A Systematic Review of Diagnostic Test Accuracy.

Lettner, Jonathan; Królikowska, Aleksandra; Ramadanov, Nikolai; Oleksy, Lukasz; Hakam, Hassan Tarek; Becker, Roland; Prill, Robert.

Medicina (Kaunas) ; 60(6)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38929468

RESUMEN

Background and Objectives: Muscle properties are critical for performance and injury risk, with changes occurring due to physical exertion, aging, and neurological conditions. The MyotonPro device offers a non-invasive method to comprehensively assess muscle biomechanical properties. This systematic review evaluates the reliability of MyotonPro across various muscles for diagnostic purposes. Materials and Methods: Following PRISMA guidelines, a comprehensive literature search was conducted in Medline (PubMed), Ovid (Med), Epistemonikos, Embase, Cochrane Library, Clinical trials.gov, and the WHO International Clinical Trials platform. Studies assessing the reliability of MyotonPro across different muscles were included. A methodological quality assessment was performed using established tools, and reviewers independently conducted data extraction. Statistical analysis involved summarizing intra-rater and inter-rater reliability measures across muscles. Results: A total of 48 studies assessing 31 muscles were included in the systematic review. The intra-rater and inter-rater reliability were consistently high for parameters such as frequency and stiffness in muscles of the lower and upper extremities, as well as other muscle groups. Despite methodological heterogeneity and limited data on specific parameters, MyotonPro demonstrated promising reliability for diagnostic purposes across diverse patient populations. Conclusions: The findings suggest the potential of MyotonPro in clinical assessments for accurate diagnosis, treatment planning, and monitoring of muscle properties. Further research is needed to address limitations and enhance the applicability of MyotonPro in clinical practice. Reliable muscle assessments are crucial for optimizing treatment outcomes and improving patient care in various healthcare settings.

Asunto(s)

Músculo Esquelético , Humanos , Reproducibilidad de los Resultados , Músculo Esquelético/fisiología , Músculo Esquelético/fisiopatología , Pruebas Diagnósticas de Rutina/normas , Pruebas Diagnósticas de Rutina/métodos

11.

Evaluation of a Semi-Automated Wound-Halving Algorithm for Split-Wound Design Studies: A Step towards Enhanced Wound-Healing Assessment.

Georg, Paul Julius; Schmid, Meret Emily; Zahia, Sofia; Probst, Sebastian; Cazzaniga, Simone; Hunger, Robert; Bossart, Simon.

J Clin Med ; 13(12)2024 Jun 20.

Artículo en Inglés | MEDLINE | ID: mdl-38930128

RESUMEN

Background: Chronic leg ulcers present a global challenge in healthcare, necessitating precise wound measurement for effective treatment evaluation. This study is the first to validate the "split-wound design" approach for wound studies using objective measures. We further improved this relatively new approach and combined it with a semi-automated wound measurement algorithm. Method: The algorithm is capable of plotting an objective halving line that is calculated by splitting the bounding box of the wound surface along the longest side. To evaluate this algorithm, we compared the accuracy of the subjective wound halving of manual operators of different backgrounds with the algorithm-generated halving line and the ground truth, in two separate rounds. Results: The median absolute deviation (MAD) from the ground truth of the manual wound halving was 2% and 3% in the first and second round, respectively. On the other hand, the algorithm-generated halving line showed a significantly lower deviation from the ground truth (MAD = 0.3%, p < 0.001). Conclusions: The data suggest that this wound-halving algorithm is suitable and reliable for conducting wound studies. This innovative combination of a semi-automated algorithm paired with a unique study design offers several advantages, including reduced patient recruitment needs, accelerated study planning, and cost savings, thereby expediting evidence generation in the field of wound care. Our findings highlight a promising path forward for improving wound research and clinical practice.

12.

Unsupervised Segmentation of Knee Bone Marrow Edema-like Lesions Using Conditional Generative Models.

Yu, Andrew Seohwan; Yang, Mingrui; Lartey, Richard; Holden, William; Ok, Ahmet Hakan; Khan, Sameed; Kim, Jeehun; Winalski, Carl; Subhas, Naveen; Chaudhary, Vipin; Li, Xiaojuan.

Bioengineering (Basel) ; 11(6)2024 May 22.

Artículo en Inglés | MEDLINE | ID: mdl-38927762

RESUMEN

Bone marrow edema-like lesions (BMEL) in the knee have been linked to the symptoms and progression of osteoarthritis (OA), a highly prevalent disease with profound public health implications. Manual and semi-automatic segmentations of BMELs in magnetic resonance images (MRI) have been used to quantify the significance of BMELs. However, their utilization is hampered by the labor-intensive and time-consuming nature of the process as well as by annotator bias, especially since BMELs exhibit various sizes and irregular shapes with diffuse signal that lead to poor intra- and inter-rater reliability. In this study, we propose a novel unsupervised method for fully automated segmentation of BMELs that leverages conditional diffusion models, multiple MRI sequences that have different contrast of BMELs, and anomaly detection that do not rely on costly and error-prone annotations. We also analyze BMEL segmentation annotations from multiple experts, reporting intra-/inter-rater variability and setting better benchmarks for BMEL segmentation performance.

13.

Evaluation of the departmental inter-rater reliability when scoring thyroid nodules according to the British Thyroid Association Ultrasound-classification model: Is there significant disagreement?

Rtam, Nabil.

Ultrasound ; 32(2): 76-84, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38694831

RESUMEN

Introduction: The British Thyroid Association Ultrasound-classification is a risk stratification model which grades thyroid nodules in U2-5 based on their sonographic appearance. Existence of variability between the ultrasound operators when U-scoring is reported in the literature with some evidence found in the author's department. The aim of this study was to investigate whether there is significant disagreement in the department and identify potential reasons for variability. Methods: Eight operators, radiologists and sonographers, were recruited to grade 33 TNs and answer a tick box questionnaire using the British Thyroid Association lexicon. The inter-operator variability for the U-categories, indication for fine-needle aspiration biopsy and ultrasound features was assessed using Fleiss' kappa and Gwet-AC1. The operators' accuracy was measured against the most experienced operator in the department using Cohen's kappa and percentage agreement. Results: Fair agreement (Fleiss' K = 0.21) was obtained between the participants when U-scoring (U2-5). Fair-to-moderate agreement was noted between sonographers (K = 0.40). Significant variability was demonstrated between radiologists (p > 0.05). Indication for fine-needle aspiration biopsy reached fair to almost substantial agreement (radiologists' AC1 = 0.34, sonographers' AC1 = 0.58, overall AC1 = 0.41). No significant variability measured for echogenicity (K = 0.29), composition (K = 0.33), shape (K = 0.58), margin (K = 0.45), halo (K = 0.34) and vascularity (K = 0.44). Accuracy reached fair agreement (mean Cohen's K = 0.29) and moderate agreement (mean AC1 = 0.53) for the U-categories and fine-needle aspiration biopsy, respectively. Radiologists demonstrated lower accuracy. Conclusion: No significant inter-rater variability in U-scoring or recommending fine-needle aspiration biopsy was demonstrated between all the operators in the department. Radiologists showed significant variability in U-scoring and lower accuracy. Reliability and accuracy could be improved by addressing those problematic categories and features identified with this study.

14.

Evaluation of data collection bias of third molar stages of mineralisation for age estimation in the living.

de Oliveira Santos, Inês; Baptista, Isabel Poiares; da Silva, Ricardo Henrique Alves; Cunha, Eugénia.

Forensic Sci Res ; 9(2): owae004, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38765699

RESUMEN

Age assessment of the living is a fundamental procedure in the process of human identification, in order to guarantee fair treatment of individuals, which has ethical, civil, legal, and medical repercussions. The careful selection of the appropriate methods requires evaluation of several parameters: accuracy, precision of the method, as well as its reproducibility. The approach proposed by Mincer et al. adapted from Demirjian et al. exploring third molar mineralisation, is one of the most frequently considered for age estimation of the living. Thus, this work aims to assess potential bias in the data collection when applying the classification stages for dental mineralisation adapted by Mincer et al. A total of 102 orthopantomographs, of clinical origin, belonging to individuals aged between 12 and 25 years ([Formula: see text] = 20.12 years, SD = 3.49 years; 65 females, 37 males, all of Portuguese nationality) were included and a retrospective analysis performed by five observers with different levels of experience (high, average, and basic). The performance and agreement between five observers were evaluated using Weighted Cohen's Kappa and the Intraclass Correlation Coefficient. To access the influence of impaction on third molar classification, variables were tested using ordinal logistic regression Generalised Linear Model. It was observed that there were variations in the number of teeth identified among the observers, but the agreement levels ranged from moderate to substantial (0.4-0.8). Upon closer examination of the results, it was observed that although there were discernible differences between highly experienced observers and those with less experience, the gap was not as significant as initially hypothesised, and a greater disparity between the classifications of the upper (0.24-0.49) and lower third molars (>0.55) was observed. When bone superimposition is present, the classification process is not significantly influenced; however, variation in teeth angulation affects the assessment. The results suggest that with an efficient preparation, the level of experience as a factor can be overcome. Mincer and colleague's classification system can be replicated with ease and consistency, even though the classification of upper and lower third molars presents distinct challenges.

15.

How Reliable are Single-Question Workplace-Based Assessments in Surgery?

Gates, Rebecca S; Krumm, Andrew E; Cate, Olle Ten; Chen, Xilin; Marcotte, Kayla; Thelen, Angela E; Deal, Shanley B; Alseidi, Adnan; Swanson, David; George, Brian C.

J Surg Educ ; 81(7): 967-972, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38816336

RESUMEN

OBJECTIVE: Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency. DESIGN: General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations. PARTICIPANTS: Nine faculty members viewed the videos and assigned ratings for multiple WBAs. RESULTS: Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47. CONCLUSIONS: All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.

Asunto(s)

Competencia Clínica , Evaluación Educacional , Cirugía General , Internado y Residencia , Lugar de Trabajo , Cirugía General/educación , Reproducibilidad de los Resultados , Humanos , Evaluación Educacional/métodos , Educación de Postgrado en Medicina/métodos , Grabación en Video , Docentes Médicos , Masculino , Femenino

16.

Feasibility and Inter-rater Reliability of the Japanese Version of the Intensive Care Unit Mobility Scale.

Yasumura, Daisetsu; Katsukawa, Hajime; Matsuo, Ryu; Kawano, Reo; Taito, Shunsuke; Liu, Keibun; Hodgson, Carol.

Cureus ; 16(4): e59135, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38803745

RESUMEN

Purpose The purpose of this study was to verify the feasibility and inter-rater reliability of the Japanese version of the Intensive Care Unit Mobility Scale (IMS). Methods A prospective observational study was conducted at two intensive care units (ICUs) in Japan. The feasibility of the Japanese version of the IMS was assessed by 25 ICU staff (12 physical therapists and 13 nurses) using a 10-item questionnaire. Inter-rater reliability was assessed by two experienced physical therapists and two experienced nurses working with 100 ICU patients using the Japanese version of the IMS. Results In the questionnaire survey assessing feasibility, a high agreement rate was shown in 8 out of the 10 questions. All respondents could complete the IMS evaluation, and most respondents were able to complete the scoring of the IMS in a short time. The inter-rater reliability of the Japanese version of the IMS on the first day of physical therapy for ICU patients was 0.966 (95% CI: 9.94-9.99) for the weighted kappa coefficient and 0.985 (95% CI: 9.97-9.99) on the ICU discharge date assessment. The weighted κ coefficient showed an "almost perfect agreement" of 0.8 or higher. Conclusion The Japanese version of the IMS is a feasible tool with strong inter-rater reliability for the measurement of physical activity in ICU patients.

17.

Low Inter-Rater Reliability and Reproducibility of Neck Reflex/"Adler-Langer" Points in Neural Therapy Diagnostics but Increased Pressure Pain Threshold after Therapy: Results of a Randomized Controlled Observer-Blind Trial.

Choi, Kyung-Eun; Grünert, Jan; Werner, Marc; Cramer, Holger; Anheyer, Dennis; Dobos, Gustav; Saha, Felix J.

Complement Med Res ; : 1-8, 2024 May 14.

Artículo en Inglés | MEDLINE | ID: mdl-38744266

RESUMEN

BACKGROUND: Neck reflex points or Adler-Langer points are commonly used in neural therapy to detect so-called interference fields. Chronic irritations or inflammations in the sinuses, teeth, tonsils, or ears are supposed to induce tension and tenderness of the soft tissues and short muscles in the upper cervical spine. The individual treatment strategy is based on the results of diagnostic Adler-Langer point palpation. This study investigated the inter- and intra-rater reliability and explored treatment effects. METHODS: We performed a randomized controlled trial with 104 inpatients (80.8% female, 51.8 ± 12.74 years) of a German department for internal and integrative medicine. Patients were randomized to individual neural therapy according to the pathological findings (n = 48) or no treatment (n = 56). In each patient, three experienced raters (20-45 years of experience in neural therapy) and two novice raters (medical students) rated Adler-Langer points rigidity on a standardized rating scale ("strong," "weak," "none"). The patients independently evaluated the tenderness on palpation of the eight points using the same scale. Pressure pain thresholds were assessed at the eight Adler-Langer points. All patients were retested after 30 min. The five raters were blinded to treatment allocation and assessments of the other raters. Video recordings were obtained to assess the consistency of the areas tested by the different raters. RESULTS: Agreement between patients and raters (Cohen's kappa = 0.161-0.400) and inter-rater reliability were low (Fleiss kappa = 0.132-0.150). Moreover, the individual agreement (pre-post comparisons in untreated patients) was similarly low even in experienced raters (Cohen's kappa = 0.099-0.173). Video documentation suggests that raters do not place their fingers in the correct segments (percentage of correct position: 42.0-60.6%). Pressure pain thresholds at five of the eight Adler-Langer points showed significant changes after treatment compared to none in the control group. CONCLUSION: Under this artificial experimental setting, this method of Adler-Langer point palpation has not proven to be a reliable diagnostic tool. But it could be shown that, as claimed by the method, the tenderness in five of eight Adler-Langer points decreased after neural therapy.

18.

Intra- and Inter-rater Reliability of the Lumbar-Locked Thoracic Rotation Test in Patients With Neck Pain.

Yoshida, Ryota; Kuruma, Hironobu.

Cureus ; 16(3): e56407, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38638709

RESUMEN

PURPOSE: Neck pain is a common musculoskeletal disorder. Therefore, establishing effective physical therapy for neck pain is one of the most important issues. In addition, in physical therapy for neck pain, it is important to evaluate the thoracic spine, which is an adjacent region of the neck. The lumbar-locked rotation test is designed to evaluate the rotational range of the thoracic spine. However, the reliability of the test when performed on patients with neck pain has not been confirmed. OBJECTIVE: We aimed to determine the intra- and inter-rater reliability of the lumbar-locked rotation test in patients with neck pain. METHODS: In this study involving 43 patients, two separate examiners measured thoracic spine rotation. Both examiners conducted three measurements for each side, before and after a five-minute interval. Reliability was assessed using various intra-class correlation coefficient (ICC) models. RESULTS: The intra-rater reliability showed ICC values of 0.99 for both examiners. The inter-rater reliability showed ICC values of 0.98 for both right and left thoracic rotations. CONCLUSION: The findings strongly suggest that the lumbar-locked rotation test has high within-session intra- and inter-rater reliability for patients with neck pain. This test can be considered a reliable method of measuring the thoracic spine rotational range of motion in patients with neck pain in clinical practice.

19.

Evaluating inter-rater reliability in the context of "Sysmex UN2000 detection of protein/creatinine ratio and of renal tubular epithelial cells can be used for screening lupus nephritis": a statistical examination.

Li, Ming; Gao, Qian; Yang, Jing; Yu, Tianfei.

BMC Nephrol ; 25(1): 94, 2024 Mar 13.

Artículo en Inglés | MEDLINE | ID: mdl-38481181

RESUMEN

BACKGROUND: The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS: This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS: The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION: Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.

Asunto(s)

Nefritis Lúpica , Humanos , Creatinina , Reproducibilidad de los Resultados , Nefritis Lúpica/diagnóstico , Variaciones Dependientes del Observador , Células Epiteliales

20.

Use of superpixels for improvement of inter-rater and intra-rater reliability during annotation of medical images.

Gut, Daniel; Trombini, Marco; Kucybala, Iwona; Krupa, Kamil; Rozynek, Milosz; Dellepiane, Silvana; Tabor, Zbislaw; Wojciechowski, Wadim.

Med Image Anal ; 94: 103141, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38489896

RESUMEN

In the context of automatic medical image segmentation based on statistical learning, raters' variability of ground truth segmentations in training datasets is a widely recognized issue. Indeed, the reference information is provided by experts but bias due to their knowledge may affect the quality of the ground truth data, thus hindering creation of robust and reliable datasets employed in segmentation, classification or detection tasks. In such a framework, automatic medical image segmentation would significantly benefit from utilizing some form of presegmentation during training data preparation process, which could lower the impact of experts' knowledge and reduce time-consuming labeling efforts. The present manuscript proposes a superpixels-driven procedure for annotating medical images. Three different superpixeling methods with two different number of superpixels were evaluated on three different medical segmentation tasks and compared with manual annotations. Within the superpixels-based annotation procedure medical experts interactively select superpixels of interest, apply manual corrections, when necessary, and then the accuracy of the annotations, the time needed to prepare them, and the number of manual corrections are assessed. In this study, it is proven that the proposed procedure reduces inter- and intra-rater variability leading to more reliable annotations datasets which, in turn, may be beneficial for the development of more robust classification or segmentation models. In addition, the proposed approach reduces time needed to prepare the annotations.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Humanos , Reproducibilidad de los Resultados , Imagen por Resonancia Magnética/métodos , Sesgo , Procesamiento de Imagen Asistido por Computador/métodos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA