Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 377
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Surg Endosc ; 38(2): 713-719, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38036765

RESUMO

INTRODUCTION: Gastroesophageal reflux disease affects a significant portion of the Australian and world population. Minimally invasive laparoscopic fundoplication is a highly effective treatment in appropriately selected patients, with a 90% satisfaction rate. However, up to 5% will undergo revisional surgery. Endoscopy is an important investigation in the evaluation of persistent or new symptoms after fundoplication. Our study sought to evaluate the inter-rater reliability and variability in assessing fundoplication with endoscopy. METHODS: Upper gastrointestinal (UGI) surgeons and gastroenterologists were invited to join the cohort study through their professional membership with two societies based in Australia. Participants completed a two part 25-item multiple choice questionnaire, involving the analysis of ten static endoscopic images post-fundoplication. RESULTS: A total of 101 participants were included in the study (64 UGI surgeons and 37 gastroenterologists). Over 95% of participants were consultant level, working in non-rural tertiary hospitals. Total accuracy for all 10 cases combined was 76% for UGI surgeons and 69.9% for gastroenterologists. In three of the 10 cases, UGI surgeons performed significantly better than gastroenterologists (p < 0.05). When assessing performance across each of the 4 questions for each case, UGI surgeons were more accurate than gastroenterologists in describing the integrity of the wrap (p = 0.014). Inter-rater reliability was low across both groups for most domains (kappa < 1). CONCLUSION: Our study confirms low inter-rater reliability between endoscopists and large variations in reporting. UGI surgeons performed better than gastroenterologists in certain cases, usually when describing the integrity of the fundoplication. Our study provides further support for the use of a standardized reporting system in post-fundoplication patients.


Assuntos
Fundoplicatura , Laparoscopia , Humanos , Fundoplicatura/métodos , Estudos de Coortes , Reprodutibilidade dos Testes , Laparoscopia/métodos , Austrália , Resultado do Tratamento
2.
BMC Nephrol ; 25(1): 94, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38481181

RESUMO

BACKGROUND: The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS: This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS: The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION: Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.


Assuntos
Nefrite Lúpica , Humanos , Creatinina , Reprodutibilidade dos Testes , Nefrite Lúpica/diagnóstico , Variações Dependentes do Observador , Células Epiteliais
3.
Acta Paediatr ; 113(2): 344-352, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37874018

RESUMO

AIM: The aim of this Swedish study was to evaluate the assessment of clinical signs of perceptual disorder in children with cerebral palsy (CP). METHODS: Three experienced raters assessed 56 videos of 19 children from 1 to 18 years of age with bilateral spastic CP, which were recorded by colleagues at an Italian hospital. Six signs were evaluated for inter-rater reliability and criterion validity. Clinical applicability was evaluated by assessing inter-rater reliability between 47 Swedish clinicians, who examined 15 of the videos during face-to-face and online education seminars. There were 41 physiotherapists, two occupational therapists and four doctors, with 1-37 years of clinical experience and a median of 10 years. RESULTS: The experienced raters demonstrated moderate to almost perfect inter-rater reliability (kappa 0.54-0.81) and criterion validity (0.54-0.87) for startle reaction, upper limbs in startle position, averted eye gaze and eye blinking. The clinicians recognised these signs with at least moderate reliability (0.56-0.88). Grimacing and posture freezing were less reliable (0.22-0.35) and valid (0.09-0.50). CONCLUSION: Four of the six signs of perceptual disorder were reliably recognised by experienced raters and by clinicians after education seminars. Extended education and larger study samples are needed to recognise all the signs.


Assuntos
Paralisia Cerebral , Transtornos da Percepção , Criança , Humanos , Paralisia Cerebral/diagnóstico , Suécia , Reprodutibilidade dos Testes , Movimento
4.
J Neuroradiol ; 51(4): 101184, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38387650

RESUMO

BACKGROUND AND PURPOSE: To evaluate the reliability and accuracy of nonaneurysmal perimesencephalic subarachnoid hemorrhage (NAPSAH) on Noncontrast Head CT (NCCT) between numerous raters. MATERIALS AND METHODS: 45 NCCT of adult patients with SAH who also had a catheter angiography (CA) were independently evaluated by 48 diverse raters; 45 raters performed a second assessment one month later. For each case, raters were asked: 1) whether they judged the bleeding pattern to be perimesencephalic; 2) whether there was blood anterior to brainstem; 3) complete filling of the anterior interhemispheric fissure (AIF); 4) extension to the lateral part of the sylvian fissure (LSF); 5) frank intraventricular hemorrhage; 6) whether in the hypothetical presence of a negative CT angiogram they would still recommend CA. An automatic NAPSAH diagnosis was also generated by combining responses to questions 2-5. Reliability was estimated using Gwet's AC1 (κG), and the relationship between the NCCT diagnosis of NAPSAH and the recommendation to perform CA using Cramer's V test. Multi-rater accuracy of NCCT in predicting negative CA was explored. RESULTS: Inter-rater reliability for the presence of NAPSAH was moderate (κG = 0.58; 95%CI: 0.47, 0.69), but improved to substantial when automatically generated (κG = 0.70; 95%CI: 0.59, 0.81). The most reliable criteria were the absence of AIF filling (κG = 0.79) and extension to LSF (κG = 0.79). Mean intra-rater reliability was substantial (κG = 0.65). NAPSAH weakly correlated with CA decision (V = 0.50). Mean sensitivity and specificity were 58% (95%CI: 44%, 71%) and 83 % (95%CI: 72 %, 94%), respectively. CONCLUSION: NAPSAH remains a diagnosis of exclusion. The NCCT diagnosis was moderately reliable and its impact on clinical decisions modest.


Assuntos
Hemorragia Subaracnóidea , Tomografia Computadorizada por Raios X , Humanos , Hemorragia Subaracnóidea/diagnóstico por imagem , Reprodutibilidade dos Testes , Feminino , Masculino , Pessoa de Meia-Idade , Tomografia Computadorizada por Raios X/métodos , Idoso , Adulto , Variações Dependentes do Observador , Sensibilidade e Especificidade , Angiografia por Tomografia Computadorizada/métodos , Angiografia Cerebral/métodos
5.
Phys Occup Ther Pediatr ; : 1-14, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39007754

RESUMO

AIM: The Test of Gross Motor Development Third Edition (TGMD-3) is used to assess the development of fundamental movement skills in children from 3 to 10 years old. This study aimed to evaluate the intra-rater, inter-rater, and test-retest reliability and to determine the minimal detectable change (MDC) value of the TGMD-3 in children with developmental coordination disorder (DCD). METHODS: The TGMD-3 was administered to 20 children with DCD. The child's fundamental movement skills were recorded using a digital video camera. Reliability was assessed at two occasions by three raters using the generalizability theory. RESULTS: The TGMD-3 demonstrates good inter-rater reliability for the locomotor skills subscale, the ball skills subscale, and the total score (φ = 0.77 - 0.91), while the intra-rater reliability was even higher (φ = 0.94 - 0.97). Test-retest reliability was also shown to be good (φ = 0.79-0.93). The MDC95 was determined to be 10 points. CONCLUSION: This study provides evidence that the TGMD-3 is a reliable test when used to evaluate fundamental movement skills in children with DCD and suggests that an increase of 10 points represents a significant change in the motor function of a child with DCD.

6.
Medicina (Kaunas) ; 60(6)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38929468

RESUMO

Background and Objectives: Muscle properties are critical for performance and injury risk, with changes occurring due to physical exertion, aging, and neurological conditions. The MyotonPro device offers a non-invasive method to comprehensively assess muscle biomechanical properties. This systematic review evaluates the reliability of MyotonPro across various muscles for diagnostic purposes. Materials and Methods: Following PRISMA guidelines, a comprehensive literature search was conducted in Medline (PubMed), Ovid (Med), Epistemonikos, Embase, Cochrane Library, Clinical trials.gov, and the WHO International Clinical Trials platform. Studies assessing the reliability of MyotonPro across different muscles were included. A methodological quality assessment was performed using established tools, and reviewers independently conducted data extraction. Statistical analysis involved summarizing intra-rater and inter-rater reliability measures across muscles. Results: A total of 48 studies assessing 31 muscles were included in the systematic review. The intra-rater and inter-rater reliability were consistently high for parameters such as frequency and stiffness in muscles of the lower and upper extremities, as well as other muscle groups. Despite methodological heterogeneity and limited data on specific parameters, MyotonPro demonstrated promising reliability for diagnostic purposes across diverse patient populations. Conclusions: The findings suggest the potential of MyotonPro in clinical assessments for accurate diagnosis, treatment planning, and monitoring of muscle properties. Further research is needed to address limitations and enhance the applicability of MyotonPro in clinical practice. Reliable muscle assessments are crucial for optimizing treatment outcomes and improving patient care in various healthcare settings.


Assuntos
Músculo Esquelético , Humanos , Reprodutibilidade dos Testes , Músculo Esquelético/fisiologia , Músculo Esquelético/fisiopatologia , Testes Diagnósticos de Rotina/normas , Testes Diagnósticos de Rotina/métodos
7.
BMC Med Res Methodol ; 23(1): 3, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36604617

RESUMO

BACKGROUND: In inter-rater agreement studies, the assessment behaviour of raters can be influenced by their experience, training levels, the degree of willingness to take risks, and the availability of clear guidelines for the assessment. When the assessment behaviour of raters differentiates for some levels of an ordinal classification, a grey zone occurs between the corresponding adjacent cells to these levels around the main diagonal of the table. A grey zone introduces a negative bias to the estimate of the agreement level between the raters. In that sense, it is crucial to detect the existence of a grey zone in an agreement table. METHODS: In this study, a framework composed of a metric and the corresponding threshold is developed to identify grey zones in an agreement table. The symmetry model and Cohen's kappa are used to define the metric, and the threshold is based on a nonlinear regression model. A numerical study is conducted to assess the accuracy of the developed framework. Real data examples are provided to illustrate the use of the metric and the impact of identifying a grey zone. RESULTS: The sensitivity and specificity of the proposed framework are shown to be very high under moderate, substantial, and near-perfect agreement levels for [Formula: see text] and [Formula: see text] tables and sample sizes greater than or equal to 100 and 50, respectively. Real data examples demonstrate that when a grey zone is detected in the table, it is possible to report a notably higher level of agreement in the studies. CONCLUSIONS: The accuracy of the proposed framework is sufficiently high; hence, it provides practitioners with a precise way to detect the grey zones in agreement tables.


Assuntos
Variações Dependentes do Observador , Humanos , Reprodutibilidade dos Testes
8.
Pediatr Dev Pathol ; 26(2): 106-114, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36755427

RESUMO

BACKGROUND: Mucosal biopsies in eosinophilic esophagitis (EoE) can exhibit lamina propria (LP) fibrosis, which may portend stenotic complications; however, the histologic diagnosis of LP fibrosis is subjective. We sought to assess and improve the consistency of LP fibrosis diagnosis among our pathologist group. METHODS: At a large pediatric hospital, 25 esophageal biopsy slides from 19 patients (16 with EoE) exhibiting a wide spectrum of LP area, artifacts, and fibrosis severity were scanned into whole-slide images. Staff pediatric pathologists (n = 8) separate from the authors classified each biopsy by LP adequacy and fibrosis severity 1 month before and after completion of an educational tutorial. Consensus was defined as >70% agreement. RESULTS: At baseline, 16/25 (64%) cases reached consensus for no fibrosis (n = 3), fibrosis (n = 7), or inadequate LP (n = 6); agreement was fair (α = 0.34). Post-tutorial, 13/25 (52%) cases reached consensus for no fibrosis (n = 2), fibrosis (n = 7), or inadequate LP (n = 4); agreement was again fair (α = 0.33). There was moderate agreement in grading of fibrosis severity (α = 0.54). CONCLUSION: We document only fair-to-moderate agreement in the diagnosis of esophageal LP fibrosis and adequacy in a large pediatric pathologist group despite targeted education, highlighting a challenge in incorporating this feature into EoE research and clinical decision-making.


Assuntos
Esofagite Eosinofílica , Humanos , Criança , Biópsia/métodos , Esofagite Eosinofílica/diagnóstico , Esofagite Eosinofílica/patologia , Mucosa/patologia , Mucosa Esofágica/patologia , Fibrose
9.
Surg Endosc ; 37(3): 2070-2077, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36289088

RESUMO

BACKGROUND: Phase and step annotation in surgical videos is a prerequisite for surgical scene understanding and for downstream tasks like intraoperative feedback or assistance. However, most ontologies are applied on small monocentric datasets and lack external validation. To overcome these limitations an ontology for phases and steps of laparoscopic Roux-en-Y gastric bypass (LRYGB) is proposed and validated on a multicentric dataset in terms of inter- and intra-rater reliability (inter-/intra-RR). METHODS: The proposed LRYGB ontology consists of 12 phase and 46 step definitions that are hierarchically structured. Two board certified surgeons (raters) with > 10 years of clinical experience applied the proposed ontology on two datasets: (1) StraBypass40 consists of 40 LRYGB videos from Nouvel Hôpital Civil, Strasbourg, France and (2) BernBypass70 consists of 70 LRYGB videos from Inselspital, Bern University Hospital, Bern, Switzerland. To assess inter-RR the two raters' annotations of ten randomly chosen videos from StraBypass40 and BernBypass70 each, were compared. To assess intra-RR ten randomly chosen videos were annotated twice by the same rater and annotations were compared. Inter-RR was calculated using Cohen's kappa. Additionally, for inter- and intra-RR accuracy, precision, recall, F1-score, and application dependent metrics were applied. RESULTS: The mean ± SD video duration was 108 ± 33 min and 75 ± 21 min in StraBypass40 and BernBypass70, respectively. The proposed ontology shows an inter-RR of 96.8 ± 2.7% for phases and 85.4 ± 6.0% for steps on StraBypass40 and 94.9 ± 5.8% for phases and 76.1 ± 13.9% for steps on BernBypass70. The overall Cohen's kappa of inter-RR was 95.9 ± 4.3% for phases and 80.8 ± 10.0% for steps. Intra-RR showed an accuracy of 98.4 ± 1.1% for phases and 88.1 ± 8.1% for steps. CONCLUSION: The proposed ontology shows an excellent inter- and intra-RR and should therefore be implemented routinely in phase and step annotation of LRYGB.


Assuntos
Derivação Gástrica , Laparoscopia , Obesidade Mórbida , Humanos , Obesidade Mórbida/cirurgia , Reprodutibilidade dos Testes , Resultado do Tratamento , Complicações Pós-Operatórias/cirurgia
10.
BMC Geriatr ; 23(1): 803, 2023 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-38053055

RESUMO

BACKGROUND: Worldwide, there is a large and growing group of older adults. Frailty is known as an important discriminatory factor for poor outcomes. The Clinical Frailty Scale (CFS) has become a frequently used frailty instrument in different clinical settings and health care sectors, and it has shown good predictive validity. The aims of this study were to describe and validate the translation and cultural adaptation of the CFS into Swedish (CFS-SWE), and to test the inter-rater reliability (IRR) for registered nurses using the CFS-SWE. METHODS: An observational study design was employed. The ISPOR principles were used for the translation, linguistic validation and cultural adaptation of the scale. To test the IRR, 12 participants were asked to rate 10 clinical case vignettes using the CFS-SWE. The IRR was assessed using intraclass correlation and Krippendorff's alpha agreement coefficient test. RESULTS: The Clinical Frailty Scale was translated and culturally adapted into Swedish and is presented in its final form. The IRR for all raters, measured by an intraclass correlation test, resulted in an absolute agreement value among the raters of 0.969 (95% CI: 0.929-0.991) and a consistency value of 0.979 (95% CI: 0.953-0.994), which indicates excellent reliability. Krippendorff's alpha agreement coefficient for all raters was 0.969 (95% CI: 0.917-0.988), indicating near-perfect agreement. The sensitivity of the reliability was examined by separately testing the IRR of the group of specialised registered nurses and non-specialised registered nurses respectively, with consistent and similar results. CONCLUSION: The Clinical Frailty Scale was translated, linguistically validated and culturally adapted into Swedish following a well-established standard technique. The IRR was excellent, judged by two established, separately used, reliability tests. The reliability test results did not differ between non-specialised and specialised registered nurses. However, the use of case vignettes might reduce the generalisability of the reliability findings to real-life settings. The CFS has the potential to be a common reference tool, especially when older adults are treated and rehabilitated in different care sectors.


Assuntos
Fragilidade , Humanos , Idoso , Fragilidade/diagnóstico , Suécia , Reprodutibilidade dos Testes , Comparação Transcultural
11.
Teach Learn Med ; 35(5): 609-622, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35989668

RESUMO

PROBLEM: Some medical schools have incorporated constructed response short answer questions (CR-SAQs) into their assessment toolkits. Although CR-SAQs carry benefits for medical students and educators, the faculty perception that the amount of time required to create and score CR-SAQs is not feasible and concerns about reliable scoring may impede the use of this assessment type in medical education. INTERVENTION: Three US medical schools collaborated to write and score CR-SAQs based on a single vignette. Study participants included faculty question writers (N = 5) and three groups of scorers: faculty content experts (N = 7), faculty non-content experts (N = 6), and fourth-year medical students (N = 7). Structured interviews were performed with question writers and an online survey was administered to scorers to gather information about their process for creating and scoring CR-SAQs. A content analysis was performed on the qualitative data using Bowen's model of feasibility as a framework. To examine inter-rater reliability between the content expert and other scorers, a random selection of fifty student responses from each site were scored by each site's faculty content experts, faculty non-content experts, and student scorers. A holistic rubric (6-point Likert scale) was used by two schools and an analytic rubric (3-4 point checklist) was used by one school. Cohen's weighted kappa (κw) was used to evaluate inter-rater reliability. CONTEXT: This research study was implemented at three US medical schools that are nationally dispersed and have been administering CR-SAQ summative exams as part of their programs of assessment for at least five years. The study exam question was included in an end-of-course summative exam during the first year of medical school. IMPACT: Five question writers (100%) participated in the interviews and twelve scorers (60% response rate) completed the survey. Qualitative comments revealed three aspects of feasibility: practicality (time, institutional culture, teamwork), implementation (steps in the question writing and scoring process), and adaptation (feedback, rubric adjustment, continuous quality improvement). The scorers' described their experience in terms of the need for outside resources, concern about lack of expertise, and value gained through scoring. Inter-rater reliability between the faculty content expert and student scorers was fair/moderate (κw=.34-.53, holistic rubrics) or substantial (κw=.67-.76, analytic rubric), but much lower between faculty content and non-content experts (κw=.18-.29, holistic rubrics; κw=.59-.66, analytic rubric). LESSONS LEARNED: Our findings show that from the faculty perspective it is feasible to include CR-SAQs in summative exams and we provide practical information for medical educators creating and scoring CR-SAQs. We also learned that CR-SAQs can be reliably scored by faculty without content expertise or senior medical students using an analytic rubric, or by senior medical students using a holistic rubric, which provides options to alleviate the faculty burden associated with grading CR-SAQs.


Assuntos
Avaliação Educacional , Estudantes de Medicina , Humanos , Reprodutibilidade dos Testes , Estudos de Viabilidade , Aprendizagem
12.
J Sports Sci ; 41(19): 1779-1786, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38155177

RESUMO

This study examined the reliability of expert tennis coaches/biomechanists to qualitatively assess selected features of the serve with the aid of two-dimensional (2D) video replays. Two expert high-performance coaches rated the serves of 150 male and 150 female players across three different age groups from two different camera viewing angles. Serve performance was rated across 13 variables that represented commonly investigated and coached (serve) mechanics using a 1-7 Likert rating scale. A total of 7800 ratings were performed. The reliability of the experts' ratings was assessed using a Krippendorffs alpha. Strong agreement was shown across all age groups and genders when the experts rated the overall serve score (0.727-0.924), power or speed of the serve (0.720-0.907), rhythm (0.744-0.944), quality of the trunk action (0.775-1.000), leg drive (0.731-0.959) and the likelihood of back injury (0.703-0.934). They encountered greater difficulty in consistently rating shoulder internal rotation speed (0.688-0.717). In high-performance settings, the desire for highly precise measurement and large data sets powered by new technologies, is commonplace but this study revealed that tennis experts, through the use of 2D video, can reliably rate important mechanical features of the game's most important shot, the serve.


Assuntos
Tênis , Humanos , Masculino , Feminino , Fenômenos Biomecânicos , Reprodutibilidade dos Testes , Extremidade Superior , Ombro
13.
J Digit Imaging ; 36(2): 401-413, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36414832

RESUMO

Radiologists today play a central role in making diagnostic decisions and labeling images for training and benchmarking artificial intelligence (AI) algorithms. A key concern is low inter-reader reliability (IRR) seen between experts when interpreting challenging cases. While team-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit nondominant participants from expressing true opinions. To overcome the dual problems of low consensus and interpersonal bias, we explored a solution modeled on bee swarms. Two separate cohorts, three board-certified radiologists, (cohort 1), and five radiology residents (cohort 2) collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) standards of reference using Cohen's kappa. The IRR of the consensus votes was then compared to the IRR of the majority and most confident votes of the two cohorts. IRR was also calculated for predictions from a meniscal lesion detecting AI algorithm. The attending cohort saw an improvement of 23% in IRR of swarm votes (k = 0.34) over majority vote (k = 0.11). Similar improvement of 23% in IRR (k = 0.25) in 3-resident swarm votes over majority vote (k = 0.02) was observed. The 5-resident swarm had an even higher improvement of 30% in IRR (k = 0.37) over majority vote (k = 0.07). The swarm consensus votes outperformed individual and majority vote decision in both the radiologists and resident cohorts. The attending and resident swarms also outperformed predictions from a state-of-the-art AI algorithm.


Assuntos
Inteligência Artificial , Radiologistas , Animais , Humanos , Consenso , Reprodutibilidade dos Testes , Inteligência
14.
Artigo em Inglês | MEDLINE | ID: mdl-38157097

RESUMO

The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version (K-SADS-PL) is a valuable tool for diagnosing mental disorders in children and adolescents. Previous studies have examined its interrater reliability, but there is limited information on individual disorders, on the updated DSM-5 version. This study aims to analyse the interrater reliability of the Icelandic translation of K-SADS-PL, DSM-5 version. K-SADS-PL was administered to a clinical sample of outpatients from the Icelandic Anxiety Centre for Children, Adolescents, and Young Adults, and The Department of Child and Adolescent Psychiatry at Landspítali, the National University Hospital in Reykjavík, Iceland. In total, 135 patients aged 6-18 were included in this study. We assessed the interrater reliability using Cohen's κ, with results ranging from poor to excellent (0.3-1.0), though most disorders showed excellent reliability (κ > 0.75). The Icelandic translation of the DSM-5 K-SADS-PL is generally reliable when used by properly trained post-graduate students, which supports its use in clinical settings.

15.
Clin Psychol Psychother ; 30(3): 611-619, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36607260

RESUMO

INTRODUCTION: Among the elderly, the availability of tool assessing psychosomatic syndromes is limited. The present study aims at testing inter-rater reliability and concurrent validity of the semi-structured interview for the Diagnostic Criteria for Psychosomatic Research (DCPR-R-SSI) in the elderly of the general population. METHOD: One hundred eight subjects were recruited. Participants received a clinical assessment which included the DCPR-R-SSI, the Illness Attitude Scale (IAS), the Geriatric Depression Scale (GDS), the Psychosocial Index (PSI), the Toronto Alexithymia Scale-20 (TAS-20). Analyses of inter-rater reliability of DCPR-R-SSI and concurrent validity between DCPR-R-SSI and self-administered questionnaires were conducted. RESULTS: DCPR-R-SSI showed excellent inter-rater reliability with a percent of agreement of 90.7% (K Cohen: 0.856 [SE = 0.043], 95% CI: 0.77-0.94). DCPR-R demoralization showed fair concurrent validity with GDS; concurrent validity was also fair between DCPR-R Alexithymia and TAS-20, and between DCPR-R allostatic overload and PSI allostatic load, while the concurrent validity between DCPR-R Disease Phobia and IAS was moderate. CONCLUSION: DCPR-R-SSI represents a reliable and valid tool to assess psychosomatic syndromes in the elderly. DCPR-R is in need of being implemented in the elderly clinical evaluation.


Assuntos
Sintomas Afetivos , Transtornos Psicofisiológicos , Humanos , Idoso , Reprodutibilidade dos Testes , Síndrome , Transtornos Psicofisiológicos/diagnóstico , Transtornos Psicofisiológicos/epidemiologia , Transtornos Psicofisiológicos/psicologia , Inquéritos e Questionários , Sintomas Afetivos/psicologia
16.
Behav Res Methods ; 55(7): 3326-3347, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-36114386

RESUMO

We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dichotomization. Here, we not only studied some specific estimators but also developed a general method for the study of any estimator candidate to be an agreement measurement. This method was developed in open-source R codes and it is available to the researchers. We tested this method by verifying the performance of several traditional estimators over all possible configurations with sizes ranging from 1 to 68 (total of 1,028,789 tables). Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi, and Shankar and Bangdiwala's B seem to better assess situations of disagreement than agreement between raters. Krippendorff's alpha emulates, without any advantage, Scott's pi in cases with nominal variables and two raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Cohen's kappa is a measurement of association and McNemar's chi-squared assess neither association nor agreement; the only two authentic agreement estimators are Holley and Guilford's G and Gwet's AC1. The latter two estimators also showed the best performance over the range of table sizes and should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from Harvard Dataverse https://doi.org/10.7910/DVN/HMYTCK.


Assuntos
Dissidências e Disputas , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes
17.
Malays J Med Sci ; 30(2): 83-89, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37102040

RESUMO

Background: The NEURON (Neuropsychiatry and Neuromodulation Unit) electroconvulsive therapy electroencephalogram (ECT-EEG) Algorithmic Rating Scale (NEARS) is a step-by-step approach to ictal electroencephalogram visual pattern recognition of seizure adequacy based on recruitment, amplitude, symmetry, duration and degree of post-ictal suppression. The objectives of this clinical audit were to determine the degree of agreement on the NEARS operational criteria between two neuropsychiatrists, the reliability of electroconvulsive therapy practitioners' administration of NEARS during ECT procedures and the correlation of NEARS scores with Clinical Global Impression scale scores after each ECT treatment session. Methods: Systematic random sampling was conducted. Even numbers of ictal tracings were selected for analysis from the total samples collected over 8 consecutive days of ECT overseen by a total of eight different ECT practitioners. Cohen's kappa coefficient was used to measure the inter-rater reliability of the two neuropsychiatrists and determine the level of agreement between NEARS scores and those of the ECT practitioners. The correlation using NEARS scores and post-ECT Clinical Global Impression scores was measured with Spearman's test. The significance level was set at P < 0.05. Results: Cohen's kappa showed perfect agreement between the two neuropsychiatrists, at κ = 1.00 (SE = 0.001; P < 0.001), and strong agreement between NEARS scores of overall seizure adequacy and the scores interpreted by the ECT practitioners, at κ = 0.83 (95% CI: 0.66, 0.99; P < 0.001). Spearman's test showed a weak negative association between NEARS scores and post-ECT Clinical Global Impression scores (r = -0.018; P = 0.900). Conclusion: NEARS may facilitate a brief, objectively reliable and practical assessment of ictal electroencephalogram quality. The scale is readily applicable by any trained ECT practitioner during an ongoing ECT procedure, especially when a prompt treatment decision is required.

18.
Genet Med ; 24(2): 430-438, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34906486

RESUMO

PURPOSE: Demonstrating the clinical utility of genetic testing is fundamental to clinical adoption and reimbursement, but standardized definitions and measurement strategies for this construct do not exist. The Clinician-reported Genetic testing Utility InDEx (C-GUIDE) offers a novel measure to fill this gap. This study assessed its validity and inter-rater reliability. METHODS: Genetics professionals completed C-GUIDE after disclosure of test results to patients. Construct validity was assessed using regression analysis to measure associations between C-GUIDE and global item scores as well as potentially explanatory variables. Inter-rater reliability was assessed by administering a vignette-based survey to genetics professionals and calculating Krippendorff's α. RESULTS: On average, a 1-point increase in the global item score was associated with an increase of 3.0 in the C-GUIDE score (P < .001). Compared with diagnostic results, partially/potentially diagnostic and nondiagnostic results were associated with a reduction in C-GUIDE score of 9.5 (P < .001) and 10.2 (P < .001), respectively. Across 19 vignettes, Krippendorff's α was 0.68 (95% CI: 0.63-0.72). CONCLUSION: C-GUIDE showed acceptable validity and inter-rater reliability. Although further evaluation is required, C-GUIDE version 1.2 can be useful as a standardized approach to assess the clinical utility of genetic testing.


Assuntos
Testes Genéticos , Humanos , Reprodutibilidade dos Testes , Inquéritos e Questionários
19.
BMC Geriatr ; 22(1): 137, 2022 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-35177006

RESUMO

BACKGROUND: Given the potential benefits of introducing ultrasound in the clinical assessment of muscle disorders, this study aimed to assess the feasibility and reliability of measuring forearm muscle thickness by ultrasound in a geriatric clinical setting. METHODS: Cross-sectional pilot study in 25 participants (12 patients aged ≥ 70 years in an acute geriatric ward and 13 healthy volunteers aged 25-50 years), assessed by three raters. Muscle thickness measurement was estimated as the distance between the subcutaneous adipose tissue-muscle interface and muscle-bone interface of the radius at 30% proximal of the distance between the styloid process and distal insertion of the biceps brachii muscle of the dominant forearm. Examinations were repeated three times by each rater and intra- and inter-rater reliability was calculated. Feasibility analysis included consideration of technological, economic, legal, operational, and scheduling (TELOS) components. RESULTS: Mean muscle-thickness measurement difference between groups was 4.4 mm (95% confidence interval [CI] 2.4 mm to 6.3 mm], p < 0.001). Intra-rater reliability of muscle-thickness assessment was excellent, with intraclass correlation coefficient (ICC) of 0.947 (95%CI 0.902 to 0.974), 0.969 (95%CI 0.942 to 0.985), and 0.950 (95%CI 0.907 to 0.975) for observer A, B, and C, respectively. Inter-rater comparison showed good agreement (ICC of 0.873 [95%CI 0.73 to 0.94]). Four of the 17 TELOS components considered led to specific recommendations to improve the procedure's feasibility in clinical practice. CONCLUSION: Our findings suggest that US is a feasible tool to assess the thickness of the forearm muscles with good inter-rater and excellent intra-rater reliability in a sample of hospitalized geriatric patients, making it a promising option for use in clinical practice.


Assuntos
Antebraço , Pacientes Internados , Idoso , Estudos Transversais , Estudos de Viabilidade , Antebraço/diagnóstico por imagem , Humanos , Músculo Esquelético/diagnóstico por imagem , Variações Dependentes do Observador , Projetos Piloto , Reprodutibilidade dos Testes , Ultrassonografia
20.
BMC Musculoskelet Disord ; 23(1): 104, 2022 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-35101020

RESUMO

BACKGROUND: The common manual measurement technique of spinal sagittal alignment on X-rays is susceptible to rater-dependent variability, which has not been adequately considered in previous publications. This study investigates the effect of those variations in the characterization of patients receiving lumbar spondylodesis. METHODS: General alignment parameters on pre- and postoperative X-rays were evaluated by four raters in 43 prospectively sampled patients undergoing monolevel spondylodesis. The Intra-class Correlation Coefficient (ICC) for each rater pair and all raters together was calculated for inter-rater reliability. For the operation-induced change of the sagittal alignment in every patient the Wilcoxon test was applied to compare for each rater separately. RESULTS: The ICCs were "good" (>0.75) to "excellent" (>0.9) for all raters together and for 45 of the 48 single rater pairs (93.75%). All revealed a significant increase of the addressed segmental lordosis and disc height and no significant change for spinopelvic parameters and sagittal vertical axis from pre- to postoperative. The lumbar lordosis showed a significant increase through the operation of +2.5° (p = 0.014) and +3.7° (p = 0.015) in two raters and no difference for the other ones (+2.1°, p = 0.171; -2.2°, p = 0.522). CONCLUSIONS: The pre- to postoperative change of lumbar lordosis revealed different significance levels for different raters, although the ICCs were formally good. Accordingly, the evaluation by only one rater would lead to different conclusions. Due to this susceptibility of alignment measurements to rater-dependent variability, the exact evaluation process should be described in every publication and the consistency of significant results be validated through multiple raters. TRIALS REGISTRATION: The trial was approved by the local ethics committee and listed at the national clinical trials register ( DRKS00004514 , date of registration: 08/11/2012).


Assuntos
Lordose , Fusão Vertebral , Humanos , Lordose/diagnóstico por imagem , Lordose/cirurgia , Variações Dependentes do Observador , Radiografia , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa