Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
1.
Cureus ; 16(6): e63142, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38919857

ABSTRACT

Background The evaluation of attractiveness varies from one civilization, culture, and environment to another and between individuals. Gender can also play a role in determining the standards of attractiveness. The purpose of this study was to evaluate the effect of the rater's gender on the assessment of adult facial attractiveness with a vertical and horizontal growth pattern in patients with skeletal Class I malocclusion. Methodology The study sample comprised extraoral photos taken before the treatment of 120 patients (30 males and 30 females in each group) with skeletal Class I malocclusion and vertical and horizontal growth patterns according to the Bjork sum aged between 18 and 25 years. A panel of 30 laypersons (aged 19-25 years with an average age of 23 ± 0.53 years), including raters from both genders, were selected equally using a disproportionate stratified sampling method through a computer-generated list. The raters used the visual analog scale (VAS) to provide a score for each photograph's aesthetic quality. The most attractive group, which received the greatest aesthetic score, and the least attractive group, which received the lowest aesthetic score, were the two groups formed based on each photograph's mean aesthetic scores. Overall, 13 patients were chosen for each group. Subsequently, the average assessment score for every patient photo set was determined. Independent-sample t-tests were employed to ascertain if the raters' gender made a statistically significant difference in assessing patients with vertical and horizontal growth patterns. Results There were statistically significant differences between the gender of raters in evaluating female patients with vertical growth patterns (p < 0.001), where the average rating of the female raters was significantly greater than that of the male raters in evaluating female patients. In addition, there were statistically significant differences between the gender of raters in evaluating female patients with horizontal growth patterns (p = 0.009), where the average rating of the male raters was significantly greater than that of the female raters in evaluating female patients. Conclusions There is a limited effect of the rater's gender in evaluating facial aesthetics. However, the facial features of female patients with long faces are preferred by females more than males, and males are more critical in evaluating these patients. On the other hand, males favor the facial features of female patients with short faces more than females, and females are more critical in evaluating these patients. These results suggest considering patients' personal characteristics with vertical and horizontal growth patterns during diagnosis and treatment planning.

2.
Children (Basel) ; 11(5)2024 May 11.
Article in English | MEDLINE | ID: mdl-38790579

ABSTRACT

Fundamental movement skills (FMS), considered as building blocks of movement, have received growing interest due to their significant impact on both present and future health. FMS are categorized into locomotor, object control and stability skills. While there has been extensive research on assessing the proficiency and reliability of locomotor and object control skills, stability skills have received comparatively less attention. For this reason, this study aimed to assess the test-retest, intrarater and interrater reliability of five stability skills included in the Alfamov app. The performance of eighty-four healthy primary school children (60.8% girls), aged 6 to 12 years (mean ± standard deviation of 8.7 ± 1.8 years), in five stability skills was evaluated and scored by four raters, including two experts and two novices. The Alfamov tool, integrating various process-oriented tests, was used for the assessment. Reliability analyses were conducted through the computation of the intraclass correlation coefficient (ICC) along with the corresponding 95% confidence intervals. Good-to-excellent intrarater reliability, excellent interrater reliability and moderate-to-good reliability in the test-retest were achieved. The results proved that Alfamov is a robust test for evaluating stability skills and can be suitable for use by different professionals with less experience in assessing children's motor competence.

3.
Int J Biostat ; 2024 Feb 22.
Article in English | MEDLINE | ID: mdl-38379532

ABSTRACT

Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].

4.
Br J Math Stat Psychol ; 77(2): 245-260, 2024 May.
Article in English | MEDLINE | ID: mdl-38233946

ABSTRACT

Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.


Subject(s)
Models, Statistical , Software , Bayes Theorem , Sample Size
5.
Aging Clin Exp Res ; 35(10): 2173-2190, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37540380

ABSTRACT

BACKGROUND: In behavioural assessment, information can be gathered from internally referenced self-reports or from proxy informants. AIMS: This study aimed to fine-tune a brief but reliable method for evaluating the proxy accuracy in cases where responses obtained from adult and older adults' patient cannot be considered reliable. METHODS: We generated a set of items reflecting both overt and covert behaviours related to the basic instrumental activities of daily living. The psychometric properties of the content, factorial, and criterium validity of these items were then checked. The Proxy Reliability Questionnaire-ProRe was created. We tested the frequency of "I don't know" responses as a measure of proxy reliability in a sample of healthy older adults and their proxies, and in a second sample of proxy respondents who answered questions about their parents. RESULTS: As expected, response precision was lower for items characterizing covert behaviours; items about covert compared to overt behaviours generated more "I don't know" answers. Proxies provided less "I don't know" responses when evaluating the parent, they claimed they knew better. Moreover, we tried to validate our approach using response confidence. Encouragingly, these results also showed differences in the expected direction in confidence between overt and covert behaviours. CONCLUSIONS: The present study encourages clinicians/researchers to how well the proxy the patient know each other, the tendency of proxies to exhibit, for example, response bias when responding to questions about patients' covert behaviours, and more importantly, the reliability of informants in providing a clinical assessment of neurocognitive diseases associated with aging.


Subject(s)
Activities of Daily Living , Quality of Life , Humans , Aged , Reproducibility of Results , Surveys and Questionnaires , Self Report , Psychometrics , Quality of Life/psychology
6.
MethodsX ; 11: 102230, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37383624

ABSTRACT

A low-cost quantitative continuous measurement of movements in the extremities of people with Parkinson's disease, a structured motor assessment administered by a trained examiner to a patient physically present in the same room, utilizes sensors to generate output to facilitate the evaluation of the patient. However, motor assessments with the patient and the examiner in the same room may not be feasible due to distances between the patient and the examiner and the risk of transmission of infections between the patient and the examiner. Therefore, we propose a protocol for the remote assessment by examiners in different locations of both (A) videos of patients recorded during in-person motor assessments and (B) live virtual assessments of patients in different locations from examiners. The proposed procedure provides a framework for providers, investigators, and patients in vastly diverse locations to conduct optimal motor assessments required to develop treatment plans utilizing precision medicine tailored to the specific needs of each individual patient. The proposed protocol generates the foundation for providers to remotely perform structured motor assessments necessary for optimal diagnosis and treatment of people with Parkinson's disease and related conditions.

7.
Article in English | MEDLINE | ID: mdl-36767148

ABSTRACT

In diagnostic accuracy studies, the test of interest is typically applied only once in each patient. This paper illustrates some possibilities that arise when diagnoses are carried out by a sufficiently large number of multiple raters. In a dental study, sixty-one examiners were asked to diagnose 49 tooth areas with different grades of tissue loss (minor, moderate, and advanced) to decide whether dentine was exposed (positive status) or not (negative status). The true status was determined by histology (reference). For each tooth, the rate of correct decisions reflecting the difficulty to diagnose this tooth and the positive rate reflecting the perception of the tooth by the raters was computed. Meta-analytical techniques were used to assess the inter-tooth variation and the influence of tooth-specific factors on difficulty or perception, respectively. A huge variation in diagnostic difficulty and perception could be observed. Advanced tissue loss made diagnoses more difficult. The background colour and tissue loss were associated with perception and may hint to cues used by the raters. The use of multiple raters in a diagnostic accuracy study allows detailed investigations which make it possible to obtain further insights into the decision-making process of the raters.


Subject(s)
Tooth , Humans , Dentistry
8.
J Public Health Afr ; 13(3): 2201, 2022 Sep 07.
Article in English | MEDLINE | ID: mdl-36277943

ABSTRACT

Background: As evidence supports task-shifting approaches to reduce the global mental health treatment gap, counselor competency evaluation measures are critical to ensure evidence-based therapies are administered with quality and fidelity. Objective: This article describes a training technique for evaluating lay counselors' competency for mental health lay practitioners without rating scale experience. Methods: Mental health practitioners were trained to give the Enhancing Assessment of Common Therapeutic Factors (ENACT) test to assess counselor proficiency in delivering the Common Elements Treatment Approach (CETA) in-person and over the phone using standardized video and audio recordings. A two-day in-person training was followed by a one-day remote training session. Training includes a review of item scales through didactic instructions, active learning by witnessing and scoring role-plays, peer interactions, and trainer observation and feedback. The trainees rated video and audio recordings, and ICC values were calculated. Results: The training technique presented in this research helped achieve high counselor competency scores among lay providers with no prior experience using rating scales. ICC rated both trainings satisfactory to exceptional (ICC: .71 - .89). Conclusions: Raters with no past experience with rating scales can achieve high consistency when rating counselor competency through training. Effective rater training should include didactic learning, practical learning with trainer observation and feedback, and video and audio recordings to assess consistency.

9.
Indian J Orthop ; 56(10): 1824-1833, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36034679

ABSTRACT

Objective: To propose a new method for glenoid bone loss measurement, the constellation technique (CST); determine its reliability and accuracy; and compare the validity of CST with that of the conventional technique (CVT) and standard measurements for ratio calculation. Materials and Methods: Sixty shoulders with intact glenoids and no glenohumeral instability and arthritis underwent CT scans. Simulated osteotomies were conducted on the 3D models of glenoids at two cutting locations, expressed as clock face times (2:30-4:20; 1:30-5:00). Two experienced surgeons compared three methods for glenoid bone loss measurement; CVT (best-fit circle), CST ('5S' steps), and standard measurement. Eight undergraduates remeasured five randomly chosen shoulders with moderate to severe bone loss. Intraclass correlation coefficients (ICCs) were calculated for raters. Results: With a defect range between 2:30 and 4:20, all 60 glenoids demonstrated minimal bone loss (< 15%); while between 1:30 and 5:00, 42 shoulders were with moderate bone loss (15-20%), and 18 shoulders with severe bone loss (≥ 20%). For experienced raters, no significant differences were noted between protocos for all categories of bone loss (p ≥ 0.051), with good inter- and intraobserver reliability indicated by ICC. For novice raters, post hoc Tukey analysis found that CST was more accurate in one patient with a standard mean bone loss of 23.2% ± 1.9% compared with CVT. Conclusion: The CST turned the key step of glenoid defect evaluation from deciding an en face view to determining the glenoid inferior rim. The protocol is simple, accurate, and reproducible, especially for novice raters.

10.
J Am Acad Psychiatry Law ; 50(2): 221-230, 2022 06.
Article in English | MEDLINE | ID: mdl-35444057

ABSTRACT

Growing concern about the use of incarceration is driving significant reform in juvenile legal system decision-making and is likely to have a substantial impact on the role residential options play in the future continuum of care. It appears inevitable that surviving institutions or alternative residential models will be increasingly scrutinized for their impact on youth development. While rehabilitative models focused on youth development are a promising and growing part of residential institutions, few tools are available to measure quality. For institutions to sustain a focus on quality assessment, programs should use an organized and specified treatment model`` against which staff behavior can be assessed. This study examined the concurrent validity and item functioning of corresponding youth and expert ratings of social and therapeutic climate across multiple sites in a state-wide juvenile residential setting (n = 225 paired observations). Results suggest that the reliability of expert ratings of therapeutic climate exceeds the reliability of youth ratings, whereas reliability for other indicators of social climate are roughly equal between rater types. In addition, youth and expert ratings had weak concurrent validity. Implications for the use of youth versus expertly trained raters for measuring social and therapeutic environment are discussed.


Subject(s)
Juvenile Delinquency , Prisoners , Adolescent , Humans , Social Environment
11.
J Family Community Med ; 29(1): 56-61, 2022.
Article in English | MEDLINE | ID: mdl-35197729

ABSTRACT

BACKGROUND: Leadership is a wide concept that is rapidly developing. Diverse theories suggest different styles of leadership, with strong relationships between the different styles and their outcomes. The transformational style emphasizes motivating employees and encouraging them to find new ways of dealing with issues. The transactional (TL) style promotes ideas of rewards and punishments. The Laissez-faire style is characterized by relaxation and the tendency to leave things to happen with minimal interference. MATERIALS AND METHODS: This is a descriptive cross-sectional study design conducted in Primary Healthcare Centers in Riyadh, Saudi Arabia. The leadership styles were assessed using a Multi-Factor Leadership Questionnaire, which identifies the different styles of leadership. SPSS v 26.0 was used for data analysis. t-test employed to compare leadership style between raters and managers. Logistics regression model used to determine the influence of leadership styles of managers. Pearson correlation coefficient determined the linear relationship between leadership styles and its domains. RESULTS: A total of 130 respondents (65 managers vs. 65 raters) took part. "Raters" refer to any persons other than the manager, such as a secretary, nurse, doctor. The "manager" is when the person rates himself. The global transformation mean score was 3.55, for TL it was 3.42 and for passive avoidant, the mean score was 0.93. The passive avoidant (t = 2.005; P = 0.047) and management by exception (passive) (MBEP) mean scores of raters were statistically significantly higher than managers. In the binary regression model, MBEP was the independent significant predictor of manager. CONCLUSION: The perceived leadership style of Primary Healthcare Center managers was transformational but with TL. Transformational leadership was positively correlated with TL leadership but negatively correlated with passive avoidant (The Laissez-faire style). The outcome of this study demonstrated that intellectual stimulation, idealized attributes, and inspirational motivation are perhaps better than contingent reward, active management.

12.
Front Psychol ; 13: 988272, 2022.
Article in English | MEDLINE | ID: mdl-36591072

ABSTRACT

Writing assessment relies closely on scoring the excellence of a subject's thoughts. This creates a faceted measurement structure regarding rubrics, tasks, and raters. Nevertheless, most studies did not consider the differences among raters systematically. This study examines the raters' differences in association with the reliability and validity of writing rubrics using the Many-Facet Rasch measurement model (MFRM) to model these differences. A set of standards for evaluating the quality of rating based on writing assessment was examined. Rating quality was tested within four writing domains from an analytic rubric using a scale of one to three. The writing domains explored were vocabulary, grammar, language, use, and organization; whereas the data were obtained from 15 Arabic essays gathered from religious secondary school students under the supervision of the Malaysia Ministry of Education. Five raters in the field of practice were selected to evaluate all the essays. As a result, (a) raters range considerably on the lenient-severity dimension, so rater variations ought to be modeled; (b) the combination of findings between raters avoids the doubt of scores, thereby reducing the measurement error which could lower the criterion validity with the external variable; and (c) MFRM adjustments effectively increased the correlations of the scores obtained from partial and full data. Predominant findings revealed that rating quality varies across analytic rubric domains. This also depicts that MFRM is an effective way to model rater differences and evaluate the validity and reliability of writing rubrics.

13.
Appl Psychol Meas ; 46(1): 53-67, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34898747

ABSTRACT

Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person's responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the "trifactor model" to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.

14.
Acad Pediatr ; 22(2): 313-318, 2022 03.
Article in English | MEDLINE | ID: mdl-34864133

ABSTRACT

INTRODUCTION: No standardized evaluation tool for fellowship applicant assessment exists. Assessment tools are subject to biases and scoring tendencies which can skew scores and impact rankings. We aimed to develop and evaluate an objective assessment tool for fellowship applicants. METHODS: We detected rater effects in our numerically scaled assessment tool (NST), which consisted of 10 domains rated from 0 to 9. We evaluated each domain, consolidated redundant categories, and removed subjective categories. For 7 remaining domains, we described each quality and developed a question with a behaviorally-anchored rating scale (BARS). Applicants were rated by 6 attendings. Ratings from the NST in 2018 were compared with the BARS from 2020 for distribution of data, skewness, and inter-rater reliability. RESULTS: Thirty-four applicants were evaluated with the NST and 38 with the BARS. Demographics were similar between groups. The median score on the NST was 8 out of 9; scores <5 were used in less than 1% of all evaluations. Distribution of data was improved in the BARS tool. In the NST, scores from 6 of 10 domains demonstrated moderate skewness and 3 high skewness. Three of the 7 domains in the BARS showed moderate skewness and none had high skewness. Two of 10 domains in the NST vs 5 of 7 domains in the BARS achieved good inter-rater reliability. CONCLUSION: Replacing a standard numeric scale with a BARS normalized the distribution of data, reduced skewness, and enhanced inter-rater reliability in our evaluation tool. This provides some validity evidence for improved applicant assessment and ranking.


Subject(s)
Fellowships and Scholarships , Bias , Humans , Reproducibility of Results
15.
Multivariate Behav Res ; 57(5): 701-717, 2022.
Article in English | MEDLINE | ID: mdl-33982606

ABSTRACT

To avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student's self-efficacy in school), multiple raters are often used. Increasingly, data that use multiple raters to evaluate psychological and social-emotional constructs over time are available. While a range of models to address measurement issues that arise when using multiple raters have been presented, including a small number for longitudinal data, few if any models are available to estimate growth in the presence of multiple raters. In this study, we provide a model that removes all but the shared perceptions of raters at a given timepoint (i.e., removes unique rater variance), then adds on a latent growth curve model across timepoints. Through simulation and empirical studies, we examine the performance of the model in terms of recovering true growth parameters, and relative to more crude approaches like estimating growth based on a single rater. Our results indicate that the model we propose performs quite well along these dimensions, and shows promise for use by researchers who want to estimate growth based on longitudinal multi-rater data.


Subject(s)
Computer Simulation , Humans
16.
Cureus ; 13(7): e16374, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34408929

ABSTRACT

Introduction The Chief Resident (CR) selection process is described by many residency programs as a collective effort from the residency program leadership, key faculty members, and resident peers. Unfortunately, the literature does not show any established guidelines, methods, or psychometric sound instruments to aid this process. The purpose of this study was to evaluate the properties of the newly developed CRs selection survey across two years using the Multi-Facet Rasch Model (MFRM). Methods This study used the MFRM to analyze two-year data from the newly developed CRs selection survey. After the first implementation of the tool in 2015, this instrument had its second-round evaluation process for the CRs selection in 2016. We applied a three-facet Rasch model (candidates, questions, and raters). We used Facets v. 3.66 and SAS 9.4 (SAS Institute Inc., Cary, NC) for data analysis. Results In 2015, 40 out of100 residents completed the survey to select three of the four candidates for the 2017-2018 CRs positions. The mean rating for each candidate showed that Candidate 1 received the highest rating of 5.56 while Candidates 2 and 4 received the exact same ratings. The majority of survey items performed very well based on the results from the MFRM while leaving room for improvement for a few items. In 2016, 55 out of 100 residents completed the revised survey to select three of the six candidates for the 2018-2019 CR positions. The mean rating showed that Candidate 3 received the highest mean rating of 5.81 while Candidate 2 received the lowest mean rating of 5.12. The item reliability was improved from 0.70 to 0.88 based on the results from the revised survey. The results were used to help inform decisions regarding the selection of chief residents. Conclusions The CR selection process requires a fair and collective effort from program leadership, relevant faculty members, and input from the resident group. Our study demonstrated that the survey tool we developed is appropriate to select CR candidates and MFRM is a promising technique in survey development and the evaluation of survey items.

17.
Educ Psychol Meas ; 81(4): 728-755, 2021 Aug.
Article in English | MEDLINE | ID: mdl-34267398

ABSTRACT

Although collecting data from multiple informants is highly recommended, methods to model the congruence and incongruence between informants are limited. Bauer and colleagues suggested the trifactor model that decomposes the variances into common factor, informant perspective factors, and item-specific factors. This study extends their work to the trifactor mixture model that combines the trifactor model and the mixture model. This combined approach allows researchers to investigate the common and unique perspectives of multiple informants on targets using latent factors and simultaneously take into account potential heterogeneity of targets using latent classes. We demonstrate this model using student self-rated and teacher-rated academic behaviors (N = 24,094). Model specification and testing procedures are explicated in detail. Methodological and practical issues in conducting the trifactor mixture analysis are discussed.

18.
Article in English | MEDLINE | ID: mdl-33572298

ABSTRACT

The Test of Gross Motor Development (TGMD) is one of the most common tools for assessing the fundamental movement skills (FMS) in children between 3 and 10 years. This study aimed to examine the intra-rater and inter-rater reliability of the TGMD-3rd Edition (TGMD-3) between expert and novice raters using live and video assessment. Five raters [2 experts and 3 novices (one of them BSc in Physical Education and Sport Science)] assessed and scored the performance of the TGMD-3 of 25 healthy children [Female: 60%; mean (standard deviation) age 9.16 (1.31)]. Schoolchildren were attending at one public elementary school during the academic year 2019-2020 from Santiago de Compostela (Spain). Raters scored each children performance through two viewing moods (live and slow-motion). The ICC (Intraclass Correlation Coefficient) was used to determine the agreement between raters. Our results showed moderate-to-excellent intra-rater reliability for overall score and locomotor and ball skills subscales; moderate-to-good inter-rater reliability for overall and ball skills; and poor-to-good for locomotor subscale. Higher intra-rater reliability was achieved by the expert raters and novice rater with physical education background compared to novice raters. However, the inter-rater reliability was more variable in all the raters regardless of their experience or background. No significant differences in reliability were found when comparing live and video assessments. For clinical practice, it would be recommended that raters reach an agreement before the assessment to avoid subjective interpretations that might distort the results.


Subject(s)
Movement , Sports , Child , Female , Humans , Physical Education and Training , Reproducibility of Results , Spain
19.
Front Psychol ; 11: 562462, 2020.
Article in English | MEDLINE | ID: mdl-33071888

ABSTRACT

The assessment of text quality is a transdisciplinary issue concerning the research areas of educational assessment, language technology, and classroom instruction. Text length has been found to strongly influence human judgment of text quality. The question of whether text length is a construct-relevant aspect of writing competence or a source of judgment bias has been discussed controversially. This paper used both a correlational and an experimental approach to investigate this question. Secondary analyses were performed on a large-scale dataset with highly trained raters, showing an effect of text length beyond language proficiency. Furthermore, an experimental study found that pre-service teachers tended to undervalue text length when compared to professional ratings. The findings are discussed with respect to the role of training and context in writing assessment.

20.
Article in English | WPRIM (Western Pacific) | ID: wpr-829456

ABSTRACT

@#Work posture analysis is crucial in observing and reducing work-related musculoskeletal symptoms in the workplace. However, in a developing country, new raters are commonly assigned to conduct postural analysis to save on cost. This study aims to observe the validity and inter-rater reliability (defined as the degree of agreement among different raters) among new raters of three different commonly used work posture analysis methods: Rapid Upper Limb Assessment (RULA), Rapid Entire Body Assessment (REBA), and Ovako Workload Assessment System (OWAS). Fifty industrial engineering students, divided into five groups, who received prior training about the use of the methods, participated voluntarily in this study by observing ten different working postures in five different industries: the tofu, military equipment manufacturing, automotive maintenance and service, cracker, and milk-processing industries. One ergonomics expert also observed the working postures. Validity was observed based on the correlation between new raters’ ratings and the rating of the ergonomics expert. Inter-rater reliability within one group was calculated using the percentage of agreement and kappa value. The result shows high validity of RULA, REBA, and OWAS among new raters. There are insignificant differences in the inter-rater reliability of new raters among RULA, REBA, and OWAS. The implications of the result are discussed.

SELECTION OF CITATIONS
SEARCH DETAIL