Your browser doesn't support javascript.
loading
Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys.
Kim, Tae-Yeon; Baek, Seong-Uk; Lim, Myeong-Hun; Yun, Byungyoon; Paek, Domyung; Zoh, Kyung Ehi; Youn, Kanwoo; Lee, Yun Keun; Kim, Yangho; Kim, Jungwon; Choi, Eunsuk; Kang, Mo-Yeol; Cho, YoonHo; Lee, Kyung-Eun; Sim, Juho; Oh, Juyeon; Park, Heejoo; Lee, Jian; Won, Jong-Uk; Lee, Yu-Min; Yoon, Jin-Ha.
Afiliação
  • Kim TY; Department of Occupational and Environmental Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.
  • Baek SU; The Institute for Occupational Health, Yonsei University College of Medicine, Seoul, Korea.
  • Lim MH; Department of Public Health, Graduate School, Yonsei University, Seoul, Korea.
  • Yun B; Department of Occupational and Environmental Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.
  • Paek D; The Institute for Occupational Health, Yonsei University College of Medicine, Seoul, Korea.
  • Zoh KE; Graduate School, Yonsei University College of Medicine, Seoul, Korea.
  • Youn K; Department of Occupational and Environmental Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.
  • Lee YK; The Institute for Occupational Health, Yonsei University College of Medicine, Seoul, Korea.
  • Kim Y; Department of Public Health, Graduate School, Yonsei University, Seoul, Korea.
  • Kim J; Department of Preventive Medicine, Yonsei University College of Medicine, Seoul, Korea.
  • Choi E; Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Korea.
  • Kang MY; Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Korea.
  • Cho Y; Wonjin Green Hospital Occupational Environmental Medicine, Seoul, Korea.
  • Lee KE; Wonjin Green Hospital Occupational Environmental Medicine, Seoul, Korea.
  • Sim J; Department of Occupational and Environmental Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Korea.
  • Oh J; Department of Occupational and Environmental Medicine, Kosin University College of Medicine, Busan, Korea.
  • Park H; College of Nursing, Research Institute of Nursing Innovation, Kyungpook National University, Daegu, Korea.
  • Lee J; Department of Occupational and Environmental Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.
  • Won JU; Occupational Safety and Health Research Institute, Korea Occupational Safety and Health Agency, Ulsan, Korea.
  • Lee YM; Occupational Safety and Health Research Institute, Korea Occupational Safety and Health Agency, Ulsan, Korea.
  • Yoon JH; Department of Preventive Medicine, Yonsei University College of Medicine, Seoul, Korea.
Ann Occup Environ Med ; 36: e19, 2024.
Article em En | MEDLINE | ID: mdl-39188666
ABSTRACT

Background:

Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT.

Methods:

This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 73. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics.

Results:

The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset.

Conclusions:

This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article