Your browser doesn't support javascript.
loading
Large-scale Vietnamese point-of-interest classification using weak labeling.
Tran, Van Trung; Le, Quang Dao; Pham, Bao Son; Luu, Viet Hung; Bui, Quang Hung.
Afiliação
  • Tran VT; Center of Multidisciplinary Integrated Technologies for Field Monitoring, Vietnam National University of Engineering and Technology, Hanoi, Vietnam.
  • Le QD; NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam.
  • Pham BS; Center of Multidisciplinary Integrated Technologies for Field Monitoring, Vietnam National University of Engineering and Technology, Hanoi, Vietnam.
  • Luu VH; NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam.
  • Bui QH; Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam.
Front Artif Intell ; 5: 1020532, 2022.
Article em En | MEDLINE | ID: mdl-36568578
ABSTRACT
Point-of-Interests (POIs) represent geographic location by different categories (e.g., touristic places, amenities, or shops) and play a prominent role in several location-based applications. However, the majority of POIs category labels are crowd-sourced by the community, thus often of low quality. In this paper, we introduce the first annotated dataset for the POIs categorical classification task in Vietnamese. A total of 750,000 POIs are collected from WeMap, a Vietnamese digital map. Large-scale hand-labeling is inherently time-consuming and labor-intensive, thus we have proposed a new approach using weak labeling. As a result, our dataset covers 15 categories with 275,000 weak-labeled POIs for training, and 30,000 gold-standard POIs for testing, making it the largest compared to the existing Vietnamese POIs dataset. We empirically conduct POI categorical classification experiments using a strong baseline (BERT-based fine-tuning) on our dataset and find that our approach shows high efficiency and is applicable on a large scale. The proposed baseline gives an F1 score of 90% on the test dataset, and significantly improves the accuracy of WeMap POI data by a margin of 37% (from 56 to 93%).
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article