Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet.

Guo, Haihong; Na, Xu; Hou, Li; Li, Jiao

Guo, Haihong; Na, Xu; Hou, Li; Li, Jiao.

Afiliação

Guo H; Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.
Na X; Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.
Hou L; Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.
Li J; Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.

J Med Internet Res ; 19(6): e220, 2017 06 20.

Article em En | MEDLINE | ID: mdl-28634156

ABSTRACT

ABSTRACT

BACKGROUND:

In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging.

OBJECTIVE:

This study aimed to classify health care-related questions posted by the general public (Chinese speakers) on the Internet.

METHODS:

A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology.

RESULTS:

The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F1 score of 90.24%.

CONCLUSIONS:

In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users' information needs on health care.

Assuntos

Informação de Saúde ao Consumidor/métodos; Atenção à Saúde/tendências; Internet/estatística & dados numéricos; Processamento de Linguagem Natural; Povo Asiático; Humanos; Inquéritos e Questionários

Palavras-chave

classification; consumer health information; hypertension; natural language processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Internet / Atenção à Saúde / Informação de Saúde ao Consumidor Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: J Med Internet Res Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2017 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google