Your browser doesn't support javascript.
loading
Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach.
Gu, Gen; Zhang, Xingting; Zhu, Xingeng; Jian, Zhe; Chen, Ken; Wen, Dong; Gao, Li; Zhang, Shaodian; Wang, Fei; Ma, Handong; Lei, Jianbo.
Affiliation
  • Gu G; Synyi Research, Shanghai, China.
  • Zhang X; Center for Medical Informatics, Peking University, Beijing, China.
  • Zhu X; Synyi Research, Shanghai, China.
  • Jian Z; Harbin Medical University, Harbin, China.
  • Chen K; Synyi Research, Shanghai, China.
  • Wen D; Center for Medical Informatics, Peking University, Beijing, China.
  • Gao L; School of Stomatology, Peking University, Beijing, China.
  • Zhang S; Synyi Research, Shanghai, China.
  • Wang F; APEX Data & Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China.
  • Ma H; Synyi Research, Shanghai, China.
  • Lei J; Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, United States.
JMIR Med Inform ; 7(2): e12704, 2019 May 23.
Article de En | MEDLINE | ID: mdl-31124461
BACKGROUND: The vocabulary gap between consumers and professionals in the medical domain hinders information seeking and communication. Consumer health vocabularies have been developed to aid such informatics applications. This purpose is best served if the vocabulary evolves with consumers' language. OBJECTIVE: Our objective is to develop a method for identifying and adding new terms to consumer health vocabularies, so that it can keep up with the constantly evolving medical knowledge and language use. METHODS: In this paper, we propose a consumer health term-finding framework based on a distributed word vector space model. We first learned word vectors from a large-scale text corpus and then adopted a supervised method with existing consumer health vocabularies for learning vector representation of words, which can provide additional supervised fine tuning after unsupervised word embedding learning. With a fine-tuned word vector space, we identified pairs of professional terms and their consumer variants by their semantic distance in the vector space. A subsequent manual review of the extracted and labeled pairs of entities was conducted to validate the results generated by the proposed approach. The results were evaluated using mean reciprocal rank (MRR). RESULTS: Manual evaluation showed that it is feasible to identify alternative medical concepts by using professional or consumer concepts as queries in the word vector space without fine tuning, but the results are more promising in the final fine-tuned word vector space. The MRR values indicated that on an average, a professional or consumer concept is about 14th closest to its counterpart in the word vector space without fine tuning, and the MRR in the final fine-tuned word vector space is 8. Furthermore, the results demonstrate that our method can collect abbreviations and common typos frequently used by consumers. CONCLUSIONS: By integrating a large amount of text information and existing consumer health vocabularies, our method outperformed several baseline ranking methods and is effective for generating a list of candidate terms for human review during consumer health vocabulary development.
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Type d'étude: Prognostic_studies Langue: En Journal: JMIR Med Inform Année: 2019 Type de document: Article Pays d'affiliation: Chine Pays de publication: Canada

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Type d'étude: Prognostic_studies Langue: En Journal: JMIR Med Inform Année: 2019 Type de document: Article Pays d'affiliation: Chine Pays de publication: Canada