Your browser doesn't support javascript.
loading
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.
Sarker, Abeed; Belousov, Maksim; Friedrichs, Jasper; Hakala, Kai; Kiritchenko, Svetlana; Mehryary, Farrokh; Han, Sifei; Tran, Tung; Rios, Anthony; Kavuluru, Ramakanth; de Bruijn, Berry; Ginter, Filip; Mahata, Debanjan; Mohammad, Saif M; Nenadic, Goran; Gonzalez-Hernandez, Graciela.
Afiliación
  • Sarker A; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
  • Belousov M; School of Computer Science, University of Manchester, Manchester, UK.
  • Friedrichs J; Infosys Limited, Palo Alto, California, USA.
  • Hakala K; Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland.
  • Kiritchenko S; The University of Turku Graduate School, University of Turku, Turku, Finland.
  • Mehryary F; Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada.
  • Han S; Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland.
  • Tran T; The University of Turku Graduate School, University of Turku, Turku, Finland.
  • Rios A; Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.
  • Kavuluru R; Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.
  • de Bruijn B; Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.
  • Ginter F; Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.
  • Mahata D; Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA.
  • Mohammad SM; Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada.
  • Nenadic G; Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland.
  • Gonzalez-Hernandez G; Bloomberg, New York, New York, USA.
J Am Med Inform Assoc ; 25(10): 1274-1283, 2018 10 01.
Article en En | MEDLINE | ID: mdl-30272184
ABSTRACT

Objective:

We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and

Methods:

We organized 3 independent subtasks automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.

Results:

Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.

Discussion:

Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).

Conclusions:

Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http//dx.doi.org/10.17632/rxwfb3tysd.1).
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Redes Neurales de la Computación / Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos / Máquina de Vectores de Soporte / Medios de Comunicación Sociales Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Redes Neurales de la Computación / Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos / Máquina de Vectores de Soporte / Medios de Comunicación Sociales Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos