Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Klein, Ari Z; Banda, Juan M; Guo, Yuting; Schmidt, Ana Lucia; Xu, Dongfang; Flores Amaro, Ivan; Rodriguez-Esteban, Raul; Sarker, Abeed; Gonzalez-Hernandez, Graciela

Klein, Ari Z; Banda, Juan M; Guo, Yuting; Schmidt, Ana Lucia; Xu, Dongfang; Flores Amaro, Ivan; Rodriguez-Esteban, Raul; Sarker, Abeed; Gonzalez-Hernandez, Graciela.

Afiliação

Klein AZ; Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States.
Banda JM; Department of Computer Science, Georgia State University, Atlanta, GA 30302, United States.
Guo Y; Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, United States.
Schmidt AL; Roche Innovation Center, 4070 Basel, Switzerland.
Xu D; Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States.
Flores Amaro I; Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States.
Rodriguez-Esteban R; Roche Innovation Center, 4070 Basel, Switzerland.
Sarker A; Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, United States.
Gonzalez-Hernandez G; Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States.

J Am Med Inform Assoc ; 31(4): 991-996, 2024 Apr 03.

Article em En | MEDLINE | ID: mdl-38218723

ABSTRACT

ABSTRACT

OBJECTIVE:

The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results.

METHODS:

The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events).

RESULTS:

In total, 29 teams registered, representing 17 countries. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora.

CONCLUSION:

To facilitate future work, the datasets-a total of 61 353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

Assuntos

Mídias Sociais; Humanos; Mineração de Dados/métodos; Aprendizado de Máquina; Processamento de Linguagem Natural; Redes Neurais de Computação

Palavras-chave

data mining; machine learning; natural language processing; social media

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Mídias Sociais Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Mídias Sociais Idioma: En Ano de publicação: 2024 Tipo de documento: Article