Pesquisa | Portal Regional da BVS

1.

Fine grain emotion analysis in Spanish using linguistic features and transformers.

Salmerón-Ríos, Alejandro; García-Díaz, José Antonio; Pan, Ronghao; Valencia-García, Rafael.

PeerJ Comput Sci ; 10: e1992, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38855234

RESUMO

Mental health issues are a global concern, with a particular focus on the rise of depression. Depression affects millions of people worldwide and is a leading cause of suicide, particularly among young people. Recent surveys indicate an increase in cases of depression during the COVID-19 pandemic, which affected approximately 5.4% of the population in Spain in 2020. Social media platforms such as X (formerly Twitter) have become important hubs for health information as more people turn to these platforms to share their struggles and seek emotional support. Researchers have discovered a link between emotions and mental illnesses such as depression. This correlation provides a valuable opportunity for automated analysis of social media data to detect changes in mental health status that might otherwise go unnoticed, thus preventing more serious health consequences. Therefore, this research explores the field of emotion analysis in Spanish towards mental disorders. There are two contributions in this area. On the one hand, the compilation, translation, evaluation and correction of a novel dataset composed of a mixture of other existing datasets in the bibliography. This dataset compares a total of 16 emotions, with an emphasis on negative emotions. On the other hand, the in-depth evaluation of this novel dataset with several state-of-the-art transformers based on encoder-only and encoder-decoder architectures. The analysis compromises monolingual, multilingual and distilled models as well as feature integration techniques. The best results are obtained with the encoder-only MarIA model, with a macro-average F1 score of 60.4771%.

2.

Evaluation of transformer models for financial targeted sentiment analysis in Spanish.

Pan, Ronghao; García-Díaz, José Antonio; Garcia-Sanchez, Francisco; Valencia-García, Rafael.

PeerJ Comput Sci ; 9: e1377, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37346571

RESUMO

Nowadays, financial data from social media plays an important role to predict the stock market. However, the exponential growth of financial information and the different polarities of sentiment that other sectors or stakeholders may have on the same information has led to the need for new technologies that automatically collect and classify large volumes of information quickly and easily for each stakeholder. In this scenario, we conduct a targeted sentiment analysis that can automatically extract the main economic target from financial texts and obtain the polarity of a text towards such main economic target, other companies and society in general. To this end, we have compiled a novel corpus of financial tweets and news headlines in Spanish, constituting a valuable resource for the Spanish-focused research community. In addition, we have carried out a performance comparison of different Spanish-specific large language models, with MarIA and BETO achieving the best results. Our best result has an overall performance of 76.04%, 74.16%, and 68.07% in macro F1-score for the sentiment classification towards the main economic target, society, and other companies, respectively, and an accuracy of 69.74% for target detection. We have also evaluated the performance of multi-label classification models in this context and obtained a performance of 71.13%.

3.

Hope speech detection in Spanish: The LGBT case.

García-Baena, Daniel; García-Cumbreras, Miguel Ángel; Jiménez-Zafra, Salud María; García-Díaz, José Antonio; Valencia-García, Rafael.

Lang Resour Eval ; : 1-28, 2023 Mar 17.

Artigo em Inglês | MEDLINE | ID: mdl-37360265

RESUMO

In recent years, systems have been developed to monitor online content and remove abusive, offensive or hateful content. Comments in online social media have been analyzed to find and stop the spread of negativity using methods such as hate speech detection, identification of offensive language or detection of abusive language. We define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression. Detecting it automatically, in order to give greater diffusion to positive comments, can have a very significant effect when it comes to fighting against sexual or racial discrimination or when we intend to foster less bellicose environments. In this article we perform a complete study on hope speech, analyzing existing solutions and available resources. In addition, we have generated a quality resource, SpanishHopeEDI, a new Spanish Twitter dataset on LGBT community, and we have conducted some experiments that can serve as a baseline for further research.

4.

Automatic Correction of Real-Word Errors in Spanish Clinical Texts.

Bravo-Candel, Daniel; López-Hernández, Jésica; García-Díaz, José Antonio; Molina-Molina, Fernando; García-Sánchez, Francisco.

Sensors (Basel) ; 21(9)2021 Apr 21.

Artigo em Inglês | MEDLINE | ID: mdl-33919018

RESUMO

Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing with patient information. Moreover, GloVe and Word2Vec pretrained word embeddings were used to study their performance. Despite the medicine corpus being much smaller than the Wikicorpus, Seq2seq models trained on the medicine corpus performed better than those models trained on the Wikicorpus. Nevertheless, a larger amount of clinical text is required to improve the results.

Assuntos

Idioma , Semântica , Humanos , Processamento de Linguagem Natural , Privacidade , Probabilidade

5.

Ontology-driven aspect-based sentiment analysis classification: An infodemiological case study regarding infectious diseases in Latin America.

García-Díaz, José Antonio; Cánovas-García, Mar; Valencia-García, Rafael.

Future Gener Comput Syst ; 112: 641-657, 2020 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-32572291

RESUMO

Infodemiology is the process of mining unstructured and textual data so as to provide public health officials and policymakers with valuable information regarding public health. The appearance of this new data source, which was previously unimaginable, has opened up a new way in which to improve public health systems, resulting in better communication policies and better detection systems. However, the unstructured nature of the Internet, along with the complexity of the infectious disease domain, prevents the information extracted from being easily understood. Moreover, when dealing with languages other than English, for which some of the most common Natural Language Processing resources are not available, the correct exploitation of this data becomes even more difficult. We intend to fill these gaps proposing an ontology-driven aspect-based sentiment analysis with which to measure the general public's opinions as regards infectious diseases when expressed in Spanish by employing a case study of tweets concerning the Zika, Dengue and Chikungunya viruses in Latin America. Our proposal is based on two technologies. We first use ontologies in order to model the infectious disease domain with concepts such as risks, symptoms, transmission methods or drugs, among other concepts. We then measure the relationship between these concepts in order to determine the degree to which one concept influences other concepts. This new information is subsequently applied in order to build an aspect-based sentiment analysis model based on statistical and linguistic features. This is done by applying deep-learning models. Our proposal is available on a web platform, where users can see the sentiment for each concept at a glance and analyse how each concept influences the sentiment of the others.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA