Pesquisa | Portal Regional da BVS

1.

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains.

Muthusami, R; Mani Kandan, N; Saritha, K; Narenthiran, B; Nagaprasad, N; Ramaswamy, Krishnaraj.

Sci Rep ; 14(1): 12003, 2024 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-38796483

RESUMO

The online channel has affected many facets of an individual's identity, commercial, social policy, and culture, among others. It implies that discovering the topics on which these brief writings are focused, as well as examining the qualities of these short texts is critical. Another key issue that has been identified is the evaluation of newly discovered topics in terms of topic quality, which includes topic separation and coherence. A topic modeling method has been shown to be an outstanding aid in the linguistic interpretation of quite tiny texts. Based on the underlying strategy, topic models are divided into two categories: probabilistic methods and non-probabilistic methods. In this research, short texts are analyzed using topic models, including latent Dirichlet allocation (LDA) for probabilistic topic modeling and non-negative matrix factorization (NMF) for non-probabilistic topic modeling. A novel approach for topic evaluation is used, such as clustering methods and silhouette analysis on both models, to investigate performance in terms of quality. The experiment results indicate that the proposed evaluation method outperforms on both LDA and NMF.

2.

An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter.

Athukorala, Shalani; Mohotti, Wathsala.

Soc Netw Anal Min ; 12(1): 89, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35911485

RESUMO

Social media such as Twitter connect billions of people by allowing them to exchange their thoughts via short-text communication. Topic modelling is a widely used technique for analysing short texts. Discovering topic clusters in short-text collections faces issues with distance-based, density-based and dimensionality reduction-based methods due to their higher dimensionality and short length which results in extremely sparse text representation matrices. We propose the 'neighbourhood-based assistance'-driven non-negative matrix factorization (NMF) method to handle high-dimensional sparse short-text representation with lower-dimensional projection effectively. We utilized NMF that aligned with the natural non-negativity of text data coupled with the symmetric document affinity information to identify topic distribution in the short text. Neighbourhood information within documents is captured using Jaccard similarity to assist information loss, resulting in higher-to-lower-dimensional projection. Experimental results with Twitter data sets show that the proposed approach is able to attain high accuracy compared to state-of-the-art methods quantitatively, while qualitative analysis with case studies validates the ability of the proposed approach in generating meaningful topic clusters.

3.

Named entity disambiguation in short texts over knowledge graphs.

Bouarroudj, Wissem; Boufaida, Zizette; Bellatreche, Ladjel.

Knowl Inf Syst ; 64(2): 325-351, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35001999

RESUMO

The ever-growing usage of knowledge graphs (KGs) positions named entity disambiguation (NED) at the heart of designing accurate KG-driven systems such as query answering systems (QAS). According to the current research, most studies dealing with NED on KGs involve long texts, which is not the case of short text fragments, identified by their limited contexts. The accuracy of QASs strongly depends on the management of such short text. This limitation motivates this paper, which studies the NED problem on KGs, involving only short texts. First, we propose a NED approach including the following steps: (i) context expansion using WordNet to measure its similarity to the resource context. (ii) Exploiting coherence between entities in queries that contain more than one entity, such as "Is Michelle Obama the wife of Barack Obama?". (iii) Taking into account the relations between words to calculate their similarity with the properties of a resource. (iv) the use of syntactic features. The NED solution approach is compared to state-of-the-art approaches using five datasets. The experimental results show that our approach outperforms these systems by 27% in the F-measure. A system called Welink, implementing our proposal, is available on GitHub, and it is also accessible via a REST API.

4.

The Affective Norms for Polish Short Texts (ANPST) Database Properties and Impact of Participants' Population and Sex on Affective Ratings.

Imbir, Kamil K.

Front Psychol ; 8: 855, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28611707

RESUMO

The Affective Norms for Polish Short Texts (ANPST) dataset (Imbir, 2016d) is a list of 718 affective sentence stimuli with known affective properties with respect to subjectively perceived valence, arousal, dominance, origin, subjective significance, and source. This article examines the reliability of the ANPST and the impact of population type and sex on affective ratings. The ANPST dataset was introduced to provide a recognized method of eliciting affective states with linguistic stimuli more complex than single words and that included contextual information and thus are less ambiguous in interpretation than single word. Analysis of the properties of the ANPST dataset showed that norms collected are reliable in terms of split-half estimation and that the distributions of ratings are similar to those obtained in other affective norms studies. The pattern of correlations was the same as that found in analysis of an affective norms dataset for words based on the same six variables. Female psychology students' valence ratings were also more polarized than those of their female student peers studying other subjects, but arousal ratings were only higher for negative words. Differences also appeared for all other measured dimensions. Women's valence ratings were found to be more polarized and arousal ratings were higher than those made by men, and differences were also present for dominance, origin, and subjective significance. The ANPST is the first Polish language list of sentence stimuli and could easily be adapted for other languages and cultures.

5.

Affective Norms for 718 Polish Short Texts (ANPST): Dataset with Affective Ratings for Valence, Arousal, Dominance, Origin, Subjective Significance and Source Dimensions.

Imbir, Kamil K.

Front Psychol ; 7: 1030, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27458420

RESUMO

Affective sciences are of burgeoning interest and are attracting more and more research attention. Three components of stimuli meaning have traditionally been distinguished: valence (degree of pleasantness), arousal (degree of intensity of sensations), and dominance (degree of control over sensations). Recently, another three dimensions have been introduced to measure qualities connected to the emotion-duality model: origin (the main component originating in the heart or in the mind), subjective significance (the degree of the subjective goal's relevance), and source (the location of the stimuli evoking the state). All six affective dimensions were assessed in our study of 718 Polish short texts (sentences of 5-23 words and 36-133 characters in length) describing situations or states in a way that can be referenced to an individual's experience. Assessments were carried out by 148 psychology students (all women for 108 sentences) and 2,091 students of different faculties (social science, engineering, life science, and science) from Warsaw colleges and universities (1,061 women and 1,030 men for all 718 sentences). Assessing sets of sentences for emotional response is especially useful for researchers interested in emotion elicitation through the use of a phrase such as "imagine that " or by simply reading emotionally charged material that is more complex and that provides better context than single pictures or words.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA