Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Sci Rep ; 14(1): 8585, 2024 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-38615123

RESUMEN

This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies, specifically Cardano, Binance, Bitcoin, Dogecoin, Ethereum, Fantom, Matic, Shiba, and Ripple. Our goal was to conduct a psycholinguistic and emotional analysis of social media content associated with these cryptocurrencies. Such analysis can enable researchers and experts dealing with cryptocurrencies to make more informed decisions. Our work involved comparing linguistic characteristics across the diverse digital coins, shedding light on the distinctive linguistic patterns emerging in each coin's community. To achieve this, we utilized advanced text analysis techniques. Additionally, this work unveiled an understanding of the interplay between these digital assets. By examining which coin pairs are mentioned together most frequently in the dataset, we established co-mentions among different cryptocurrencies. To ensure the reliability of our findings, we initially gathered a total of 832,559 tweets from X. These tweets underwent a rigorous preprocessing stage, resulting in a refined dataset of 115,899 tweets that were used for our analysis. Overall, our research offers valuable perception into the linguistic nuances of various digital coins' online communities and provides a deeper understanding of their interactions in the cryptocurrency space.

2.
Sci Rep ; 13(1): 11441, 2023 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-37454207

RESUMEN

We introduce a novel Natural Language Processing (NLP) task called guilt detection, which focuses on detecting guilt in text. We identify guilt as a complex and vital emotion that has not been previously studied in NLP, and we aim to provide a more fine-grained analysis of it. To address the lack of publicly available corpora for guilt detection, we created VIC, a dataset containing 4622 texts from three existing emotion detection datasets that we binarized into guilt and no-guilt classes. We experimented with traditional machine learning methods using bag-of-words and term frequency-inverse document frequency features, achieving a 72% f1 score with the highest-performing model. Our study provides a first step towards understanding guilt in text and opens the door for future research in this area.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural
4.
PeerJ Comput Sci ; 9: e1282, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37346646

RESUMEN

In this article, we propose a method for the automatic retrieval of a set of semantic primitive words from an explanatory dictionary and a novel evaluation procedure for the obtained set of primitives. The approach is based on the representation of the dictionary as a directed graph with a single-objective constrained optimization problem via a genetic algorithm with the PageRank scoring model. The problem is defined as a subset selection. The algorithm is fit to search for the sets of words that should fulfil several requirements: the cardinality of the set should not exceed empirically selected limits and the PageRank word importance score is minimized with cycle prevention thresholding. In the experiments, we used the WordNet dictionary for English. The proposed method is an improvement over the previous state-of-the-art solutions.

5.
PLoS One ; 17(7): e0267590, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35857768

RESUMEN

The analysis of an author's writing style implies the characterization and identification of the style in terms of a set of features commonly called linguistic features. The analysis can be extrinsic, where the style of an author can be compared with other authors, or intrinsic, where the style of an author is identified through different stages of his life. Intrinsic analysis has been used, for example, to detect mental illness and the effects of aging. A key element of the analysis is the style markers used to model the author's writing patterns. The style markers should handle diachronic changes and be thematic independent. One of the most commonly used style marker in extrinsic style analysis is n-gram. In this paper, we present the evaluation of traditional n-grams (words and characters) and dependency tree syntactic n-grams to solve the task of detecting changes in writing style over time. Our corpus consisted of novels by eleven English-speaking authors. The novels of each author were organized chronologically from the oldest to the most recent work according to the date of publication. Subsequently, two stages were defined: initial and final. In each stage three novels were assigned, novels of the initial stage corresponded to the oldest and those at the final stage to the most recent novels. To analyze changes in the writing style, novels were characterized by using four types of n-grams: characters, words, Part-Of-Speech (POS) tags and syntactic relations n-grams. Experiments were performed with a Logistic Regression classifier. Dimension reduction techniques such as Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) algorithms were evaluated. The results obtained with the different n-grams indicated that all authors presented significant changes in writing style over time. In addition, representations using n-grams of syntactic relations have achieved competitive results among different authors.


Asunto(s)
Lingüística , Escritura , Lenguaje , Lingüística/métodos , Semántica , Aprendizaje Automático Supervisado
6.
PeerJ Comput Sci ; 8: e896, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35494831

RESUMEN

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

7.
Sensors (Basel) ; 19(6)2019 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-30909621

RESUMEN

Wireless sensor networks (WSNs) consist of a large number of small devices or nodes, called micro controller units (MCUs) and located in homes and/or offices, to be operated through the internet from anywhere, making these devices smarter and more efficient. Quality of service routing is one of the critical challenges in WSNs, especially in surveillance systems. To improve the efficiency of the network, in this article we proposes a distributed learning fractal algorithm (DFLA) to design the control topology of a wireless sensor network (WSN), whose nodes are the MCUs distributed in a physical space and which are connected to share parameters of the sensors such as concentrations of C O 2 , humidity, temperature within the space or adjustment of the intensity of light inside and outside the home or office. For this, we start defining the production rules of the L-systems to generate the Hilbert fractal, since these rules facilitate the generation of this fractal, which is a fill-space curve. Then, we model the optimization of a centralized control topology of WSNs and proposed a DFLA to find the best two nodes where a device can find the highly reliable link between these nodes. Thus, we propose a software defined network (SDN) with strong mobility since it can be reconfigured depending on the amount of nodes, also we employ a target coverage because distributed learning fractal algorithm (DLFA) only consider reliable links among devices. Finally, through laboratory tests and computer simulations, we demonstrate the effectiveness of our approach by means of a fractal routing in WSNs, by using a large amount of WSNs devices (from 16 to 64 sensors) for real time monitoring of different parameters, in order to make efficient WSNs and its application in a forthcoming Smart City.

8.
Comput Intell Neurosci ; 2016: 1638936, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27795703

RESUMEN

We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available.


Asunto(s)
Autoria , Formación de Concepto/fisiología , Lenguaje , Redes Neurales de la Computación , Semántica , Medios de Comunicación Sociales , Adolescente , Adulto , Factores de Edad , Minería de Datos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Factores Sexuales , Terminología como Asunto , Factores de Tiempo , Vocabulario , Adulto Joven
9.
Sensors (Basel) ; 16(9)2016 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-27589740

RESUMEN

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...