Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
PLoS One ; 17(3): e0265602, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35298556

RESUMEN

We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. In fact, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the years 2018-2020.


Asunto(s)
Medios de Comunicación , Medios de Comunicación Sociales , Odio , Humanos , Lenguaje , Habla
2.
Appl Netw Sci ; 6(1): 96, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34957317

RESUMEN

Twitter data exhibits several dimensions worth exploring: a network dimension in the form of links between the users, textual content of the tweets posted, and a temporal dimension as the time-stamped sequence of tweets and their retweets. In the paper, we combine analyses along all three dimensions: temporal evolution of retweet networks and communities, contents in terms of hate speech, and discussion topics. We apply the methods to a comprehensive set of all Slovenian tweets collected in the years 2018-2020. We find that politics and ideology are the prevailing topics despite the emergence of the Covid-19 pandemic. These two topics also attract the highest proportion of unacceptable tweets. Through time, the membership of retweet communities changes, but their topic distribution remains remarkably stable. Some retweet communities are strongly linked by external retweet influence and form super-communities. The super-community membership closely corresponds to the topic distribution: communities from the same super-community are very similar by the topic distribution, and communities from different super-communities are quite different in terms of discussion topics. However, we also find that even communities from the same super-community differ considerably in the proportion of unacceptable tweets they post.

3.
Sci Rep ; 11(1): 22083, 2021 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-34764344

RESUMEN

Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model, trained and fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of "pure haters", meant as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents' community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of the number of comments and time. Our results show that, coherently with Godwin's law, online debates tend to degenerate towards increasingly toxic exchanges of views.

4.
PLoS One ; 16(9): e0256175, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34469456

RESUMEN

Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select several timepoints which exhibit large differences between the communities. For community detection, we propose a two-stage approach. In the first stage, we apply an enhanced Louvain algorithm, called Ensemble Louvain, to find stable communities. In the second stage, we form influence links between these communities, and identify linked super-communities. For the detected communities, we compute internal and external influence, and for individual users, the retweet h-index influence. We apply the proposed approach to three years of Twitter data of all Slovenian tweets. The analysis shows that the Slovenian tweetosphere is dominated by politics, that the left-leaning communities are larger, but that the right-leaning communities and users exhibit significantly higher impact. An interesting observation is that retweet networks change relatively gradually, despite such events as the emergence of the Covid-19 pandemic or the change of government.


Asunto(s)
COVID-19/epidemiología , Redes Sociales en Línea , Pandemias , SARS-CoV-2 , Medios de Comunicación Sociales , Humanos
5.
PLoS One ; 13(3): e0194317, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29534112

RESUMEN

Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample datasets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios.


Asunto(s)
Interpretación Estadística de Datos , Minería de Datos/métodos , Opinión Pública , Medios de Comunicación Sociales , Humanos , Aprendizaje Automático , Factores de Tiempo
6.
Appl Netw Sci ; 3(1): 40, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30839812

RESUMEN

The 2008 financial crisis unveiled the intrinsic failures of the financial system as we know it. As a consequence, impact investing started to receive increasing attention, as evidenced by the high market growth rates. The goal of impact investment is to generate social and environmental impact alongside a financial return. In this paper we identify the main players in the sector and how they interact and communicate with each other. We use Twitter as a proxy of the impact investing market, and analyze relevant tweets posted over a period of ten months. We apply network, contents and sentiment analysis on the acquired dataset. Our study shows that Twitter users exhibit favourable leaning (predominantly neutral or positive) towards impact investing. Retweet communities are decentralised and include users from a variety of sectors. Despite some basic common vocabulary used by all retweet communities identified, the vocabulary and the topics discussed by each community vary largely. We note that an additional effort should be made in raising awareness about the sector, especially by policymakers and media outlets. The role of investors and the academia is also discussed, as well as the emergence of hybrid business models within the sector and its connections to the tech industry. This paper extends our previous study, one of the first analyses of Twitter activities in the impact investing market.

7.
Appl Netw Sci ; 3(1): 44, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30839819

RESUMEN

Creating a map of actors and their leanings is important for policy makers and stakeholders in the European Commission's 'Better Regulation Agenda'. We explore publicly available information about the European lobby organizations from the Transparency Register, and from the open public consultations in the area of Banking and Finance. We consider three complementary types of information about lobbying organizations: (i) their formal categorization in the Transparency Register, (ii) their responses to the public consultations, and (iii) their self-declared goals and activities. We consider responses to the consultations as the most relevant indicator of the actual leaning of an individual lobbyist. We partition and cluster the organizations according to their demonstrated interests and the similarities among their responses. Thus each lobby organization is assigned a profile which shows its prevailing interest in consultations' topics, similar organizations in interests and responses, and a prototypical question and answer. We combine methods from network analysis, clustering, and text mining to obtain these profiles. Due to the non-homogeneous consultations, we find that it is crucial to first construct a response network based on interests in consultations topics, and only then proceed with more detailed analysis of the actual answers to consultations. The results provide a first step in the understanding of how lobby organizations engage in the policy making process.

8.
Comput Soc Netw ; 4(1): 6, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29266132

RESUMEN

Social media are an important source of information about the political issues, reflecting, as well as influencing, public mood. We present an analysis of Twitter data, collected over 6 weeks before the Brexit referendum, held in the UK in June 2016. We address two questions: what is the relation between the Twitter mood and the referendum outcome, and who were the most influential Twitter users in the pro- and contra-Brexit camps? First, we construct a stance classification model by machine learning methods, and are then able to predict the stance of about one million UK-based Twitter users. The demography of Twitter users is, however, very different from the demography of the voters. By applying a simple age-adjusted mapping to the overall Twitter stance, the results show the prevalence of the pro-Brexit voters, something unexpected by most of the opinion polls. Second, we apply the Hirsch index to estimate the influence, and rank the Twitter users from both camps. We find that the most productive Twitter users are not the most influential, that the pro-Brexit camp was four times more influential, and had considerably larger impact on the campaign than the opponents. Third, we find that the top pro-Brexit communities are considerably more polarized than the contra-Brexit camp. These results show that social media provide a rich resource of data to be exploited, but accumulated knowledge and lessons learned from the opinion polls have to be adapted to the new data sources.

9.
PLoS One ; 12(2): e0173151, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28235103

RESUMEN

We investigate the relationship between social media, Twitter in particular, and stock market. We provide an in-depth analysis of the Twitter volume and sentiment about the 30 companies in the Dow Jones Industrial Average index, over a period of three years. We focus on Earnings Announcements and show that there is a considerable difference with respect to when the announcements are made: before the market opens or after the market closes. The two different timings of the Earnings Announcements were already investigated in the financial literature, but not yet in the social media. We analyze the differences in terms of the Twitter volumes, cumulative abnormal returns, trade returns, and earnings surprises. We report mixed results. On the one hand, we show that the Twitter sentiment (the collective opinion of the users) on the day of the announcement very well reflects the stock moves on the same day. We demonstrate this by applying the event study methodology, where the polarity of the Earnings Announcements is computed from the Twitter sentiment. Cumulative abnormal returns are high (2-4%) and statistically significant. On the other hand, we find only weak predictive power of the Twitter sentiment one day in advance. It turns out that it is important how to account for the announcements made after the market closes. These after-hours announcements draw high Twitter activity immediately, but volume and price changes in trading are observed only on the next day. On the day before the announcements, the Twitter volume is low, and the sentiment has very weak predictive power. A useful lesson learned is the importance of the proper alignment between the announcements, trading and Twitter data.


Asunto(s)
Renta/estadística & datos numéricos , Inversiones en Salud/estadística & datos numéricos , Medios de Comunicación Sociales , Emociones , Predicción , Humanos
10.
PLoS One ; 11(11): e0166586, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27835683

RESUMEN

We study the cohesion within and the coalitions between political groups in the Eighth European Parliament (2014-2019) by analyzing two entirely different aspects of the behavior of the Members of the European Parliament (MEPs) in the policy-making processes. On one hand, we analyze their co-voting patterns and, on the other, their retweeting behavior. We make use of two diverse datasets in the analysis. The first one is the roll-call vote dataset, where cohesion is regarded as the tendency to co-vote within a group, and a coalition is formed when the members of several groups exhibit a high degree of co-voting agreement on a subject. The second dataset comes from Twitter; it captures the retweeting (i.e., endorsing) behavior of the MEPs and implies cohesion (retweets within the same group) and coalitions (retweets between groups) from a completely different perspective. We employ two different methodologies to analyze the cohesion and coalitions. The first one is based on Krippendorff's Alpha reliability, used to measure the agreement between raters in data-analysis scenarios, and the second one is based on Exponential Random Graph Models, often used in social-network analysis. We give general insights into the cohesion of political groups in the European Parliament, explore whether coalitions are formed in the same way for different policy areas, and examine to what degree the retweeting behavior of MEPs corresponds to their co-voting patterns. A novel and interesting aspect of our work is the relationship between the co-voting and retweeting patterns.


Asunto(s)
Unión Europea/organización & administración , Política , Medios de Comunicación Sociales/estadística & datos numéricos , Conducta Cooperativa , Conjuntos de Datos como Asunto , Humanos , Formulación de Políticas , Sistemas Políticos
11.
PLoS One ; 11(5): e0155036, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27149621

RESUMEN

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.


Asunto(s)
Internet , Multilingüismo , Humanos
12.
Appl Netw Sci ; 1(1): 2, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-30533494

RESUMEN

Analyzing information from social media to uncover underlying real-world phenomena is becoming widespread. The goal of this paper is to evaluate the role of Twitter in identifying communities of influence when the 'ground truth' is known. We consider the European Parliament (EP) Twitter users during a period of one year, in which they posted over 560,000 tweets. We represent the influence on Twitter by the number of retweets users get. We construct two networks of influence: (i) core, where both users are the EP members, and (ii) extended, where one user can be outside the EP. We compare the detected communities in both networks to the 'ground truth': the political group, country, and language of the EP members. The results show that the core network closely matches the political groups, while the extended network best reflects the country of origin. This provides empirical evidence that the formation of retweet networks and community detection are appropriate tools to reveal real-world relationships, and can be used to uncover hidden properties when the 'ground truth' is not known.

13.
PLoS One ; 10(12): e0144296, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26641093

RESUMEN

There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.


Asunto(s)
Emociones , Medios de Comunicación Sociales , Europa (Continente) , Humanos , Internet , Terminología como Asunto
14.
PLoS One ; 10(9): e0138740, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26422473

RESUMEN

According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homogeneous communities (echo-chambers) around specific worldviews. Such a scenario has been shown to be a vivid environment for the diffusion of false claim. Not rarely, viral phenomena trigger naive (and funny) social responses-e.g., the recent case of Jade Helm 15 where a simple military exercise turned out to be perceived as the beginning of the civil war in the US. In this work, we address the emotional dynamics of collective debates around distinct kinds of information-i.e., science and conspiracy news-and inside and across their respective polarized communities. We find that for both kinds of content the longer the discussion the more the negativity of the sentiment. We show that comments on conspiracy posts tend to be more negative than on science posts. However, the more the engagement of users, the more they tend to negative commenting (both on science and conspiracy). Finally, zooming in at the interaction among polarized communities, we find a general negative pattern. As the number of comments increases-i.e., the discussion becomes longer-the sentiment of the post is more and more negative.


Asunto(s)
Comunicación , Emociones , Femenino , Humanos , Masculino
15.
PLoS One ; 10(9): e0138441, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26390434

RESUMEN

Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known "event study" from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the "event study" methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1-2%), but the dependence is statistically significant for several days after the events.


Asunto(s)
Blogging , Administración Financiera , Internet , Medios de Comunicación Sociales , Apoyo Social , Comercio , Humanos
16.
PLoS One ; 10(7): e0131184, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26161795

RESUMEN

Large-scale data from social media have a significant potential to describe complex phenomena in the real world and to anticipate collective behaviors such as information spreading and social trends. One specific case of study is represented by the collective attention to the action of political parties. Not surprisingly, researchers and stakeholders tried to correlate parties' presence on social media with their performances in elections. Despite the many efforts, results are still inconclusive since this kind of data is often very noisy and significant signals could be covered by (largely unknown) statistical fluctuations. In this paper we consider the number of tweets (tweet volume) of a party as a proxy of collective attention to the party, identify the dynamics of the volume, and show that this quantity has some information on the election outcome. We find that the distribution of the tweet volume for each party follows a log-normal distribution with a positive autocorrelation of the volume over short terms, which indicates the volume has large fluctuations of the log-normal distribution yet with a short-term tendency. Furthermore, by measuring the ratio of two consecutive daily tweet volumes, we find that the evolution of the daily volume of a party can be described by means of a geometric Brownian motion (i.e., the logarithm of the volume moves randomly with a trend). Finally, we determine the optimal period of averaging tweet volume for reducing fluctuations and extracting short-term tendencies. We conclude that the tweet volume is a good indicator of parties' success in the elections when considered over an optimal time window. Our study identifies the statistical nature of collective attention to political issues and sheds light on how to model the dynamics of collective attention in social media.


Asunto(s)
Atención , Internet/estadística & datos numéricos , Política , Medios de Comunicación Sociales/estadística & datos numéricos , Algoritmos , Difusión de la Información/métodos , Relaciones Interpersonales , Modelos Teóricos
17.
PLoS One ; 9(12): e99515, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25470498

RESUMEN

A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are their co-occurrences. We present a method to estimate the significance of co-occurrences, and a benchmark model against which their robustness is evaluated. The approach is applied to a large set of financial news, collected over a period of two years. The entities we consider are 50 countries which issue sovereign bonds, and which are insured by Credit Default Swaps (CDS) in turn. We compare the country co-occurrence networks to the CDS networks constructed from the correlations between the CDS. The results show relatively small, but significant overlap between the networks extracted from the news and those from the CDS correlations.


Asunto(s)
Algoritmos , Redes de Comunicación de Computadores , Humanos , Modelos Teóricos , Sistemas en Línea
18.
Sci Rep ; 4: 5038, 2014 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-24849598

RESUMEN

Motivated by recent financial crises, significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said regarding the influence of financial news on financial markets. We propose a novel measure of collective behaviour based on financial news on the Web, the News Cohesiveness Index (NCI), and we demonstrate that the index can be used as a financial market volatility indicator. We evaluate the NCI using financial documents from large Web news sources on a daily basis from October 2011 to July 2013 and analyse the interplay between financial markets and finance-related news. We hypothesise that strong cohesion in financial news reflects movements in the financial markets. Our results indicate that cohesiveness in financial news is highly correlated with and driven by volatility in financial markets.

19.
Int J Comput Biol Drug Des ; 7(1): 61-79, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24429503

RESUMEN

Biologists have been investigating plant defence response to virus infections; however, a comprehensive mathematical model of this complex process has not been developed. One obstacle in developing a dynamic model, useful for simulation, is the lack of kinetic data from which the model parameters could be determined. We address this problem by proposing a methodology for iterative improvement of the model parameters until the simulation results come close to the expectation of biology experts. These expectations are formalised in the form of constraints to be satisfied by the model simulations. In three iterative steps the model converged to satisfy the biology experts. There are two results of our approach: individual simulations and optimised model parameters, which provide a deeper insight into the biological system. Our constraint-driven optimisation approach allows for an efficient exploration of the dynamic behaviour of biological models and, at the same time, increases their reliability.

20.
Nucleic Acids Res ; 42(Database issue): D1167-75, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24194592

RESUMEN

GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.


Asunto(s)
Bases de Datos Genéticas , Ontología de Genes , Genes de Plantas , Anotación de Secuencia Molecular , Gráficos por Computador , Internet , Proteínas de Plantas/genética , Integración de Sistemas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...