RESUMEN
Online misogyny has become a fixture in female politicians' lives. Backlash theory suggests that it may represent a threat response prompted by female politicians' counterstereotypical, power-seeking behaviors. We investigated this hypothesis by analyzing Twitter references to Hillary Clinton before, during, and after her presidential campaign. We collected a corpus of over 9 million tweets from 2014 to 2018 that referred to Hillary Clinton, and employed an interrupted time series analysis on the relative frequency of misogynistic language within the corpus. Prior to 2015, the level of misogyny associated with Clinton decreased over time, but this trend reversed when she announced her presidential campaign. During the campaign, misogyny steadily increased and only plateaued after the election, when the threat of her electoral success had subsided. These findings are consistent with the notion that online misogyny towards female political nominees is a form of backlash prompted by their ambition for power in the political arena.
Asunto(s)
Medios de Comunicación Sociales , Humanos , Femenino , Política , Lenguaje , Personal Administrativo , Análisis de Series de Tiempo InterrumpidoRESUMEN
The murder of George Floyd by police in May 2020 sparked international protests and brought unparalleled levels of attention to the Black Lives Matter movement. As we show, his death set record levels of activity and amplification on Twitter, prompted the saddest day in the platform's history, and caused his name to appear among the ten most frequently used phrases in a day, where he is the only individual to have ever received that level of attention who was not known to the public earlier that same week. Importantly, we find that the Black Lives Matter movement's rhetorical strategy to connect and repeat the names of past Black victims of police violence-foregrounding racial injustice as an ongoing pattern rather than a singular event-was exceptionally effective following George Floyd's death: attention given to him extended to over 185 prior Black victims, more than other past moments in the movement's history. We contextualize this rising tide of attention among 12 years of racial justice activism on Twitter, demonstrating how activists and allies have used attention and amplification as a recurring tactic to lift and memorialize the names of Black victims of police violence. Our results show how the Black Lives Matter movement uses social media to center past instances of police violence at an unprecedented scale and speed, while still advancing the racial justice movement's longstanding goal to "say their names."
Asunto(s)
Negro o Afroamericano , Policia , Humanos , Masculino , Grupos Raciales , ViolenciaRESUMEN
BACKGROUND: Radiomics, defined as quantitative features extracted from images, provide a non-invasive means of assessing malignant versus benign pulmonary nodules. In this study, we evaluate the consistency with which perinodular radiomics extracted from low-dose computed tomography images serve to identify malignant pulmonary nodules. MATERIALS AND METHODS: Using the National Lung Screening Trial (NLST), we selected individuals with pulmonary nodules between 4mm to 20mm in diameter. Nodules were segmented to generate four distinct datasets; 1) a Tumor dataset containing tumor-specific features, 2) a 10 mm Band dataset containing parenchymal features between the segmented nodule boundary and 10mm out from the boundary, 3) a 15mm Band dataset, and 4) a Tumor Size dataset containing the maximum nodule diameter. Models to predict malignancy were constructed using support-vector machine (SVM), random forest (RF), and least absolute shrinkage and selection operator (LASSO) approaches. Ten-fold cross validation with 10 repetitions per fold was used to evaluate the performance of each approach applied to each dataset. RESULTS: With respect to the RF, the Tumor, 10mm Band, and 15mm Band datasets achieved areas under the receiver-operator curve (AUC) of 84.44%, 84.09%, and 81.57%, respectively. Significant differences in performance were observed between the Tumor and 15mm Band datasets (adj. p-value <0.001). However, when combining tumor-specific features with perinodular features, the 10mm Band + Tumor and 15mm Band + Tumor datasets (AUC 87.87% and 86.75%, respectively) performed significantly better than the Tumor Size dataset (66.76%) or the Tumor dataset. Similarly, the AUCs from the SVM and LASSO were 84.71% and 88.91%, respectively, for the 10mm Band + Tumor. CONCLUSIONS: The combined 10mm Band + Tumor dataset improved the differentiation between benign and malignant lung nodules compared to the Tumor datasets across all methodologies. This demonstrates that parenchymal features capture novel diagnostic information beyond that present in the nodule itself. (data agreement: NLST-163).
Asunto(s)
Adenocarcinoma del Pulmón , Neoplasias Pulmonares , Nódulos Pulmonares Múltiples , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias Pulmonares/patología , Pulmón/patología , Adenocarcinoma del Pulmón/patología , Nódulos Pulmonares Múltiples/patología , Tomografía Computarizada por Rayos X/métodos , Estudios RetrospectivosRESUMEN
BACKGROUND: Mental health challenges are thought to affect approximately 10% of the global population each year, with many of those affected going untreated because of the stigma and limited access to services. As social media lowers the barrier for joining difficult conversations and finding supportive groups, Twitter is an open source of language data describing the changing experience of a stigmatized group. OBJECTIVE: By measuring changes in the conversation around mental health on Twitter, we aim to quantify the hypothesized increase in discussions and awareness of the topic as well as the corresponding reduction in stigma around mental health. METHODS: We explored trends in words and phrases related to mental health through a collection of 1-, 2-, and 3-grams parsed from a data stream of approximately 10% of all English tweets from 2010 to 2021. We examined temporal dynamics of mental health language and measured levels of positivity of the messages. Finally, we used the ratio of original tweets to retweets to quantify the fraction of appearances of mental health language that was due to social amplification. RESULTS: We found that the popularity of the phrase mental health increased by nearly two orders of magnitude between 2012 and 2018. We observed that mentions of mental health spiked annually and reliably because of mental health awareness campaigns as well as unpredictably in response to mass shootings, celebrities dying by suicide, and popular fictional television stories portraying suicide. We found that the level of positivity of messages containing mental health, while stable through the growth period, has declined recently. Finally, we observed that since 2015, mentions of mental health have become increasingly due to retweets, suggesting that the stigma associated with the discussion of mental health on Twitter has diminished with time. CONCLUSIONS: These results provide useful texture regarding the growing conversation around mental health on Twitter and suggest that more awareness and acceptance has been brought to the topic compared with past years.
RESUMEN
Measuring the specific kind, temporal ordering, diversity, and turnover rate of stories surrounding any given subject is essential to developing a complete reckoning of that subject's historical impact. Here, we use Twitter as a distributed news and opinion aggregation source to identify and track the dynamics of the dominant day-scale stories around Donald Trump, the 45th President of the United States. Working with a data set comprising around 20 billion 1-grams, we first compare each day's 1-gram and 2-gram usage frequencies to those of a year before, to create day- and week-scale timelines for Trump stories for 2016-2021. We measure Trump's narrative control, the extent to which stories have been about Trump or put forward by Trump. We then quantify story turbulence and collective chronopathy-the rate at which a population's stories for a subject seem to change over time. We show that 2017 was the most turbulent overall year for Trump. In 2020, story generation slowed dramatically during the first two major waves of the COVID-19 pandemic, with rapid turnover returning first with the Black Lives Matter protests following George Floyd's murder and then later by events leading up to and following the 2020 US presidential election, including the storming of the US Capitol six days into 2021. Trump story turnover for 2 months during the COVID-19 pandemic was on par with that of 3 days in September 2017. Our methods may be applied to any well-discussed phenomenon, and have potential to enable the computational aspects of journalism, history, and biography.
Asunto(s)
Política , COVID-19/epidemiología , COVID-19/patología , COVID-19/virología , Humanos , SARS-CoV-2/aislamiento & purificación , Estados UnidosRESUMEN
In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument's potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest.
RESUMEN
We study collective attention paid towards hurricanes through the lens of n-grams on Twitter, a social media platform with global reach. Using hurricane name mentions as a proxy for awareness, we find that the exogenous temporal dynamics are remarkably similar across storms, but that overall collective attention varies widely even among storms causing comparable deaths and damage. We construct 'hurricane attention maps' and observe that hurricanes causing deaths on (or economic damage to) the continental United States generate substantially more attention in English language tweets than those that do not. We find that a hurricane's Saffir-Simpson wind scale category assignment is strongly associated with the amount of attention it receives. Higher category storms receive higher proportional increases of attention per proportional increases in number of deaths or dollars of damage, than lower category storms. The most damaging and deadly storms of the 2010s, Hurricanes Harvey and Maria, generated the most attention and were remembered the longest, respectively. On average, a category 5 storm receives 4.6 times more attention than a category 1 storm causing the same number of deaths and economic damage.
Asunto(s)
Tormentas Ciclónicas/estadística & datos numéricos , Difusión de la Información/métodos , Desastres Naturales , Medios de Comunicación Sociales/estadística & datos numéricos , Humanos , Estados UnidosRESUMEN
The past decade has witnessed a marked increase in the use of social media by politicians, most notably exemplified by the 45th President of the United States (POTUS), Donald Trump. On Twitter, POTUS messages consistently attract high levels of engagement as measured by likes, retweets, and replies. Here, we quantify the balance of these activities, also known as "ratios", and study their dynamics as a proxy for collective political engagement in response to presidential communications. We find that raw activity counts increase during the period leading up to the 2016 election, accompanied by a regime change in the ratio of retweets-to-replies connected to the transition between campaigning and governing. For the Trump account, we find words related to fake news and the Mueller inquiry are more common in tweets with a high number of replies relative to retweets. Finally, we find that Barack Obama consistently received a higher retweet-to-reply ratio than Donald Trump. These results suggest Trump's Twitter posts are more often controversial and subject to enduring engagement as a given news cycle unfolds.
Asunto(s)
Comunicación , Política , Medios de Comunicación Sociales , Humanos , Estados UnidosRESUMEN
Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, Arabic, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio': The balance of retweets to organic messages. We find that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content. By the end of 2019, the contagion ratios for half of the top 30 languages, including English and Spanish, had reached above 1-the naive contagion threshold. In 2019, the top 5 languages with the highest average daily ratios were, in order, Thai (7.3), Hindi, Tamil, Urdu, and Catalan, while the bottom 5 were Russian, Swedish, Esperanto, Cebuano, and Finnish (0.26). Further, we show that over time, the contagion ratios for most common languages are growing more strongly than those of rare languages.
RESUMEN
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information on the temporal components of this relationship. Using the districts of Hong Kong over multiple census years as a case study, we demonstrate that there are differences in how wealth indicator variables are associated with longevity in (a) areas that are affluent but neighbored by socially deprived districts versus (b) wealthy areas surrounded by similarly wealthy districts. We also show that the inclusion of spatially-distributed variables reduces uncertainty in mortality rate predictions in each census year when compared with a baseline model. Our results suggest that geographic mortality models should incorporate nonlocal information (e.g., spatial neighbors) to lower the variance of their mortality estimates, and point to a more in-depth analysis of sociospatial spillover effects on mortality rates.
Asunto(s)
Mortalidad , Factores Socioeconómicos , Teorema de Bayes , Hong Kong/epidemiología , Humanos , Modelos EstadísticosRESUMEN
In confronting the global spread of the coronavirus disease COVID-19 pandemic we must have coordinated medical, operational, and political responses. In all efforts, data is crucial. Fundamentally, and in the possible absence of a vaccine for 12 to 18 months, we need universal, well-documented testing for both the presence of the disease as well as confirmed recovery through serological tests for antibodies, and we need to track major socioeconomic indices. But we also need auxiliary data of all kinds, including data related to how populations are talking about the unfolding pandemic through news and stories. To in part help on the social media side, we curate a set of 2000 day-scale time series of 1- and 2-grams across 24 languages on Twitter that are most 'important' for April 2020 with respect to April 2019. We determine importance through our allotaxonometric instrument, rank-turbulence divergence. We make some basic observations about some of the time series, including a comparison to numbers of confirmed deaths due to COVID-19 over time. We broadly observe across all languages a peak for the language-specific word for 'virus' in January 2020 followed by a decline through February and then a surge through March and April. The world's collective attention dropped away while the virus spread out from China. We host the time series on Gitlab, updating them on a daily basis while relevant. Our main intent is for other researchers to use these time series to enhance whatever analyses that may be of use during the pandemic as well as for retrospective investigations.
Asunto(s)
COVID-19/psicología , Pandemias/estadística & datos numéricos , Medios de Comunicación Sociales/tendencias , Atención , COVID-19/etiología , Infecciones por Coronavirus/etiología , Infecciones por Coronavirus/psicología , Humanos , Lenguaje , Estudios Retrospectivos , SARS-CoV-2/patogenicidadRESUMEN
Sentiment-aware intelligent systems are essential to a wide array of applications. These systems are driven by language models which broadly fall into two paradigms: Lexicon-based and contextual. Although recent contextual models are increasingly dominant, we still see demand for lexicon-based models because of their interpretability and ease of use. For example, lexicon-based models allow researchers to readily determine which words and phrases contribute most to a change in measured sentiment. A challenge for any lexicon-based approach is that the lexicon needs to be routinely expanded with new words and expressions. Here, we propose two models for automatic lexicon expansion. Our first model establishes a baseline employing a simple and shallow neural network initialized with pre-trained word embeddings using a non-contextual approach. Our second model improves upon our baseline, featuring a deep Transformer-based network that brings to bear word definitions to estimate their lexical polarity. Our evaluation shows that both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.