Your browser doesn't support javascript.
loading
Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods.
Jaidka, Kokil; Giorgi, Salvatore; Schwartz, H Andrew; Kern, Margaret L; Ungar, Lyle H; Eichstaedt, Johannes C.
Afiliación
  • Jaidka K; Department of Communications and New Media, National University of Singapore, Singapore 117416; jaidka@nus.edu.sg johannes.stanford@gmail.com.
  • Giorgi S; Centre for Trusted Internet and Community, National University of Singapore, Singapore 117416.
  • Schwartz HA; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
  • Kern ML; Department of Computer Science, Stony Brook University, Stony Brook, NY 11794.
  • Ungar LH; Melbourne Graduate School of Education, The University of Melbourne, Parkville, VIC 3010, Australia.
  • Eichstaedt JC; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
Proc Natl Acad Sci U S A ; 117(19): 10165-10171, 2020 05 12.
Article en En | MEDLINE | ID: mdl-32341156
ABSTRACT
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Article