Your browser doesn't support javascript.
loading
Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.
Eichstaedt, Johannes C; Kern, Margaret L; Yaden, David B; Schwartz, H A; Giorgi, Salvatore; Park, Gregory; Hagan, Courtney A; Tobolsky, Victoria A; Smith, Laura K; Buffone, Anneke; Iwry, Jonathan; Seligman, Martin E P; Ungar, Lyle H.
Afiliação
  • Eichstaedt JC; Department of Psychology, Stanford University.
  • Kern ML; Melbourne Graduate School of Education, The University of Melbourne.
  • Yaden DB; Department of Psychiatry and Behavioral Sciences, Johns Hopkins Medicine.
  • Schwartz HA; Department of Computer Science, Stony Brook University.
  • Giorgi S; Department of Psychology, University of Pennsylvania.
  • Park G; Department of Psychology, University of Pennsylvania.
  • Hagan CA; Department of Psychology, University of Pennsylvania.
  • Tobolsky VA; Department of Psychology, University of Pennsylvania.
  • Smith LK; Department of Psychology, University of Pennsylvania.
  • Buffone A; Department of Psychology, University of Pennsylvania.
  • Iwry J; Department of Psychology, University of Pennsylvania.
  • Seligman MEP; Department of Psychology, University of Pennsylvania.
  • Ungar LH; Department of Psychology, University of Pennsylvania.
Psychol Methods ; 26(4): 398-427, 2021 Aug.
Article em En | MEDLINE | ID: mdl-34726465
ABSTRACT
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary

methods:

Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Vocabulário / Linguística Tipo de estudo: Guideline Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Vocabulário / Linguística Tipo de estudo: Guideline Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article