Pesquisa | Portal Regional da BVS

Zipf's law holds for phrases, not words.

Ryland Williams, Jake; Lessard, Paul R; Desu, Suma; Clark, Eric M; Bagrow, James P; Danforth, Christopher M; Sheridan Dodds, Peter.

Sci Rep ; 5: 12209, 2015 Aug 11.

Artigo em Inglês | MEDLINE | ID: mdl-26259699

RESUMO

With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.

Assuntos

Mineração de Dados/métodos , Idioma , Modelos Teóricos , Humanos

Reply to Garcia et al.: Common mistakes in measuring frequency-dependent word characteristics.

Dodds, Peter Sheridan; Clark, Eric M; Desu, Suma; Frank, Morgan R; Reagan, Andrew J; Williams, Jake Ryland; Mitchell, Lewis; Harris, Kameron Decker; Kloumann, Isabel M; Bagrow, James P; Megerdoomian, Karine; McMahon, Matthew T; Tivnan, Brian F; Danforth, Christopher M.

Proc Natl Acad Sci U S A ; 112(23): E2984-5, 2015 Jun 09.

Artigo em Inglês | MEDLINE | ID: mdl-25997446

Assuntos

Viés , Emoções , Idioma , Humanos

Human language reveals a universal positivity bias.

Proc Natl Acad Sci U S A ; 112(8): 2389-94, 2015 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-25675475

RESUMO

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.

Assuntos

Viés , Emoções , Idioma , Humanos , Fatores de Tempo

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA