Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Open Mind (Camb) ; 8: 723-738, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38828431

RESUMO

Recent advances in Large Language Models (LLMs) have raised the question of replacing human subjects with LLM-generated data. While some believe that LLMs capture the "wisdom of the crowd"-due to their vast training data-empirical evidence for this hypothesis remains scarce. We present a novel methodological framework to test this: the "number needed to beat" (NNB), which measures how many humans are needed for a sample's quality to rival the quality achieved by GPT-4, a state-of-the-art LLM. In a series of pre-registered experiments, we collect novel human data and demonstrate the utility of this method for four psycholinguistic datasets for English. We find that NNB > 1 for each dataset, but also that NNB varies across tasks (and in some cases is quite small, e.g., 2). We also introduce two "centaur" methods for combining LLM and human data, which outperform both stand-alone LLMs and human samples. Finally, we analyze the trade-offs in data cost and quality for each approach. While clear limitations remain, we suggest that this framework could guide decision-making about whether and how to integrate LLM-generated data into the research pipeline.

2.
Behav Res Methods ; 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38261264

RESUMO

Research on language and cognition relies extensively on psycholinguistic datasets or "norms". These datasets contain judgments of lexical properties like concreteness and age of acquisition, and can be used to norm experimental stimuli, discover empirical relationships in the lexicon, and stress-test computational models. However, collecting human judgments at scale is both time-consuming and expensive. This issue of scale is compounded for multi-dimensional norms and those incorporating context. The current work asks whether large language models (LLMs) can be leveraged to augment the creation of large, psycholinguistic datasets in English. I use GPT-4 to collect multiple kinds of semantic judgments (e.g., word similarity, contextualized sensorimotor associations, iconicity) for English words and compare these judgments against the human "gold standard". For each dataset, I find that GPT-4's judgments are positively correlated with human judgments, in some cases rivaling or even exceeding the average inter-annotator agreement displayed by humans. I then identify several ways in which LLM-generated norms differ from human-generated norms systematically. I also perform several "substitution analyses", which demonstrate that replacing human-generated norms with LLM-generated norms in a statistical model does not change the sign of parameter estimates (though in select cases, there are significant changes to their magnitude). I conclude by discussing the considerations and limitations associated with LLM-generated norms in general, including concerns of data contamination, the choice of LLM, external validity, construct validity, and data quality. Additionally, all of GPT-4's judgments (over 30,000 in total) are made available online for further analysis.

3.
Cogn Sci ; 47(7): e13309, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37401923

RESUMO

Humans can attribute beliefs to others. However, it is unknown to what extent this ability results from an innate biological endowment or from experience accrued through child development, particularly exposure to language describing others' mental states. We test the viability of the language exposure hypothesis by assessing whether models exposed to large quantities of human language display sensitivity to the implied knowledge states of characters in written passages. In pre-registered analyses, we present a linguistic version of the False Belief Task to both human participants and a large language model, GPT-3. Both are sensitive to others' beliefs, but while the language model significantly exceeds chance behavior, it does not perform as well as the humans nor does it explain the full extent of their behavior-despite being exposed to more language than a human would in a lifetime. This suggests that while statistical learning from language exposure may in part explain how humans develop the ability to reason about the mental states of others, other mechanisms are also responsible.


Assuntos
Comunicação , Teoria da Mente , Criança , Humanos , Enganação , Idioma , Desenvolvimento Infantil
4.
Psychol Rev ; 130(5): 1239-1261, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36892900

RESUMO

Most words have multiple meanings, but there are foundationally distinct accounts for this. Categorical theories posit that humans maintain discrete entries for distinct word meanings, as in a dictionary. Continuous ones eschew discrete sense representations, arguing that word meanings are best characterized as trajectories through a continuous state space. Both kinds of approach face empirical challenges. In response, we introduce two novel "hybrid" theories, which reconcile discrete sense representations with a continuous view of word meaning. We then report on two behavioral experiments, pairing them with an analytical approach relying on neural language models to test these competing accounts. The experimental results are best explained by one of the novel hybrid accounts, which posits both distinct sense representations and a continuous meaning space. This hybrid account accommodates both the dynamic, context-dependent nature of word meaning, as well as the behavioral evidence for category-like structure in human lexical knowledge. We further develop and quantify the predictive power of several computational implementations of this hybrid account. These results raise questions for future research on lexical ambiguity, such as why and when discrete sense representations might emerge in the first place. They also connect to more general questions about the role of discrete versus gradient representations in cognitive processes and suggest that at least in this case, the best explanation is one that integrates both factors: Word meaning is both categorical and continuous. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

5.
Lang Speech ; 66(1): 118-142, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35422153

RESUMO

Ambiguity pervades language. The sentence "My office is really hot" could be interpreted as a complaint about the temperature or as an indirect request to turn on the air conditioning. How do comprehenders determine a speaker's intended interpretation? One possibility is that speakers and comprehenders exploit prosody to overcome the pragmatic ambiguity inherent in indirect requests. In a pre-registered behavioral experiment, we find that human listeners can successfully determine whether a given utterance was intended as a request at a rate above chance (55%), above and beyond the prior probability of a given sentence being interpreted as a request. Moreover, we find that a classifier equipped with seven acoustic features can detect the original intent of an utterance with 65% accuracy. Finally, consistent with past work, the duration, pitch, and pitch slope of an utterance emerge both as significant correlates of a speaker's original intent and as predictors of comprehenders' pragmatic interpretation. These results suggest that human and machine comprehenders alike can use prosody to enrich the meaning of ambiguous utterances, such as indirect requests.


Assuntos
Idioma , Percepção da Fala , Humanos , Intenção , Acústica
6.
Behav Res Methods ; 55(4): 1537-1557, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35689168

RESUMO

For any research program examining how ambiguous words are processed in broader linguistic contexts, a first step is to establish factors relating to the frequency balance or dominance of those words' multiple meanings, as well as the similarity of those meanings to one other. Homonyms-words with divergent meanings-are one ambiguous word type commonly utilized in psycholinguistic research. In contrast, although polysemes-words with multiple related senses-are far more common in English, they have been less frequently used as tools for understanding one-to-many word-to-meaning mappings. The current paper details two norming studies of a relatively large number of ambiguous English words. In the first, offline dominance norming is detailed for 547 homonyms and polysemes via a free association task suitable for words across the ambiguity continuum, with a goal of identifying words with more equibiased meanings. The second norming assesses offline meaning similarity for a partial subset of 318 ambiguous words (including homonyms, unambiguous words, and polysemes divided into regular and irregular types) using a novel, continuous rating method reliant on the linguistic phenomenon of zeugma. In addition, we conduct computational analyses on the human similarity norming data using the BERT pretrained neural language model (Devlin et al., 2018, BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint. arXiv:1810.04805) to evaluate factors that may explain variance beyond that accounted for by dictionary-criteria ambiguity categories. Finally, we make available the summarized item dominance values and similarity ratings in resultant appendices (see supplementary material), as well as individual item and participant norming data, which can be accessed online ( https://osf.io/g7fmv/ ).


Assuntos
Idioma , Semântica , Humanos , Psicolinguística , Linguística , Associação Livre
7.
Lang Resour Eval ; : 1-25, 2022 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-36465948

RESUMO

Speakers enjoy considerable flexibility in how they refer to a given referent--referring expressions can vary in their form (e.g., "she" vs. "the cat"), their length (e.g., "the (big) (orange) cat"), and more. What factors drive a speaker's decisions about how to refer, and how do these decisions shape a comprehender's ability to resolve the intended referent? Answering either question presents a methodological challenge; researchers must strike a balance between experimental control and ecological validity. In this paper, we introduce the SCARFS (Spontaneous, Controlled Acts of Reference between Friends and Strangers) Database: a corpus of approximately 20,000 English nominal referring expressions (NREs), produced in the context of a communication game. For each NRE, the corpus lists the concept the speaker was trying to convey (from a set of 471 possible target concepts), formal properties of the NRE (e.g., its length), the relationship between the interlocutors (i.e., friend vs. stranger), and the communicative outcome (i.e., whether the expression was successfully resolved). Researchers from diverse disciplines may use this resource to answer questions about how speakers refer and how comprehenders resolve their intended referent--as well as other fundamental questions about dialogic speech, such as whether and how speakers tailor their utterances to the identity of their interlocutor, how second-degree associations are generated, and the predictors of communicative success. Supplementary Information: The online version contains supplementary material available at 10.1007/s10579-022-09619-y.

8.
Cognition ; 225: 105094, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35339794

RESUMO

Human languages evolve to make communication more efficient. But efficiency creates trade-offs: what is efficient for speakers is not always efficient for comprehenders. How do languages balance these competing pressures? We focus on Zipf's meaning-frequency law, the observation that frequent wordforms have more meanings. On the one hand, this law could reflect a speaker-oriented pressure to reuse frequent wordforms. Yet human languages still maintain thousands of distinct wordforms, suggesting a countervailing, comprehender-oriented pressure. What balance of these pressures produces Zipf's meaning-frequency law? Using a neutral baseline, we find that frequent wordforms in real lexica have fewer homophones than predicted by their phonotactic structure: real lexica favor a comprehender-oriented pressure to reduce the cost of frequent disambiguation. These results help clarify the evolutionary drive for efficiency: human languages are subject to competing pressures for efficient communication, the relative magnitudes of which reveal how individual-level cognitive constraints shape languages over time.


Assuntos
Comunicação , Idioma , Evolução Biológica , Humanos
9.
Psychon Bull Rev ; 29(2): 613-626, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34755319

RESUMO

The Action-sentence Compatibility Effect (ACE) is a well-known demonstration of the role of motor activity in the comprehension of language. Participants are asked to make sensibility judgments on sentences by producing movements toward the body or away from the body. The ACE is the finding that movements are faster when the direction of the movement (e.g., toward) matches the direction of the action in the to-be-judged sentence (e.g., Art gave you the pen describes action toward you). We report on a pre-registered, multi-lab replication of one version of the ACE. The results show that none of the 18 labs involved in the study observed a reliable ACE, and that the meta-analytic estimate of the size of the ACE was essentially zero.


Assuntos
Compreensão , Idioma , Humanos , Movimento , Tempo de Reação
10.
Cognition ; 205: 104449, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32947137

RESUMO

Human languages are replete with ambiguity. This is most evident in homophony--where two or more words sound the same, but carry distinct meanings. For example, the wordform "bark" can denote either the sound produced by a dog or the protective outer sheath of a tree trunk. Why would a system evolved for efficient, effective communication display rampant ambiguity? Some accounts argue that ambiguity is actually a design feature of human communication systems, allowing languages to recycle their most optimal wordforms (those which are short, frequent, and phonotactically well-formed) for multiple meanings. We test this claim by constructing five series of artificial lexica matched for the phonotactics and distribution of word lengths found in five real languages (English, German, Dutch, French, and Japanese), and comparing both the quantity and concentration of homophony across the real and artificial lexica. Surprisingly, we find that the artificial lexica exhibit higher upper-bounds on homophony than their real counterparts, and that homophony is even more likely to be found among short, phonotactically plausible wordforms in the artificial than in the real lexica. These results suggest that homophony in real languages is not directly selected for, but rather, that it emerges as a natural consequence of other features of a language. In fact, homophony may even be selected against in real languages, producing lexica that better conform to other requirements of humans who need to use them. Finally, we explore the hypothesis that this is achieved by "smoothing" out dense concentrations of homophones across lexical neighborhoods, resulting in comparatively more minimal pairs in real lexica.


Assuntos
Idioma , Animais , Cães , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA