RESUMEN
When mentally exploring maps representing large-scale environments (e.g., countries or continents), humans are assumed to mainly rely on spatial information derived from direct perceptual experience (e.g., prior visual experience with the geographical map itself). In the present study, we rather tested whether also temporal and linguistic information could account for the way humans explore and ultimately represent this type of maps. We quantified temporal distance as the minimum time needed to travel by train across Italian cities, while linguistic distance was retrieved from natural language through cognitively plausible AI models based on non-spatial associative learning mechanisms (i.e., distributional semantic models). In a first experiment, we show that temporal and linguistic distances capture with high-confidence real geographical distances. Next, in a second behavioral experiment, we show that linguistic information can account for human performance over and above real spatial information (which plays the major role in explaining participants' performance) in a task in which participants have to judge the distance between cities (while temporal information was found to be not relevant). These findings indicate that, when exploring maps representing large-scale environments, humans do take advantage of both perceptual and linguistic information, suggesting in turn that the formation of cognitive maps possibly relies on a strict interplay between spatial and non-spatial learning principles.
Asunto(s)
Juicio , Percepción Espacial , Humanos , Femenino , Adulto , Masculino , Percepción Espacial/fisiología , Adulto Joven , Juicio/fisiología , Italia , Lingüística , Percepción del Tiempo/fisiologíaRESUMEN
OBJECTIVE: We aimed to develop a machine learning model to infer OCEAN traits from text. BACKGROUND: The psycholexical approach allows retrieving information about personality traits from human language. However, it has rarely been applied because of methodological and practical issues that current computational advancements could overcome. METHOD: Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity. RESULTS: Intrinsic validation of the model yielded excellent results, with R2 values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness. CONCLUSIONS: This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.
RESUMEN
The formation of false memories is one of the most widely studied topics in cognitive psychology. The Deese-Roediger-McDermott (DRM) paradigm is a powerful tool for investigating false memories and revealing the cognitive mechanisms subserving their formation. In this task, participants first memorize a list of words (encoding phase) and next have to indicate whether words presented in a new list were part of the initially memorized one (recognition phase). By employing DRM lists optimized to investigate semantic effects, previous studies highlighted a crucial role of semantic processes in false memory generation, showing that new words semantically related to the studied ones tend to be more erroneously recognized (compared to new words less semantically related). Despite the strengths of the DRM task, this paradigm faces a major limitation in list construction due to its reliance on human-based association norms, posing both practical and theoretical concerns. To address these issues, we developed the False Memory Generator (FMG), an automated and data-driven tool for generating DRM lists, which exploits similarity relationships between items populating a vector space. Here, we present FMG and demonstrate the validity of the lists generated in successfully replicating well-known semantic effects on false memory production. FMG potentially has broad applications by allowing for testing false memory production in domains that go well beyond the current possibilities, as it can be in principle applied to any vector space encoding properties related to word referents (e.g., lexical, orthographic, phonological, sensory, affective, etc.) or other type of stimuli (e.g., images, sounds, etc.).
Asunto(s)
Semántica , Programas Informáticos , Humanos , Femenino , Masculino , Adulto Joven , Adulto , Represión Psicológica , Reconocimiento en Psicología/fisiología , Memoria/fisiología , Recuerdo Mental/fisiologíaRESUMEN
The use of taboo words represents one of the most common and arguably universal linguistic behaviors, fulfilling a wide range of psychological and social functions. However, in the scientific literature, taboo language is poorly characterized, and how it is realized in different languages and populations remains largely unexplored. Here we provide a database of taboo words, collected from different linguistic communities (Study 1, N = 1046), along with their speaker-centered semantic characterization (Study 2, N = 455 for each of six rating dimensions), covering 13 languages and 17 countries from all five permanently inhabited continents. Our results show that, in all languages, taboo words are mainly characterized by extremely low valence and high arousal, and very low written frequency. However, a significant amount of cross-country variability in words' tabooness and offensiveness proves the importance of community-specific sociocultural knowledge in the study of taboo language.
Asunto(s)
Lenguaje , Tabú , Humanos , Semántica , Comparación TransculturalRESUMEN
In the present study, we leveraged computational methods to explore the extent to which, relative to direct access to semantics from orthographic cues, the additional appreciation of morphological cues is advantageous while inducing the meaning of affixed pseudo-words. We re-analyzed data from a study on a lexical decision task for affixed pseudo-words. We considered a parsimonious model only including semantic variables (namely, semantic neighborhood density, entropy, magnitude, stem proximity) derived through a word-form-to-meaning approach (ngram-based). We then explored the extent to which the addition of equivalent semantic variables derived by combining semantic information from morphemes (combination-based) improved the fit of the statistical model explaining human data. Results suggest that semantic information can be extracted from arbitrary clusters of letters, yet a computational model of semantic access also including a combination-based strategy based on explicit morphological information better captures the cognitive mechanisms underlying human performance. This is particularly evident when participants recognize affixed pseudo-words as meaningful stimuli.
Asunto(s)
Señales (Psicología) , Procesamiento de Texto , Humanos , Modelos Estadísticos , SemánticaRESUMEN
Although mouse-tracking has been seen as a real-time window into different aspects of human decision-making processes, currently little is known about how the decision process unfolds in veridical and false memory retrieval. Here, we directly investigated decision-making processes by predicting participants' performance in a mouse-tracking version of a typical Deese-Roediger-McDermott (DRM) task through distributional semantic models, a usage-based approach to meaning. Participants were required to study lists of associated words and then to perform a recognition task with the mouse. Results showed that mouse trajectories were extensively affected by the semantic similarity between the words presented in the recognition phase and the ones previously studied. In particular, the higher the semantic similarity, the larger the conflict driving the choice and the higher the irregularity in the trajectory when correctly rejecting new words (i.e., the false memory items). Conversely, on the temporal evolution of the decision, our results showed that semantic similarity affects more complex temporal measures indexing the online decision processes subserving task performance. Together, these findings demonstrate that semantic similarity can affect human behavior at the level of motor control, testifying its influence on online decision-making processes. More generally, our findings complement previous seminal theories on false memory and provide insights into the impact of the semantic memory structure on different decision-making components.
Asunto(s)
Memoria , Semántica , Humanos , Reconocimiento en Psicología , Recuerdo MentalRESUMEN
We release a database of cloze probability values, predictability ratings, and computational estimates for a sample of 205 English sentences (1726 words), aligned with previously released word-by-word reading time data (both self-paced reading and eye-movement records; Frank et al., Behavior Research Methods, 45(4), 1182-1190. 2013) and EEG responses (Frank et al., Brain and Language, 140, 1-11. 2015). Our analyses show that predictability ratings are the best predictors of the EEG signal (N400, P600, LAN) self-paced reading times, and eye movement patterns, when spillover effects are taken into account. The computational estimates are particularly effective at explaining variance in the eye-tracking data without spillover. Cloze probability estimates have decent overall psychometric accuracy and are the best predictors of early fixation patterns (first fixation duration). Our results indicate that the choice of the best measurement of word predictability in context critically depends on the processing index being considered.
RESUMEN
Cognitive maps are assumed to be fundamentally spatial and grounded only in perceptual processes, as supported by the discovery of functionally dedicated cell types in the human brain, which tile the environment in a maplike fashion. Challenging this view, we demonstrate that spatial representations-such as large-scale geographical maps-can be as well retrieved with high confidence from natural language through cognitively plausible artificial-intelligence models on the basis of nonspatial associative-learning mechanisms. More critically, we show that linguistic information accounts for the specific distortions observed in tasks when college-age adults have to judge the geographical positions of cities, even when these positions are estimated on real maps. These findings indicate that language experience can encode and reproduce cognitive maps without the need for a dedicated spatial-representation system, thus suggesting that the formation of these maps is the result of a strict interplay between spatial- and nonspatial-learning principles.
Asunto(s)
Lenguaje , Lingüística , Adulto , Humanos , Aprendizaje , EncéfaloRESUMEN
While distributional semantic models that represent word meanings as high-dimensional vectors induced from large text corpora have been shown to successfully predict human behavior across a wide range of tasks, they have also received criticism from different directions. These include concerns over their interpretability (how can numbers specifying abstract, latent dimensions represent meaning?) and their ability to capture variation in meaning (how can a single vector representation capture multiple different interpretations for the same expression?). Here, we demonstrate that semantic vectors can indeed rise up to these challenges, by training a mapping system (a simple linear regression) that predicts inter-individual variation in relational interpretations for compounds such as wood brush (for example brush FOR wood, or brush MADE OF wood) from (compositional) semantic vectors representing the meanings of these compounds. These predictions consistently beat different random baselines, both for familiar compounds (moon light, Experiment 1) as well as novel compounds (wood brush, Experiment 2), demonstrating that distributional semantic vectors encode variations in qualitative interpretations that can be decoded using techniques as simple as linear regression.
Asunto(s)
Semántica , HumanosRESUMEN
Theories of grounded cognition assume that conceptual representations are grounded in sensorimotor experience. However, abstract concepts such as jealousy or childhood have no directly associated referents with which such sensorimotor experience can be made; therefore, the grounding of abstract concepts has long been a topic of debate. Here, we propose (a) that systematic relations exist between semantic representations learned from language on the one hand and perceptual experience on the other hand, (b) that these relations can be learned in a bottom-up fashion, and (c) that it is possible to extrapolate from this learning experience to predict expected perceptual representations for words even where direct experience is missing. To test this, we implement a data-driven computational model that is trained to map language-based representations (obtained from text corpora, representing language experience) onto vision-based representations (obtained from an image database, representing perceptual experience), and apply its mapping function onto language-based representations for abstract and concrete words outside the training set. In three experiments, we present participants with these words, accompanied by two images: the image predicted by the model and a random control image. Results show that participants' judgements were in line with model predictions even for the most abstract words. This preference was stronger for more concrete items and decreased for the more abstract ones. Taken together, our findings have substantial implications in support of the grounding of abstract words, suggesting that we can tap into our previous experience to create possible visual representation we don't have.
Asunto(s)
Formación de Concepto , Semántica , Humanos , Niño , Lenguaje , Cognición , AprendizajeRESUMEN
Word frequency is one of the best predictors of language processing. Typically, word frequency norms are entirely based on natural-language text data, thus representing what the literature typically refers to as purely linguistic experience. This study presents Flickr frequency norms as a novel word frequency measure from a domain-specific corpus inherently tied to extra-linguistic information: words used as image tags on social media. To obtain Flickr frequency measures, we exploited the photo-sharing platform Flickr Image (containing billions of photos) and extracted the number of uploaded images tagged with each of the words considered in the lexicon. Here, we systematically examine the peculiarities of Flickr frequency norms and show that Flickr frequency is a hybrid metrics, lying at the intersection between language and visual experience and with specific biases induced by being based on image-focused social media. Moreover, regression analyses indicate that Flickr frequency captures additional information beyond what is already encoded in existing norms of linguistic, sensorimotor, and affective experience. Therefore, these new norms capture aspects of language usage that are missing from traditional frequency measures: a portion of language usage capturing the interplay between language and vision, which - this study demonstrates - has its own impact on word processing. The Flickr frequency norms are openly available on the Open Science Framework (https://osf.io/2zfs3/).
RESUMEN
Scientific studies of language behavior need to grapple with a large diversity of languages in the world and, for reading, a further variability in writing systems. Yet, the ability to form meaningful theories of reading is contingent on the availability of cross-linguistic behavioral data. This paper offers new insights into aspects of reading behavior that are shared and those that vary systematically across languages through an investigation of eye-tracking data from 13 languages recorded during text reading. We begin with reporting a bibliometric analysis of eye-tracking studies showing that the current empirical base is insufficient for cross-linguistic comparisons. We respond to this empirical lacuna by presenting the Multilingual Eye-Movement Corpus (MECO), the product of an international multi-lab collaboration. We examine which behavioral indices differentiate between reading in written languages, and which measures are stable across languages. One of the findings is that readers of different languages vary considerably in their skipping rate (i.e., the likelihood of not fixating on a word even once) and that this variability is explained by cross-linguistic differences in word length distributions. In contrast, if readers do not skip a word, they tend to spend a similar average time viewing it. We outline the implications of these findings for theories of reading. We also describe prospective uses of the publicly available MECO data, and its further development plans.
Asunto(s)
Lectura , HumanosRESUMEN
People with schizophrenia spectrum disorders (SSD) show anomalies in language processing with respect to "who is doing what" in an action. This linguistic behavior is suggestive of an atypical representation of the formal concepts of "Agent" in the lexical representation of a verb, i.e., its thematic grid. To test this hypothesis, we administered a silent-reading task with sentences including a semantic violation of the animacy trait of the grammatical subject to 30 people with SSD and 30 healthy control participants (HCs). When the anomalous grammatical subject was the Agent of the event, a significant increase of Gaze Duration was observed in HCs, but not in SSDs. Conversely, when the anomalous subject was a Theme, SSDs displayed an increased probability of go-back movements, unlike HCs. These results are suggestive of a higher tolerability for anomalous Agents in SSD compared to the normal population. The fact that SSD participants did not show a similar tolerability for anomalous Themes rules out the issue of an attention deficit. We suggest that general communication abilities in SSD might benefit from explicit training on deep linguistic structures.
Asunto(s)
Movimientos Oculares , Esquizofrenia , Humanos , Lenguaje , Lingüística , SemánticaRESUMEN
Normative measures of verbal material are fundamental in psycholinguistic and cognitive research for the control of confounding in experimental procedures and for achieving a better comprehension of our conceptual system. Traditionally, normative studies have focused on classical psycholinguistic variables, such as concreteness and imageability. Recent works have shifted researchers' focus to perceptual strength, in which items are rated separately for each of the five senses. We present a resource that includes perceptual norms for 1,121 Italian words extracted from the Italian version of ANEW. Norms were collected from 57 native speakers. For each word, the participants provided perceptual-strength ratings for each of the five perceptual modalities. The perceptual norms performance in predicting human behavior was tested in two novel experiments, a lexical decision task and a naming task. Concreteness, imageability, and different composite variables representing perceptual-strength scores were considered as competing predictors in a series of linear regressions, evaluating the goodness of fit of each model. For both tasks, the model with imageability as the only predictor was found to be the best-fitting model according to the Akaike information criterion, whereas the model with the separately considered five modalities better described data according to the explained variance. These results differ from the ones previously reported for English, in which maximum perceptual strength emerged as the best predictor of behavior. We investigated this discrepancy by comparing Italian and English data for the same set of translated items, thus confirming a genuine cross-linguistic effect. We thus confirmed that perceptual experience influences linguistic processing, even though evaluations from different languages are needed to generalize this claim.
Asunto(s)
Lenguaje , Psicolingüística , Comprensión , Humanos , ItaliaRESUMEN
In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents' meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be captured using a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2061 novel German compounds.
Asunto(s)
SemánticaRESUMEN
Orthography-semantics consistency (OSC) is a measure that quantifies the degree of semantic relatedness between a word and its orthographic relatives. OSC is computed as the frequency-weighted average semantic similarity between the meaning of a given word and the meanings of all the words containing that very same orthographic string, as captured by distributional semantic models. We present a resource including optimized estimates of OSC for 15,017 English words. In a series of analyses, we provide a progressive optimization of the OSC variable. We show that computing OSC from word-embeddings models (in place of traditional count models), limiting preprocessing of the corpus used for inducing semantic vectors (in particular, avoiding part-of-speech tagging and lemmatization), and relying on a wider pool of orthographic relatives provide better performance for the measure in a lexical-processing task. We further show that OSC is an important and significant predictor of reaction times in visual word recognition and word naming, one that correlates only weakly with other psycholinguistic variables (e.g., family size, word frequency), indicating that it captures a novel source of variance in lexical access. Finally, some theoretical and methodological implications are discussed of adopting OSC as one of the predictors of reaction times in studies of visual word recognition.
Asunto(s)
Bases de Datos Factuales , Lenguaje , Lectura , Semántica , Pruebas de Asociación de Palabras , Investigación Conductal/métodos , Femenino , Humanos , Masculino , Psicolingüística/métodos , Tiempo de Reacción , Análisis y Desempeño de TareasRESUMEN
Most compound words are constituted of a head constituent (e.g., light in moonlight) and a modifier constituent (e.g., moon in moonlight); the information transmitted by these head-modifier roles is fundamental for defining the grammatical and semantic properties of the compound and for identifying a correct combination of the constituents at the conceptual level. The objective of this study is to assess how lexical processing in aphasia is influenced by the head-modifier structure of nominal compounds. A picture-naming task of 35 compounds with head-initial (pescespada, swordfish, literally fishsword) and head-final (autostrada, highway, literally carroad) forms was administered to 45 Italian aphasic patients, and their accuracy in retrieving constituents was analysed with a mixed-effects logistic regression. The interaction between headedness and constituent position was significant: The modifier emerged as being more difficult to retrieve than the head, but only for head-final compounds. The results are consistent with previous data from priming experiments on healthy subjects and provide convincing evidence that compound headedness is represented at central processing levels.
Asunto(s)
Afasia/psicología , Lenguaje , Semántica , Vocabulario , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , PsicolingüísticaRESUMEN
Compound words in Romance languages may have the head either in the initial or in the final position. In the present event-related potential (ERP) study, we address the hypothesis that Italian compounds are processed differently according to their head position and that this is mostly due to the perceived change in the canonical order of syntactic elements. Compound stimuli (head-initial, head-final, or exocentric) were visually displayed in two presentation modes, as whole words or separated into their constituents, in the context of a lexical decision task. Behavioural results showed an increased split cost in head-final and exocentric compounds as compared to head-initial compounds. ERP results showed an enhanced left anterior negativity (LAN) for head-final and exocentric compounds as compared to head-initial compounds, regardless of the presentation mode. Results suggest that the analogy with syntactic order may influence the internal structure of a compound and, as a consequence, its processing, but other characteristics (such as the grammatical properties of constituents) may affect the processing itself.