RESUMEN
The major aim of the present megastudy of picture-naming norms was to address the shortcomings of the available picture data sets used in psychological and linguistic research by creating a new database of normed colour images that researchers from around the world can rely upon in their investigations. In order to do this, we employed a new form of normative study, namely a megastudy, whereby 1620 colour photographs of items spanning across 42 semantic categories were named and rated by a group of German speakers. This was done to establish the following linguistic norms: speech onset times (SOT), name agreement, accuracy, familiarity, visual complexity, valence, and arousal. The data, including over 64,000 audio files, were used to create the LinguaPix database of pictures, audio recordings, and linguistic norms, which to our knowledge, is the largest available research tool of its kind ( http://linguapix.uni-mannheim.de ). In this paper, we present the tool and the analysis of the major variables.
Asunto(s)
Lenguaje , Psicolingüística , Humanos , Lingüística , Reconocimiento en Psicología , SemánticaRESUMEN
We present a new dataset of English word recognition times for a total of 62 thousand words, called the English Crowdsourcing Project. The data were collected via an internet vocabulary test in which more than one million people participated. The present dataset is limited to native English speakers. Participants were asked to indicate which words they knew. Their response times were registered, although at no point were the participants asked to respond as quickly as possible. Still, the response times correlate around .75 with the response times of the English Lexicon Project for the shared words. Also, the results of virtual experiments indicate that the new response times are a valid addition to the English Lexicon Project. This not only means that we have useful response times for some 35 thousand extra words, but we now also have data on differences in response latencies as a function of education and age.
Asunto(s)
Colaboración de las Masas , Toma de Decisiones , Humanos , Tiempo de Reacción , Reconocimiento en Psicología , VocabularioRESUMEN
Vocabulary size seems to be affected by multiple factors, including those that belong to the properties of the words themselves and those that relate to the characteristics of the individuals assessing the words. In this study, we present results from a crowdsourced lexical decision megastudy in which more than 150,000 native speakers from around 20 Spanish-speaking countries performed a lexical decision task to 70 target word items selected from a list of about 45,000 Spanish words. We examined how demographic characteristics such as age, education level, and multilingualism affected participants' vocabulary size. Also, we explored how common factors related to words like frequency, length, and orthographic neighbourhood influenced the knowledge of a particular item. Results indicated important contributions of age to overall vocabulary size, with vocabulary size increasing in a logarithmic fashion with this factor. Furthermore, a contrast between monolingual and bilingual communities within Spain revealed no significant vocabulary size differences between the communities. Additionally, we replicated the standard effects of the words' properties and their interactions, accurately accounting for the estimated knowledge of a particular word. These results highlight the value of crowdsourced approaches to uncover effects that are traditionally masked by small-sampled in-lab factorial experimental designs.
Asunto(s)
Colaboración de las Masas , Multilingüismo , Lectura , Humanos , Tiempo de Reacción , España , VocabularioRESUMEN
We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.
Asunto(s)
Conocimiento , Vocabulario , Adulto , Colaboración de las Masas , Femenino , Humanos , Pruebas del LenguajeRESUMEN
The correspondence in meaning extracted from written versus spoken input remains to be fully understood neurobiologically. Here, in a total of 38 subjects, the functional anatomy of cross-modal semantic similarity for concrete words was determined based on a dual criterion: First, a voxelwise univariate analysis had to show significant activation during a semantic task (property verification) performed with written and spoken concrete words compared to the perceptually matched control condition. Second, in an independent dataset, in these clusters, the similarity in fMRI response pattern to two distinct entities, one presented as a written and the other as a spoken word, had to correlate with the similarity in meaning between these entities. The left ventral occipitotemporal transition zone and ventromedial temporal cortex, retrosplenial cortex, pars orbitalis bilaterally, and the left pars triangularis were all activated in the univariate contrast. Only the left pars triangularis showed a cross-modal semantic similarity effect. There was no effect of phonological nor orthographic similarity in this region. The cross-modal semantic similarity effect was confirmed by a secondary analysis in the cytoarchitectonically defined BA45. A semantic similarity effect was also present in the ventral occipital regions but only within the visual modality, and in the anterior superior temporal cortex only within the auditory modality. This study provides direct evidence for the coding of word meaning in BA45 and positions its contribution to semantic processing at the confluence of input-modality specific pathways that code for meaning within the respective input modalities.
Asunto(s)
Área de Broca/fisiología , Reconocimiento Visual de Modelos/fisiología , Semántica , Percepción del Habla/fisiología , Adolescente , Adulto , Mapeo Encefálico/métodos , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Imagen por Resonancia Magnética , Masculino , Adulto JovenRESUMEN
We present SUBTLEX-PL, Polish word frequencies based on movie subtitles. In two lexical decision experiments, we compare the new measures with frequency estimates derived from another Polish text corpus that includes predominantly written materials. We show that the frequencies derived from the two corpora perform best in predicting human performance in a lexical decision task if used in a complementary way. Our results suggest that the two corpora may have unequal potential for explaining human performance for words in different frequency ranges and that corpora based on written materials severely overestimate frequencies for formal words. We discuss some of the implications of these findings for future studies comparing different frequency estimates. In addition to frequencies for word forms, SUBTLEX-PL includes measures of contextual diversity, part-of-speech-specific word frequencies, frequencies of associated lemmas, and word bigrams, providing researchers with necessary tools for conducting psycholinguistic research in Polish. The database is freely available for research purposes and may be downloaded from the authors' university Web site at http://crr.ugent.be/subtlex-pl .
Asunto(s)
Conducta Verbal , Vocabulario , Escritura , Investigación Conductal/métodos , Bases de Datos Factuales , Humanos , Polonia , Psicolingüística/métodos , HablaRESUMEN
The SUBTLEX-US corpus has been parsed with the CLAWS tagger, so that researchers have information about the possible word classes (parts-of-speech, or PoSs) of the entries. Five new columns have been added to the SUBTLEX-US word frequency list: the dominant (most frequent) PoS for the entry, the frequency of the dominant PoS, the frequency of the dominant PoS relative to the entry's total frequency, all PoSs observed for the entry, and the respective frequencies of these PoSs. Because the current definition of lemma frequency does not seem to provide word recognition researchers with useful information (as illustrated by a comparison of the lemma frequencies and the word form frequencies from the Corpus of Contemporary American English), we have not provided a column with this variable. Instead, we hope that the full list of PoS frequencies will help researchers to collectively determine which combination of frequencies is the most informative.
Asunto(s)
Algoritmos , Lenguaje , Vocabulario , Humanos , Psicolingüística/métodos , Terminología como Asunto , Procesamiento de TextoRESUMEN
We present a new database of lexical decision times for English words and nonwords, for which two groups of British participants each responded to 14,365 monosyllabic and disyllabic words and the same number of nonwords for a total duration of 16 h (divided over multiple sessions). This database, called the British Lexicon Project (BLP), fills an important gap between the Dutch Lexicon Project (DLP; Keuleers, Diependaele, & Brysbaert, Frontiers in Language Sciences. Psychology, 1, 174, 2010) and the English Lexicon Project (ELP; Balota et al., 2007), because it applies the repeated measures design of the DLP to the English language. The high correlation between the BLP and ELP data indicates that a high percentage of variance in lexical decision data sets is systematic variance, rather than noise, and that the results of megastudies are rather robust with respect to the selection and presentation of the stimuli. Because of its design, the BLP makes the same analyses possible as the DLP, offering researchers with a new interesting data set of word-processing times for mixed effects analyses and mathematical modeling. The BLP data are available at http://crr.ugent.be/blp and as Electronic Supplementary Materials.
Asunto(s)
Toma de Decisiones , Lenguaje , Vocabulario , Humanos , Tiempo de ReacciónRESUMEN
Emotions play a fundamental role in language learning, use, and processing. Words denoting positivity account for a larger part of the lexicon than words denoting negativity, and they also tend to be used more frequently, a phenomenon known as positivity bias. However, language experience changes over an individual's lifetime, making the examination of the emotion-laden lexicon an important topic not only across the life span but also across languages. Furthermore, existing theories predict a range of different age-related trajectories in processing valenced words. The present study pits all of these predictions against written productions (Facebook status updates from over 20,000 users) and behavioral data from three publicly available megastudies on different languages, namely English, Dutch, and Spanish, across adulthood. The production data demonstrated an increase in positive word types and tokens with advancing age. In terms of comprehension, the results showed a uniform and consistent effect of valence across languages and cohorts based on data from a visual word recognition task. The difference in reaction times to very positive and very negative words declined with age, with responses to positive words slowing down more strongly with age than responses to negative words. We argue that the results stem from lifelong learning and emotion regulation: Advancing age is accompanied by an increased type frequency of positive words in language production, which is mirrored as a discrimination penalty in comprehension. To our knowledge, this is the first study to simultaneously target both language production and comprehension across adulthood and in a cross-linguistic perspective. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Asunto(s)
Afecto , Envejecimiento , Actitud , Comprensión , Emociones , Lenguaje , Aprendizaje , Adulto , Anciano , Anciano de 80 o más Años , Etnicidad , Femenino , Humanos , Pruebas del Lenguaje , Masculino , Persona de Mediana Edad , Tiempo de Reacción , Medios de Comunicación SocialesRESUMEN
Pseudowords play an important role in psycholinguistic experiments, either because they are required for performing tasks, such as lexical decision, or because they are the main focus of interest, such as in nonword-reading and nonce-inflection studies. We present a pseudoword generator that improves on current methods. It allows for the generation of written polysyllabic pseudowords that obey a given language's phonotactic constraints. Given a word or nonword template, the algorithm can quickly generate pseudowords that match the template in subsyllabic structure and transition frequencies without having to search through a list with all possible candidates. Currently, the program is available for Dutch, English, German, French, Spanish, Serbian, and Basque, and, with little effort, it can be expanded to other languages.
Asunto(s)
Algoritmos , Psicolingüística/métodos , Programas Informáticos , Toma de Decisiones , Humanos , LenguajeRESUMEN
We present a new database of Dutch word frequencies based on film and television subtitles, and we validate it with a lexical decision study involving 14,000 monosyllabic and disyllabic Dutch words. The new SUBTLEX frequencies explain up to 10% more variance in accuracies and reaction times (RTs) of the lexical decision task than the existing CELEX word frequency norms, which are based largely on edited texts. As is the case for English, an accessibility measure based on contextual diversity explains more of the variance in accuracy and RT than does the raw frequency of occurrence counts. The database is freely available for research purposes and may be downloaded from the authors' university site at http://crr.ugent.be/subtlex-nl or from http://brm.psychonomic-journals.org/content/supplemental.
Asunto(s)
Lenguaje , Películas Cinematográficas , Bases de Datos Factuales , Toma de Decisiones/fisiología , Femenino , Humanos , Masculino , Países Bajos , Desempeño Psicomotor/fisiología , Tiempo de Reacción/fisiología , Reproducibilidad de los Resultados , Televisión , Adulto JovenRESUMEN
The French Lexicon Project involved the collection of lexical decision data for 38,840 French words and the same number of nonwords. It was directly inspired by the English Lexicon Project (Balota et al., 2007) and produced very comparable frequency and word length effects. The present article describes the methods used to collect the data, reports analyses on the word frequency and the word length effects, and describes the Excel files that make the data freely available for research purposes. The word and pseudoword data from this article may be downloaded from http://brm.psychonomic-journals.org/content/supplemental.
Asunto(s)
Bases de Datos Factuales , Psicolingüística/métodos , Vocabulario , Adolescente , Adulto , Toma de Decisiones , Femenino , Francia , Humanos , Internet , Lenguaje , Masculino , Tiempo de Reacción , Programas Informáticos , Conducta VerbalRESUMEN
We present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an internet vocabulary test. The database is limited to native Dutch speakers. Participants were asked to indicate which words they knew. Their response times were registered, even though the participants were not asked to respond as fast as possible. Still, the response times correlate around .7 with the response times of the Dutch Lexicon Projects for shared words. Also results of virtual experiments indicate that the new response times are a valid addition to the Dutch Lexicon Projects. This not only means that we have useful response times for some 20 thousand extra words, but we now also have data on differences in response latencies as a function of education and age. The new data correspond better to word use in the Netherlands.
RESUMEN
We monitored the progress of 40 children when they first started to acquire a second language (L2) implicitly through immersion. Employing a longitudinal design, we tested them before they had any notions of an L2 (Time 0) and after 1 school year of L2 exposure (Time 1) to determine whether cognitive abilities can predict the success of L2 learning. Task administration included measures of intelligence, cognitive control, and language skills. Initial scores on measures of inhibitory control seemed predictive of L2 Dutch vocabulary acquisition. At the same time, progress on IQ, inhibitory control, attentional shifting, and working memory were also identified as contributing factors, suggesting a more intricate relationship between cognitive abilities and L2 learning than previously assumed. Furthermore, L1 development was mainly predicted by performance on inhibitory control and working memory. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Asunto(s)
Aptitud , Cognición/fisiología , Aprendizaje/fisiología , Multilingüismo , Atención/fisiología , Preescolar , Femenino , Humanos , Inteligencia/fisiología , Lenguaje , Masculino , Memoria a Corto Plazo/fisiología , Pruebas Neuropsicológicas , VocabularioAsunto(s)
Inversión Cromosómica , Lenguaje , Aprendizaje , Procesos Mentales , Papio papio , Lectura , Animales , HumanosRESUMEN
According to a recent study, semantic similarity between concrete entities correlates with the similarity of activity patterns in left middle IPS during category naming. We examined the replicability of this effect under passive viewing conditions, the potential role of visuoperceptual similarity, where the effect is situated compared to regions that have been previously implicated in visuospatial attention, and how it compares to effects of object identity and location. Forty-six subjects participated. Subjects passively viewed pictures from two categories, musical instruments and vehicles. Semantic similarity between entities was estimated based on a concept-feature matrix obtained in more than 1,000 subjects. Visuoperceptual similarity was modeled based on the HMAX model, the AlexNet deep convolutional learning model, and thirdly, based on subjective visuoperceptual similarity ratings. Among the IPS regions examined, only left middle IPS showed a semantic similarity effect. The effect was significant in hIP1, hIP2, and hIP3. Visuoperceptual similarity did not correlate with similarity of activity patterns in left middle IPS. The semantic similarity effect in left middle IPS was significantly stronger than in the right middle IPS and also stronger than in the left or right posterior IPS. The semantic similarity effect was similar to that seen in the angular gyrus. Object identity effects were much more widespread across nearly all parietal areas examined. Location effects were relatively specific for posterior IPS and area 7 bilaterally. To conclude, the current findings replicate the semantic similarity effect in left middle IPS under passive viewing conditions, and demonstrate its anatomical specificity within a cytoarchitectonic reference frame. We propose that the semantic similarity effect in left middle IPS reflects the transient uploading of semantic representations in working memory.
RESUMEN
Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of 'words known' mentioned in other publications.
RESUMEN
Keuleers, Stevens, Mandera, and Brysbaert (2015) presented a new variable, word prevalence, defined as word knowledge in the population. Some words are known to more people than other. This is particularly true for low-frequency words (e.g., screenshot vs. scourage). In the present study, we examined the impact of the measure by collecting lexical decision times for 30,000 Dutch word lemmas of various lengths (the Dutch Lexicon Project 2). Word prevalence had the second highest correlation with lexical decision times (after word frequency): Words known by everyone in the population were responded to 100 ms faster than words known to only half of the population, even after controlling for word frequency, word length, age of acquisition, similarity to other words, and concreteness. Because word prevalence has rather low correlations with the existing measures (including word frequency), the unique variance it contributes to lexical decision times is higher than that of the other variables. We consider the reasons why word prevalence has an impact on word processing times and we argue that it is likely to be the most important new variable protecting researchers against experimenter bias in selecting stimulus materials.
Asunto(s)
Psicolingüística , Reconocimiento en Psicología , Semántica , Vocabulario , Adolescente , Adulto , Anciano , Toma de Decisiones , Femenino , Humanos , Masculino , Persona de Mediana Edad , Países Bajos , Prevalencia , Tiempo de Reacción , Lectura , Adulto JovenRESUMEN
This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
Asunto(s)
Colaboración de las Masas , Psicolingüística/métodos , Psicolingüística/estadística & datos numéricos , Psicolingüística/tendencias , Colaboración de las Masas/estadística & datos numéricos , Colaboración de las Masas/tendencias , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Psicolingüística/historia , SemánticaRESUMEN
Subjective ratings for age of acquisition, concreteness, affective valence, and many other variables are an important element of psycholinguistic research. However, even for well-studied languages, ratings usually cover just a small part of the vocabulary. A possible solution involves using corpora to build a semantic similarity space and to apply machine learning techniques to extrapolate existing ratings to previously unrated words. We conduct a systematic comparison of two extrapolation techniques: k-nearest neighbours, and random forest, in combination with semantic spaces built using latent semantic analysis, topic model, a hyperspace analogue to language (HAL)-like model, and a skip-gram model. A variant of the k-nearest neighbours method used with skip-gram word vectors gives the most accurate predictions but the random forest method has an advantage of being able to easily incorporate additional predictors. We evaluate the usefulness of the methods by exploring how much of the human performance in a lexical decision task can be explained by extrapolated ratings for age of acquisition and how precisely we can assign words to discrete categories based on extrapolated ratings. We find that at least some of the extrapolation methods may introduce artefacts to the data and produce results that could lead to different conclusions that would be reached based on the human ratings. From a practical point of view, the usefulness of ratings extrapolated with the described methods may be limited.