RESUMO
Like biological species, words in language must compete to survive. Previously, it has been shown that language changes in response to cognitive constraints and over time becomes more learnable. Here, we use two complementary research paradigms to demonstrate how the survival of existing word forms can be predicted by psycholinguistic properties that impact language production. In the first study, we analyzed the survival of words in the context of interpersonal communication. We analyzed data from a large-scale serial-reproduction experiment in which stories were passed down along a transmission chain over multiple participants. The results show that words that are acquired earlier in life, more concrete, more arousing, and more emotional are more likely to survive retellings. We reason that the same trend might scale up to language evolution over multiple generations of natural language users. If that is the case, the same set of psycholinguistic properties should also account for the change of word frequency in natural language corpora over historical time. That is what we found in two large historical-language corpora (Study 2): Early acquisition, concreteness, and high arousal all predict increasing word frequency over the past 200 y. However, the two studies diverge with respect to the impact of word valence and word length, which we take up in the discussion. By bridging micro-level behavioral preferences and macro-level language patterns, our investigation sheds light on the cognitive mechanisms underlying word competition.
Assuntos
Idioma , Psicolinguística , Humanos , Emoções/fisiologia , Nível de Alerta/fisiologia , CogniçãoRESUMO
There is substantial evidence that children's apparent omission of grammatical morphemes in utterances such as "She play tennis" and "Mummy eating" is in fact errors of commission in which contextually licensed unmarked forms encountered in the input are reproduced in a context-blind fashion. So how do children stop making such errors? In this study, we test the assumption that children's ability to recover from error is related to their developing sensitivity to longer-range dependencies. We use a pre-registered corpus analysis to explore the predictive value of different cues with regards to children's verb-marking errors and observe a developmental pattern consistent with this account. We look at context-independent cues (the identity of the specific verb being used) and at the relative value of context-dependent cues (the identity of the specific subject+verb sequence being used). We find that the only consistent effect across a group of 2- to 3-year-olds and a group of 3- to 4-year-olds is the relative frequency of unmarked forms of specific subject+verb sequences being used. The relative frequency of unmarked forms of the verb alone is predictive only in the younger age group. This is consistent with an account in which children recover from making errors by becoming progressively more sensitive to context, at first the immediately preceding lexical contexts (e.g., the subject that precedes the verb) and eventually more distant grammatical markers (e.g., the fronted auxiliary that precedes the subject in questions). RESEARCH HIGHLIGHTS: We provide a corpus analysis investigating input effects on young children's verb-marking errors (e.g., Mummy go) across development (between 2 and 4 years of age). We find evidence that these apparent errors of omission are in fact input-driven errors of commission that persist into the third year of life. We compare the relative effect on error rates of context-independent (e.g., verb) and context-dependent (e.g., subject+verb sequence) cues across developmental time. Our findings support the proposal that children recover from making verb-marking errors by becoming progressively more sensitive to preceding context.
Assuntos
Linguagem Infantil , Sinais (Psicologia) , Desenvolvimento da Linguagem , Humanos , Pré-Escolar , Masculino , Feminino , Fatores EtáriosRESUMO
We investigate Korean-speaking children's knowledge about clause-level constructions involving a transitive event - active transitive and suffixal passive - through corpus analysis and Bayesian modelling. The analysis of Korean caregiver input and children's production in CHILDES revealed that the rates of constructional patterns produced by the children mirrored those uttered by the caregivers to a considerable degree and that the caregivers' use of case-marking was skewed towards single form-function pairings (despite the multiple form-function associations that the markers manifest). Based on these characteristics, we modelled a Bayesian learner by employing construction-based input (without considering lexical information). This simulation revealed the dominance of several constructional patterns, occupying most of the input, and their inhibitory effects on the development of the other patterns. Our findings illuminate how children shape clause-level constructional knowledge in Korean, an understudied language for this topic, as a function of input properties and domain-general learning capacities, appealing to the usage-based constructionist approach.
Assuntos
Desenvolvimento da Linguagem , Idioma , Humanos , Criança , Teorema de Bayes , Linguagem Infantil , República da CoreiaRESUMO
This interdisciplinary study examined the structure of humor creation in the specific context of efforts to positively reappraise stressful situations for effective coping. In a sample of n = 101 participants, a performance test was used to assess the quantity (fluency, number of generated ideas that qualified as humor) and quality (rated funniness) of humor creation in cognitive reappraisal. Linguistic mechanisms were identified and quantified using cognitive-linguistic methods of corpus analysis, and their employment was correlated with humor production performance on the level of the individual. Almost all individuals were able to come up with reappraisal ideas that qualified as humorous. Depressive symptoms, a negative mood state, and high perceptions of threat did not compromise the participants' capability to create humor. Individuals who were more serious-minded as a trait produced ideas that were rated as less funny, but their basic ability to create humor was unaffected. Metonymy (a contiguity-based principle of meaning extension) emerged as by far the most prominent semantic mechanism in the creation of humorous re-interpretations. Furthermore, its use was related to good humor creation performance in terms of quantity and quality, which is in line with its assumed importance in the extension of meaning in general and the creation of humor in particular. Further effective linguistic mechanisms and conceptual phenomena were identified. The empirical data may be valuable for the development of interventions involving the creation of humorous ideas for cognitive reappraisal.
RESUMO
As written language contains more complex syntax than spoken language, exposure to written language provides opportunities for children to experience language input different from everyday speech. We investigated the distribution and nature of relative clauses in three large developmental corpora: one of child-directed speech (targeted at pre-schoolers) and two of text written for children - namely, picture books targeted at pre-schoolers for shared reading and children's own reading books. Relative clauses were more common in both types of book language. Within text, relative clause usage increased with intended age, and was more frequent in nonfiction than fiction. The types of relative clause structures in text co-occurred with specific lexical properties, such as noun animacy and pronoun use. Book language provides unique access to grammar not easily encountered in speech. This has implications for the distributional lexical-syntactic features and associated discourse functions that children experience and, from this, consequences for language development.
RESUMO
This report introduces the Beijing Sentence Corpus (BSC). This is a Chinese sentence corpus of eye-tracking data with relatively clear word boundaries. In addition, we report predictability norms for each word in the corpus. Eye movement corpora are available in alphabetic scripts such as English, German, and French. However, there is no publicly available corpus for Chinese. Thus, to study predictive processes during reading in Chinese, it is necessary to establish such a corpus. Also, given the clear word boundaries in the sentences, BSC is especially useful to provide evidence relevant to the theoretical debate of saccade target selection in Chinese. With the large-scale predictability norms, we conducted new analyses based on 60 BSC readers, testing the influences of launch word and target word properties while controlling for visual and oculomotor constraints, as well as sentence and subject-level individual differences. We discuss implications for guidance of eye movements in Chinese reading.
Assuntos
Movimentos Oculares , Leitura , Pequim , Humanos , Idioma , Movimentos SacádicosRESUMO
Psycholinguistic research over the past decade has suggested that children's linguistic knowledge includes dedicated representations for frequently-encountered multiword sequences. Important evidence for this comes from studies of children's production: it has been repeatedly demonstrated that children's rate of speech errors is greater for word sequences that are infrequent and thus unfamiliar to them than for those that are frequent. In this study, we investigate whether children's knowledge of multiword sequences can explain a phenomenon that has long represented a key theoretical fault line in the study of language development: errors of subject-auxiliary non-inversion in question production (e.g., "why we can't go outside?*"). In doing so we consider a type of error that has been ignored in discussion of multiword sequences to date. Previous work has focused on errors of omission - an absence of accurate productions for infrequent phrases. However, if children make use of dedicated representations for frequent sequences of words in their productions, we might also expect to see errors of commission - the appearance of frequent phrases in children's speech even when such phrases are not appropriate. Through a series of corpus analyses, we provide the first evidence that the global input frequency of multiword sequences (e.g., "she is going" as it appears in declarative utterances) is a valuable predictor of their errorful appearance (e.g., the uninverted question "what she is going to do?*") in naturalistic speech. This finding, we argue, constitutes powerful evidence that multiword sequences can be represented as linguistic units in their own right.
Assuntos
Linguística , Fala , Criança , Feminino , Humanos , Idioma , Desenvolvimento da Linguagem , PsicolinguísticaRESUMO
Adjectives are essential for describing and differentiating concepts. However, they have a protracted development relative to other word classes. Here we measure three- and four-year-olds' exposure to adjectives across a range of interactive and socioeconomic contexts to: (i) measure the syntactic, semantic, and pragmatic variability of adjectives in child-directed speech (CDS); and (ii) investigate how features of the input might scaffold adjective acquisition. In our novel corpus of UK English, adjectives occurred more frequently in prenominal than in postnominal (predicative) syntactic frames, though postnominal frames were more frequent for less-familiar adjectives. They occurred much more frequently with a descriptive than a contrastive function, especially for less-familiar adjectives. Our findings present a partial mismatch between the forms of adjectives found in real-world CDS and those forms that have been shown to be more useful for learning. We discuss implications for models of adjective acquisition and for clinical practice.
Assuntos
Desenvolvimento da Linguagem , Idioma , Relações Mãe-Filho , Fala , Livros , Linguagem Infantil , Pré-Escolar , Família , Feminino , Humanos , Aprendizagem , Masculino , Jogos e Brinquedos , Semântica , Classe SocialRESUMO
Children learn high phonological neighbourhood density words more easily than low phonological neighbourhood density words (Storkel, 2004). However, the strength of this effect relative to alternative predictors of word acquisition is unclear. We addressed this issue using communicative inventory data from 300 British English-speaking children aged 12 to 25 months. Using Bayesian regression, we modelled word understanding and production as a function of: (i) phonological neighbourhood density, (ii) frequency, (iii) length, (iv) babiness, (v) concreteness, (vi) valence, (vii) arousal, and (viii) dominance. Phonological neighbourhood density predicted word production but not word comprehension, and this effect was stronger in younger children.
Assuntos
Compreensão , Desenvolvimento da Linguagem , Vocabulário , Teorema de Bayes , Pré-Escolar , Feminino , Humanos , Lactente , Aprendizagem , Masculino , FonéticaRESUMO
Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child's life that allows us to measure the child's experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child's production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children's language acquisition.
Assuntos
Desenvolvimento Infantil/fisiologia , Aprendizagem/fisiologia , Inteligibilidade da Fala/fisiologia , Pré-Escolar , Feminino , Humanos , Lactente , MasculinoRESUMO
As human activity and interaction increasingly take place online, the digital residues of these activities provide a valuable window into a range of psychological and social processes. A great deal of progress has been made toward utilizing these opportunities; however, the complexity of managing and analyzing the quantities of data currently available has limited both the types of analysis used and the number of researchers able to make use of these data. Although fields such as computer science have developed a range of techniques and methods for handling these difficulties, making use of those tools has often required specialized knowledge and programming experience. The Text Analysis, Crawling, and Interpretation Tool (TACIT) is designed to bridge this gap by providing an intuitive tool and interface for making use of state-of-the-art methods in text analysis and large-scale data management. Furthermore, TACIT is implemented as an open, extensible, plugin-driven architecture, which will allow other researchers to extend and expand these capabilities as new methods become available.
Assuntos
Mineração de Dados/métodos , Software , HumanosRESUMO
Children overgeneralise verbs to ungrammatical structures early in acquisition, but retreat from these overgeneralisations as they learn semantic verb classes. In a large corpus of English locative utterances (e.g., the woman sprayed water onto the wall/wall with water), we found structural biases which changed over development and which could explain overgeneralisation behaviour. Children and adults had similar verb classes and a correspondence analysis suggested that lexical distributional regularities in the adult input could help to explain the acquisition of these classes. A connectionist model provided an explicit account of how structural biases could be learned over development and how these biases could be reduced by learning verb classes from distributional regularities.
Assuntos
Desenvolvimento da Linguagem , Idioma , Psicolinguística , Adulto , Pré-Escolar , Generalização Psicológica , Humanos , Semântica , Comportamento VerbalRESUMO
ABSTRACT This study addresses the basic structure of playfulness in adults from a psycho-lexical approach and its relationship with the sense of humor. Using items derived from a corpus analysis of written accounts in the German language, five factors were derived (N = 195); that is, (a) cheerful-engaged; (b) whimsical; (c) creative-loving; (d) intellectual; and (e) impulsive. Their contents strongly overlap in comparison with an earlier study using this approach. However, the correlation of the intellectual component with two current measures of adult playfulness was low, and the impulsive component was not correlated with these measures. The question arises as to whether these aspects exist only as components in the implicit psychological and linguistic theories. The sense of humor was most strongly related with the cheerful-engaged factor while some "humor skills" were particularly related to other factors; for example, finding humor under stress with the intellectual component. This study helps toward a better understanding of the basic structure of playfulness in adults.
Assuntos
Jogos e Brinquedos/psicologia , Psicolinguística , Inquéritos e Questionários , Senso de Humor e Humor como Assunto , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Caráter , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Psicometria , Autoimagem , Suíça , Adulto JovemRESUMO
Dependency distance (DD) is an important factor in language processing and can affect the ease with which a sentence is understood. Previous studies have investigated the role of DD in L2 writing, but little is known about how the native language influences DD in L2 academic writing. This study is probably the first one that investigates, though a large dataset of over 400 million words, whether the native language of L2 writers influences the DD in their academic writings. Using a dataset of over 2.2 million abstracts of articles downloaded from Scopus in the fields of Arts & Humanities and Social Sciences, the study analyzes the DD patterns, parsed by the latest version of the syntactic parser Stanford Corenlp 4.5.5, in the academic writing of L2 learners from different language backgrounds. It is found that native languages influence the DD of English L2 academic writings. When the mean dependency distance (MDD) of native languages is much longer than that of native English, the MDD of their English L2 academic writings will be much longer than that of English native academic writings. The findings of this study will deepen our insights into the influence of native language transfer on L2 academic writing, potentially shaping pedagogical strategies in L2 academic writing education.
RESUMO
Most research regarding early word learning in English tends to make the simplifying assumption that there exists a one-to-one mapping between concrete objects and their labels. In the current work, we provide evidence that runs counter to this assumption, aligning English with more morphologically-rich languages. We suggest that even in a morphologically-poor language like English, real world language input to infants does not provide tidy 1-to-1 mappings. Instead, infants encounter many variant wordforms for familiar nouns (e.g. dogâ¼doggyâ¼dogs). We explore this wordform variability in 44 English-learning infants' naturalistic environments using a longitudinal corpus of infant-available speech. We look at both the frequency and composition of wordform variability. We find two broad categories of variability: referent-changing alterations, where words were pluralized or compounded (e.g. coatâ¼raincoats); and wordplay, where words changed form without a notable change in referent (e.g. birdâ¼birdie). We further find that wordplay occurs with a limited number of lemmas that are usually early-learned, high-frequency, and shorter. When looking at all wordform variability, we find that individual words with higher levels of wordform variability are learned earlier than words with fewer wordforms, over and above the effect of frequency.
Assuntos
Desenvolvimento da Linguagem , Percepção da Fala , Lactente , Humanos , Animais , Cães , Idioma , Aprendizagem Verbal , Aprendizagem , FalaRESUMO
Piantadosi, Tily, and Gibson analyzed a large-scale web-scraping corpus (the Google 1T dataset) and reported that word length is independently predicted from average information content (surprisal) calculated by a 2- to 4-gram model (hereafter, longer-span surprisal) across 11 Indo-European languages, namely, Czech, Dutch, English, French, German, Italian, Polish, Spanish, Portuguese, Romanian, and Swedish. However, a recent article by Meylan and Griffiths suggested the importance of preprocessing for studies with large-scale corpora and reanalyzed the same databases. After their preprocessing, the results in Piantadosi et al. were not replicated in Czech, Romanian, and Swedish. Additionally, a German-specific study by Koplenig, Kupietz, and Wolfer showed that the strict analysis did not replicate the result in Piantadosi et al. for that language with the preprocessing suggested by Meylan and Griffiths in a large-scale but less noisy database. These three studies provide evidence from 11 Indo-European languages and one Afro-Asiatic language, Hebrew, as relevant in this debate. However, we do not have evidence from other linguistic groups. This study provides evidence about Japanese based on a strict preprocessing of Google's web-scraping database. The results show that Japanese word length can be predicted independently by 2- to 4-gram surprisal.
Assuntos
Idioma , Linguística , JapãoRESUMO
Understanding the reception of public health messages in public-facing communications is of key importance to health agencies in managing crises, pandemics, and other health threats. Established public health communications strategies including self-efficacy messaging, fear appeals, and moralising messaging were all used during the Coronavirus pandemic. We explore the reception of public health messages to understand the efficacy of these established messaging strategies in the COVID-19 context. Taking a community-focussed approach, we combine a corpus linguistic analysis with methods of wider engagement, namely, a public survey and interactions with a Public Involvement Panel to analyse this type of real-world public health discourse. Our findings indicate that effective health messaging content provides manageable instructions, which inspire public confidence that following the guidance is worthwhile. Messaging that appeals to the audience's morals or fears in order to provide a rationale for compliance can be polarising and divisive, producing a strongly negative emotional response from the public and potentially undermining social cohesion. Provenance of the messaging alongside text-external political factors also have an influence on messaging uptake. In addition, our findings highlight key differences in messaging uptake by audience age, which demonstrates the importance of tailored communications and the need to seek public feedback to test the efficacy of messaging with the relevant demographics. Our study illustrates the value of corpus linguistics to public health agencies and health communications professionals, and we share our recommendations for improving the public health messaging both in the context of the ongoing pandemic and for future novel and re-emerging infectious disease outbreaks.
RESUMO
BACKGROUND: Methamphetamine is a highly addictive stimulant that affects the central nervous system. Crystal methamphetamine is a form of the drug resembling glass fragments or shiny bluish-white rocks that can be taken through smoking, swallowing, snorting, or injecting the powder once it has been dissolved in water or alcohol. OBJECTIVE: The objective of this study is to examine how identities are socially (discursively) constructed by people who use methamphetamine within a subreddit for people who regularly use crystal meth. METHODS: Using a mixed methods approach, we analyzed 1000 threads (318,422 words) from a subreddit for regular crystal meth users. The qualitative component of the analysis used concordancing and corpus-based discourse analysis to identify discursive themes informed by assemblage theory. The quantitative portion of the analysis used corpus linguistic techniques including keyword analysis to identify words occurring with statistically marked frequency in the corpus and collocation analysis to analyze their discursive context. RESULTS: Our findings reveal that the subreddit contributors use a rich and varied lexicon to describe crystal meth and other substances, ranging from a neuroscientific register (eg, methamphetamine and dopamine) to informal vernacular (eg, meth, dope, and fent) and commercial appellations (eg, Adderall and Seroquel). They also use linguistic resources to construct symbolic boundaries between different types of methamphetamine users, differentiating between the esteemed category of "functional addicts" and relegating others to the stigmatized category of "tweakers." In addition, contributors contest the dominant view that methamphetamine use inevitably leads to psychosis, arguing instead for a more nuanced understanding that considers the interplay of factors such as sleep deprivation, poor nutrition, and neglected hygiene. CONCLUSIONS: The subreddit contributors' discourse offers a "set and setting" perspective, which provides a fresh viewpoint on drug-induced psychosis and can guide future harm reduction strategies and research. In contrast to this view, many previous studies overlook the real-world complexities of methamphetamine use, perhaps due to the use of controlled experimental settings. Actual drug use, intoxication, and addiction are complex, multifaceted, and elusive phenomena that defy straightforward characterization.
Assuntos
Transtornos Relacionados ao Uso de Anfetaminas , Estimulantes do Sistema Nervoso Central , Metanfetamina , Humanos , Metanfetamina/efeitos adversos , Estimulantes do Sistema Nervoso Central/efeitos adversos , Fumar , Fumar TabacoRESUMO
Corpus analyses have shown that turn-taking in conversation is much faster than laboratory studies of speech planning would predict. To explain fast turn-taking, Levinson and Torreira (2015) proposed that speakers are highly proactive: They begin to plan a response to their interlocutor's turn as soon as they have understood its gist, and launch this planned response when the turn-end is imminent. Thus, fast turn-taking is possible because speakers use the time while their partner is talking to plan their own utterance. In the present study, we asked how much time upcoming speakers actually have to plan their utterances. Following earlier psycholinguistic work, we used transcripts of spoken conversations in Dutch, German, and English. These transcripts consisted of segments, which are continuous stretches of speech by one speaker. In the psycholinguistic and phonetic literature, such segments have often been used as proxies for turns. We found that in all three corpora, large proportions of the segments comprised of only one or two words, which on our estimate does not give the next speaker enough time to fully plan a response. Further analyses showed that speakers indeed often did not respond to the immediately preceding segment of their partner, but continued an earlier segment of their own. More generally, our findings suggest that speech segments derived from transcribed corpora do not necessarily correspond to turns, and the gaps between speech segments therefore only provide limited information about the planning and timing of turns.
Assuntos
Comunicação , Fala , Humanos , Idioma , Fonética , Psicolinguística , Fala/fisiologiaRESUMO
Over their first years of life, children learn not just the words of their native languages, but how to use them to communicate. Because manual annotation of communicative intent does not scale to large corpora, our understanding of communicative act development is limited to case studies of a few children at a few time points. We present an approach to automatic identification of communicative acts using a hidden topic Markov model, applying it to the conversations of English-learning children in the CHILDES database. We first describe qualitative changes in parent-child communication over development, and then use our method to demonstrate two large-scale features of communicative development: (a) children develop a parent-like repertoire of our model's communicative acts rapidly, their learning rate peaking around 14 months of age, and (b) this period of steep repertoire change coincides with the highest predictability between parents' acts and children's, suggesting that structured interactions play a role in learning to communicate.