RESUMO
Human nonverbal vocalizations such as screams and cries often reflect their evolved functions. Although the universality of these putatively primordial vocal signals and their phylogenetic roots in animal calls suggest a strong reflexive foundation, many of the emotional vocalizations that we humans produce are under our voluntary control. This suggests that, like speech, volitional vocalizations may require auditory input to develop typically. Here, we acoustically analyzed hundreds of volitional vocalizations produced by profoundly deaf adults and typically-hearing controls. We show that deaf adults produce unconventional and homogenous vocalizations of aggression and pain that are unusually high-pitched, unarticulated, and with extremely few harsh-sounding nonlinear phenomena compared to controls. In contrast, fear vocalizations of deaf adults are relatively acoustically typical. In four lab experiments involving a range of perception tasks with 444 participants, listeners were less accurate in identifying the intended emotions of vocalizations produced by deaf vocalizers than by controls, perceived their vocalizations as less authentic, and reliably detected deafness. Vocalizations of congenitally deaf adults with zero auditory experience were most atypical, suggesting additive effects of auditory deprivation. Vocal learning in humans may thus be required not only for speech, but also to acquire the full repertoire of volitional non-linguistic vocalizations.
RESUMO
Passive Acoustic Monitoring (PAM), which involves using autonomous record units for studying wildlife behaviour and distribution, often requires handling big acoustic datasets collected over extended periods. While these data offer invaluable insights about wildlife, their analysis can present challenges in dealing with geophonic sources. A major issue in the process of detection of target sounds is represented by wind-induced noise. This can lead to false positive detections, i.e., energy peaks due to wind gusts misclassified as biological sounds, or false negative, i.e., the wind noise masks the presence of biological sounds. Acoustic data dominated by wind noise makes the analysis of vocal activity unreliable, thus compromising the detection of target sounds and, subsequently, the interpretation of the results. Our work introduces a straightforward approach for detecting recordings affected by windy events using a pre-trained convolutional neural network. This process facilitates identifying wind-compromised data. We consider this dataset pre-processing crucial for ensuring the reliable use of PAM data. We implemented this preprocessing by leveraging YAMNet, a deep learning model for sound classification tasks. We evaluated YAMNet as-is ability to detect wind-induced noise and tested its performance in a Transfer Learning scenario by using our annotated data from the Stony Point Penguin Colony in South Africa. While the classification of YAMNet as-is achieved a precision of 0.71, and recall of 0.66, those metrics strongly improved after the training on our annotated dataset, reaching a precision of 0.91, and recall of 0.92, corresponding to a relative increment of >28 %. Our study demonstrates the promising application of YAMNet in the bioacoustics and ecoacoustics fields, addressing the need for wind-noise-free acoustic data. We released an open-access code that, combined with the efficiency and peak performance of YAMNet, can be used on standard laptops for a broad user base.
Assuntos
Monitoramento Ambiental , Redes Neurais de Computação , Vento , Monitoramento Ambiental/métodos , Acústica , África do Sul , Ruído , AnimaisRESUMO
Baby cries can convey both static information related to individual identity and dynamic information related to the baby's emotional and physiological state. How do these dimensions interact? Are they transmitted independently, or do they compete against one another? Here we show that the universal acoustic expression of pain in distress cries overrides individual differences at the expense of identity signaling. Our acoustic analysis show that pain cries, compared with discomfort cries, are characterized by a more unstable source, thus interfering with the production of identity cues. Machine learning analyses and psychoacoustic experiments reveal that while the baby's identity remains encoded in pain cries, it is considerably weaker than in discomfort cries. Our results are consistent with the prediction that the costs of failing to signal distress outweigh the cost of weakening cues to identity.
RESUMO
In mammals, offspring vocalizations typically encode information about identity and body condition, allowing parents to limit alloparenting and adjust care. But how do these vocalizations mediate parental behavior in species faced with the problem of rearing not one, but multiple offspring, such as domestic dogs? Comprehensive acoustic analyses of 4,400 whines recorded from 220 Beagle puppies in 40 litters revealed litter and individual (within litter) differences in call acoustic structure. By then playing resynthesized whines to mothers, we showed that they provided more care to their litters, and were more likely to carry the emitting loudspeaker to the nest, in response to whine variants derived from their own puppies than from strangers. Importantly, care provisioning was attenuated by experimentally moving the fundamental frequency (fo, perceived as pitch) of their own puppies' whines outside their litter-specific range. Within most litters, we found a negative relationship between puppies' whine fo and body weight. Consistent with this, playbacks showed that maternal care was stronger in response to high-pitched whine variants simulating relatively small offspring within their own litter's range compared to lower-pitched variants simulating larger offspring. We thus show that maternal care in a litter-rearing species relies on a dual assessment of offspring identity and condition, largely based on level-specific inter- and intra-litter variation in offspring call fo. This dual encoding system highlights how, even in a long-domesticated species, vocalizations reflect selective pressures to meet species-specific needs. Comparative work should now investigate whether similar communication systems have convergently evolved in other litter-rearing species.
Assuntos
Comportamento Materno , Vocalização Animal , Animais , Cães , Comportamento Materno/fisiologia , Vocalização Animal/fisiologia , Feminino , Peso CorporalRESUMO
Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered "large" in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness-frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Assuntos
Voz , Humanos , Qualidade da Voz , Agressão , Comunicação , SomRESUMO
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
Assuntos
Software , Humanos , Fonética , Fala/fisiologia , Acústica da Fala , Prega Vocal/fisiologia , AcústicaRESUMO
Cat purring, the unusual, pulsed vibration that epitomizes comfort, enjoys a special status in the world of vocal communication research. Indeed, it has long been flagged as a rare exception to the dominant theory of voice production in mammals. A new study presents histological and biomechanical evidence that purring can occur passively, without needing muscle vibration in the larynx controlled by an independent neural oscillator.
Assuntos
Laringe , Prega Vocal , Gatos , Animais , Prega Vocal/fisiologia , Laringe/fisiologia , Vibração , Vocalização Animal , Comunicação , Fonação , MamíferosRESUMO
Humans have evolved voluntary control over vocal production for speaking and singing, while preserving the phylogenetically older system of spontaneous nonverbal vocalizations such as laughs and screams. To test for systematic acoustic differences between these vocal domains, we analyzed a broad, cross-cultural corpus representing over 2 h of speech, singing, and nonverbal vocalizations. We show that, while speech is relatively low-pitched and tonal with mostly regular phonation, singing and especially nonverbal vocalizations vary enormously in pitch and often display harsh-sounding, irregular phonation owing to nonlinear phenomena. The evolution of complex supralaryngeal articulatory spectro-temporal modulation has been critical for speech, yet has not significantly constrained laryngeal source modulation. In contrast, articulation is very limited in nonverbal vocalizations, which predominantly contain minimally articulated open vowels and rapid temporal modulation in the roughness range. We infer that vocal source modulation works best for conveying affect, while vocal filter modulation mainly facilitates semantic communication.
RESUMO
Variation in formant frequencies has been shown to affect social interactions and sexual competition in a range of avian species. Yet, the anatomical bases of this variation are poorly understood. Here, we investigated the morphological correlates of formants production in the vocal apparatus of African penguins. We modelled the geometry of the supra-syringeal vocal tract of 20 specimens to generate a population of virtual vocal tracts with varying dimensions. We then estimated the acoustic response of these virtual vocal tracts and extracted the centre frequency of the first four predicted formants. We demonstrate that: (i) variation in length and cross-sectional area of vocal tracts strongly affects the formant pattern, (ii) the tracheal region determines most of this variation, and (iii) the skeletal size of penguins does not correlate with the trachea length and consequently has relatively little effect on formants. We conclude that in African penguins, while the variation in vocal tract geometry generates variation in resonant frequencies supporting the discrimination of conspecifics, such variation does not provide information on the emitter's body size. Overall, our findings advance our understanding of the role of formant frequencies in bird vocal communication.
Assuntos
Spheniscidae , Animais , Spheniscidae/fisiologia , Vocalização Animal/fisiologia , Tamanho Corporal , Acústica , ComunicaçãoRESUMO
Global biodiversity is in rapid decline, and many seabird species have disproportionally poorer conservation statuses than terrestrial birds. A good understanding of population dynamics is necessary for successful conservation efforts, making noninvasive, cost-effective monitoring tools essential. Here, we set out to investigate whether passive acoustic monitoring (PAM) could be used to estimate the number of animals within a set area of an African penguin (Spheniscus demersus) colony in South Africa. We were able to automate the detection of ecstatic display songs (EDSs) in our recordings, thus facilitating the handling of large datasets. This allowed us to show that calling rate increased with wind speed and humidity but decreased with temperature, and to highlight apparent abundance variations between nesting habitat types. We then showed that the number of EDSs in our recordings positively correlated with the number of callers counted during visual observations, indicating that the density could be estimated based on calling rate. Our observations suggest that increasing temperatures may adversely impact penguin calling behaviour, with potential negative consequences for population dynamics, suggesting the importance of effective conservation measures. Crucially, this study shows that PAM could be successfully used to monitor this endangered species' populations with minimal disturbance.
RESUMO
The ability to discriminate between different individuals based on identity cues, which is important to support the social behaviour of many animal species, has mostly been investigated in conspecific contexts. A rare example of individual heterospecific discrimination is found in domestic dogs, who are capable of recognising their owners' voices. Here, we test whether grey wolves, the nearest wild relative of dogs, also have the ability to distinguish familiar human voices, which would indicate that dogs' ability is not a consequence of domestication. Using the habituation-dishabituation paradigm, we presented captive wolves with playback recordings of their keepers' and strangers' voices producing either familiar or unfamiliar phrases. The duration of their response was significantly longer when presented with keepers' voices than with strangers' voices, demonstrating that wolves discriminated between familiar and unfamiliar speakers. This suggests that dogs' ability to discriminate between human voices was probably present in their common ancestor and may support the idea that this is a general ability of vertebrates to recognise heterospecific individuals. Our study also provides further evidence for familiar voice discrimination in a wild animal in captivity, indicating that this ability may be widespread across vertebrate species.
Assuntos
Voz , Lobos , Humanos , Cães , Animais , Comportamento Social , Sinais (Psicologia) , DomesticaçãoRESUMO
What information is encoded in the cries of human babies? While it is widely recognized that cries can encode distress levels, whether cries reliably encode the cause of crying remains disputed. Here, we collected 39201 cries from 24 babies recorded in their homes longitudinally, from 15 days to 3.5 months of age, a database we share publicly for reuse. Based on the parental action that stopped the crying, which matched the parental evaluation of cry cause in 75% of cases, each cry was classified as caused by discomfort, hunger, or isolation. Our analyses show that baby cries provide reliable information about age and identity. Baby voices become more tonal and less shrill with age, while individual acoustic signatures drift throughout the first months of life. In contrast, neither machine learning algorithms nor trained adult listeners can reliably recognize the causes of crying.
RESUMO
Because the expression of pain in babies' cries is based on universal acoustic features, it is assumed that adult listeners should be able to detect when a crying baby is experiencing pain1-3. We report that detecting that a baby's cry expresses pain actually requires learning through experience. Our psychoacoustic experiments reveal that adults with no experience of caring for babies are unable to identify whether a baby's cry is a pain cry induced by vaccination or a mild discomfort cry recorded during a bath, even when they are familiar with the discomfort cries from this particular baby. In contrast, people with prior experience of babies - parents or professional caregivers - identify a familiar baby's pain cries without having heard these cries before. Parents of very young children are even able to identify the pain cries of a baby who is completely unfamiliar to them. Exposure through caregiving and/or parenting thus shapes the auditory and cognitive abilities involved in decoding the information conveyed by the baby's communication signals.
Assuntos
Choro , Poder Familiar , Acústica , Adulto , Criança , Pré-Escolar , Choro/psicologia , Humanos , Lactente , Aprendizagem , Dor/diagnósticoRESUMO
While nonlinear phenomena (NLP) are widely reported in animal vocalizations, often causing perceptual harshness and roughness, their communicative function remains debated. Several hypotheses have been put forward: attention-grabbing, communication of distress, exaggeration of body size and dominance. Here, we use state-of-the-art sound synthesis to investigate how NLP affect the perception of puppy whines by human listeners. Listeners assessed the distress, size or dominance conveyed by synthetic puppy whines with manipulated NLP, including frequency jumps and varying proportions of subharmonics, sidebands and deterministic chaos. We found that the presence of chaos increased the puppy's perceived level of distress and that this effect held across a range of representative fundamental frequency (fo) levels. Adding sidebands and subharmonics also increased perceived distress among listeners who have extensive caregiving experience with pre-weaned puppies (e.g. breeders, veterinarians). Finally, we found that whines with added chaos, subharmonics or sidebands were associated with larger and more dominant puppies, although these biases were attenuated in experienced caregivers. Together, our results show that nonlinear phenomena in puppy whines can convey rich information to human listeners and therefore may be crucial for offspring survival during breeding of a domesticated species.
Assuntos
Voz , Animais , Atenção , Comunicação , Cães , Humanos , Vocalização AnimalRESUMO
When producing intimidating aggressive vocalizations, humans and other animals often extend their vocal tracts to lower their voice resonance frequencies (formants) and thus sound big. Is acoustic size exaggeration more effective when the vocal tract is extended before, or during, the vocalization, and how do listeners interpret within-call changes in apparent vocal tract length? We compared perceptual effects of static and dynamic formant scaling in aggressive human speech and nonverbal vocalizations. Acoustic manipulations corresponded to elongating or shortening the vocal tract either around (Experiment 1) or from (Experiment 2) its resting position. Gradual formant scaling that preserved average frequencies conveyed the impression of smaller size and greater aggression, regardless of the direction of change. Vocal tract shortening from the original length conveyed smaller size and less aggression, whereas vocal tract elongation conveyed larger size and more aggression, and these effects were stronger for static than for dynamic scaling. Listeners familiarized with the speaker's natural voice were less often 'fooled' by formant manipulations when judging speaker size, but paid more attention to formants when judging aggressive intent. Thus, within-call vocal tract scaling conveys emotion, but a better way to sound large and intimidating is to keep the vocal tract consistently extended.
RESUMO
Vocal tract elongation, which uniformly lowers vocal tract resonances (formant frequencies) in animal vocalizations, has evolved independently in several vertebrate groups as a means for vocalizers to exaggerate their apparent body size. Here, we propose that smaller speech-like articulatory movements that alter only individual formants can serve a similar yet less energetically costly size-exaggerating function. To test this, we examine whether uneven formant spacing alters the perceived body size of vocalizers in synthesized human vowels and animal calls. Among six synthetic vowel patterns, those characterized by the lowest first and second formant (the vowel /u/ as in 'boot') are consistently perceived as produced by the largest vocalizer. Crucially, lowering only one or two formants in animal-like calls also conveys the impression of a larger body size, and lowering the second and third formants simultaneously exaggerates perceived size to a similar extent as rescaling all formants. As the articulatory movements required for individual formant shifts are minor compared to full vocal tract extension, they represent a rapid and energetically efficient mechanism for acoustic size exaggeration. We suggest that, by favouring the evolution of uneven formant patterns in vocal communication, this deceptive strategy may have contributed to the origins of the phonemic diversification required for articulated speech. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.
Assuntos
Voz , Acústica , Animais , Tamanho Corporal , Fala , Vocalização AnimalRESUMO
Existing evidence suggests that children from around the age of 8 years strategically alter their public image in accordance with known values and preferences of peers, through the self-descriptive information they convey. However, an important but neglected aspect of this 'self-presentation' is the medium through which such information is communicated: the voice itself. The present study explored peer audience effects on children's vocal productions. Fifty-six children (26 females, aged 8-10 years) were presented with vignettes where a fictional child, matched to the participant's age and sex, is trying to make friends with a group of same-sex peers with stereotypically masculine or feminine interests (rugby and ballet, respectively). Participants were asked to impersonate the child in that situation and, as the child, to read out loud masculine, feminine and gender-neutral self-descriptive statements to these hypothetical audiences. They also had to decide which of those self-descriptive statements would be most helpful for making friends. In line with previous research, boys and girls preferentially selected masculine or feminine self-descriptive statements depending on the audience interests. Crucially, acoustic analyses of fundamental frequency and formant frequency spacing revealed that children also spontaneously altered their vocal productions: they feminized their voices when speaking to members of the ballet club, while they masculinized their voices when speaking to members of the rugby club. Both sexes also feminized their voices when uttering feminine sentences, compared to when uttering masculine and gender-neutral sentences. Implications for the hitherto neglected role of acoustic qualities of children's vocal behaviour in peer interactions are discussed. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.
Assuntos
Feminilidade , Voz , Acústica , Criança , Feminino , Humanos , Masculino , MasculinidadeRESUMO
Research on within-individual modulation of vocal cues is surprisingly scarce outside of human speech. Yet, voice modulation serves diverse functions in human and nonhuman nonverbal communication, from dynamically signalling motivation and emotion, to exaggerating physical traits such as body size and masculinity, to enabling song and musicality. The diversity of anatomical, neural, cognitive and behavioural adaptations necessary for the production and perception of voice modulation make it a critical target for research on the origins and functions of acoustic communication. This diversity also implicates voice modulation in numerous disciplines and technological applications. In this two-part theme issue comprising 21 articles from leading and emerging international researchers, we highlight the multidisciplinary nature of the voice sciences. Every article addresses at least two, if not several, critical topics: (i) development and mechanisms driving vocal control and modulation; (ii) cultural and other environmental factors affecting voice modulation; (iii) evolutionary origins and adaptive functions of vocal control including cross-species comparisons; (iv) social functions and real-world consequences of voice modulation; and (v) state-of-the-art in multidisciplinary methodologies and technologies in voice modulation research. With this collection of works, we aim to facilitate cross-talk across disciplines to further stimulate the burgeoning field of voice modulation. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Assuntos
Mudança Social , Voz , Emoções , Humanos , Masculino , Comunicação não Verbal , FalaRESUMO
The human voice carries information about a vocalizer's physical strength that listeners can perceive and that may influence mate choice and intrasexual competition. Yet, reliable acoustic correlates of strength in human speech remain unclear. Compared to speech, aggressive nonverbal vocalizations (roars) may function to maximize perceived strength, suggesting that their acoustic structure has been selected to communicate formidability, similar to the vocal threat displays of other animals. Here, we test this prediction in two non-WEIRD African samples: an urban community of Cameroonians and rural nomadic Hadza hunter-gatherers in the Tanzanian bushlands. Participants produced standardized speech and volitional roars and provided handgrip strength measures. Using acoustic analysis and information-theoretic multi-model inference and averaging techniques, we show that strength can be measured from both speech and roars, and as predicted, strength is more reliably gauged from roars than vowels, words or greetings. The acoustic structure of roars explains 40-70% of the variance in actual strength within adults of either sex. However, strength is predicted by multiple acoustic parameters whose combinations vary by sex, sample and vocal type. Thus, while roars may maximally signal strength, more research is needed to uncover consistent and likely interacting acoustic correlates of strength in the human voice. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Assuntos
Fala , Voz , Acústica , Agressão , Animais , Força da Mão , HumanosRESUMO
Distress cries are emitted by many mammal species to elicit caregiving attention. Across taxa, these calls tend to share similar acoustic structures, but not necessarily frequency range, raising the question of their interspecific communicative potential. As domestic dogs are highly responsive to human emotional cues and experience stress when hearing human cries, we explore whether their responses to distress cries from human infants and puppies depend upon sharing conspecific frequency range or species-specific call characteristics. We recorded adult dogs' responses to distress cries from puppies and human babies, emitted from a loudspeaker in a basket. The frequency of the cries was presented in both their natural range and also shifted to match the other species. Crucially, regardless of species origin, calls falling into the dog call-frequency range elicited more attention. Thus, domestic dogs' responses depended strongly on the frequency range. Females responded both faster and more strongly than males, potentially reflecting asymmetries in parental care investment. Our results suggest that, despite domestication leading to an increased overall responsiveness to human cues, dogs still respond considerably less to calls in the natural human infant range than puppy range. Dogs appear to use a fast but inaccurate decision-making process to determine their response to distress-like vocalisations.