Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Commun Biol ; 7(1): 711, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38862808

RESUMO

Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.


Assuntos
Percepção da Fala , Humanos , Masculino , Feminino , Adulto , Adulto Jovem , Percepção da Fala/fisiologia , Rede Nervosa/fisiologia , Córtex Auditivo/fisiologia , Voz/fisiologia , Corpo Estriado/fisiologia
2.
JASA Express Lett ; 4(1)2024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38169314

RESUMO

Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels. These findings offer potential cues for identifying shouting, particularly useful when fine-structure and dynamic range cues are absent, for example, in cochlear implant users.


Assuntos
Implante Coclear , Implantes Cocleares , Percepção da Fala , Fala , Idioma
3.
Sci Rep ; 13(1): 18742, 2023 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-37907749

RESUMO

Human voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N = 22), Zoom-quality (N = 21), or telephone-quality (N = 20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (d') in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p = 0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.


Assuntos
Percepção da Fala , Voz , Humanos , Feminino , Reconhecimento de Voz , Fala , Acústica
4.
Front Psychol ; 14: 1145572, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37342649

RESUMO

Introduction: Cooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality. Therefore, we tested the effects of group size (3 and 5 interactants) on vocal convergence and individualization in a social communication scenario in which individual recognition by voice is at stake. Methods: In an interactive game, players had to recognize each other through their voices while solving a cooperative task online. The vocal similarity was quantified through similarities in speaker i-vectors obtained through probabilistic linear discriminant analysis (PLDA). Speaker recognition performance was measured through the system Equal Error Rate (EER). Results: Vocal similarity between-speakers increased with a larger group size which indicates a higher cooperative vocal behavior. At the same time, there was an increase in EER for the same speakers between the smaller and the larger group size, meaning a decrease in overall recognition performance. Discussion: The decrease in vocal individualization in the larger group size suggests that ingroup cooperation and social cohesion conveyed through acoustic convergence have priority over individualization in larger groups of unacquainted speakers.

5.
Anim Cogn ; 25(6): 1393-1398, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35595881

RESUMO

The human auditory system is capable of processing human speech even in situations when it has been heavily degraded, such as during noise-vocoding, when frequency domain-based cues to phonetic content are strongly reduced. This has contributed to arguments that speech processing is highly specialized and likely a de novo evolved trait in humans. Previous comparative research has demonstrated that a language competent chimpanzee was also capable of recognizing degraded speech, and therefore that the mechanisms underlying speech processing may not be uniquely human. However, to form a robust reconstruction of the evolutionary origins of speech processing, additional data from other closely related ape species is needed. Specifically, such data can help disentangle whether these capabilities evolved independently in humans and chimpanzees, or if they were inherited from our last common ancestor. Here we provide evidence of processing of highly varied (degraded and computer-generated) speech in a language competent bonobo, Kanzi. We took advantage of Kanzi's existing proficiency with touchscreens and his ability to report his understanding of human speech through interacting with arbitrary symbols called lexigrams. Specifically, we asked Kanzi to recognise both human (natural) and computer-generated forms of 40 highly familiar words that had been degraded (noise-vocoded and sinusoidal forms) using a match-to-sample paradigm. Results suggest that-apart from noise-vocoded computer-generated speech-Kanzi recognised both natural and computer-generated voices that had been degraded, at rates significantly above chance. Kanzi performed better with all forms of natural voice speech compared to computer-generated speech. This work provides additional support for the hypothesis that the processing apparatus necessary to deal with highly variable speech, including for the first time in nonhuman animals, computer-generated speech, may be at least as old as the last common ancestor we share with bonobos and chimpanzees.


Assuntos
Hominidae , Pan paniscus , Percepção da Fala , Animais , Humanos , Estimulação Acústica/veterinária , Computadores , Pan troglodytes , Fala
6.
J Acoust Soc Am ; 150(4): 2836, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34717513

RESUMO

Foreign-accented speech typically deviates segmentally and suprasegmentally from native-accented speech. Two experiments were conducted to investigate the role of amplitude envelope (ENV), segment duration (DUR), and speech rate (SR) on Italian listeners' ability to identify native-accented Italian in utterances produced by Zurich German speakers. In experiment 1, listeners judged in a two-alternative forced-choice perception task which of the two stimuli in a trial they perceived as more native-like. Stimuli in each trial only varied in ENV and DUR, which were retrieved either from a native Italian speaker [first language (L1) donor] or from a German speaker of Italian [second language (L2) donor]. Results revealed that listeners make use of both DUR and ENV to identify the more native-like stimuli, but the effect of ENV was more subtle. In experiment 2, SR differences (resulting from native and non-native segment duration differences in experiment 1) were normalized for. It was found that this drastically reduced the effect of segment durations in terms of perceived nativeness; however, the ENV effect still remained. This was not the case in a control group of listeners without competence in Italian. Though effects were subtle, the study shows that ENV cues contribute to the percept of nativeness in L2 speech.


Assuntos
Percepção da Fala , Fala , Sinais (Psicologia) , Idioma , Fonética
7.
J Acoust Soc Am ; 146(1): EL1, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31370609

RESUMO

An unsupervised automatic clustering algorithm (k-means) classified 1282 Mel frequency cepstral coefficient (MFCC) representations of isolated steady-state vowel utterances from eight standard German vowel categories with fo between 196 and 698 Hz. Experiment I obtained the number of MFCCs (1-20) in connection with the spectral bandwidth (2-20 kHz) at which performance peaked (five MFCCs at 4 kHz). In experiment II, classification performance with different ranges of fo revealed that ranges with fo > 500 Hz reduced classification performance but it remained well above chance. This shows that isolated steady state vowels with strongly undersampled spectra contain sufficient acoustic information to be classified automatically.

8.
Neurobiol Aging ; 80: 116-126, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31170532

RESUMO

Age-related decline in speech perception may result in difficulties partaking in spoken conversation and potentially lead to social isolation and cognitive decline in older adults. It is therefore important to better understand how age-related differences in neurostructural factors such as cortical thickness (CT) and cortical surface area (CSA) are related to neurophysiological sensitivity to speech cues in younger and older adults. Age-related differences in CT and CSA of bilateral auditory-related areas were extracted using FreeSurfer in younger and older adults with normal peripheral hearing. Behavioral and neurophysiological sensitivity to prosodic speech cues (word stress and fundamental frequency of oscillation) was evaluated using discrimination tasks and a passive oddball paradigm, while EEG was recorded, to quantify mismatch negativity responses. Results revealed (a) higher neural sensitivity (i.e., larger mismatch negativity responses) to word stress in older adults compared to younger adults, suggesting a higher importance of prosodic speech cues in the speech processing of older adults, and (b) lower CT in auditory-related regions in older compared to younger individuals, suggesting neuronal loss associated with aging. Within the older age group, less neuronal loss (i.e., higher CT) in a right auditory-related area (i.e., the supratemporal sulcus) was related to better performance in fundamental frequency discrimination, while higher CSA in left auditory-related areas was associated with higher neural sensitivity toward prosodic speech cues as evident in the mismatch negativity patterns. Overall, our results offer evidence for neurostructural changes in aging that are associated with differences in the extent to which left and right auditory-related areas are involved in speech processing in older adults. We argue that exploring age-related differences in brain structure and function associated with decline in speech perception in older adults may help develop much needed rehabilitation strategies for older adults with central age-related hearing loss.


Assuntos
Envelhecimento/patologia , Envelhecimento/fisiologia , Córtex Cerebral/patologia , Córtex Cerebral/fisiologia , Percepção da Fala/fisiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Córtex Auditivo/patologia , Disfunção Cognitiva/etiologia , Feminino , Perda Auditiva Central/etiologia , Humanos , Masculino , Isolamento Social , Adulto Jovem
9.
J Acoust Soc Am ; 145(3): EL209, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-31067968

RESUMO

First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics. Results showed that measures of negative F1 dynamics explained more between-speaker variability, which was highly congruent with a previous study using intensity dynamics [He and Dellwo (2017). J. Acoust. Soc. Am. 141, EL488-EL494]. The results may be explained by speaker idiosyncratic articulation.


Assuntos
Acústica da Fala , Adulto , Variação Biológica da População , Feminino , Humanos , Masculino , Fonação , Interface para o Reconhecimento da Fala , Voz/fisiologia
10.
J Acoust Soc Am ; 142(4): 2419, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-29092541

RESUMO

The perception of stress is highly influenced by listeners' native language. In this research, the authors examined the effect of intonation and talker variability (here: phonetic variability) in the discrimination of Spanish lexical stress contrasts by native Spanish (N = 17), German (N = 21), and French (N = 27) listeners. Participants listened to 216 trials containing three Spanish disyllabic words, where one word carried a different lexical stress to the others. The listeners' task was to identify the deviant word in each trial (Odd-One-Out task). The words in the trials were produced by either the same talker or by two different talkers, and carried the same or varying intonation patterns. The German listeners' performance was lower compared to the Spanish listeners but higher than that of the French listeners. French listeners performed above chance level with and without talker variability, and performed at chance level when intonation variability was introduced. Results are discussed in the context of the stress "deafness" hypothesis.


Assuntos
Idioma , Fonética , Percepção da Fala , Adulto , Humanos , Espectrografia do Som , Adulto Jovem
11.
J Acoust Soc Am ; 142(2): 1025, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28863619

RESUMO

The phonological function of vowels can be maintained at fundamental frequencies (fo) up to 880 Hz [Friedrichs, Maurer, and Dellwo (2015). J. Acoust. Soc. Am. 138, EL36-EL42]. Here, the influence of talker variability and multiple response options on vowel recognition at high fos is assessed. The stimuli (n = 264) consisted of eight isolated vowels (/i y e ø ε a o u/) produced by three female native German talkers at 11 fos within a range of 220-1046 Hz. In a closed-set identification task, 21 listeners were presented excised 700-ms vowel nuclei with quasi-flat fo contours and resonance trajectories. The results show that listeners can identify the point vowels /i a u/ at fos up to almost 1 kHz, with a significant decrease for the vowels /y ε/ and a drop to chance level for the vowels /e ø o/ toward the upper fos. Auditory excitation patterns reveal highly differentiable representations for /i a u/ that can be used as landmarks for vowel category perception at high fos. These results suggest that theories of vowel perception based on overall spectral shape will provide a fuller account of vowel perception than those based solely on formant frequency patterns.

12.
J Acoust Soc Am ; 141(5): EL488, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28599553

RESUMO

Intensity contours of speech signals were sub-divided into positive and negative dynamics. Positive dynamics were defined as the speed of increases in intensity from amplitude troughs to subsequent peaks, and negative dynamics as the speed of decreases in intensity from peaks to troughs. Mean, standard deviation, and sequential variability were measured for both dynamics in each sentence. Analyses showed that measures of both dynamics were separately classified and between-speaker variability was largely explained by measures of negative dynamics. This suggests that parts of the signal where intensity decreases from syllable peaks are more speaker-specific. Idiosyncratic articulation may explain such results.

13.
Brain Topogr ; 29(3): 440-58, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26613726

RESUMO

This EEG-study aims to investigate age-related differences in the neural oscillation patterns during the processing of temporally modulated speech. Viewing from a lifespan perspective, we recorded the electroencephalogram (EEG) data of three age samples: young adults, middle-aged adults and older adults. Stimuli consisted of temporally degraded sentences in Swedish-a language unfamiliar to all participants. We found age-related differences in phonetic pattern matching when participants were presented with envelope-degraded sentences, whereas no such age-effect was observed in the processing of fine-structure-degraded sentences. Irrespective of age, during speech processing the EEG data revealed a relationship between envelope information and the theta band (4-8 Hz) activity. Additionally, an association between fine-structure information and the gamma band (30-48 Hz) activity was found. No interaction, however, was found between acoustic manipulation of stimuli and age. Importantly, our main finding was paralleled by an overall enhanced power in older adults in high frequencies (gamma: 30-48 Hz). This occurred irrespective of condition. For the most part, this result is in line with the Asymmetric Sampling in Time framework (Poeppel in Speech Commun 41:245-255, 2003), which assumes an isomorphic correspondence between frequency modulations in neurophysiological patterns and acoustic oscillations in spoken language. We conclude that speech-specific neural networks show strong stability over adulthood, despite initial processes of cortical degeneration indicated by enhanced gamma power. The results of our study therefore confirm the concept that sensory and cognitive processes undergo multidirectional trajectories within the context of healthy aging.


Assuntos
Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica , Adulto , Fatores Etários , Idoso , Percepção Auditiva/fisiologia , Eletroencefalografia/métodos , Feminino , Lateralidade Funcional , Humanos , Masculino , Pessoa de Meia-Idade , Neurônios/fisiologia , Oscilometria , Análise Espaço-Temporal
14.
J Acoust Soc Am ; 138(1): EL36-42, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26233058

RESUMO

In a between-subject perception task, listeners either identified full words or vowels isolated from these words at F0s between 220 and 880 Hz. They received two written words as response options (minimal pair with the stimulus vowel in contrastive position). Listeners' sensitivity (A') was extremely high in both conditions at all F0s, showing that the phonological function of vowels can also be maintained at high F0s. This indicates that vowel sounds may carry strong acoustic cues departing from common formant frequencies at high F0s and that listeners do not rely on consonantal context phenomena for their identification performance.


Assuntos
Fonética , Acústica da Fala , Percepção da Fala/fisiologia , Adulto , Análise de Variância , Sinais (Psicologia) , Feminino , Humanos , Masculino
15.
J Acoust Soc Am ; 137(3): 1513-28, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25786962

RESUMO

Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences). Between-speaker variability was tested using analysis of variance with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. It was concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most plausible factor explaining between-speaker differences.


Assuntos
Periodicidade , Fonética , Acústica da Fala , Qualidade da Voz , Acústica , Adulto , Análise de Variância , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Processamento de Sinais Assistido por Computador , Espectrografia do Som , Medida da Produção da Fala , Fatores de Tempo , Adulto Jovem
16.
Forensic Sci Int ; 238: 59-67, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24675042

RESUMO

Everyday experience tells us that it is often possible to identify a familiar speaker solely by his/her voice. Such observations reveal that speakers carry individual features in their voices. The present study examines how suprasegmental temporal features contribute to speaker-individuality. Based on data of a homogeneous group of Zurich German speakers, we conducted an experiment that included speaking style variability (spontaneous vs. read speech) and channel variability (high-quality vs. mobile phone-transmitted speech), both of which are characteristic of forensic casework. Speakers demonstrated high between-speaker variability in both read and spontaneous speech, and low within-speaker variability across the two speaking styles. Results further revealed that distortions of the type introduced by mobile telephony had little effect on suprasegmental temporal characteristics. Given this evidence of speaker-individuality, we discuss suprasegmental temporal features' potential for forensic voice comparison.


Assuntos
Acústica da Fala , Voz , Adulto , Telefone Celular , Feminino , Ciências Forenses , Humanos , Masculino , Modelos Biológicos , Adulto Jovem
17.
Brain Topogr ; 27(6): 786-800, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24271979

RESUMO

Integrating visual and auditory language information is critical for reading. Suppression and congruency effects in audiovisual paradigms with letters and speech sounds have provided information about low-level mechanisms of grapheme-phoneme integration during reading. However, the central question about how such processes relate to reading entire words remains unexplored. Using ERPs, we investigated whether audiovisual integration occurs for words already in beginning readers, and if so, whether this integration is reflected by differences in map strength or topography (aim 1); and moreover, whether such integration is associated with reading fluency (aim 2). A 128-channel EEG was recorded while 69 monolingual (Swiss)-German speaking first-graders performed a detection task with rare targets. Stimuli were presented in blocks either auditorily (A), visually (V) or audiovisually (matching: AVM; nonmatching: AVN). Corresponding ERPs were computed, and unimodal ERPs summated (A + V = sumAV). We applied TANOVAs to identify time windows with significant integration effects: suppression (sumAV-AVM) and congruency (AVN-AVM). They were further characterized using GFP and 3D-centroid analyses, and significant effects were correlated with reading fluency. The results suggest that audiovisual suppression effects occur for familiar German and unfamiliar English words, whereas audiovisual congruency effects can be found only for familiar German words, probably due to lexical-semantic processes involved. Moreover, congruency effects were characterized by topographic differences, indicating that different sources are active during processing of congruent compared to incongruent audiovisual words. Furthermore, no clear associations between audiovisual integration and reading fluency were found. The degree to which such associations develop in beginning readers remains open to further investigation.


Assuntos
Encéfalo/fisiologia , Leitura , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Criança , Potenciais Evocados , Feminino , Humanos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA