Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Multimedia ; 25: 4573-4585, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37928617

RESUMO

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networks, there has been tremendous improvement in the performance of sound event detection systems, although at the expense of costly data collection and labeling efforts. In fact, current state-of-the-art methods employ supervised training methods that leverage large amounts of data samples and corresponding labels in order to facilitate identification of sound category and time stamps of events. As an alternative, the current study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training. Additionally, this paper explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance. The proposed approach is evaluated on sound event detection task for the DCASE2020 challenge. The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.

2.
J Neurosci ; 41(31): 6726-6739, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-34193552

RESUMO

The human brain extracts statistical regularities embedded in real-world scenes to sift through the complexity stemming from changing dynamics and entwined uncertainty along multiple perceptual dimensions (e.g., pitch, timbre, location). While there is evidence that sensory dynamics along different auditory dimensions are tracked independently by separate cortical networks, how these statistics are integrated to give rise to unified objects remains unknown, particularly in dynamic scenes that lack conspicuous coupling between features. Using tone sequences with stochastic regularities along spectral and spatial dimensions, this study examines behavioral and electrophysiological responses from human listeners (male and female) to changing statistics in auditory sequences and uses a computational model of predictive Bayesian inference to formulate multiple hypotheses for statistical integration across features. Neural responses reveal multiplexed brain responses reflecting both local statistics along individual features in frontocentral networks, together with global (object-level) processing in centroparietal networks. Independent tracking of local surprisal along each acoustic feature reveals linear modulation of neural responses, while global melody-level statistics follow a nonlinear integration of statistical beliefs across features to guide perception. Near identical results are obtained in separate experiments along spectral and spatial acoustic dimensions, suggesting a common mechanism for statistical inference in the brain. Potential variations in statistical integration strategies and memory deployment shed light on individual variability between listeners in terms of behavioral efficacy and fidelity of neural encoding of stochastic change in acoustic sequences.SIGNIFICANCE STATEMENT The world around us is complex and ever changing: in everyday listening, sound sources evolve along multiple dimensions, such as pitch, timbre, and spatial location, and they exhibit emergent statistical properties that change over time. In the face of this complexity, the brain builds an internal representation of the external world by collecting statistics from the sensory input along multiple dimensions. Using a Bayesian predictive inference model, this work considers alternative hypotheses for how statistics are combined across sensory dimensions. Behavioral and neural responses from human listeners show the brain multiplexes two representations, where local statistics along each feature linearly affect neural responses, and global statistics nonlinearly combine statistical beliefs across dimensions to shape perception of stochastic auditory sequences.


Assuntos
Percepção Auditiva/fisiologia , Encéfalo/fisiologia , Simulação por Computador , Modelos Neurológicos , Estimulação Acústica , Adulto , Teorema de Bayes , Eletroencefalografia , Feminino , Humanos , Masculino , Rede Nervosa/fisiologia
3.
Sensors (Basel) ; 22(23)2022 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-36501787

RESUMO

Many commercial and prototype devices are available for capturing body sounds that provide important information on the health of the lungs and heart; however, a standardized method to characterize and compare these devices is not agreed upon. Acoustic phantoms are commonly used because they generate repeatable sounds that couple to devices using a material layer that mimics the characteristics of skin. While multiple acoustic phantoms have been presented in literature, it is unclear how design elements, such as the driver type and coupling layer, impact the acoustical characteristics of the phantom and, therefore, the device being measured. Here, a design of experiments approach is used to compare the frequency responses of various phantom constructions. An acoustic phantom that uses a loudspeaker to generate sound and excite a gelatin layer supported by a grid is determined to have a flatter and more uniform frequency response than other possible designs with a sound exciter and plate support. When measured on an optimal acoustic phantom, three devices are shown to have more consistent measurements with added weight and differing positions compared to a non-optimal phantom. Overall, the statistical models developed here provide greater insight into acoustic phantom design for improved device characterization.


Assuntos
Acústica , Som , Desenho de Equipamento , Imagens de Fantasmas , Gelatina
4.
J Neurophysiol ; 126(5): 1772-1782, 2021 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-34669503

RESUMO

The discrimination of complex sounds is a fundamental function of the auditory system. This operation must be robust in the presence of noise and acoustic clutter. Echolocating bats are auditory specialists that discriminate sonar objects in acoustically complex environments. Bats produce brief signals, interrupted by periods of silence, rendering echo snapshots of sonar objects. Sonar object discrimination requires that bats process spatially and temporally overlapping echoes to make split-second decisions. The mechanisms that enable this discrimination are not well understood, particularly in complex environments. We explored the neural underpinnings of sonar object discrimination in the presence of acoustic scattering caused by physical clutter. We performed electrophysiological recordings in the inferior colliculus of awake big brown bats, to broadcasts of prerecorded echoes from physical objects. We acquired single unit responses to echoes and discovered a subpopulation of IC neurons that encode acoustic features that can be used to discriminate between sonar objects. We further investigated the effects of environmental clutter on this population's encoding of acoustic features. We discovered that the effect of background clutter on sonar object discrimination is highly variable and depends on object properties and target-clutter spatiotemporal separation. In many conditions, clutter impaired discrimination of sonar objects. However, in some instances clutter enhanced acoustic features of echo returns, enabling higher levels of discrimination. This finding suggests that environmental clutter may augment acoustic cues used for sonar target discrimination and provides further evidence in a growing body of literature that noise is not universally detrimental to sensory encoding.NEW & NOTEWORTHY Bats are powerful animal models for investigating the encoding of auditory objects under acoustically challenging conditions. Although past work has considered the effect of acoustic clutter on sonar target detection, less is known about target discrimination in clutter. Our work shows that the neural encoding of auditory objects was affected by clutter in a distance-dependent manner. These findings advance the knowledge on auditory object detection and discrimination and noise-dependent stimulus enhancement.


Assuntos
Percepção Auditiva/fisiologia , Discriminação Psicológica/fisiologia , Ecolocação/fisiologia , Fenômenos Eletrofisiológicos/fisiologia , Colículos Inferiores/fisiologia , Animais , Quirópteros , Ruído
5.
PLoS Comput Biol ; 16(4): e1007746, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32275706

RESUMO

Perceptual bistability-the spontaneous, irregular fluctuation of perception between two interpretations of a stimulus-occurs when observing a large variety of ambiguous stimulus configurations. This phenomenon has the potential to serve as a tool for, among other things, understanding how function varies across individuals due to the large individual differences that manifest during perceptual bistability. Yet it remains difficult to interpret the functional processes at work, without knowing where bistability arises during perception. In this study we explore the hypothesis that bistability originates from multiple sources distributed across the perceptual hierarchy. We develop a hierarchical model of auditory processing comprised of three distinct levels: a Peripheral, tonotopic analysis, a Central analysis computing features found more centrally in the auditory system, and an Object analysis, where sounds are segmented into different streams. We model bistable perception within this system by applying adaptation, inhibition and noise into one or all of the three levels of the hierarchy. We evaluate a large ensemble of variations of this hierarchical model, where each model has a different configuration of adaptation, inhibition and noise. This approach avoids the assumption that a single configuration must be invoked to explain the data. Each model is evaluated based on its ability to replicate two hallmarks of bistability during auditory streaming: the selectivity of bistability to specific stimulus configurations, and the characteristic log-normal pattern of perceptual switches. Consistent with a distributed origin, a broad range of model parameters across this hierarchy lead to a plausible form of perceptual bistability.


Assuntos
Percepção Auditiva/fisiologia , Biologia Computacional/métodos , Estimulação Acústica/métodos , Adulto , Feminino , Humanos , Masculino , Modelos Estatísticos , Modelos Teóricos , Ruído , Som , Percepção Visual/fisiologia
6.
J Acoust Soc Am ; 150(4): 2952, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34717500

RESUMO

Salience is the quality of a sensory signal that attracts involuntary attention in humans. While it primarily reflects conspicuous physical attributes of a scene, our understanding of processes underlying what makes a certain object or event salient remains limited. In the vision literature, experimental results, theoretical accounts, and large amounts of eye-tracking data using rich stimuli have shed light on some of the underpinnings of visual salience in the brain. In contrast, studies of auditory salience have lagged behind due to limitations in both experimental designs and stimulus datasets used to probe the question of salience in complex everyday soundscapes. In this work, we deploy an online platform to study salience using a dichotic listening paradigm with natural auditory stimuli. The study validates crowd-sourcing as a reliable platform to collect behavioral responses to auditory salience by comparing experimental outcomes to findings acquired in a controlled laboratory setting. A model-based analysis demonstrates the benefits of extending behavioral measures of salience to broader selection of auditory scenes and larger pools of subjects. Overall, this effort extends our current knowledge of auditory salience in everyday soundscapes and highlights the limitations of low-level acoustic attributes in capturing the richness of natural soundscapes.


Assuntos
Percepção Auditiva , Crowdsourcing , Atenção , Encéfalo , Humanos
7.
PLoS Comput Biol ; 15(1): e1006711, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30668568

RESUMO

Our current understanding of how the brain segregates auditory scenes into meaningful objects is in line with a Gestaltism framework. These Gestalt principles suggest a theory of how different attributes of the soundscape are extracted then bound together into separate groups that reflect different objects or streams present in the scene. These cues are thought to reflect the underlying statistical structure of natural sounds in a similar way that statistics of natural images are closely linked to the principles that guide figure-ground segregation and object segmentation in vision. In the present study, we leverage inference in stochastic neural networks to learn emergent grouping cues directly from natural soundscapes including speech, music and sounds in nature. The model learns a hierarchy of local and global spectro-temporal attributes reminiscent of simultaneous and sequential Gestalt cues that underlie the organization of auditory scenes. These mappings operate at multiple time scales to analyze an incoming complex scene and are then fused using a Hebbian network that binds together coherent features into perceptually-segregated auditory objects. The proposed architecture successfully emulates a wide range of well established auditory scene segregation phenomena and quantifies the complimentary role of segregation and binding cues in driving auditory scene segregation.


Assuntos
Percepção Auditiva/fisiologia , Modelos Psicológicos , Reconhecimento Fisiológico de Modelo/fisiologia , Estimulação Acústica , Córtex Auditivo/fisiologia , Análise por Conglomerados , Bases de Dados Factuais , Feminino , Teoria Gestáltica , Humanos , Masculino , Música , Psicofísica , Fala
8.
PLoS Comput Biol ; 14(5): e1006162, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29813049

RESUMO

Our ability to parse our acoustic environment relies on the brain's capacity to extract statistical regularities from surrounding sounds. Previous work in regularity extraction has predominantly focused on the brain's sensitivity to predictable patterns in sound sequences. However, natural sound environments are rarely completely predictable, often containing some level of randomness, yet the brain is able to effectively interpret its surroundings by extracting useful information from stochastic sounds. It has been previously shown that the brain is sensitive to the marginal lower-order statistics of sound sequences (i.e., mean and variance). In this work, we investigate the brain's sensitivity to higher-order statistics describing temporal dependencies between sound events through a series of change detection experiments, where listeners are asked to detect changes in randomness in the pitch of tone sequences. Behavioral data indicate listeners collect statistical estimates to process incoming sounds, and a perceptual model based on Bayesian inference shows a capacity in the brain to track higher-order statistics. Further analysis of individual subjects' behavior indicates an important role of perceptual constraints in listeners' ability to track these sensory statistics with high fidelity. In addition, the inference model facilitates analysis of neural electroencephalography (EEG) responses, anchoring the analysis relative to the statistics of each stochastic stimulus. This reveals both a deviance response and a change-related disruption in phase of the stimulus-locked response that follow the higher-order statistics. These results shed light on the brain's ability to process stochastic sound sequences.


Assuntos
Percepção Auditiva/fisiologia , Potenciais Evocados Auditivos/fisiologia , Psicofísica/métodos , Adolescente , Adulto , Atenção/fisiologia , Encéfalo/fisiologia , Biologia Computacional , Eletroencefalografia , Feminino , Humanos , Masculino , Som , Processos Estocásticos , Adulto Jovem
9.
Acta Acust United Acust ; 105(1): 1-4, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31929768

RESUMO

To understand our surroundings, we effortlessly parse our sound environment into sound sources, extracting invariant information-or regularities-over time to build an internal representation of the world around us. Previous experimental work has shown the brain is sensitive to many types of regularities in sound, but theoretical models that capture underlying principles of regularity tracking across diverse sequence structures have been few and far between. Existing efforts often focus on sound patterns rather the stochastic nature of sequences. In the current study, we employ a perceptual model for regularity extraction based on a Bayesian framework that posits the brain collects statistical information over time. We show this model can be used to simulate various results from the literature with stimuli exhibiting a wide range of predictability. This model can provide a useful tool for both interpreting existing experimental results under a unified model and providing predictions for new ones using more complex stimuli.

10.
J Acoust Soc Am ; 141(3): 2163, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28372080

RESUMO

Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.


Assuntos
Atenção , Vias Auditivas/fisiologia , Percepção Auditiva , Meio Ambiente , Som , Estimulação Acústica , Adolescente , Adulto , Testes com Listas de Dissílabos , Feminino , Humanos , Julgamento , Percepção Sonora , Masculino , Música , Estimulação Luminosa , Percepção da Altura Sonora , Psicoacústica , Pupila/fisiologia , Fala , Percepção Visual , Adulto Jovem
11.
PLoS Comput Biol ; 10(12): e1003985, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25521593

RESUMO

A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments.


Assuntos
Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Modelos Neurológicos , Estimulação Acústica/classificação , Algoritmos , Feminino , Humanos , Masculino , Ruído , Fala , Fatores de Tempo
12.
J Acoust Soc Am ; 137(2): 911-22, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25698023

RESUMO

Listeners' ability to discriminate unfamiliar voices is often susceptible to the effects of manipulations of acoustic characteristics of the utterances. This vulnerability was quantified within a task in which participants determined if two utterances were spoken by the same or different speakers. Results of this task were analyzed in relation to a set of historical and novel parameters in order to hypothesize the role of those parameters in the decision process. Listener performance was first measured in a baseline task with unmodified stimuli, and then compared to responses with resynthesized stimuli under three conditions: (1) normalized mean-pitch; (2) normalized duration; and (3) normalized linear predictive coefficients (LPCs). The results of these experiments suggest that perceptual speaker discrimination is robust to acoustic changes, though mean-pitch and LPC modifications are more detrimental to a listener's ability to successfully identify same or different speaker pairings. However, this susceptibility was also found to be partially dependent on the specific speaker and utterances.


Assuntos
Sinais (Psicologia) , Discriminação Psicológica , Acústica da Fala , Percepção da Fala , Qualidade da Voz , Estimulação Acústica , Acústica , Adolescente , Audiometria da Fala , Feminino , Humanos , Julgamento , Masculino , Reconhecimento Fisiológico de Modelo , Percepção da Altura Sonora , Reconhecimento Psicológico , Fatores de Tempo , Adulto Jovem
13.
PLoS Comput Biol ; 9(3): e1002982, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23555217

RESUMO

The processing characteristics of neurons in the central auditory system are directly shaped by and reflect the statistics of natural acoustic environments, but the principles that govern the relationship between natural sound ensembles and observed responses in neurophysiological studies remain unclear. In particular, accumulating evidence suggests the presence of a code based on sustained neural firing rates, where central auditory neurons exhibit strong, persistent responses to their preferred stimuli. Such a strategy can indicate the presence of ongoing sounds, is involved in parsing complex auditory scenes, and may play a role in matching neural dynamics to varying time scales in acoustic signals. In this paper, we describe a computational framework for exploring the influence of a code based on sustained firing rates on the shape of the spectro-temporal receptive field (STRF), a linear kernel that maps a spectro-temporal acoustic stimulus to the instantaneous firing rate of a central auditory neuron. We demonstrate the emergence of richly structured STRFs that capture the structure of natural sounds over a wide range of timescales, and show how the emergent ensembles resemble those commonly reported in physiological studies. Furthermore, we compare ensembles that optimize a sustained firing code with one that optimizes a sparse code, another widely considered coding strategy, and suggest how the resulting population responses are not mutually exclusive. Finally, we demonstrate how the emergent ensembles contour the high-energy spectro-temporal modulations of natural sounds, forming a discriminative representation that captures the full range of modulation statistics that characterize natural sound ensembles. These findings have direct implications for our understanding of how sensory systems encode the informative components of natural stimuli and potentially facilitate multi-sensory integration.


Assuntos
Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Biologia Computacional/métodos , Modelos Neurológicos , Neurônios/fisiologia , Estimulação Acústica , Animais , Córtex Auditivo/citologia , Análise por Conglomerados , Feminino , Furões , Humanos , Masculino , Ruído , Fala , Vocalização Animal
14.
Lung ; 192(5): 765-73, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24943262

RESUMO

PURPOSE: Lung auscultation has long been a standard of care for the diagnosis of respiratory diseases. Recent advances in electronic auscultation and signal processing have yet to find clinical acceptance; however, computerized lung sound analysis may be ideal for pediatric populations in settings, where skilled healthcare providers are commonly unavailable. We described features of normal lung sounds in young children using a novel signal processing approach to lay a foundation for identifying pathologic respiratory sounds. METHODS: 186 healthy children with normal pulmonary exams and without respiratory complaints were enrolled at a tertiary care hospital in Lima, Peru. Lung sounds were recorded at eight thoracic sites using a digital stethoscope. 151 (81%) of the recordings were eligible for further analysis. Heavy-crying segments were automatically rejected and features extracted from spectral and temporal signal representations contributed to profiling of lung sounds. RESULTS: Mean age, height, and weight among study participants were 2.2 years (SD 1.4), 84.7 cm (SD 13.2), and 12.0 kg (SD 3.6), respectively; and, 47% were boys. We identified ten distinct spectral and spectro-temporal signal parameters and most demonstrated linear relationships with age, height, and weight, while no differences with genders were noted. Older children had a faster decaying spectrum than younger ones. Features like spectral peak width, lower-frequency Mel-frequency cepstral coefficients, and spectro-temporal modulations also showed variations with recording site. CONCLUSIONS: Lung sound extracted features varied significantly with child characteristics and lung site. A comparison with adult studies revealed differences in the extracted features for children. While sound-reduction techniques will improve analysis, we offer a novel, reproducible tool for sound analysis in real-world environments.


Assuntos
Auscultação/normas , Pulmão/fisiologia , Sons Respiratórios , Fatores Etários , Auscultação/instrumentação , Estatura , Peso Corporal , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Peru , Valor Preditivo dos Testes , Valores de Referência , Fatores Sexuais , Processamento de Sinais Assistido por Computador , Espectrografia do Som , Estetoscópios/normas , Fatores de Tempo
16.
bioRxiv ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38854125

RESUMO

Binding the attributes of a sensory source is necessary to perceive it as a unified entity, one that can be attended to and extracted from its surrounding scene. In auditory perception, this is the essence of the cocktail party problem in which a listener segregates one speaker from a mixture of voices, or a musical stream from simultaneous others. It is postulated that coherence of the temporal modulations of a source's features is necessary to bind them. The focus of this study is on the role of temporal-coherence in binding and segregation, and specifically as evidenced by the neural correlates of rapid plasticity that enhance cortical responses among synchronized neurons, while suppressing them among asynchronized ones. In a first experiment, we find that attention to a sound sequence rapidly binds it to other coherent sequences while suppressing nearby incoherent sequences, thus enhancing the contrast between the two groups. In a second experiment, a sequence of synchronized multi-tone complexes, embedded in a cloud of randomly dispersed background of desynchronized tones, perceptually and neurally pops-out after a fraction of a second highlighting the binding among its coherent tones against the incoherent background. These findings demonstrate the role of temporal-coherence in binding and segregation.

17.
bioRxiv ; 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38915590

RESUMO

Segregation of complex sounds such as speech, music and animal vocalizations as they simultaneously emanate from multiple sources (referred to as the "cocktail party problem") is a remarkable ability that is common in humans and animals alike. The neural underpinnings of this process have been extensively studied behaviorally and physiologically in non-human animals primarily with simplified sounds (tones and noise sequences). In humans, segregation experiments utilizing more complex speech mixtures are common; but physiological experiments have relied on EEG/MEG/ECoG recordings that sample activity from thousands of neurons, often obscuring the detailed processes that give rise to the observed segregation. The present study combines the insights from animal single-unit physiology with segregation of speech-like mixtures. Ferrets were trained to attend to a female voice and detect a target word, both in presence or absence of a concurrent, equally salient male voice. Single neuron recordings were obtained from primary and secondary ferret auditory cortical fields, as well as frontal cortex. During task performance, representation of the female words became more enhanced relative to those of the (distractor) male in all cortical regions, especially in the higher auditory cortical field. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings are consistent with the temporal coherence theory whereby attention to a target voice anchors neural activity in cortical networks hence binding together channels that are coherently temporally-modulated with the target, and ultimately forming a common auditory stream.

18.
Open Mind (Camb) ; 8: 333-365, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38571530

RESUMO

Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.

19.
PLoS Comput Biol ; 8(11): e1002759, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23133363

RESUMO

Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Studies based on psychophysical judgments of musical timbre, ecological analyses of sound's physical characteristics as well as machine learning approaches have all suggested that timbre is a multifaceted attribute that invokes both spectral and temporal sound features. Here, we explored the neural underpinnings of musical timbre. We used a neuro-computational framework based on spectro-temporal receptive fields, recorded from over a thousand neurons in the mammalian primary auditory cortex as well as from simulated cortical neurons, augmented with a nonlinear classifier. The model was able to perform robust instrument classification irrespective of pitch and playing style, with an accuracy of 98.7%. Using the same front end, the model was also able to reproduce perceptual distance judgments between timbres as perceived by human listeners. The study demonstrates that joint spectro-temporal features, such as those observed in the mammalian primary auditory cortex, are critical to provide the rich-enough representation necessary to account for perceptual judgments of timbre by human listeners, as well as recognition of musical instruments.


Assuntos
Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Modelos Neurológicos , Música , Estimulação Acústica , Adulto , Algoritmos , Biologia Computacional , Feminino , Humanos , Julgamento/fisiologia , Masculino , Psicofísica , Reconhecimento Psicológico/fisiologia , Som
20.
Adv Exp Med Biol ; 787: 535-43, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23716261

RESUMO

Humans and other animals can attend to one of multiple sounds, and -follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we propose instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed toward a particular feature (e.g., pitch or location) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. Experimental -neurophysiological evidence in support of this hypothesis will be presented. The focus, however, will be on a computational realization of this idea and a discussion of the insights learned from simulations to disentangle complex sound sources such as speech and music. The model consists of a representational stage of early and cortical auditory processing that creates a multidimensional depiction of various sound attributes such as pitch, location, and spectral resolution. The following stage computes a coherence matrix that summarizes the pair-wise correlations between all channels making up the cortical representation. Finally, the perceived segregated streams are extracted by decomposing the coherence matrix into its uncorrelated components. Questions raised by the model are discussed, especially on the role of attention in streaming and the search for further neural correlates of streaming percepts.


Assuntos
Atenção/fisiologia , Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Modelos Neurológicos , Estimulação Acústica/métodos , Acústica , Animais , Vias Auditivas/fisiologia , Furões , Humanos , Percepção da Altura Sonora/fisiologia , Localização de Som/fisiologia , Percepção do Tempo/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA