RESUMO
Hippocampal activity represents many behaviorally important variables, including context, an animal's location within a given environmental context, time, and reward. Using longitudinal calcium imaging in mice, multiple large virtual environments, and differing reward contingencies, we derived a unified probabilistic model of CA1 representations centered on a single feature-the field propensity. Each cell's propensity governs how many place fields it has per unit space, predicts its reward-related activity, and is preserved across distinct environments and over months. Propensity is broadly distributed-with many low, and some very high, propensity cells-and thus strongly shapes hippocampal representations. This results in a range of spatial codes, from sparse to dense. Propensity varied â¼10-fold between adjacent cells in salt-and-pepper fashion, indicating substantial functional differences within a presumed cell type. Intracellular recordings linked propensity to cell excitability. The stability of each cell's propensity across conditions suggests this fundamental property has anatomical, transcriptional, and/or developmental origins.
Assuntos
Hipocampo/anatomia & histologia , Hipocampo/fisiologia , Animais , Comportamento Animal/fisiologia , Fenômenos Biofísicos , Cálcio/metabolismo , Masculino , Camundongos Endogâmicos C57BL , Modelos Neurológicos , Células Piramidais/fisiologia , Recompensa , Análise e Desempenho de Tarefas , Fatores de TempoRESUMO
We propose that coding and decoding in the brain are achieved through digital computation using three principles: relative ordinal coding of inputs, random connections between neurons, and belief voting. Due to randomization and despite the coarseness of the relative codes, we show that these principles are sufficient for coding and decoding sequences with error-free reconstruction. In particular, the number of neurons needed grows linearly with the size of the input repertoire growing exponentially. We illustrate our model by reconstructing sequences with repertoires on the order of a billion items. From this, we derive the Shannon equations for the capacity limit to learn and transfer information in the neural population, which is then generalized to any type of neural network. Following the maximum entropy principle of efficient coding, we show that random connections serve to decorrelate redundant information in incoming signals, creating more compact codes for neurons and therefore, conveying a larger amount of information. Henceforth, despite the unreliability of the relative codes, few neurons become necessary to discriminate the original signal without error. Finally, we discuss the significance of this digital computation model regarding neurobiological findings in the brain and more generally with artificial intelligence algorithms, with a view toward a neural information theory and the design of digital neural networks.
Assuntos
Inteligência Artificial , Encéfalo , Modelos Neurológicos , Algoritmos , Encéfalo/fisiologia , Redes Neurais de Computação , Neurônios/fisiologiaRESUMO
The mechanisms involved in transforming early visual signals to curvature representations in V4 are unknown. We propose a hierarchical model that reveals V1/V2 encodings that are essential components for this transformation to the reported curvature representations in V4. Then, by relaxing the often-imposed prior of a single Gaussian, V4 shape selectivity is learned in the last layer of the hierarchy from Macaque V4 responses. We found that V4 cells integrate multiple shape parts from the full spatial extent of their receptive fields with similar excitatory and inhibitory contributions. Our results uncover new details in existing data about shape selectivity in V4 neurons that with additional experiments can enhance our understanding of processing in this area. Accordingly, we propose designs for a stimulus set that allow removing shape parts without disturbing the curvature signal to isolate part contributions to V4 responses.SIGNIFICANCE STATEMENT Selectivity to convex and concave shape parts in V4 neurons has been repeatedly reported. Nonetheless, the mechanisms that yield such selectivities in the ventral stream remain unknown. We propose a hierarchical computational model that incorporates findings of the various visual areas involved in shape processing and suggest mechanisms that transform the shape signal from low-level features to convex/concave part representations. Learning shape selectivity from Macaque V4 responses in the final processing stage in our model, we found that V4 neurons integrate shape parts from the full spatial extent of their receptive field with both facilitatory and inhibitory contributions. These results reveal hidden information in existing V4 data that with additional experiments can enhance our understanding of processing in V4.
Assuntos
Percepção de Forma , Córtex Visual , Animais , Córtex Visual/fisiologia , Percepção de Forma/fisiologia , Macaca , Neurônios/fisiologia , Encéfalo , Vias Visuais/fisiologia , Estimulação LuminosaRESUMO
OBJECTIVE: The primary objective of our study is to address the challenge of confidentially sharing medical images across different centers. This is often a critical necessity in both clinical and research environments, yet restrictions typically exist due to privacy concerns. Our aim is to design a privacy-preserving data-sharing mechanism that allows medical images to be stored as encoded and obfuscated representations in the public domain without revealing any useful or recoverable content from the images. In tandem, we aim to provide authorized users with compact private keys that could be used to reconstruct the corresponding images. METHOD: Our approach involves utilizing a neural auto-encoder. The convolutional filter outputs are passed through sparsifying transformations to produce multiple compact codes. Each code is responsible for reconstructing different attributes of the image. The key privacy-preserving element in this process is obfuscation through the use of specific pseudo-random noise. When applied to the codes, it becomes computationally infeasible for an attacker to guess the correct representation for all the codes, thereby preserving the privacy of the images. RESULTS: The proposed framework was implemented and evaluated using chest X-ray images for different medical image analysis tasks, including classification, segmentation, and texture analysis. Additionally, we thoroughly assessed the robustness of our method against various attacks using both supervised and unsupervised algorithms. CONCLUSION: This study provides a novel, optimized, and privacy-assured data-sharing mechanism for medical images, enabling multi-party sharing in a secure manner. While we have demonstrated its effectiveness with chest X-ray images, the mechanism can be utilized in other medical images modalities as well.
Assuntos
Algoritmos , Privacidade , Disseminação de InformaçãoRESUMO
The nervous system is under tight energy constraints and must represent information efficiently. This is particularly relevant in the dorsal part of the medial superior temporal area (MSTd) in primates where neurons encode complex motion patterns to support a variety of behaviors. A sparse decomposition model based on a dimensionality reduction principle known as non-negative matrix factorization (NMF) was previously shown to account for a wide range of monkey MSTd visual response properties. This model resulted in sparse, parts-based representations that could be regarded as basis flow fields, a linear superposition of which accurately reconstructed the input stimuli. This model provided evidence that the seemingly complex response properties of MSTd may be a by-product of MSTd neurons performing dimensionality reduction on their input. However, an open question is how a neural circuit could carry out this function. In the current study, we propose a spiking neural network (SNN) model of MSTd based on evolved spike-timing-dependent plasticity and homeostatic synaptic scaling (STDP-H) learning rules. We demonstrate that the SNN model learns compressed and efficient representations of the input patterns similar to the patterns that emerge from NMF, resulting in MSTd-like receptive fields observed in monkeys. This SNN model suggests that STDP-H observed in the nervous system may be performing a similar function as NMF with sparsity constraints, which provides a test bed for mechanistic theories of how MSTd may efficiently encode complex patterns of visual motion to support robust self-motion perception.SIGNIFICANCE STATEMENT The brain may use dimensionality reduction and sparse coding to efficiently represent stimuli under metabolic constraints. Neurons in monkey area MSTd respond to complex optic flow patterns resulting from self-motion. We developed a spiking neural network model that showed MSTd-like response properties can emerge from evolving spike-timing-dependent plasticity with STDP-H parameters of the connections between then middle temporal area and MSTd. Simulated MSTd neurons formed a sparse, reduced population code capable of encoding perceptual variables important for self-motion perception. This model demonstrates that complex neuronal responses observed in MSTd may emerge from efficient coding and suggests that neurobiological plasticity, like STDP-H, may contribute to reducing the dimensions of input stimuli and allowing spiking neurons to learn sparse representations.
Assuntos
Percepção de Movimento , Animais , Haplorrinos , Modelos Neurológicos , Percepção de Movimento/fisiologia , Redes Neurais de Computação , Plasticidade Neuronal/fisiologia , Neurônios/fisiologia , Estimulação Luminosa/métodos , Primatas , Lobo Temporal/fisiologiaRESUMO
Foraging is a vital behavioral task for living organisms. Behavioral strategies and abstract mathematical models thereof have been described in detail for various species. To explore the link between underlying neural circuits and computational principles, we present how a biologically detailed neural circuit model of the insect mushroom body implements sensory processing, learning, and motor control. We focus on cast and surge strategies employed by flying insects when foraging within turbulent odor plumes. Using a spike-based plasticity rule, the model rapidly learns to associate individual olfactory sensory cues paired with food in a classical conditioning paradigm. We show that, without retraining, the system dynamically recalls memories to detect relevant cues in complex sensory scenes. Accumulation of this sensory evidence on short time scales generates cast-and-surge motor commands. Our generic systems approach predicts that population sparseness facilitates learning, while temporal sparseness is required for dynamic memory recall and precise behavioral control. Our work successfully combines biological computational principles with spike-based machine learning. It shows how knowledge transfer from static to arbitrary complex dynamic conditions can be achieved by foraging insects and may serve as inspiration for agent-based machine learning.
Assuntos
Insetos/fisiologia , Modelos Neurológicos , Neurônios/fisiologia , Neurônios Receptores Olfatórios/fisiologia , Animais , Inteligência Artificial , Quimiotaxia , Biologia Computacional , Simulação por Computador , Condicionamento Clássico , Drosophila melanogaster/fisiologia , Aprendizado de Máquina , Memória/fisiologia , Corpos Pedunculados/fisiologia , Redes Neurais de Computação , OlfatoRESUMO
(1) Background: The ability to recognize identities is an essential component of security. Electrocardiogram (ECG) signals have gained popularity for identity recognition because of their universal, unique, stable, and measurable characteristics. To ensure accurate identification of ECG signals, this paper proposes an approach which involves mixed feature sampling, sparse representation, and recognition. (2) Methods: This paper introduces a new method of identifying individuals through their ECG signals. This technique combines the extraction of fixed ECG features and specific frequency features to improve accuracy in ECG identity recognition. This approach uses the wavelet transform to extract frequency bands which contain personal information features from the ECG signals. These bands are reconstructed, and the single R-peak localization determines the ECG window. The signals are segmented and standardized based on the located windows. A sparse dictionary is created using the standardized ECG signals, and the KSVD (K-Orthogonal Matching Pursuit) algorithm is employed to project ECG target signals into a sparse vector-matrix representation. To extract the final representation of the target signals for identification, the sparse coefficient vectors in the signals are maximally pooled. For recognition, the co-dimensional bundle search method is used in this paper. (3) Results: This paper utilizes the publicly available European ST-T database for our study. Specifically, this paper selects ECG signals from 20, 50 and 70 subjects, each with 30 testing segments. The method proposed in this paper achieved recognition rates of 99.14%, 99.09%, and 99.05%, respectively. (4) Conclusion: The experiments indicate that the method proposed in this paper can accurately capture, represent and identify ECG signals.
Assuntos
Identificação Biométrica , Humanos , Identificação Biométrica/métodos , Algoritmos , Eletrocardiografia/métodos , Análise de Ondaletas , Bases de Dados FactuaisRESUMO
Primary motor cortex (M1) undergoes protracted development in mammals, functioning initially as a sensory structure. Throughout the first postnatal week in rats, M1 is strongly activated by self-generated forelimb movements-especially by the twitches that occur during active sleep. Here, we quantify the kinematic features of forelimb movements to reveal receptive-field properties of individual units within the forelimb region of M1. At postnatal day 8 (P8), nearly all units were strongly modulated by movement amplitude, especially during active sleep. By P12, only a minority of units continued to exhibit amplitude tuning, regardless of behavioral state. At both ages, movement direction also modulated M1 activity, though to a lesser extent. Finally, at P12, M1 population-level activity became more sparse and decorrelated, along with a substantial alteration in the statistical distribution of M1 responses to limb movements. These findings reveal a transition toward a more complex and informationally rich representation of movement long before M1 develops its motor functionality.SIGNIFICANCE STATEMENT Primary motor cortex (M1) plays a fundamental role in the generation of voluntary movements and motor learning in adults. In early development, however, M1 functions as a prototypical sensory structure. Here, we demonstrate in infant rats that M1 codes for the kinematics of self-generated limb movements long before M1 develops its capacity to drive movements themselves. Moreover, we identify a key transition during the second postnatal week in which M1 activity becomes more informationally complex. Together, these findings further delineate the complex developmental path by which M1 develops its sensory functions in support of its later-emerging motor capacities.
Assuntos
Membro Anterior/fisiologia , Córtex Motor/crescimento & desenvolvimento , Córtex Motor/fisiologia , Movimento/fisiologia , Animais , Animais Recém-Nascidos , Fenômenos Biomecânicos , Ratos , Ratos Sprague-DawleyRESUMO
Models of associative memory with discrete state synapses learn new memories by forgetting old ones. In the simplest models, memories are forgotten exponentially quickly. Sparse population coding ameliorates this problem, as do complex models of synaptic plasticity that posit internal synaptic states, giving rise to synaptic metaplasticity. We examine memory lifetimes in both simple and complex models of synaptic plasticity with sparse coding. We consider our own integrative, filter-based model of synaptic plasticity, and examine the cascade and serial synapse models for comparison. We explore memory lifetimes at both the single-neuron and the population level, allowing for spontaneous activity. Memory lifetimes are defined using either a signal-to-noise ratio (SNR) approach or a first passage time (FPT) method, although we use the latter only for simple models at the single-neuron level. All studied models exhibit a decrease in the optimal single-neuron SNR memory lifetime, optimised with respect to sparseness, as the probability of synaptic updates decreases or, equivalently, as synaptic complexity increases. This holds regardless of spontaneous activity levels. In contrast, at the population level, even a low but nonzero level of spontaneous activity is critical in facilitating an increase in optimal SNR memory lifetimes with increasing synaptic complexity, but only in filter and serial models. However, SNR memory lifetimes are valid only in an asymptotic regime in which a mean field approximation is valid. By considering FPT memory lifetimes, we find that this asymptotic regime is not satisfied for very sparse coding, violating the conditions for the optimisation of single-perceptron SNR memory lifetimes with respect to sparseness. Similar violations are also expected for complex models of synaptic plasticity.
Assuntos
Memória , Modelos Neurológicos , Humanos , Aprendizagem , Memória/fisiologia , Plasticidade Neuronal/fisiologia , Sinapses/fisiologiaRESUMO
Sparse population activity is a well-known feature of supragranular sensory neurons in neocortex. The mechanisms underlying sparseness are not well understood because a direct link between the neurons activated in vivo, and their cellular properties investigated in vitro has been missing. We used two-photon calcium imaging to identify a subset of neurons in layer L2/3 (L2/3) of mouse primary somatosensory cortex that are highly active following principal whisker vibrotactile stimulation. These high responders (HRs) were then tagged using photoconvertible green fluorescent protein for subsequent targeting in the brain slice using intracellular patch-clamp recordings and biocytin staining. This approach allowed us to investigate the structural and functional properties of HRs that distinguish them from less active control cells. Compared to less responsive L2/3 neurons, HRs displayed increased levels of stimulus-evoked and spontaneous activity, elevated noise and spontaneous pairwise correlations, and stronger coupling to the population response. Intrinsic excitability was reduced in HRs, while we found no evidence for differences in other electrophysiological and morphological parameters. Thus, the choice of which neurons participate in stimulus encoding may be determined largely by network connectivity rather than by cellular structure and function.
Assuntos
Neurônios/fisiologia , Córtex Somatossensorial/fisiologia , Animais , Proteínas de Fluorescência Verde , Individualidade , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neurônios/ultraestrutura , Ruído , Técnicas de Patch-Clamp , Estimulação Física , Córtex Somatossensorial/ultraestrutura , Vibrissas/inervaçãoRESUMO
Group sparse coding (GSC) uses the non-local similarity of images as constraints, which can fully exploit the structure and group sparse features of images. However, it only imposes the sparsity on the group coefficients, which limits the effectiveness of reconstructing real images. Low-rank regularized group sparse coding (LR-GSC) reduces this gap by imposing low-rankness on the group sparse coefficients. However, due to the use of non-local similarity, the edges and details of the images are over-smoothed, resulting in the blocking artifact of the images. In this paper, we propose a low-rank matrix restoration model based on sparse coding and dual weighting. In addition, total variation (TV) regularization is integrated into the proposed model to maintain local structure smoothness and edge features. Finally, to solve the problem of the proposed optimization, an optimization method is developed based on the alternating direction method. Extensive experimental results show that the proposed SDWLR-GSC algorithm outperforms state-of-the-art algorithms for image restoration when the images have large and sparse noise, such as salt and pepper noise.
RESUMO
A central goal in theoretical neuroscience is to predict the response properties of sensory neurons from first principles. To this end, "efficient coding" posits that sensory neurons encode maximal information about their inputs given internal constraints. There exist, however, many variants of efficient coding (e.g., redundancy reduction, different formulations of predictive coding, robust coding, sparse coding, etc.), differing in their regimes of applicability, in the relevance of signals to be encoded, and in the choice of constraints. It is unclear how these types of efficient coding relate or what is expected when different coding objectives are combined. Here we present a unified framework that encompasses previously proposed efficient coding models and extends to unique regimes. We show that optimizing neural responses to encode predictive information can lead them to either correlate or decorrelate their inputs, depending on the stimulus statistics; in contrast, at low noise, efficiently encoding the past always predicts decorrelation. Later, we investigate coding of naturalistic movies and show that qualitatively different types of visual motion tuning and levels of response sparsity are predicted, depending on whether the objective is to recover the past or predict the future. Our approach promises a way to explain the observed diversity of sensory neural responses, as due to multiple functional goals and constraints fulfilled by different cell types and/or circuits.
Assuntos
Modelos Neurológicos , Células Receptoras Sensoriais/fisiologia , Animais , HumanosRESUMO
Sparse Coding (SC) has been widely studied and shown its superiority in the fields of signal processing, statistics, and machine learning. However, due to the high computational cost of the optimization algorithms required to compute the sparse feature, the applicability of SC to real-time object recognition tasks is limited. Many deep neural networks have been constructed to low fast estimate the sparse feature with the help of a large number of training samples, which is not suitable for small-scale datasets. Therefore, this work presents a simple and efficient fast approximation method for SC, in which a special single-hidden-layer neural network (SLNNs) is constructed to perform the approximation task, and the optimal sparse features of training samples exactly computed by sparse coding algorithm are used as ground truth to train the SLNNs. After training, the proposed SLNNs can quickly estimate sparse features for testing samples. Ten benchmark data sets taken from UCI databases and two face image datasets are used for experiment, and the low root mean square error (RMSE) results between the approximated sparse features and the optimal ones have verified the approximation performance of this proposed method. Furthermore, the recognition results demonstrate that the proposed method can effectively reduce the computational time of testing process while maintaining the recognition performance, and outperforms several state-of-the-art fast approximation sparse coding methods, as well as the exact sparse coding algorithms.
RESUMO
The spectral mismatch between a multispectral (MS) image and its corresponding panchromatic (PAN) image affects the pansharpening quality, especially for WorldView-2 data. To handle this problem, a pansharpening method based on graph regularized sparse coding (GRSC) and adaptive coupled dictionary is proposed in this paper. Firstly, the pansharpening process is divided into three tasks according to the degree of correlation among the MS and PAN channels and the relative spectral response of WorldView-2 sensor. Then, for each task, the image patch set from the MS channels is clustered into several subsets, and the sparse representation of each subset is estimated through the GRSC algorithm. Besides, an adaptive coupled dictionary pair for each task is constructed to effectively represent the subsets. Finally, the high-resolution image subsets for each task are obtained by multiplying the estimated sparse coefficient matrix by the corresponding dictionary. A variety of experiments are conducted on the WorldView-2 data, and the experimental results demonstrate that the proposed method achieves better performance than the existing pansharpening algorithms in both subjective analysis and objective evaluation.
RESUMO
With the rapid growth of the demand for location services in the indoor environment, fingerprint-based indoor positioning has attracted widespread attention due to its high-precision characteristics. This paper proposes a double-layer dictionary learning algorithm based on channel state information (DDLC). The DDLC system includes two stages. In the offline training stage, a two-layer dictionary learning architecture is constructed for the complex conditions of indoor scenes. In the first layer, for the input training data of different regions, multiple sub-dictionaries are generated corresponding to learning, and non-coherent promotion items are added to emphasize the discrimination between sparse coding in different regions. The second-level dictionary learning introduces support vector discriminant items for the fingerprint points inside each region, and uses Max-margin to distinguish different fingerprint points. In the online positioning stage, we first determine the area of the test point based on the reconstruction error, and then use the support vector discriminator to complete the fingerprint matching work. In this experiment, we selected two representative indoor positioning environments, and compared the DDLC with several existing indoor positioning methods. The results show that DDLC can effectively reduce positioning errors, and because the dictionary itself is easy to maintain and update, the characteristic of strong anti-noise ability can be better used in CSI indoor positioning work.
RESUMO
The cellular analysis of mushroom body (MB)-dependent memory forming processes is far advanced, whereas, the molecular and physiological understanding of their synaptic basis lags behind. Recent analysis of the Drosophila olfactory system showed that Unc13A, a member of the M(Unc13) release factor family, promotes a phasic, high release probability component, while Unc13B supports a slower tonic release component, reflecting their different nanoscopic positioning within individual active zones. We here use STED super-resolution microscopy of MB lobe synapses to show that Unc13A clusters closer to the active zone centre than Unc13B. Unc13A specifically supported phasic transmission and short-term plasticity of Kenyon cell:output neuron synapses, measured by combining electrophysiological recordings of output neurons with optogenetic stimulation. Knockdown of unc13A within Kenyon cells provoked drastic deficits of olfactory aversive short-term and anaesthesia-sensitive middle-term memory. Knockdown of unc13B provoked milder memory deficits. Thus, a low frequency domain transmission component is probably crucial for the proper representation of memory-associated activity patterns, consistent with sparse Kenyon cell activation during memory acquisition and retrieval. Notably, Unc13A/B ratios appeared highly diversified across MB lobes, leaving room for an interplay of activity components in memory encoding and retrieval.
Assuntos
Proteínas de Drosophila/metabolismo , Proteínas de Membrana/metabolismo , Memória/fisiologia , Corpos Pedunculados/metabolismo , Proteínas do Tecido Nervoso/metabolismo , Plasticidade Neuronal/fisiologia , Percepção Olfatória/fisiologia , Animais , Drosophila , Feminino , Isoformas de Proteínas , Sinapses/metabolismoRESUMO
Sparse representation is considered an important coding strategy for cortical processing in various sensory modalities. It remains unclear how cortical sparseness arises and is being regulated. Here, unbiased recordings from primary auditory cortex of awake adult mice revealed salient sparseness in layer (L)2/3, with a majority of excitatory neurons exhibiting no increased spiking in response to each of sound types tested. Sparse representation was not observed in parvalbumin (PV) inhibitory neurons. The nonresponding neurons did receive auditory-evoked synaptic inputs, marked by weaker excitation and lower excitation/inhibition (E/I) ratios than responding cells. Sparse representation arises during development in an experience-dependent manner, accompanied by differential changes of excitatory input strength and a transition from unimodal to bimodal distribution of E/I ratios. Sparseness level could be reduced by suppressing PV or L1 inhibitory neurons. Thus, sparse representation may be dynamically regulated via modulating E/I balance, optimizing cortical representation of the external sensory world.
Assuntos
Potenciais de Ação , Córtex Auditivo/fisiologia , Percepção Auditiva/fisiologia , Neurônios/fisiologia , Estimulação Acústica , Animais , Potenciais Evocados Auditivos , Feminino , Masculino , Camundongos Endogâmicos C57BL , Inibição NeuralRESUMO
We broaden the applicability of sparse coding, a machine learning method, to low-dose electron holography by using simulated holograms for learning and validation processes. The holograms, with shot noise, are prepared to generate a model, or a dictionary, that includes basic features representing interference fringes. The dictionary is applied to sparse representations of other simulated holograms with various signal-to-noise ratios (SNRs). Results demonstrate that this approach successfully removes noise for holograms with an extremely small SNR of 0.10, and that the denoised holograms provide the accurate phase distribution. Furthermore, this study demonstrates that the dictionary learned from the simulated holograms can be applied to denoising of experimental holograms of a p-n junction specimen recorded with different exposure times. The results indicate that the simulation-trained sparse coding is suitable for use over a wide range of imaging conditions, in particular for observing electron beam-sensitive materials.
RESUMO
Effective management of chronic constrictive pulmonary conditions lies in proper and timely administration of medication. As a series of studies indicates, medication adherence can effectively be monitored by successfully identifying actions performed by patients during inhaler usage. This study focuses on the recognition of inhaler audio events during usage of pressurized metered dose inhalers (pMDI). Aiming at real-time performance, we investigate deep sparse coding techniques including convolutional filter pruning, scalar pruning and vector quantization, for different convolutional neural network (CNN) architectures. The recognition performance has been assessed on three healthy subjects following both within and across subjects modeling strategies. The selected CNN architecture classified drug actuation, inhalation and exhalation events, with 100%, 92.6% and 97.9% accuracy, respectively, when assessed in a leave-one-subject-out cross-validation setting. Moreover, sparse coding of the same architecture with an increasing compression rate from 1 to 7 resulted in only a small decrease in classification accuracy (from 95.7% to 94.5%), obtained by random (subject-agnostic) cross-validation. A more thorough assessment on a larger dataset, including recordings of subjects with multiple respiratory disease manifestations, is still required in order to better evaluate the method's generalization ability and robustness.
Assuntos
Nebulizadores e Vaporizadores , Redes Neurais de Computação , Som , Adulto , Feminino , Humanos , Masculino , Adesão à Medicação , Inaladores Dosimetrados , Síndrome do Desconforto Respiratório , Adulto JovemRESUMO
This paper proposes a novel technique to improve a spectral statistical filter for speech enhancement, to be applied in wearable hearing devices such as hearing aids. The proposed method is implemented considering a 32-channel uniform polyphase discrete Fourier transform filter bank, for which the overall algorithm processing delay is 8 ms in accordance with the hearing device requirements. The proposed speech enhancement technique, which exploits the concepts of both non-negative sparse coding (NNSC) and spectral statistical filtering, provides an online unified framework to overcome the problem of residual noise in spectral statistical filters under noisy environments. First, the spectral gain attenuator of the statistical Wiener filter is obtained using the a priori signal-to-noise ratio (SNR) estimated through a decision-directed approach. Next, the spectrum estimated using the Wiener spectral gain attenuator is decomposed by applying the NNSC technique to the target speech and residual noise components. These components are used to develop an NNSC-based Wiener spectral gain attenuator to achieve enhanced speech. The performance of the proposed NNSC-Wiener filter was evaluated through a perceptual evaluation of the speech quality scores under various noise conditions with SNRs ranging from -5 to 20 dB. The results indicated that the proposed NNSC-Wiener filter can outperform the conventional Wiener filter and NNSC-based speech enhancement methods at all SNRs.