RESUMEN
While cochlear implants (CIs) have proven to restore speech perception to a remarkable extent, access to music remains difficult for most CI users. In this work, a methodology for the design of deep learning-based signal preprocessing strategies that simplify music signals and emphasize rhythmic information is proposed. It combines harmonic/percussive source separation and deep neural network (DNN) based source separation in a versatile source mixture model. Two different neural network architectures were assessed with regard to their applicability for this task. The method was evaluated with instrumental measures and in two listening experiments for both network architectures and six mixing presets. Normal-hearing subjects rated the signal quality of the processed signals compared to the original both with and without a vocoder which provides an approximation of the auditory perception in CI listeners. Four combinations of remix models and DNNs have been selected for an evaluation with vocoded signals and were all rated significantly better in comparison to the unprocessed signal. In particular, the two best-performing remix networks are promising candidates for further evaluation in CI listeners.
Asunto(s)
Implantación Coclear , Implantes Cocleares , Música , Percepción Auditiva , Implantación Coclear/métodos , Humanos , Redes Neurales de la ComputaciónRESUMEN
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded by 1. The latter feature is essential to control generalization errors in many statistical and machine learning applications.