Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Mais filtros

Base de dados
Intervalo de ano de publicação
Artigo em Inglês | MEDLINE | ID: mdl-29398973


Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.

IEEE Trans Pattern Anal Mach Intell ; 32(2): 348-63, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20075463


We present a generative model and inference algorithm for 3D nonrigid object tracking. The model, which we call G-flow, enables the joint inference of 3D position, orientation, and nonrigid deformations, as well as object texture and background texture. Optimal inference under G-flow reduces to a conditionally Gaussian stochastic filtering problem. The optimal solution to this problem reveals a new space of computer vision algorithms, of which classic approaches such as optic flow and template matching are special cases that are optimal only under special circumstances. We evaluate G-flow on the problem of tracking facial expressions and head motion in 3D from single-camera video. Previously, the lack of realistic video data with ground truth nonrigid position information has hampered the rigorous evaluation of nonrigid tracking. We introduce a practical method of obtaining such ground truth data and present a new face video data set that was created using this technique. Results on this data set show that G-flow is much more robust and accurate than current deterministic optic-flow-based approaches.

Algoritmos , Face/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Distribuição Normal , Reconhecimento Automatizado de Padrão/métodos , Humanos , Processos Estocásticos , Gravação em Vídeo