#### Last update: 10/22/2007

## Sparsification for Monaural Source Separation

#### Hiroki Asari, Rasmus K. Olsson, Barak A. Pearlmutter, and Anthony M. Zador

**Abstract**
We explore the use of sparse representations for separation of a monaural mixture signal,
where by a sparse representation we mean one where the number of non-zero elements is
smaller than might be expected. This is a surprisingly powerful idea, as the ability to
express a signal sparsely in some known, and potentially overcomplete, basis constitutes
a strong model, while also lending itself to efficient algorithms. In the framework we
explore, the representation of the signal is linear in a vector of coefficients. However,
because many coefficient values could represent the same signal, the mapping from signal
to coefficients is nonlinear, with the coefficients being chosen to simultaneously
represent the signal and maximize a measure of sparsity. This conversion of the signal
into the coefficients using L1-optimization is viewed not as a pre-processing step
performed before the data reaches the heart of the algorithm, but rather as itself the
heart of the algorithm: after the coefficients have been found, only trivial processing
remains to be done. We show how, by suitable choice of overcomplete basis, this framework
can use a variety of cues (e.g., speaker identity, differential filtering, differential
attenuation) to accomplish monaural separation. We also discuss two radically different
algorithms for finding the required overcomplete dictionaries: one based on non-negative
matrix factorization of isolated sources, and the other based on end-to-end optimization
using automatic differentiation.

Draft manuscript (150KB, PDF)

See also Asari et al., (2006) and my dissertation
(Chapter 2 and
technical notes).