Auditory system detects sound signals and uses the temporal-frequency information
of the sound signals to conduct sound identification, sound localization,
and sound source separation. Thanks to the past studies, we know that the hair cells
at cochlea show frequency-dependent responses against input sound signals.
But, little is known about how the sound information is conducted to and
processed within the auditory cortex.
Recently, we proposed an algorithm for monaural source separation under the assumption
that the precise head-related transfer function (HRTF) and all the sound dictionary
elements are known. To apply this algorithm to the real sound signals,
here I propose that non-negative matrix factorization (NMF) applied to the spectrograms
of sound signals would successfully give a set of sound dictionary elements.
When NMF was applied to solo-music signals with an appropriate value for the rank of
factorization, it could extract instrument-specific patterns of basis spectrograms,
each of which has a peak frequency for different notes. Interestingly,
the sound signals converted back from the obtained basis spectrograms sounded
more or less like the corresponding instrument, which suggests that the obtained
basis spectrograms, or basis sound elements, would be a good candidate
for the sound dictionary.
When NMF was applied to sound signals played with several different instruments,
the obtained basis sound elements can be categorized into each instrument-specific
pattern by hand. In addition, using the categorized elements, sound signals can be
reconstructed corresponding to each instrument part of the original music.
The fact that source separation can be done using the basis sound elements obtained
by NMF suggests that NMF would be a possible way to learn sound dictionaries
from sound signals.
Preprint (930 KB, PDF)
Supplementary Materials (2.4 MB, ZIP)
See also technical notes for my dissertation.