Auditory system detects sound signals and uses the temporal-frequency information
of the sound signals to conduct sound identification, sound localization,
and sound source separation. Thanks to the past studies, we know that the hair cells
at cochlea show frequency-dependent responses against input sound signals.
But, little is known about how the sound information is conducted to and
processed within the auditory cortex.
Recently, we proposed an algorithm for monaural source separation under the assumption that the precise head-related transfer function (HRTF) and all the sound dictionary elements are known. To apply this algorithm to the real sound signals, here I propose that non-negative matrix factorization (NMF) applied to the spectrograms of sound signals would successfully give a set of sound dictionary elements.
When NMF was applied to solo-music signals with an appropriate value for the rank of factorization, it could extract instrument-specific patterns of basis spectrograms, each of which has a peak frequency for different notes. Interestingly, the sound signals converted back from the obtained basis spectrograms sounded more or less like the corresponding instrument, which suggests that the obtained basis spectrograms, or basis sound elements, would be a good candidate for the sound dictionary.
When NMF was applied to sound signals played with several different instruments, the obtained basis sound elements can be categorized into each instrument-specific pattern by hand. In addition, using the categorized elements, sound signals can be reconstructed corresponding to each instrument part of the original music. The fact that source separation can be done using the basis sound elements obtained by NMF suggests that NMF would be a possible way to learn sound dictionaries from sound signals.
Preprint (930 KB, PDF)
Supplementary Materials (2.4 MB, ZIP)
See also technical notes for my dissertation.