Last update: 10/22/2007

Auditory System Characterization

Hiroki Asari

A dissertation presented to the Watson School of Biological Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Biological Sciences at Cold Spring Harbor Laboratory (July 2007).



The world is full of sounds---rustle of leaves, whistle of winds, and bustle of cities---and in our everyday lives, we enjoy music and pleasant conversations with friends and families. Such auditory communications play highly significant roles in social interactions in many animal species including humans, songbirds, crickets, and so on, but they would never be accomplished without the specialized module in the brain; the auditory system.

The three main functions of the auditory system are to separate, localize, and identify sound signals we receive at the cochlea. To perform these tasks, the auditory system became sophisticated during the course of evolution, and now reached such an amazing level where coincidence detectors in owls can identify interaural time differences with the precision of microseconds (Moiseff and Konishi, 1981; Konishi, 2003), and hair cells in the vertebrate ear can detect a tip movement of only a tenth of a nanometer (Sellick et al., 1982; Crawford and Fettiplace, 1985; Hudspeth, 1989, 1997). Moreover, a trained ear can establish a sense of "absolute pitch" (Zatorre, 2003; Levitin and Rogers, 2005), and even listen out and transcribe a single instrument from a musical piece played by an orchestra.

How can the brain perform such outstanding feats? Although the auditory system works so efficiently that one might almost forget about how difficult the auditory tasks are, it should be mentioned that the underlying computations are extremely complex and overwhelming. In fact, no artificial system can compete with the brain in solving any acoustic---and any sensory---signal processing problem. The brain generally works up to the noise limit of the sensors across their entire dynamic range in all ecological conditions, whereas an artificial system requires a huge amount of tuning to solve a particular problem of interest but yet works only in a carefully controlled demo (as I show an example in Chapter 2) or only in a "platonic" world. As such, not surprisingly, the process beyond the auditory nerve (i.e., the very first step of the auditory processing in the brain) is not known very well despite many decades of psychological and physiological research (for the earliest electophysiological studies on the auditory cortex, see e.g., Woolsey and Walzl, 1942; Tunturi, 1944; Bremer and Bonnet, 1949).

In this dissertation, I combined two complementary approaches to study neuronal dynamics in the auditory cortex and challenged how the brain processes acoustic signals.

  1. As a top-down theoretical analysis of neural behaviors at the population level, we exploited the idea of sparse overcomplete linear representations and developed a model for monaural blind source separation (Asari et al., 2006, 2007).

  2. As a bottom-up experimental analysis at the single-cell level, we used whole-cell recordings in vivo and examined how neural responses in the auditory cortex depend on stimulus history and its context in order to build plausible models that characterize the relationship between input sounds and output neural responses (Asari and Zador, submitted).

This thesis therefore consists of two major parts. Following the introductory Chapter 1 overviewing the characteristics of the auditory system and the challenges in auditory systems research, Chapter 2 describes the theoretical part in detail. Inspired by a striking anatomical feature of many sensory processing problems that many more neurons appear to be engaged in the internal representations of the signal than in its transduction (Section 2.1), I demonstrate an example of how sparse overcomplete linear representations can directly solve difficult acoustic signal processing problems (Sections 2.2--2.3); i.e., monaural source separation using solely the cues provided by the differential filtering imposed on a source by its path from its origin to the cochlea (the head-related transfer function; Bregman, 1990). The model of sparse representations then makes several experimentally testable predictions, which in turn can be used to test the model with experimental data (Section 2.4). This model framework can be generalized to exploit other monaural separation cues such as common onset time, and binaural information such as interaural time and level differences (Section 2.5).

The experimental part is then covered in Chapter 3. Because sensory signal processing in the brain depends on stimulus history and its context (Section 3.1), here we assessed the relevant time-scales and how past events influence the responses in the auditory cortex (Sections 3.2--3.3). We found that the context-dependence sometimes lasted as long as four seconds or longer in some neurons; and the changes in lower-order sound properties (e.g., intensity) had larger and longer effects on the following response dynamics than the changes in higher-order properties (e.g., amplitude-modulation; Section 3.5). The data were also analyzed from a viewpoint of model construction (Section 3.4), showing that the window length of at least several seconds was required to capture the stimulus-related predictable response power fully enough (Section 3.5). Although the linear model performance did not improve substantially by just extending the window length and even by incorporating static nonlinearities, these results suggest that complex bottom-up modulations on longer time-scales should contribute a lot to the nature of stimulus encoding in the auditory cortex and its functions (Section 3.6).

Finally, Chapter 4 briefly recapitulates the results from Chapters 2 and 3, and discusses future challenges. I will close with a general discussion on how we could possibly meet theory and biology together for better understanding the principles and the algorithms underlying computations and functions in the brain. This thesis is intended to be self-contained, and thus all mathematical details and conventional algorithms used for the data analysis and the simulations are explained in Appendix A.