In this paper we will use a direct method
[DeWeese, 1995, DeWeese, 1996, Stevens and Zador, 1996, de Ruyter van Steveninck et al., 1997] to
estimate the mutual information. Direct methods use another form of
the expression eq. (5) for mutual information,
The first term is the entropy of the output spike train
itself, while the second is the conditional entropy of
the output given the inputs. The first term measures of the
variability of the spike train in response to the ensemble of
different inputs, while the second measures the reliability of the
response to repeated presentations of the same inputs. The second
term depends on the reliability of the synapses and spike generating
mechanism: to the extent the same inputs produce the same outputs,
this term approaches zero.
The direct method has two advantages over the reconstruction method in the present context. First, it does not require the construction of a ``reconstructor'' for estimating the input from the output. Although the optimal linear reconstructor is straightforward to estimate, the construction of more sophisticated (i.e. nonlinear) reconstructors can be a delicate art. Second, it provides an estimate of information that is limited only by the errors in the estimation of and ; the reconstruction method by contrast provides only a lower bound on the mutual information that is limited by the quality of the reconstructor.
As noted above, the estimation of and can
require vast amounts of data. If, however, interspike intervals
(ISIs) in the output spike train were independent, then the entropies
could be simply expressed in terms of the entropy of the associated
ISI distributions. The information per spike is then
given simply by
where H(T) are are total and conditional entropies,
respectively, of the ISI distribution. The information rate (units:
bits/second) is then just the information per spike (units:
bits/spike) times the firing rate R (units: spikes/second),
The representation of the output spike train as a sequence of firing times is entirely equivalent (except for edge effects) to the representation as a sequence of ISIs , where . The advantage of using ISIs rather than spike times is that H(T) depends only on the ISI distribution p(T), which is a univariate distribution. This dramatically reduces the amount of data required.
In the sequel we assume that spike times are discretized at a finite time resolution . The assumption of finite precision keeps the potential information finite. If this assumption is not made, each spike has potentially infinite information capacity; for example, a message of arbitrary length could be encoded in the decimal expansion of a single ISI.
Eq. 8 represents the information per spike as the
difference between two entropies. The first term is the total entropy
per spike,
where is the probability that the length of the ISI was
between and . The distribution of ISIs can be obtained
from a single long (ideally, infinite) sequence of spike times.
The second term is the conditional entropy per spike. The conditional
entropy is just the entropy of the ISI distribution in response to a
particular set m of input spikes , averaged over all
possible sets of inputs spikes
where represents average. Here
is the probability of obtaining an ISI of length
in response to a particular particular set of input spikes
.
We used the following algorithm for estimating the conditional entropy:
In summary, we have described the three steps required to compute the information rate in our model. First, the total entropy per spike is computed from Eq. 10, and the conditional entropy per spike is computed from Eq. 11. Next, the information per spike is computed from Eq. 8. Finally, the information rate (information per time) is computed from Eq. 9.