What is the principle of sparse coding? Explain its relation to other coding schemes such as dense codes or grandmother cells, and give examples of each in the nervous system. Why is sparse coding more common higher in sensory hierarchies?

Answer

Sparse Coding in General

The central hypothesis of sparse coding is that a sensing system should attempt to represent a stimulus as a combination of as few putative causes as possible. This simple idea can be expressed mathematically in very elegant ways that suggest deep connections with recent developments in signal processing, like compressive sensing and independent component analysis, but that is not the form of sparse coding that is important for this question.

Instead, this question relates to sparse coding as a hypothesis about coding in neural systems. In this view, each neuron (or perhaps an assembly of neurons) in a population represents a single causal feature of the stimulus – like an edge in an image – resulting in a statistically-optimal balance between the metabolic efficiency of the "grandmother cell" coding scheme and the storage capacity of dense, holographic codes.

Sparse coding is more common higher in sensory processing hierarchies because we have defined those hierarchies in terms of the complexity and invariance of the stimulus feature they respond to. Due to the finite bandwidth of perception, the more complex the stimulus feature is, the fewer such features can be present in a single presentation: one human body is made up of many body parts, each body part composed of thousands of visual edges.

Alternatives: Grandmothers and Holograms

In the "grandmother cell" formulation, high-level concepts are represented by single neurons – at its most absurd, this theory predicts a single cell in your brain that responds multi-modally to your grandmother, granting you the percept of a single entity connected to the sounds 'grandmother', 'grandma', 'grams', etc.; the sight of her smiling, wrinkly face from all angles; and your memories of tasty cookies, boring stories, and learning to pickle vegetables. This coding scheme is the natural extension of the Hubel-Wiesel dogma upwards from simple and complex cells through the black box of V4 into high-level perception and has the advantage of exceedingly simple read-out. It is, however, non-robust (an errant spike or two in one cell could cause you to suddenly hallucinate your grandmother in an inappropriate situation; a lesion of the cell is fearsome to contemplate, but would explain why you never call) and limits storage capacity of a population of N neurons to N values.

The precise opposite coding scheme, where all neurons in a population participate in coding a single percept, is referred to as dense coding, and comes in two flavors: population vector and holographic.

The former is very simple: each neuron in a population acts as a "vote" that the stimulus is at some value range (e.g. orientation in degrees of an edge) and the percept arises by pooling the responses together (more technically, by using ideal observer analysis on the population). This code is robust and accurate, but N neurons can only represent one feature (though they represent, in theory, all possible values of that feature). For more information on population codes see this post.

The latter is familiar to anyone who has worked with Hopfield Networks. In holographic coding, each cell is a bit – a 1 or a 0 – in a multi-bit representation of the stimulus, much like the memory of a computer. This code allows for a positively massive storage capacity (2^N for N neurons), but also suffers from a certain lack of robustness (especially at the extremes of storage, where features close in stimulus space are not likely to be close in representation space) and difficulty in read-out (any putative downstream cell needs to be connected to all N of the population's cells).

Examples

Sparse Coding in V1 - "Sparse Coding with an Over-Complete Basis Set: A Strategy Employed by V1?", Olshausen and Field, 1997.
Dense Coding in M1 - "Neuronal Population Coding of Movement Direction", Georgopoulos, et al., 1986.
- There is, however, a nice rebuttal to this interpretation of the data in this Neural Computation paper by Terry Sanger.
Grandmother Cells in the MTL - "Explicit Encoding of Multimodal Percepts by Single Neurons in the Human Brain", Quiran Quiroga et al., 2009.
- This is commonly referred to as the "Jennifer Aniston Neuron" paper. Note, however, that the authors actually repudiated that interpretation of their findings:
  - "…a Bayesian probabilistic argument was used to estimate that out of approximately one billion MTL neurons, fewer than two million represent a given percept…this is a far cry from the 'grandmother cell'."

An Aside on the Sociology of Science

Note that there are compelling empirical and theoeretical counter-arguments to either of the two extremes, so in some sense, pointing to "an example of each in the nervous system" is an impossible task. This is partly due to the polemical nature of scientific argument: the sharp distinction between "dense code" thesis and "grandmother cell" antithesis is dialectically necessary for the development of a proper synthesis, despite the physical improbability of either in pure form. It is also due to the extremely primitive nature of our understanding of neural systems: if we could say, with certainty, that a particular area of cortex encodes some information in some specific way, neuroscience would be much more mature as a field than it is.

Refs

Sparse Coding in General
- A nice Scholarpedia article by Peter Foldiak
- Olshausen and Field, 2004. "Sparse coding of sensory inputs"
Examples
- Most of the references are in-line or linked, but I'd like to add this cute little article by Charles Gross on the history of the phrase "grandmother cell", complete with references to Philip Roth:
- Gross, C 2002. Genealogy of the “Grandmother Cell”
An Aside
- Hegel, GWF 1807. Phenomenology of Spirit
- Kuhn, T 1962. The Structure of Scientific Revolutions