Foundational Neuroscience QuestionsNeuroscience Questions with Neuroscience Answers
http://charlesfrye.github.io/FoundationalNeuroscience//
Tue, 24 Mar 2020 23:40:23 +0000Tue, 24 Mar 2020 23:40:23 +0000Jekyll v3.8.5What is population coding? Describe the population coding model proposed by Georgopoulos in the 1980s for M1 control of arm direction.<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<h2 id="answer">Answer</h2>
<p>Population codes are neural representations
at the level of groups of cells.
There are many examples of population codes,
including sparse codes and holographic codes.</p>
<p>One famous population coding model is the
"population vector" model from a 1986 paper
by Georgopoulos,
proposed to describe motor neuron tuning in
primary motor cortex.
In this model, each neuron in the population
has a preferred movement direction,
and the resulting movement is a
weighted average of the preferred movements,
where the average is weighted by firing rate.</p>
<p>Though popular at the time, it came in for some
substantial criticism on theoretical grounds,
which I review here.</p>
<h2 id="population-coding">Population Coding</h2>
<p><a href="/FoundationalNeuroscience//82">Information about the outside world</a>
and
<a href="/FoundationalNeuroscience//12">plans for motor action</a>
are represented by the firing patterns
of neurons –
temporal sequences of
<a href="/FoundationalNeuroscience//23">action potentials</a>,
also known as spikes.</p>
<p>Usually, questions about the nature of this representation
revolve around the behavior of single cells:
Is information contained <a href="/FoundationalNeuroscience//47">in firing rate or relative timing of spikes</a>?
How does a single synapse <a href="/FoundationalNeuroscience//29">change its efficacy</a>?
What stimulus features <a href="/FoundationalNeuroscience//52">predict firing changes in a cell</a>?</p>
<p>We ask these questions because they are more tractable,
that is,
because we might hope to find answers to them.
Unfortunately, they are essentially
the wrong questions.</p>
<p>In general, the computational properties of groups of neurons
should be an emergent property of the group,
rather than merely a concatenation
of the computational powers of individual neurons.</p>
<p>Population codes are answers to the more complicated
questions that arise when multiple cells
are considered at the same time:
Should codes involve <a href="{[site.baseurl}}/48">all cells or just a few</a>?
How can <a href="/FoundationalNeuroscience//51">excitatory and inhibitory cells interact to produce computation</a>?
How can we <a href="/FoundationalNeuroscience//45">store and retrieve memories</a>
using a population of neurons?</p>
<p>In the case of <em>population vector coding</em>, though,
the computations of individual neurons are combined
in a very simple way.</p>
<h2 id="population-vector-coding-georgopoulos-model">Population Vector Coding: Georgopoulos Model</h2>
<p>We'll discuss population vector coding
by means of a particular model,
which was proposed in the 1980s
to describe the behavior of neurons in the
<a href="/FoundationalNeuroscience//13}">primary motor cortex</a>
of monkeys during a reaching task.</p>
<p>Our monkey has been trained to reach and press
a button in return for a juice reward.
There are eight buttons,
arranged in a circle,
and the monkey is informed
as to which button to press
by a blinking red light.
In between each button press,
the monkey returns its hand to the center.</p>
<p>We insert an
<a href="/FoundationalNeuroscience//80">extracellular electrode</a>
into the brain of this monkey
and record from one or a few neurons at a time.</p>
<p>The results from recording a particularly nice neuron
are shown below.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/tuningCurve.png" alt="tuningCurve" /></p>
<p>On top, we see a schema of the experiment,
along with some data.
The circle in the center represents
the central region to which the monkey
returns its hand on each trial.
Each red arrow points towards the location of a button.</p>
<p>At the tip of each arrow appears a
"spike train" –
the black line is a time axis,
and the green hash marks
are action potentials.
The spike trains at the tip of an arrow were recorded
when the monkey was reaching towards the button
indicated by that arrow.</p>
<p>The data represented by the spike trains has been aggregated
into a
"tuning curve"
in the bottom half of the figure.
Heights represent spike counts
(population vector coding is a form of
<a href="/FoundationalNeuroscience//47">rate coding</a>)
and positions on the x-axis correspond to
angles of motion <script type="math/tex">\theta</script>–
that is, to button locations.
This neuron has a "preferred direction"
down and to the left.</p>
<p>Note, however, that the neuron
fires a spike or two even when
the motion is in precisely the opposite
direction of it prefers.
If we assume that the number of spikes fired
on any trial varies by two or three spikes,
then we have a serious problem:
seeing five spikes from this neuron
provides very little
<a href="/FoundationalNeuroscience//82">information</a>
about the intended motion direction.</p>
<p>But monkeys, like humans,
are capable of very fine motions:
a monkey would have no trouble touching a button
separated by only a few degrees,
but this neuron hardly discriminates between
angles that are 90<script type="math/tex">^{\circ}</script> apart!</p>
<p>In population vector coding
<a href="http://wexler.free.fr/library/files/georgopoulos%20(1986)%20neuronal%20population%20coding%20of%20movement%20direction.pdf">according to Georgopoulos</a>,
this apparent paradox
is resolved through a <em>population vector code</em>.
Each neuron's spikes are viewed as "votes"
for motion in its preferred movement direction.
When a movement command is generated,
the resulting votes are combined
and movement occurs in the
<em>average</em> direction.</p>
<p>In mathematical terms,
we take a weighted average of the preferred directions.
Each direction is a vector,
and so the resulting average is a <em>population vector</em>.
To get even more mathematical,
we can say that we are taking a
<em>linear combination</em>
in the basis set up by the neural tuning curves –
by assumption, a
<a href="http://www.scholarpedia.org/article/Radial_basis_function">radial basis</a>
– where the weights of the linear combination
are determined by the firing rates.</p>
<p>This results in a substantial increase in the
possible accuracy of movement.
In the image below,
eight neurons with preferred directions pointing
to each of the buttons (red arrows) are firing,
with the preferred direction and spike count
of each cell represented by the directions and magnitudes of the green arrows.
The average, "population" vector that results (light blue)
is in between two of the preferred directions.
Small changes in firing rate can generate small changes in direction.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/popVector.png" alt="popVector" /></p>
<h2 id="problems-with-population-vectors">Problems with Population Vectors</h2>
<p>During the 1980s,
it was reported that population vector models
could predict hand motion with a high degree of accuracy,
and so it was believed that these models accounted well for the
behavior of
<a href="/FoundationalNeuroscience//13">cortical primary motor neurons</a>.
That is to say,
it was believed that cortical motor neurons represent
preferred motion direction in Cartesian coordinates.</p>
<p>However, a close look at the mathematics of the
population vector approach,
as in, e.g.,
<a href="http://www.cs.cmu.edu/afs/cs/academic/class/15883-f13/readings/sanger-1994.pdf">Sanger's 1994 <em>Neural Computation</em> paper</a>
precludes that conclusion.</p>
<p>Without diving into the mathematical details,
the overall point is that if we assume that
1) neural firing is at least correlated with hand movement
and
2) these directions of correlation are spread out in space
then all of the reported results follow,
without any need to assume that the neurons are actually
engaging in population vector coding.</p>
<p>It's important to note that,
though Assumption #1 sounds like an assumption
that neurons are computing preferred directions,
it isn't exactly the same.
Importantly, the possibility is left open that
there are more complex statistical dependencies
than correlation between the neuron and hand motion,
so long as the correlation is non-zero.</p>
<p>The broader point is that the results would be found
whether the neurons were using Cartesian coordinates or not –
so long as they produce a complete basis for 2-D or 3-D space,
the analysis procedure used to create a population vector
will produce statistically-significant results.</p>
<p>The broadest point is to be careful with models:
usually, we are making a number of basic assumptions
about the data that we have no reason to believe are true.
We make these assumptions because they guarantee the correctness
of our model – our linear regression or our correlation analysis.
They provide these guarantees because they are powerful mathematical tools,
and we should be careful that they are not too powerful
to support our conclusions!</p>
<p>Before you think this kind of mistake is relegated to neuroscience past,
consider this:
recently, a number of papers have used
compared the internal representations of
artificial neural networks that perform difficult tasks,
like object recognition,
at a human level
to the representations of cortical neurons.
When they find correspondences,
the papers speculate that this indicates
some connection between the computations of artificial neurons
and their biological counterparts:
perhaps the brain does gradient descent,
or has a
<a href="/FoundationalNeuroscience//09">convolutional architecture</a>?</p>
<p>However, these are exactly the kinds of correspondences
that could be, like the population vectors,
simply artifacts of the definition of the problem –
both of how we
<a href="/FoundationalNeuroscience//52">measure receptive fields</a>
and of how we train artificial nets.</p>
Tue, 12 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//49/
http://charlesfrye.github.io/FoundationalNeuroscience//49/neural-networks-and-codingWhat is a central pattern generator? Choose a well-studied CPG and describe how it functions.<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<h2 id="answer">Answer</h2>
<p>Central pattern generators or CPGs
are collections of neurons that produce
<em>rhythmic</em> behavioral outputs.</p>
<p>Due to the fundamental nature of rhythms,
both in biology and in mathematics,
we should expect to find many examples
of central pattern generators (CPGs) in neuroscience.</p>
<p>One of the best-studied CPGs is the
cat locomotion pattern generator,
which was discovered over 100 years ago
by pulling the brains out of cats!</p>
<h2 id="rhythms">Rhythms</h2>
<p>The behavior of animals extends in time.
In many cases,
this behavior is repetitive:
it could be extended forever,
with some basic template occurring
over and over again,
beginning and ending in the same way:
we walk by stepping left-right-left-right-left-…,
fish swim by contracting one side of their body, then the other,
birds fly by flapping their wings up and down and up again,
and so on.
Even outside of nature,
inside our computers,
some of the most fundamental steps of computation,
the clock cycle of the CPU
and the
core loop of the operating system,
are repeated actions.</p>
<p>Rhythms are fundamental because they express
one of the basic ways a process can unfold
over an arbitrary amount of time.
In a very real way,
it represents one of only three basic options
for the behavior of a arbitrarily-long process:
it can either continue to increase in intensity forever,
it can settle down to some specific value and stay there forever,
or it can oscillate, repeating itself over and over again.</p>
<p>The first option is impossible for biological processes:
it is simply unphysical for anything to truly grow without bound,
since we live in a finite universe<sup>†</sup>.
At best, this can occur approximately over some regime,
as in
<a href="http://www.math.utah.edu/~carlson/hsp2004/dynamics.pdf">the population growth of bacteria in a dish</a>
or in
<a href="/FoundationalNeuroscience//23">the upstroke of the action potential</a>.</p>
<p>Settling to a particular value is the definition of
<a href="/FoundationalNeuroscience//30">homeostasis</a>,
meaning our second option is in fact
an important function of a myriad of biological processes.</p>
<p>Before returning to oscillations, I'd be remiss if I didn't
note that there is a fourth, more complex, option for infinite-time behavior:
<em>chaos</em>,
where small changes slowly grow,
as in the proverbial butterfly that causes a hurricane,
while big changes get smeared out into nothing.
Though chaos plays a role in biology,
e.g. in the
<a href="/FoundationalNeuroscience//32">developing embryo</a>,
as discovered by Alan Turing,
a role for
<a href="http://www.ncbi.nlm.nih.gov/pubmed/14694754">chaotic dynamics in the brain</a>
is controversial in the field.</p>
<p>So, if we wish to generate an action,
like walking, swimming, chewing, or flying,
that repeats itself <em>ad infinitum</em>,
or at least until we decide to stop,
rather than varying chaotically,
growing forever,
or slowly petering out,
we'd do well to make use of oscillations.</p>
<p>Put another way:
our neurons must generate patterns
of oscillatory behavior,
so we should expect to find pattern generators
that oscillate in the central nervous system.
We call these <em>central pattern generators</em>.</p>
<h2 id="a-model-cpg">A Model CPG</h2>
<p>We model our neurons using a toolset
from classical mechanics:
dynamical systems.
In brief, a dynamical system
is any system that can be described with a
<em>differential equation</em>,
or an equation that describes
how the system changes at each point in time.</p>
<p>Differential equations are fundamental mathematical objects:
in a very real way,
<script type="math/tex">e</script>, <script type="math/tex">\pi</script>, and the Gaussian, or Normal, distribution
<a href="https://affinemess.quora.com/What-is-math-pi-math-and-while-were-at-it-whats-math-e-math">arise from differential equations</a>.
They are also critical for computational neuroscience and neurobiology.
The
<a href="/FoundationalNeuroscience//92iii">Hodgkin-Huxley equations</a>,
which describe the evolution of the
<a href="/FoundationalNeuroscience//23">action potential</a>,
are differential equations.
The
<a href="/FoundationalNeuroscience//25">leaky integrate-and-fire model</a>
of the neuron is a differential equation.</p>
<p>Because of their importance, their generality, and their fundamental nature,
differential equations are a difficult subject, and notoriously so.
But for the exact same reason, there exist simple,
intuitive ways of explaining much of the critical understanding
about the behavior of systems governed by these equations.</p>
<p>Consider the following idealized neural circuit:</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/simpleCircuit.png" alt="simpleCircuit" /></p>
<p>Neuron <em>X</em> activates Neuron <em>Y</em>,
but Neuron <em>Y</em> inhibits Neuron <em>X</em>.
Any time Neuron <em>X</em> becomes active,
it will increase the activity of Neuron <em>Y</em>,
which will eventually decrease the activity of Neuron <em>X</em>
until it is below its baseline value.
At that point, Neuron <em>Y</em> is no longer receiving as much
activation from Neuron <em>X</em>,
and so its activity goes down.
This will eventually cause the activity of Neuron <em>X</em>
to increase again, starting the whole cycle over again.</p>
<p>Just from that explanation, it is clear that this
system is oscillatory.
We can understand it even more clearly if we use the methods
of dynamical systems analysis
<a href="http://www.scholarpedia.org/article/History_of_dynamical_systems#Poincar.C3.A9_and_Birkhoff">pioneered by Poincaré</a>
to describe the motion of the planets.</p>
<p>We represent the activity of a neuron by a single number:
large and positive means more active than baseline,
while large and negative means less active than baseline.
This means our baseline activity is represented by <em>0</em>.</p>
<p>We translate the notions of "activation" and "inhibition"
into mathematics like this:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
\frac{dX}{dt} &= -Y \\
\frac{dY}{dt} &= X
\end{align} %]]></script>
<p>In English, you might read the second equation as:
"the change in the activity of <em>Y</em> over time
is equal to the current activity of <em>X</em>",
or
"activity in <em>X</em> causes the activity in <em>Y</em>
to increase or decrease proportionally".</p>
<p>For any pair of values <em>X</em> and <em>Y</em>,
representing a possible state for the two neurons,
we can then describe how the state will change over time.
To each pair, we can attach an arrow,
pointing towards what the values will look like
in the next instant.</p>
<p>If we treat <em>X</em> and <em>Y</em> as an ordered pair and
draw our arrows on the Cartesian plane,
we get a picture of the system called a
<em>flow field</em>
or a
<em>phase portrait</em>.</p>
<p>The phase portrait for our neural circuit appears below:</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/phasePortrait.png" alt="phasePortrait" /></p>
<p>The blue arrows represent the pair of values
<script type="math/tex">\frac{dX}{dt}</script> and <script type="math/tex">\frac{dY}{dt}</script> –
they are <em>vectors</em>.
You should confirm for yourself that the values match up!</p>
<p>Phase portraits are useful because we can see
what the behavior of a system looks like
without having to do any number-crunching.
Just by looking at the portrait,
we can see that, if we start off with either
<em>X</em> or <em>Y</em> away from its baseline value,
the neurons will "go in circles":
first one will increase while the other decreases,
then vice versa.</p>
<p>Put your finger on the x-axis,
and then follow the arrows
side to side,
<em>without</em> leaving the x-axis.
That is, move left when the arrows point left,
ignoring whether they also point up or down,
and move right when the arrows point right.
Your finger should move back and forth from
the starting point to minus the starting point,
and then back again.
It is <em>oscillating</em>.
In fact, it's drawing a sine or cosine wave!</p>
<p>If you're the programming type,
you might understand this better by means of
the following Python code:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dt</span> <span class="o">=</span> <span class="mf">0.0001</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">aBigNumber</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="o">-</span><span class="n">y</span><span class="o">*</span><span class="n">dt</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">y</span> <span class="o">+</span> <span class="n">x</span><span class="o">*</span><span class="n">dt</span></code></pre></figure>
<p>Try coding that up in your favorite programming language
and plotting the resulting <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> values.
You should get circles, so long as <code class="language-plaintext highlighter-rouge">dt</code> is small
and <code class="language-plaintext highlighter-rouge">aBigNumber</code> is big!</p>
<p>We can see how such a neural circuit
might be used for movement generation
by connecting the neural activity values
to motor commands.</p>
<p>Let's say that <em>X</em> is a
<a href="/FoundationalNeuroscience//13">speed-representing neuron</a>,
meaning that increased spiking activity in <em>X</em>
tends to cause an increase in the speed
that some muscle or group of muscles is moving.
For concreteness' sake, let's say it's
the muscles birds use to flap their wings.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/flappyBird.png" alt="flappyBird" /></p>
<p>In the above diagram,
I've drawn how the state of a bird
using a neuron like <em>X</em>
to flap its wings will change over time.
If the neurons are both at their baseline activity,
then nothing happens,
represented by the single bird at the origin.
But if some signal comes from another neuron
and increases the activity of neuron <em>X</em>,
then the bird's wings begin to flap,
first up, then down,
and then back again.
If nothing changes,
the bird will continue to fly forever,
but if another signal comes in and
returns <em>X</em> and <em>Y</em> to <em>0</em>,
then the bird's wings will stop flapping.</p>
<h2 id="an-actual-cpg-cats">An Actual CPG: Cats!</h2>
<p>Interestingly,
the pattern generators for locomotion
are not necessarily located within the brain.
This was discovered when a group of neuroscientists
removed the entire brain of cat,
then placed it on a treadmill.
It still walked!
Click the image below to see a video.</p>
<p style="text-align: center"><a href="http://www.youtube.com/watch?v=wPiLLplofYw" title="Cat with no brain walks!"><img src="http://img.youtube.com/vi/wPiLLplofYw/0.jpg" alt="zombieCat" /></a></p>
<p>In fact,
as long as 100 years ago,
it was known that one could "decerebrate"
(that is, cut the brain out of)
a cat,
cut its spinal cord,
and even sever most of the sensory nerves,
and it will still respond appropriately
to stimuli, like a treadmill,
that encourage motion.</p>
<p>This indicated that the central pattern generators
were to be found
<a href="/FoundationalNeuroscience//71">in the ventral horn of the spinal cord</a>.</p>
<p>Additional experiments showed that,
even when the muscles were paralyzed,
rhythmic activity occurred in response
to motion-generating stimuli,
detected using
<a href="/FoundationalNeuroscience//80">extracellular electrophysiology</a>.</p>
<h3 id="references">References</h3>
<p>For understanding dynamical systems,
the classic text is
<a href="http://www.amazon.com/Nonlinear-Dynamics-And-Chaos-Applications/dp/0738204536"><em>Nonlinear Dynamics and Chaos</em></a>,
by Steven Strogatz.
I learned about these systems in the context of biology,
and my professor for that course,
Dmitry Kondrashov, has written
<a href="http://www.amazon.com/Quantifying-Life-Symbiosis-Computation-Mathematics/dp/022637176X/">a book introducing the ideas to biologists</a>,
developing all the mathematical machinery necessary
in as clear a fashion as possible.</p>
<p>For more concrete information on CPGs, check out
<a href="https://www.cs.cmu.edu/~cga/legs/nclpt1.pdf">this review</a>
from 1998
by Duysens and Van de Crommert.</p>
<h3 id="footnotes">Footnotes</h3>
<p>† An astute reader might note that,
thanks to the second law of thermodynamics,
oscillating forever and staying at one value forever
are impossible,
since they require energy and can be used to perform work.
While this view is technically correct,
it misses a subtle difference in the timescale of failure
for a system trying to oscillate forever versus one trying to
grow forever.</p>
<p>Eternal growth <em>compounds</em>,
meaning that even modest continual growth
will quickly spiral out of control,
reach values that are orders of magnitude greater
than the initial state in a reasonable amount of time –
compounding growth reaches <em>exponential values</em>
in only <em>linear time</em>.</p>
<p>Oscillation, on the other hand, can proceed with
miniscule amounts of energy,
especially compared with the amount of energy available in
the entire Universe.
Eventually, sure,
<a href="https://np.reddit.com/r/explainlikeimfive/comments/1hewot/eli5_how_the_universe_will_eventually_run_out/catpopo">there won't even be enough energy to vibrate a guitar string</a>,
but a guitar string could vibrate many, many trillions of times
before that happens.
On the other hand,
just 64 periods of doubling is enough to go from
<a href="http://mathforum.org/sanders/geometry/GP11Fable.html">the scale of a grain of rice to the interstellar scale</a>.</p>
Mon, 11 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//15/
http://charlesfrye.github.io/FoundationalNeuroscience//15/control-of-movementHow is M1 organized? What features of movement are represented in spiking of M1 neurons?<h2 id="answer">Answer</h2>
<p>The primary motor cortex is arranged
<em>somatopically</em> –
according to a map of the body.
The amount of cortical real estate devoted to a given body part
<a href="/FoundationalNeuroscience//04">is correlated with the fineness of motion</a>
required of that body part.</p>
<p>Spiking in neurons in the primary motor area
correlate with several basic movement properties:
direction, speed, and strength.</p>
<p>Artificial stimulation of the primary motor cortex elicits
simple movements, like twitches, flexions, and extensions.</p>
<p>Check out the
<a href="http://neuroscience.uth.tmc.edu/s3/chapter03.html">UTH online neuroscience textbook</a>
for a bunch of neat visualiztions of this material!</p>
<h2 id="the-organization-of-motor-cortex">The Organization of Motor Cortex</h2>
<p>Like it's partner in crime, the
<a href="/FoundationalNeuroscience//73">somatosensory cortex</a>,
motor cortex is organized according to a
topographic map of the body.</p>
<p>Topographic maps are unlike the maps we use to navigate.
On those maps, the key property to preserve is <em>distance</em> –
as we move one mile in the world, we should move one foot
on a navigational map with a 5,280:1 scale.</p>
<p>Topographic maps are more like the maps of subway systems.
Below is a map of the subway system of my home city, Chicago.
On this map, moving an inch will sometimes move you just
a dozen yards, and sometimes it will move you
hundreds of feet.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/ctaMap.png" alt="ctaMap" /></p>
<p>The key property preserved by this map is <em>neighborhood</em>,
rather than distance.
We can see which stations are connected to which,
and how many stops are between them.</p>
<p>The map of your body in the motor cortex is like that:
at some points, moving a millimeter in motor cortex
will move you just from one part of your hand to another.
At others, moving a millimeter will take you from
a region that controls a muscle in your abdomen
to a region that controls a muscle in your neck.</p>
<p>Nearby neurons, however, will almost always control nearby parts of the body,
where nearby is on the scale of a few microns.
Mathematically, we would say that there is a <em>smooth map</em>
from the surface of your body to the surface of the motor cortex.
In neuroscience, we call such a map a <em>somatotopic map</em>.</p>
<p>What determines how large the area of cortex that corresponds
to a particular body part is?
Certain body parts need to be very carefully controlled:
for humans, these include the fingers and the lips,
while for a mouse, these might be the whisking mucles.
Other body parts need only very gross or simple control:
the trunk, for example, or the legs.</p>
<p>The finer we need to control a body part,
the more cortical area is devoted to that part.
A related principle controls
<a href="/FoundationalNeuroscience//04">how much cortical area is devoted to sensory stimuli</a>.
The result is that there is also a somatotopic map
in the <em>somatosensory</em>, or body-sensing,
part of the brain.</p>
<p>One cool visualization of this kind of map is called
"the homunculus",
which is Latin for "the little man".
The homunculus for a somatopic map is drawn with his body parts
sized proportionally to the amount of the map
used by the body part.
Representations of the motor and somatosensory homunculi
appear below.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/homunculi.jpg" alt="homunculi" /></p>
<p>It is important to note that,
contrary to what you might expect,
the motor cortex is not organized according to
the specific muscle controlled by a given neuron.
Within the region corresponding to the arm,
for example,
we don't find sub-regions corresponding to the
tricep, bicep, etc.</p>
<p>Instead, we find these neurons mixed in with one another,
presumably because most of our movements
require the coordination of many muscles within a body part,
rather than the motion of a single muscle.</p>
<h2 id="an-aside-on-coding">An Aside on Coding</h2>
<p>Before describing the following results,
I'd like to note that we don't know what
any part of the cortex really
"represents"
at the level of a single neuron's
<a href="/FoundationalNeuroscience//23">action potentials</a>.
One reason for this is that we stil don't know
how neurons
<a href="/FoundationalNeuroscience//82">encode information</a>.
Are codes based on relative spike timing,
or on total spike count in some time window?
<a href="/FoundationalNeuroscience//47">Is that even a sensible question?</a>
Are stimuli encoded at the level of
<a href="/FoundationalNeuroscience//48">a single cell or small group of cells</a>,
or is information aggregated from an
<a href="/FoundationalNeuroscience//49">entire population of cells</a>?
One famous model of M1 function uses a population-based rate code,
but it has
<a href="/FoundationalNeuroscience//49">come in for some serious criticism</a>.
On the other hand, designs for
<a href="/FoundationalNeuroscience//17">neural prosthetics</a>
based on that model have been moderately successful,
even if they won't be performing rocket surgery.</p>
<p>We'll ignore most of that complexity below,
but I wanted to put it out there
so that I don't mislead any of my readers
into thinking we understand the brain!</p>
<h2 id="coding-in-motor-cortex">Coding in Motor Cortex</h2>
<p>Several basic movement features are represented by
the patterns of
<a href="/FoundationalNeuroscience//23">action potentials</a>
in neurons of the primary motor cortex:
strength, speed, extent, and direction.</p>
<p>Strength-, speed-, and extent-representing cells increase their firing rate
when those parameters of the movement are increased:
more spikes per second means stronger, faster, or further movement
of the given muscle.</p>
<p>You might wonder how you distinguish between
strength-representing and speed-representing cell.
This is best seen with <em>targeted movements</em>,
or movements to a specific location,
as when I move my coffee cup to my lips
using my arm and hand.
In these movements, speed starts at 0,
then increases to some peak,
and then decreases,
ensuring that I don't smack my face with my mug.
Note that the force is going to proporitonal
to the <em>derivative</em>, or rate of change,
of the speed, thanks to Newton's Laws.</p>
<p>Some neurons fire at a rate proportional to this derivative,
the acceleration, and so are force- or strength-representing cells.
Other neurons fire at a rate proportional to the speed directly,
and so are speed-representing cells.</p>
<p>The encoding of movement direction is a separate case,
and is covered more thoroughly in
<a href="/FoundationalNeuroscience//49">a separate blog post</a>.
In short, according to a popular model,
each neuron has a preferred movement direction,
and during a motion,
neurons with many different preffered directions fire.
The resulting movement is in the direction that is the
<em>weighted average</em> of the directions preferred by the firing neurons,
with the weights determined by the number of spikes.</p>
<p>More complicated movement features,
like the components of a motor sequence
or the goal state of a movement, are represented by
<a href="/FoundationalNeuroscience//12">other cortical areas</a>.</p>
Sun, 10 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//13/
http://charlesfrye.github.io/FoundationalNeuroscience//13/control-of-movementDescribe the key cortical areas and subcortical structures of the motor system. Give an example of how these components contribute to the performance of a simple motor task such as reaching and how they contribute to the performance of a skilled action like playing the piano.<h2 id="answer">Answer</h2>
<p>This answer is almost criminally short
given the richness of detail
with which we know the various systems mentioned.
Many of these systems are discussed in greater detail
in other blog posts,
and so relevant links are provided to encourage deeper engagement
with the components of the motor pathway.</p>
<p>The important cortical structures for
planning and executing voluntary movement
are the primary motor cortex (M1),
the premotor and supplementary motor areas (PMA and SMA),
and the prefrontal cortex (PFC).</p>
<p>Subcortically, the most critical structures are
the <a href="/FoundationalNeuroscience//14">basal ganglia</a>,
the <a href="/FoundationalNeuroscience//16">cerebellum</a>,
and the <a href="/FoundationalNeuroscience//71">spinal cord</a>.</p>
<p><a href="/FoundationalNeuroscience//13">M1 executes movements</a>
by sending signals to the spinal cord,
wherein reside the lower motor neurons, which
<a href="/FoundationalNeuroscience//35">communicate with the muscles</a>.
Not every signal generated by M1 results in movement.
Sequences generated by M1 are gated by the
<a href="/FoundationalNeuroscience//14">basal ganglia</a>.</p>
<p>These ganglia are in close communcation
with the PFC, which makes executive decisions
about which goals to pursue
on the basis of
<a href="/FoundationalNeuroscience//64">value information</a>
computed by other
<a href="/FoundationalNeuroscience//66">frontal structures</a>
and
<a href="/FoundationalNeuroscience//65">mid-brain structures</a>.</p>
<p>PFC also communicates with SMA and PMA,
sending them a continual stream of potential goals
to turn into motor plans,
without regard to whether they will eventually be
<a href="/FoundationalNeuroscience//14">approved by the basal ganglia</a>.</p>
<p>Pre-motor area acts, among other things, as a "staging area",
ramping up its firing during the run-up to movement.
It also contains the "mirror neurons",
which fire not only during a certain action,
but also while watching someone else perform that action,
or even when imagining that action!</p>
<p>The supplementary motor area generates motor sequences:
short, stereotyped collections of actions.
The stimulation of SMA neurons produces these stereotyped actions,
like raising the arm to a given position,
or baring the teeth.</p>
<p>Check out the
<a href="http://neuroscience.uth.tmc.edu/s3/chapter03.html">UTH online neuroscience textbook</a>
for more information about these areas, and about M1.</p>
<p>During sustained movements,
the brain must keep track of whether the resulting movement
is the one it planned for,
or whether an error has occured,
due to, for example,
an unexpected change in load –
e.g. a toddler grabbing onto your leg –
or due to incorrect sensory information,
as when one has a bit too much to drink,
or when a cat
<a href="https://www.youtube.com/watch?v=Awf45u6zrP0">overestimates its jumping ability</a>.</p>
<p>This task falls to
<a href="/FoundationalNeuroscience//16">the cerebellum</a>,
which integrates sensory signals with motor commands
and adjusts spinal output accordingly.</p>
<h3 id="during-reaching">During reaching</h3>
<p>When the decision to reach is made by the prefrontal cortex,
it sends a signal to the PMA to prepare a reach
and another signal to the basal ganglia
encoding the value of the reach target.
The PMA recruits the SMA, which produces a series of activations
representing, e.g., the rotation of the wrist,
the lifting of the arm,
and then the bending of the elbow.
These activations take effect by recruting specific
collections of cells in M1,
which,
after passing their activations past the filter of the basal ganglia,
direct the upper motor neurons of the spinal cord
to activate their pools of lower motor neurons
in just such a way that the desired reaching motion is achieved.</p>
<h3 id="while-playing-the-piano">While playing the piano</h3>
<p>All of the above occurs,
with perhaps more two-way communication between the PFC
and the basal ganglia regarding the correct action selection –
is now the time for Debussy or <em>Piano Man</em>?</p>
<p>If the piece is well-rehearsed,
then the SMA will have sequences already prepared,
while an improvization
would require the cooperation of the frontal areas
and the PMA.
Playing along with a teacher would recruit
the mirror neurons of the PMA,
as would the highly imaginative act of an air guitar solo.</p>
<p>For a complex movement like a sonata,
the feedback of the cerebellum will be critical –
especially if the piano is a novel one,
with differently weighted or spaced keys.</p>
Sat, 09 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//12/
http://charlesfrye.github.io/FoundationalNeuroscience//12/control-of-movementHow is the synapse at the vertebrate neuromuscular junction specified?<h2 id="answer">Answer</h2>
<p>The neuromuscular junction, or "NMJ",
a sort of "synapse" between a neuron and a muscle cell,
is specificed by an activity-dependent process.
This process involves the recruitment of
neurotransmitter receptors,
mitochondria,
and a Schwann cell
to wrap the whole thing up tight.
It is organized by the basal lamina first,
and then mechanisms in both the muscle
and the nerve take over.</p>
<h2 id="the-neuromuscular-junction-anatomy">The Neuromuscular Junction: Anatomy</h2>
<p>Below, you'll see an electron microscope image
of a neuromuscular junction,
taken from Figure 55-7 in Kandel and Schwartz,
with added color.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/nmjEM.jpg" alt="nmjEM" /></p>
<p>The neuro- part of the junction is on the top-right,
circled in magenta.
In green, I've highlighted the mitochondria,
the "powerhouses of the cell"
that provide the energy necessay for the cell to work.
The smaller circles are vesicles,
which identify this structure
<a href="/FoundationalNeuroscience//26">as a synapse</a>.</p>
<p>The muscular component is highlighted in orange.
The gray divots are called "junctional folds",
and they are the primary place where the magic of
neurotransmission occurs.
They are enriched with
acetylcholine receptors,
or
<a href="/FoundationalNeuroscience//19">proteins that open an ion channel</a>
in response to the presence of the neurotransmitter
acetylcholine.</p>
<p>Wrapped around the neuron, covering all of the exposed
surface of the synaptic terminal,
is a Schwann cell,
the peripheral
<a href="/FoundationalNeuroscience//68">glial cell</a>
responsible for myelination.</p>
<h2 id="construction">Construction</h2>
<h3 id="initial-setup">Initial Setup</h3>
<p>The cell body of the neuron is located in the
<a href="/FoundationalNeuroscience//71">ventral horn of the spinal cord</a>.
The axon arrives at the muscle cell by following a series of
<a href="/FoundationalNeuroscience//34">guidance cues</a>,
or local signals that cooperate with the cell's genetics
to guide growth in the right direction –
somewhat analogus to the way IPs and URLs work.
At each point, a local signal directs the axon
to a new point, closer to its destination,
where it receives a new local signal, pointing
it to the next step on its journey.</p>
<p>The final destination, instead of being a picture of a cat
stored on Google's servers
or a blog about neuroscience stored at GitHub,
is a slightly concentrated group of
acetylcholine receptors on a muscle fiber.
These receptors are concentrated in this location
due to the presence of a special form of a protein called
<em>laminin</em>.
Laminin makes up the entirety of the <em>basal lamina</em>,
a stretchy network of proteins that sheaths the muscle fibers.
At particular locations, the laminin
is slightly different,
and at these locations,
acetylcholine receptors concentrate.
In addtion, laminins cause the release of a signal
that encourages axons to grow toward them.</p>
<p>The result is that the axon growth cone
docks with the muscle fiber at that location,
resulting in an incomplete but functional
neuromuscular junction.</p>
<p>Now, multiple signals pass
in both directions,
from nerve to muscle and muscle to nerve.</p>
<p>As described in
<a href="/FoundationalNeuroscience//36">another blog post</a>,
the muscle produces something called a
"trophic factor",
which deactivates a ticking time bomb inside the neuron
that, left unchecked,
would eventually lead to programmed cell death –
apoptosis.
The survival of the neuron is, of course,
critical for NMJ function.</p>
<p>The nerve produces multiple signals
to organize the production of a working NMJ.</p>
<h3 id="signals-from-the-nerve">Signals from the Nerve</h3>
<p>Before we continue,
one important point about muscle cells needs to be made.
Unlike almost all other animal cells,
they are <em>multinucleate</em>:
they contain more than one nucleus,
and thus more than one copy of the DNA.
This is because muscle fibers form from
the fusion of multiple progenitor cells,
called <em>satellite cells</em>,</p>
<p>This means that genes can be made into proteins
at many locations in the muscle fiber,
not just at a single nucleus.</p>
<p>This poses a problem for the neuron.
It has claimed this muscle for its lifelong partner,
and part of doing so involves controlling
the production of proteins,
including the aforementioned trophic factor
and acetylcholine receptors.
But some of the nuclei will be far away from the neuron,
as far as a centimeter, or over million times further away
than the axon is wide,
and could find another axon terminal to support.
This simply will not do.</p>
<p>To fix this, the nascent pre-synaptic terminal
releases lots of acetylcholine onto the muscle cell.
This results in an
<a href="{[site.baseurl}}/22">electrical signal</a>
that is very similar to the classical
<a href="/FoundationalNeuroscience//23">action potential</a>
that propagates throughout the entire fiber.
Active muscle fibers produce fewer acetylcholine receptors
than do inactive muscle fibers,
and so production decreases all across the cell.</p>
<p>At the same time, the neuron is releasing a chemical,
agrin, that activates a variety of
<a href="/FoundationalNeuroscience//19">second messenger cascades</a>
to both increase the production of acetylcholine receptors
and to draw them towards the site of the synapse.</p>
<p>These activating signals are <em>chemically</em> mediated,
rather than <em>electrically</em> mediated,
as the repressive signals are,
and so they don't travel as far.
The result is that,
in the area right by the synapse,
acetylcholine receptor production is <em>increased</em>,
while far away,
production is <em>decreased</em>,
preventing other synapses from forming.</p>
<p>Around this time,
a Schwann cell finds its way into the picture
and attaches to the axon terminal.
This also induces the formation of the
junctional folds,
which become heavily enriched with
acetylcholine receptors.</p>
<p>And with that, all of the pieces of the mature
neuromuscular junction are in place!</p>
Fri, 08 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//35/
http://charlesfrye.github.io/FoundationalNeuroscience//35/developmentGive an example of a wiring process mediated by an activity-dependent competitive mechanism.<h2 id="answer">Answer</h2>
<p>Activity-dependent competitive mechanisms
abound in the nervous system.
Of special interest is the development
of the connection between motor neurons
and the skeletal muscles,
which relies on an activity-dependent competition
with particularly high stakes:
life or death for the neurons!</p>
<h2 id="two-cells-enter-one-cell-leaves">Two Cells Enter, One Cell Leaves</h2>
<p>In addition to constructing a miniature
model of the world inside your skull
for you to inhabit,
the brain is also tasked with generating
sequences of actions in the real world.</p>
<p>The brain's primary effectors are the skeletal muscles,
more commonly known as the muscles.
When you feel the desire to take a step forward,
reach for an object,
or scratch an itch, the
<a href="/FoundationalNeuroscience//12">motor cortex</a>
must determine how to tug on these big bundles of springs
in order to swing the bones to which they are attached
in precisely the correct fashion to produce
the desired movement.</p>
<p>These commands rely on a well-made interface
between the nervous system and the muscles.
Each muscle fiber needs to be matched to
exactly one neuron,
and all of the motor neurons need
to be matched to at least one muscle fiber.</p>
<p>To complicate matters further,
the neurons in question are born inside the spinal cord,
induced to adopt their fate by
<a href="/FoundationalNeuroscience//32">a variety of morphogens</a>,
while the muscle cells are born far away,
in a totally different germ layer,
and, in one final twist of complexity,
assemble themselves,
<a href="https://www.youtube.com/watch?v=tZZv5Z2Iz_s">Voltron-style</a>
into a single, more powerful muscle fiber.</p>
<p>So how are we to ensure that our motor neurons
and our muscle fibers are well matched?</p>
<p>One
<a href="https://www.gutenberg.org/files/1080/1080-h/1080-h.htm">modest proposal</a>
is to generate far more neurons than you need,
and any that don't manage to find a motor neuron
can just be killed.
In order to ensure that this <em>diktat</em> is followed,
nature adopts a strategy straight out of
<a href="https://en.wikipedia.org/wiki/Saw_II"><em>Saw II</em></a>:
motor neurons are, from the moment they are born,
searching frantically for the antidote
to a poison that will kill them when a timer runs out.
They are, like Biggie Smalls,
<a href="https://upload.wikimedia.org/wikipedia/en/9/97/Ready_To_Die.jpg">born ready to die</a>.
The antidote is released by muscle fibers,
but it is only released in small quantities
and to synaptically-connected neurons
that drive activity in the muscle.</p>
<p>So, the motor neurons rush out from the
<a href="/FoundationalNeuroscience//71">ventral horn of the spinal cord</a>,
making a mad dash for the nearest muscle fiber,
guided by the various
<a href="/FoundationalNeuroscience//34">axon guidance factors</a>.
Some cells find a partner and begin to form synapses,
<a href="/FoundationalNeuroscience//35">also called neuromuscular junctions</a>,
but others are not so lucky.
These unlucky cells are drawn to the spurts of antidote
that diffuse away from these immature synapses
in a desperate attempt to survive,
and then they are locked in a duel to the death
with the original tenant –
whoever can make a stronger synapse faster will
choke the other one out.</p>
<p>Only about half of the motor neurons born in the ventral horn
will survive to become functional.</p>
<p>Brutal.</p>
Thu, 07 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//36/
http://charlesfrye.github.io/FoundationalNeuroscience//36/developmentWhich mechanisms cause offspring from the same precursor cell to take on different cell fates?<h2 id="answer">Answer</h2>
<p>In this post, I focus on two specific mechanisms
representing two classes:
regulation of the precursor cell
and regulation of the offspring cells.</p>
<p>As an example of regulation of the precursor cell,
we consider the cell division of the radial
<a href="/FoundationalNeuroscience//68">glia</a>,
which gives rise to the cortical excitatory pyramidal neurons.
As an example of regulation of the target cells,
we consider the organization of these same cells into
the six neocortical layers.</p>
<h2 id="the-big-picture">The Big Picture</h2>
<p>In some sense, a correct answer to this question
would be
"every mechanism in development".</p>
<p>This is because developmental neurobiology is the study of
<a href="/32">how one cell, the zygote, gives rise to a whole organism</a>.
In a very real sense, the humble zygote is the precursor cell
to all the cells in your body.</p>
<p>Of course, this is about as informative as saying that
bacteria-like organisms are the ancestors
of all life on earth.
Just as your great-great-…-grandmother was a monkey,
the zygote is the great-great-…-grandcell of
your kidney cells, your muscle cells, etc.</p>
<p>This question is really asking about <em>direct</em> precursors:
the "parents" of a cell.</p>
<p>With that in mind, let's look at two mechanisms
the lead to different cell fates for "siblings",
or offspring of the same cell.</p>
<h2 id="precursor-based-radial-glia">Precursor-Based: Radial Glia</h2>
<p>Besides playing
<a href="/68">a variety of important roles in the developed brain</a>,
including myelination,
<a href="/FoundationalNeuroscience//70">forming the blood-brain-barrer</a>,
<a href="/FoundationalNeuroscience//28">neurotransmitter reuptake</a>,
the <a href="/FoundationalNeuroscience//83">neurovascular response</a>,
and
<a href="/FoundationalNeuroscience//68">possibly even computation</a>,
the quote-unquote auxiliary cells
known as "glia"
are also critical for the development of
the cortical pyramidal neurons
who, ungratefully, steal their spotlight.
Life's not fair.</p>
<p>The radial glia are the first cells to appear
in the developing cortex.
They appear at a time when the brain looks a bit like
a tube of cells filled with liquid –
the cerebrospinal fluid.
Their cell bodies are close to the fluid side,
but they extend long, thing processes out
towards the outer edge of the tube,
much like the spines of a chinese fan.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/chinesefan.jpg" alt="chinesefan" /></p>
<p>These cells are continually dividing,
sometimes dividing into two radial glial cells,
sometimes dividing into one radial glial cell and one neuron.
In the latter case, the neuron then climbs up the
long spine of the radial glial cell,
off on its merry way to become a cortical neuron,
as described in the second half of this blog post.</p>
<p>The resulting radial glial cell then serves as a precursor
to more cells. Sometimes the next division is also
asymmetric, i.e. resulting in a neuron and a precursor,
and sometimes it is symmetric, i.e. resulting in two precursors.</p>
<p>The picture below, adapted from Figure 53-2 in the 5th edition
of Kandel and Schwartz,
shows both of these processes, symmetric on the left,
asymmetric on the right.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/radialglia.jpg" alt="radialglia" /></p>
<p>So what gives?
How do the offspring of a precursor "decide"
whether to be neurons or precursors?</p>
<p>The key is, surprisingly,
<em>the orientation of the plane of cell division</em>!
When the cells undergo mitosis,
they split in half, with the dividing line
determined randomly at each division.</p>
<p>Sometimes that line is parallel to the glial process,
and sometimes the line is perpendicular to it.</p>
<p>Now, the plane of cell division is random for every
dividing cell in the body, but it usually doesn't
have any effect on the offspring cells.</p>
<p>But in this case, it does, thanks to a protein
called Notch that collects at the bottom of the cell,
like a sediment.</p>
<p>The processes of symmetric and asymmetric division appear
below, with the parent cell on the left and the offspring
on the right.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/celldiv.png" alt="celldiv" /></p>
<p>In blue, we see the DNA of the cell, busy replicating itself
and then getting dragged to one end of the cell or the other
by the <em>mitotic spindle</em>.
I've drawn the plane of division in black.
In pink, we see the aggregated Notch protein.</p>
<p>In the division event on top of the image,
the plane of division goes <em>parallel</em> to the glial process
(not pictured),
perpendicular to the surface on which the cell rests.
This results in two cells with <em>equal concentrations of Notch</em>.
Both cells go on to become precursors.</p>
<p>In the division event on the bottom of the image,
the plane of division goes <em>perpendicular</em> to the glial process,
therefore parallel to the surface on which the cell rests.
This results in two cells with <em>unequal concentrations of Notch</em>.
The cell with more Notch becomes a precursor,
while the cell with less Notch becomes a neuron.</p>
<p>This mechanism for generating different offspring
from a single precursor relies on a mechanism
inside the precursor cell itself.</p>
<p>More commonly, external signals acting
on the offspring determine the cell fate.
We turn to this case now.</p>
<h2 id="offspring-based-cortical-layers">Offspring-Based: Cortical Layers</h2>
<p>Examples of offspring-based mechanisms abound.
One could coherently argue that, for example,
<a href="/FoundationalNeuroscience//32">differentiation in the spinal cord</a>
or even
<a href="/FoundationalNeuroscience//34">axon guidance</a>
serve as examples of this phenomenon.</p>
<p>For the sake of covering additional material,
I'll be prosaic and look at a more classical
example of environmental influence on cell fate:
the differentiation of the cortical layers.</p>
<p>To start with:
the mammalian neocortex is composed of six layers,
creatively named layers I, II, III, IV, V, and VI.
These layers are labeled from the outside in.
Each layer is functionally distinct:
input from the senses arrives in IV,
for example,
and output to other cortical regions comes from II/III.
They are also anatomically distinct,
as you can see from the diagram below,
which shows what the six layers look like
under three different stains,
which show, respectively,
a few cells in their entirety,
most cell bodies (aka grey matter),
and most myelinated processes (aka white matter).</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/layers.jpg" alt="layers" /></p>
<p style="text-align: center">From <a href="http://what-when-how.com/neuroscience/the-thalamus-and-cerebral-cortex-integrative-systems-part-1/">what-when-how</a>.</p>
<p>All of these neurons arise from genetically identical
precursors, the radial glial cells of the first half of this post,
and yet they end up in different places and with different shapes.
We ask again: what gives?</p>
<p>In this system, nurture wins out over nature:
the environment determines cell fate, rather than
the circumstances of birth, as in the division of radial glia.</p>
<p>That is, the biochemical environment into which
the newborn neurons travel as they clamber up the radial glia
changes over time, leading these identical cells to
experience different environments.</p>
<p>This temporal mechanism for determining cell layer identity
leads to a temporal progression in the development of the layers:
layer VI forms first, then layer V, and so on,
up to layer II.
Layer I, as can be gleaned from the above diagrams,
is a special case, since it contains very few cell bodies.</p>
Wed, 06 Apr 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//33/
http://charlesfrye.github.io/FoundationalNeuroscience//33/developmentHow do axon guidance molecules direct axons to their targets?<h2 id="answer">Answer</h2>
<p>There are four main classes of axon guidance molecules:
molecules can act by contact or at a distance,
and they can be attractive or repulsive.</p>
<p>In all cases, they modify the balance between
growth and collapse at the tip of the axon,
the "growth cone",
resulting in the axon moving towards or away
from the source of the signal.</p>
<h2 id="the-axon-guidance-problem">The Axon Guidance Problem</h2>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/start.png" alt="start" /></p>
<p>The image above shows a recently-born neuron
(dark blue, bottom left).
The purpose of the neuron is to connect its inputs
(not shown)
to an output neuron
(dark green, center right).
In order to do so,
it must extend a long, thin cellular process,
known as an axon,
across an expanse of space
that is about 20 times its size
in order to make a physical connection with
the output neuron,
also known as a
<a href="../26">chemical synapse</a>.</p>
<p>In some cases, this distance can be even longer:
the neurons that control your toes
extend their axons almost a meter,
even though their cell bodies are only
about 25 microns across.
For comparison, this is like connecting
two apples on the opposite ends of Manhattan
with a catheter.
Without using Google Maps!</p>
<p>In order to understand how this process works,
we first need to understand how an axon grows at all.
Then, we'll talk about how signalling molecules can
modify the growth process, allowing the axon to sprout
out of one neuron, make a variety of twists and turns,
and then finally attach to its target.</p>
<h2 id="growth-cones">Growth Cones</h2>
<p>The key structure for axon guidance is the axonal
<em>growth cone</em>,
a complex, dynamic structure that somewhat resembles The Blob
(pictured below, click for video).</p>
<p style="text-align: center"><a href="https://www.youtube.com/watch?v=TdUsyXQ8Wrs" title="It eats you alive!"><img src="http://img.youtube.com/vi/TdUsyXQ8Wrs/0.jpg" alt="blob" />)</a></p>
<p>Growth cones arise at the leading edge of the axon.
Rather than having a relatively stable skeleton
made out of microtubules,
as the rest of the axon and the cell as a whole do,
the growth cone has an actin skeleton,
which, like
<a href="https://en.wikipedia.org/wiki/Red_Queen%27s_race">Lewis Carrol's Red Queen</a>
or
<a href="https://media.giphy.com/media/9FtD8pr41pYkM/giphy.gif">the Pink Panther's column</a>,
is caught in a tug-of-war between two opposite processes.
As the skeleton grows at one end,
it simultaneously is broken down at the other,
so its length does not change.</p>
<p>Its length doesn't change, that is,
if the two processes occur at the same speed.
If one occurs faster than the other,
the growth of the skeleton becomes less like
the myth of Sisyphus
and more like
the story of the frog in the well:
two hops forward, one hop back.</p>
<p>Axon guidance occurs when outside factors
influence this growth process,
causing it to increase in speed or decrease in speed.
In particular, the growth cone has, at its tip,
a number of "fingers" that stretch out.
Fingers that encounter factors that increase growth will
get longer, while those that encounter factors that
decrease growth will get shorter or even collapse.</p>
<p>Eventually, the slower process that builds up the
solid, microtubule skeleton will catch up to this portion
of the growth cone, solidifying it in the shape
induced by the incoming signals.
The fingers have continued past this point,
now encountering new signals,
which will shape the next stretch of the axon,
and so on and on.</p>
<p>In motion, the process can be quite mesmerizing:
fingers stretch and collapse,
spreading out and around and over each other
as their growth processes speed up and slow down.
The static result is a precisely-shaped axon.
Check out the videos below to see the process in action!</p>
<p style="text-align: center"><a href="https://youtu.be/3R9SOtcSEuA" title="growth cones in action"><img src="http://img.youtube.com/vi/3R9SOtcSEuA/0.jpg" alt="growthcone" /></a></p>
<p style="text-align: center">Click for videos of growth cones in action!</p>
<h2 id="types-of-guidance-cues">Types of guidance cues</h2>
<p>External signals played an important role
in determining the course of the axon
in the above model.</p>
<p>Let's follow the neuron from the example at the beginning
of the post as it encounters all of these signalling mechanisms.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/contact-attraction.png" alt="contact-attraction" /></p>
<p>First, it encounters the three pickle-shaped and pickle colored cells
in the bottom-right corner.</p>
<p>The growth cone makes physical contact with these cells,
and the proteins in the cells' membranes
have the opportunity to interact.</p>
<p>In this case, these proteins encourage the growth cone
to grow more rapidly,
causing it to stay close by and increase in size.</p>
<p>This mode of axon guidance is called
"contact-mediated attraction".
It comes in three flavors, depending on
the substrate.
Classically, contact-mediated attraction
occurs with a lamina, or tightly-bound collection of cells.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/contact-ecm.png" alt="contact-ecm" /></p>
<p>It can also occur when the neuron encounters the
<em>extra-cellular matrix</em>,
a dense network of elastic proteins,
primarily collagen (pictured in orange).</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/contact-fasciculation.png" alt="contact-fasciculation" /></p>
<p>One of the most prominent cases of contact-mediated attraction
for axon guidance is
<em>fasciculation</em>,
or the formation of a <em>fascia</em>, or bundle.
In this case, an axon encounters another axon
and "hitches a ride",
following the so-called <em>pioneer axon</em> (purple).
Because axons often travel in bundles
(known to anatomists as <em>nerves</em>),
fasciculation is a very common component of axon guidance.</p>
<p>Fun fact: the Latin word <em>fascia</em> is the root for
our English word <em>fascism</em>,
thanks to <em>Il Duce</em> Mussolini's fascination
(<em>fascination</em> having no etymological relation)
with Roman authority and symbolism,
in particular the
<a href="https://en.wikipedia.org/wiki/Fasces"><em>fasces</em></a>,
or bundle of sticks,
which represented the superior strength of the Roman state
as a corporate entity compared to its constituent components,
i.e. the individual Romans.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/contact-repulsion.png" alt="contact-repulsion" /></p>
<p>Contact does not always lead to attraction.
Above, our intrepid growth cone has encountered
some cells (yellow) expressing <em>repulsive</em> factors,
causing it to grow in the opposite direction.
This form of guidance is known as
<em>contact-mediated repulsion</em>.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/distance-repulsion.png" alt="distance-repulsion" /></p>
<p>Not all axon guidance is mediated by direct contact.
Above, the neuron has encountered some cells (red) that secrete
a repulsive factor that diffuses out into the extracellular fluid,
producing a gradient of that factor (also red).</p>
<p>This factor can activate receptors on the surface of the cell,
modifying the internal actin dynamics and resulting in
the axon either travelling towards the source of the signal or away from it.
In the case above, the axon grew away from the source.
In this case, the mode of axon guidance is known as
"chemo-repulsion".</p>
<p>In the image below, the axon encounters an attractive signal (light blue),
and so grows toward the source.
This mode of axon guidance is known as
"chemo-attraction".</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/distance-attraction.png" alt="distance-attraction" /></p>
<p>The result of the combination of all of these local signals
is that the neuron has produced an axon with
a striking global property:
it travels on the scale of about half a millimeter
to make a synapse with a specific target neuron.</p>
<h2 id="coda">Coda</h2>
<p>Note that the pioneer neuron (purple)
travels along a different path than our neuron,
even though it is exposed to many of the same signals.</p>
<p>This is a critical feature of axon guidance:
the same signal can have opposite effects,
depending on what proteins are expressed in the growth cone
of that neuron at that time.</p>
<p>If you're interested in how neurons decide which
proteins to express, check out
<a href="../33">the post on the factors that turn neural precursors into neurons</a>.
If you're interested in how cells decide to become neurons in the first place,
check out
<a href="../32">the post on morphogenesis and the spinal cord</a>.</p>
Thu, 31 Mar 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//34/
http://charlesfrye.github.io/FoundationalNeuroscience//34/developmentWhat is information theory? What does entropy measure? Mutual information? How are these quantities relevant to questions of neural coding?<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>View an updated version of this blog post at
<a href="https://charlesfrye.github.io/stats/2016/03/29/info-theory-surprise-entropy.html">this link</a>.</p>
<h2 id="answer">Answer</h2>
<p>Information theory provides a set of mathematical ideas and tools for
describing uncertainty about the state of a random variable
that are complementary to standard methods from probability theory.</p>
<p>Information theory is more useful than standard probability
in the cases of telecommunications and model comparison,
which just so happen to be major functions of the nervous system!</p>
<p>Among the tools of information theory we find <em>entropy</em>
and <em>mutual information</em>.
In short, the entropy of a random variable is an average measure of
the difficulty in knowing the state of that variable.
The mutual information, on the other hand, tells us how much
more we know about the state of two random variables when we think about them
together instead of considering them separately.</p>
<p>Below, I take a slightly unorthodox approach to the derivation of these quantities,
emphasizing the role of modeling and subjectivity in measures of information.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/jamesMaxwell.png" alt="jamesMaxwell" /></p>
<h2 id="a-problem">A Problem</h2>
<p>In the world of Classical logic as laid our by Aristotle,
the comparison of competing claims is simple.
If two claims are well-formed,
their truth values can be ascertained.
A ball is either red, or it is not.
When thrown, the ball will either go through the hoop,
or it will not.
If I claim that when I throw the ball, it will go through the hoop,
and you (probably correctly) disagree,
I can simply throw the ball to find out who was correct.</p>
<p>There are many claims we wish to make, however,
that cannot be admitted to the pristine halls of logical deduction.
As David Hume famously noted, inductive reasoning –
the sun came up yesterday, and so it shall come up tomorrow –
is exactly such a claim,
posing serious difficulties for philosophers of science to chew over.</p>
<p>Far from being simply pathological cases,
these claims arise quite frequently:
for example, despite my poor performance in the past,
there was no real reason to believe that it was
absolutely, positively false that the ball I threw
would go through the hoop –
certainly it was not as false as the claim that
2+2=5, or the claim that true is false.</p>
<p>Claims about things that are neither definitely true nor definitely false
arise in matters mundane and conseqeuential:
producing weather reports, catching the bus, predicting the outcomes
of elections, interpreting experimental results, and
betting on sports games, to name just a few.</p>
<p>So we would benefit from a method of comparing claims in these situations –
which atmospheric model produces better predictions?
Is
<a href="fivethirtyeight.com">Nate Silver's <em>FiveThirtyEight</em></a>
a better resource for predicting American presidential elections than
<a href="realclearpolitics.com">the RealClearPolitics aggregator</a>,
or talk radio?
Are these my lucky dice, or am I just imagining it?
Which theoretical model better explains a collection of experimental data points?</p>
<h2 id="surprise">Surprise!</h2>
<p>The goal, in all the cases above,
is to guess about something that we don't or can't know directly,
like the future, or the fundamental structure of the universe,
on the basis of things we do know,
like the present and the past, or the results of an experiment.</p>
<p>But we do not seek precise knowledge –
in many of these problems, we recognize that exact knowledge,
of the kind presupposed and manipulated in Classical logic,
is either too difficult or just plain impossible to acquire.</p>
<p>Instead, we seek in these cases to avoid <em>surprise</em>.
If I have the correct weather forecast, I am not surprised
when it rains at 2 pm – I may have even brought my umbrella.
A halo of bent sunlight around Mercury is quite a surprising observation,
<a href="https://en.wikipedia.org/wiki/Gravitational_lens">unless you are Einstein</a>.
In contrast, God is never surprised, whereas the unthinkable occurring –
2+2 turning out to be 5, or a particle having negative mass –
is infinitely surprising<a href="/FoundationalNeuroscience//82#dagger"><sup>†</sup></a>.</p>
<p>Our mortal inferences, clever or dumb as they are,
must have a surprise somewhere between "totally expected", or <em>0</em>,
and "totally surprising", or <script type="math/tex">\infty</script>.</p>
<p>We will generally be making statements like:
"it will probably rain tomorrow",
or "nine times out of ten, the team with a better defense wins".
This motivates us to express our surprise in terms of probability.
If you want a refresher on the foundations of probability,
<a href="/11">check out this blog post on Bayes' Rule</a>.</p>
<p>A sort of Occam's Razor principle will be in effect:
if we have two competing models for a phenomenon,
like presidential elections,
and one is repeatedly declaring the results to be
"very surprising" or "probably due to chance",
while the other explains and expects the results,
we consider the latter to be more correct.
Another related concept from traditional logic would be
<a href="http://rationalwiki.org/wiki/Special_pleading">"special pleading"</a>.
The rules of probability, unlike the laws of logic,
will allow for the possibility that a freak fluctuation
explains the outcomes – that the unlikely occurs –
but our method will punish models for relying too heavily
on this form of argument.</p>
<p>Put another way, you can imagine that we are playing a prediction game.
All players will be given "surprise tokens",
which they allocate out to different outcomes.
Any time that outcome occurs,
the player takes on that many points –
this should remind you of betting<a href="/FoundationalNeuroscience//82#asterisk">*</a>.
The goal, as in golf, is to attain the lowest score –
a sort of principle of least surprise.</p>
<p>Imagine my friend and I disagree about the shooting performance
of the NBA's long-distance bucket-wizard
<a href="http://fivethirtyeight.com/features/stephen-curry-is-the-revolution/">Steph Curry</a>.
Having watched a few Golden State Warriors games, I have come to the conclusion
that Curry dominates equally from all positions.
My friend disagrees – he thinks that Curry's shot percentage is lower from
the left side of the court than it is in the center or on the right.</p>
<p>If Curry makes a shot from the left, I feel like I've won the argument,
and if Curry misses, my friend feels the same way.
In the world of Aristotle, this causes no trouble:
any statement that is part of the world of discourse is either true or false,
and we can't have Curry both making and missing shots from the left.
The real world is not so simple: if Curry takes more than one shot,
he's liable to miss some percentage and make some percentage,
leaving me and my friend both convinced we are correct.</p>
<p>We resolve this dispute by playing the "surprise game" described above.
We each receive a number of surprise tokens.
I dole out an equal number to the left, center, and right positions.
My friend puts more on the left than in the other two positions,
trying to match how much more surprising he finds a successful shot from the left.</p>
<p>We then watch Curry play for several games,
taking on surprise tokens every time he makes a shot.
After a good number of games have passed, my friend and I compare our tokens.
Whoever has fewer tokens – whoever was <em>less surprised</em> –
is declared the winner.</p>
<p>This game setup raises a number of questions:
what is the optimal strategy for winning the surprise game?
How do we allocate our surprise tokens to match our knowledge?
Can this be extended to cases where there's more information available,
like the strength of the defense Curry is facing?</p>
<h2 id="surprise-and-probability">Surprise and Probability</h2>
<p>Before we can go any further, we need to quantify
the relationship between <em>surprise</em> and <em>probability</em>.
Our intuition tells us that
surprise has some sort of inverse relationship with probability:
the victory of a team down by two touchdowns is <em>more surprising</em>
than the victory of a team down by a field goal,
and it is also <em>less probable</em>.</p>
<p>Furthermore, this increase in surprise happens <em>smoothly</em>
as I smoothly decrease probability:
if I thought it <script type="math/tex">50\%</script> likely that my postcard from home
would arrive today, it is only marginally less surprising when it does
than if I had thought it <script type="math/tex">50.00000001\%</script> likely (add <script type="math/tex">0</script>s to taste).</p>
<p>Lastly, we know that an event that is certain to occur has a probability of 1,
and when we want to know the probability that any pair events that don't
influence each others' probabilities occur,
we multiply them together. Such events are called "independent events".</p>
<p>In particular, we can look at the probability that some event <script type="math/tex">x</script> occurs
AND that something that is certain to happen happens.
This is just equal to the probability of the original event, <script type="math/tex">p(x)</script>:</p>
<center>$$
p(x \ \& \ C) = p(x)*p(C) = p(x)*1 = p(x)
$$</center>
<p>where <script type="math/tex">C</script> is our symbol for an event that is certain to occur –
pick your favorite tautology, axiom, or fact.</p>
<p>We can use similar logic to get a rule for surprise.
We know that an event that is certain has a surprise of zero.
How can we combine <script type="math/tex">0</script> and some number to get that same number back?
We add them!</p>
<center>$$
\text{Surprise}(x \ \& \ C) = S(x) + S(C) = S(x) + 0 = S(x)
$$</center>
<p>where I've introduced a symbol, <script type="math/tex">S</script>, for surprise.</p>
<p>This combination of rules <em>uniquely</em> defines a function
that takes outcomes in and spits out surprises.
That function is</p>
<center>$$
S_p(x) = \log\left(\frac{1}{p(x)}\right)
$$</center>
<p>Awesome! Check for yourself that all of our criteria above are met:
impossible events give infinite surprise, independent events
give additive surprises, etc.
Note that there is no set base for our logarithm –
surprise has no inherent units.
If we choose base 2, common in telecommunications, the units are called "bits",
or "binary digits",
whereas the choice of base <script type="math/tex">e</script>, common in statistical physics,
gives units called "nats", or "natural digits".</p>
<p>You may object to the parasitic definition of surprise here –
<script type="math/tex">S</script> is defined in terms of <script type="math/tex">p</script>, after all.
This is an artifact of the historical dominance
of the probability approach over the surprise approach.
In fact, one can define surprises before probabilities,
and define the probability as:</p>
<center>$$
p_S(x) = \mathrm{e}^{-S(x)}
$$</center>
<p>In order to keep the material accessible to as wide an audience as possible,
we'll stick with the parasitic definition.
If you'd like to see the major components of the derivation the other way,
check out E.T. Jaynes'
<a href="http://bayes.wustl.edu/etj/prob/book.pdf">Probability Theory: The Logic of Science</a>.</p>
<h2 id="comparing-surprises">Comparing Surprises</h2>
<h3 id="comparing-with-other-models">Comparing with Other Models</h3>
<p>Let's return to our original motivation for introducing surprise:
comparing competing predictions for events.
Every time an event occurs, we can take the probability
that each competing model assigns to that event and compute the surprise.
One might be tempted to say that the model that was less surprised
is the correct one, and move on.</p>
<p>But it is inherent in the kinds of claims we're considering
that one instance isn't enough –
a historic upset in one election isn't, by itself,
cause to throw out all of our intuitions about political science
or our data-driven models for election prediction.</p>
<p>But unlikely things shouldn't <em>usually</em> happen.
So if we wish to compare two models, we simply take a look
at how surprised they are over repeated instances –
repeated experiments, multiple election cycles,
several games of basketball.
This is precisely the "surprise tokens" game described above.</p>
<p>A lower total surprise is a mark of a better model,
and if we want a number that summarizes the surprise
we would expect, on average, from a single event,
we just divide that total surprise by the number of repetitions:</p>
<script type="math/tex; mode=display">\text{Avg.}\ S_Q = \frac{1}{N} \sum S_q(x)</script>
<p>where the big <script type="math/tex">N</script> gives us the number of repetitions,
the <script type="math/tex">Q</script> refers to some particular model,
and <script type="math/tex">S_q</script> refers to the surprise that the model <script type="math/tex">Q</script> assigns
to the event <script type="math/tex">x</script>, derived from its probability distribution <script type="math/tex">q</script>.</p>
<p>In the preceding argument, the notion of repetition played a key role.
Unfortunately, repetition is a surprisingly diffcult concept to nail down,
much like its partner, probability.
If you're interested in that idea, check out the
<a href="/FoundationalNeuroscience//82#aside"><em>Aside on Repetition</em></a>.</p>
<h3 id="compared-with-the-truth">Compared with the Truth</h3>
<p>Above we took an average over repeated experiments,
implicitly assuming that there is a probability distribution
over results – that is, after all, the whole motivation
for considering probabilistic claims in the first place!</p>
<p>Let's call that distribution <script type="math/tex">p(x)</script>.
This is, in a very real sense, "Nature's probability distribution".
If that idea sounds strange or unbelievable to you,
check out the
<a href="/FoundationalNeuroscience//82#aside"><em>Aside on Repetition</em></a>.</p>
<p>With this idea and its new symbol, we can reformulate
the relationship above as:</p>
<center>$$
\text{Avg.}\ S_Q(P) = \sum_x p(x)*log\left(\frac{1}{q(x)}\right)
$$</center>
<p>This is the average surprise of the model <script type="math/tex">Q</script> when its
inputs come from the distribution <script type="math/tex">p</script> of a model <script type="math/tex">P</script>.
It is also the limit of the first expression for the average surprise
when <script type="math/tex">N</script> is taken to infinity and
so the distribution of the observed events is
equal to the true distribution <script type="math/tex">p</script>.</p>
<p>Note that this probability distribution
is induced by our experiment and can change –
if I know the exact force I apply to a coin to cause it to flip,
the result of the toss is no longer 50/50,
and so the distribution of results has changed.</p>
<p>This form of the average surprise (you might call it <em>expected surprise</em>,
since we're working with a probability distribution
rather than an empirical estimate, so the averaging operation
is known as "expectation"), has the advantage of providing
exact answers, but the disadvantage of requiring an analytical form
for, aka a model of, <script type="math/tex">p(x)</script>, which can be hard to come by.</p>
<p>But if we do have that form, then we can put <script type="math/tex">P</script> in for <script type="math/tex">Q</script>
in the expression above and get</p>
<center>$$
\text{Avg.}\ S_P(P) = \sum_x p(x)*\log\left(\frac{1}{p(x)}\right)
$$</center>
<p>this is how surprised someone would expect to be,
on average, when they have the correct model for the random variable.</p>
<p>It seems intuitively obvious that this average surprise should be the smallest
surprise possible – one definition of the correct model might be
"the one that is least surprised by the data".
This can, in fact,
<a href="http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf">be shown</a>.</p>
<p>This implies that, for every incorrect model <script type="math/tex">Q</script>,
we can figure out how much of its surprise comes
from being incorrect and how much comes from the inherent
randomness of the process under consideration.
We can compute that by just subtracting the average surprise
of the correct model from that of our incorrect model,
resulting in a measure we might call "excess surprise":</p>
<center>$$
\begin{align*}
\text{Excess}\ S_Q(P) =& \ \text{Avg.}\ S_Q(P) - \text{Avg.}\ S_P(P) \\
=& \sum_x p(x)*\log\left(\frac{1}{q(x)}\right)
- \sum_x p(x)*\log\left(\frac{1}{p(x)}\right) \\
=& \sum_x p(x)*\left[\log\left(\frac{1}{q(x)}\right)
-\log\left(\frac{1}{p(x)}\right)\right] \\
=& \sum_x p(x)*\log\left(\frac{p(x)}{q(x)}\right)
\end{align*}
$$</center>
<p>All of the algebra up there is purely to get the clean-looking
final expression – none of it changes the content of the statement,
which is that the excess surprise is just the average surprise
of the proposed model minus the average surprise of the correct model.</p>
<h2 id="entropy-and-surprise">Entropy and Surprise</h2>
<p>Now is a good time to take a step back from our exercise here
and compare with standard notation and nomenclature.</p>
<p>The average surprise of the model <script type="math/tex">Q</script> isn't the usual
first step for derivations of information theory.
Instead, the average surprise of the <em>correct</em> model is the basic entity.
It is known as "entropy" and its symbol, inherited from physics, is <script type="math/tex">H</script>:</p>
<center>$$
H(P) = - \sum_x p(x)\log\left(p(x)\right)
$$</center>
<p>Note that the rules of <script type="math/tex">\log</script> have been used to turn
<script type="math/tex">\log(1/p)</script> into <script type="math/tex">-\log(p)</script>.
Why this notation is standard is beyond me,
since it emphasizes log-probabilities,
rather than the more central notion we call "surprise"
and the literature calls "surprisal",
when it deigns to call it anything other than
"negative log-probability".</p>
<p>Since information theory was discovered in the context
of telecommunication, and specifically in the context of
encoding, decoding, and handling unreliable communication methods,
the traditional interpretation of entropy is that it corresponds to
the minimum possible average length of an encoded message produced by a source
that selects uncoded messages according to the distribution <script type="math/tex">P</script>.</p>
<p>This feels inherently unsatisfying, to me, as a definition for
such a basic notion in our understanding of knowledge and inference.
But if the communication channel view of entropy and information
seems more sensible to you, check out
<a href="http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf">Claude Shannon's surprisingly accessible original paper</a>
deriving the fundamentals of information theory from that perspective.</p>
<p>A reductive but potent view, courtesy of a purely abstract, mathematical
approach to probability distributions, interprets the entropy as a measure
of the flatness of a distribution: higher entropy means more flat
(try out the computation for yourself with a coin
that comes up heads and tails with different probabilities).
This makes entropy a sort of "summary" of the distribution,
just like the mean or the variance.</p>
<p>Contrary to the mathematical and theoretical bent of the above approaches,
our choice of basic entity, the average surprise of the model <script type="math/tex">Q</script>,
was motivated by an empirical bent –
we started with observations <script type="math/tex">x</script>, and models <script type="math/tex">Q</script> claiming to explain them,
rather than with the knowledge of absolute truth implied by <script type="math/tex">p</script>.</p>
<p>This average surprise does have a name in more traditional approaches:
it is the <em>cross-entropy</em> of <script type="math/tex">Q</script> on <script type="math/tex">P</script>.
You might see it written:</p>
<center>$$
H(P,Q) = -\sum_x p(x) \log(q(x))
$$</center>
<p>From the traditional, Shannon perspective, the interpretation of this quantity
is that it is the length of encoded messages using a code optimized for a distribution <script type="math/tex">Q</script>
on messages drawn from a distribution <script type="math/tex">P</script> (can you see why I prefer my approach?).
It is used as a cost function to train parametrized models that attempt
to predict discrete outputs, also known as <em>classification</em> models.</p>
<p>The final quantity that we derived above, the <em>excess surprise</em>,
also appears in both the Shannon approach and in purely mathematical probability.
There, it is known as the <em>Kullback-Leibler divergence</em> or the <em>Kullback-Leibler distance</em>.
The latter name, though at one point more popular, tends to irritate sticklers for accuracy,
since the excess surprise does not satisfy the technical conditions
for the mathematical notion of <em>distance</em>.
In any case, the most common notation looks something like:</p>
<center>$$
D_{KL}(P,Q) = H(P,Q) - H(P)
$$</center>
<p>From a mathematical perspective, it is a <em>premetric</em>
on probability distributions: it is a primitive notion
of distance that captures some basic amount of the structure
of the set of probability distributions,
and lets us define "convergence in distribution".</p>
<p>From the Shannon perspective,
the KL-divergence is "just the extra bits"
when using a sub-optimal coding scheme <script type="math/tex">Q</script>
on messages from <script type="math/tex">P</script> - a relatively direct
connection to our notion of "excess surprise".</p>
<h2 id="surprise-with-multiple-variables">Surprise with multiple variables</h2>
<p>Entirely missing from the foregoing has been the
discussion of what we can do when we have
multiple, possibly related, random variables
measured at the same time or during the same experiment–
what does the weather in San Francisco
tell us about the weather in Berkeley?
Vice versa?
What about the weather in Beijing?
We'd like to say that rain in San Francisco
makes rain in Berkeley less surprising,
while rain in Beijing does no such thing,
but we don't have the mathematical apparatus for that statement yet.</p>
<p>At the center of this idea is the idea of statistical
<em>independence</em>, or the lack of a relationship
between two random variables.
We've encountered this idea already when we defined
the additive property of surprise.</p>
<p>Our belief that rain in Berkeley is as (un)surprising
with the knowledge that there's a storm in Beijing
as it is without that knowledge
is based on just this idea of independence.</p>
<p>If there were a relationship of some kind,
we could make a better prediction of,
and so be less surprised by,
the weather in Berkeley,
provided we knew the weather in Beijing
and the relationship between the two events.</p>
<p>Since there is a relationship, due to geography,
between the weather in Berkeley and the weather in San Francisco,
we should expect to be less surprised by the weather in one
if we know the weather in the other
and the correct relationship between the two events.</p>
<p>This suggests that we should consider an excess surprise,
much like the excess surprise between an incorrect model and a correct model,
but between the best model knowing both pieces of information and <em>assuming there is no relationship</em>,
and the best model with the same information but <em>assuming the correct relationship</em>.</p>
<p>This "correct relationship" is also known as a "joint probability distribution".
Note that, if the "correct relationship" is that the two variables
have no relationship, and therefore are independent,
the two models are the same and the excess surprise is 0.
We also established above that independent probabilities multiply,
so the best model assuming independence has a probability distribution
that looks like the product of the two individual distributions.
In math, that looks like <script type="math/tex">p(x,y) = p(x)*p(y)</script>.</p>
<p>Let's write this relation out for two variables <script type="math/tex">X</script> and <script type="math/tex">Y</script>,
using <script type="math/tex">P_{x,y}</script> to refer to the model with the true joint probability distribution
and the mathematical symbol <script type="math/tex">\bot</script> to denote the model
that asummes <script type="math/tex">X</script> and <script type="math/tex">Y</script> are independent.</p>
<center>$$
\begin{align*}
\text{Excess}\ S_{\bot}(P_{x,y}) =& \ \text{Avg.}\ S_{\bot}(P_{x,y}) - \text{Avg.}\ S_{P_{xy}}(P_{xy}) \\
=& \sum_{x,y} p(x,y)*\log\left(\frac{1}{p(x)*p(y)}\right)
- \sum_{x,y} p(x,y)*\log\left(\frac{1}{p(x,y)}\right) \\
=& \sum_{x,y} p(x,y)*\log\left(\frac{p(x,y)}{p(x)*p(y)}\right)
\end{align*}
$$</center>
<p>OK, so what does this look like?
Let's imagine that the probability distributions for the
weather in Berkeley and San Francisco look like this:</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/marginal_berkeley.png" alt="marginal_berkeley" /> <img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/marginal_SF.png" alt="marginal_SF" /></p>
<p>In general, they are temperate cities,
but occasionally, things get a bit hot.
And what do the joint probabilities look like?</p>
<p>First, how do we visualize joint probabilities?
The approach followed below is called a <em>contour plot</em>.
The joint probability distribution assigns a value to each
point in the <script type="math/tex">xy</script> plane.
Similarly, an elevation map assigns a value, the height,
to each point in the <script type="math/tex">xy</script> plane,
where <script type="math/tex">x</script> is east-west and <script type="math/tex">y</script> is north-south.
Based on this analogy, we steal a tool from cartography:
the <em>contour plot</em>.
In a contour plot, each colored line
traces out a collection of values that have
the same elevation, or probability in our case.
Darker lines correspond to lower elevations/probabilities.</p>
<p>If we assume that the weather in Berkeley
is unrelated to the weather in San Francisco
(that is, if we assume independence),
then, when we make a contour plot,
we get something that looks like the distribution on the left.</p>
<p>If we were to look at this "landscape" from either side,
we'd see a bunch of copies of the single-variable distribution,
each one scaled to a different height by the probability that
the other variable takes that value.
This comes from the fact that <script type="math/tex">p(x,y) = p(x)*p(y)</script>:
the joint distribution at any value <script type="math/tex">y</script> is just the distribution
<script type="math/tex">p(x)</script> with its height changed by <script type="math/tex">p(y)</script>.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/joint_independent.png" alt="joint_independent" /> <img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/joint_dependent.png" alt="joint_dependent" /></p>
<p>But if we know the true relationship,
which is that hot days in Berkeley are also hot days in SF,
then we get something that looks like the distribution on the right.
Instead of simply having a bunch of scaled versions of <script type="math/tex">p(x)</script>,
we have all sorts of different distributions.
These are the <em>conditional distributions</em>, <script type="math/tex">p(x \lvert y)</script>,
each scaled by <script type="math/tex">p(y)</script>.</p>
<p>This form arises from the equation <script type="math/tex">p(x,y) = p(x \lvert y)*p(y)</script>.
The excess surprise measures how big the difference between
these two distributions is:
a bigger excess surprise means they are more different,
and so there is a stronger relationship between the variables.
Put another way, the higher the excess surprise,
the further the two variables are from being independent.</p>
<p>If you want to know more about conditional distributions,
and how Bayes' Rule is used to manipulate them
and model reasoning under uncertainty,
check out
<a href="/FoundationalNeuroscience//11">this blog post</a>.</p>
<h3 id="surprise-and-information">Surprise and Information</h3>
<p>The quantity described above as "the excess surprise
from using an independent model of a dependent system",
<script type="math/tex">Excess\ S_{\bot}(P_{xy})</script>,
is special enough to get its own name.</p>
<p>It is called the "Mutual Information" between two variables,
and it is often denoted <script type="math/tex">I(X;Y)</script>.
Because this value expresses how much one variable can tell you about another,
it serves as a natural measure of statistical dependence –
sort of like correlation, but capable of capturing arbitrarily complex relationships.</p>
<p>It is important to note that, just as correlation does not imply causation,
mutual information does not imply causation.
The mutual information between the weather and my choice of clothing
is the same as the other way around, though only the former causes the latter,
so far as I can tell.</p>
<h2 id="entropy-information-and-neuroscience">Entropy, Information, and Neuroscience</h2>
<p>The traditional argument for connecting Information Theory and neuroscience
goes something like this:
sensory neurons communicate the state of the outside world
to the brain, where different layers and brain areas
communicate the results of their computations to each other.</p>
<p>As demonstrated by Shannon, there are optimal ways to encode information
so that it can be communicated using as little energy as possible and
as quickly as possible.
These encoding schemes confer an evolutionary advantage on organisms
that use them, and so we should expect to find neural codes
that are optimal from the perspective of information theory.</p>
<p>The foregoing presentation of the basics of information theory,
which emphasizes the role of model-building and model comparison,
suggests a deeper connection to neuroscience.
The fundamental job of the nervous system is not to merely communicate
information efficiently, but rather to construct,
from noise-corrupted, ambiguous, and incomplete sensory data,
an internal model of the outside world
that enables superior action selection for the sake of
survival and reproduction.</p>
<p>One should expect, then, that the excess surprise
about the state of the outside world
of a model that uses observations of neural data,
rather than observations of the outside world,
should be minimized.</p>
<p>Put another way, if a neuron is representing something
about the state of the outside world,
the mutual information between that neuron's state
and that part of the world's state should be high.
In fact, one can take that as a sort of definition
of what it means for a neuron to "represent"
a piece of information about the outside world,
like the presence or absence of light at a particular location
or the existence of a rewarding stimulus nearby.</p>
<p>There are concerns with this approach:
high levels of mutual information are necessary,
but not sufficient, for a neuron to represent
the outside world.
The concerns about causation described above
come into play here:
a given neuron might be representing a different random variable
that happens to be dependent on the same causes as the recorded variable.
For example, a neuron that represents
<a href="/FoundationalNeuroscience//09">conjunctions of contours</a>
will have a high degree of mutual information with the individual contours.
More concerning still, the neuron might not be engaged in representation at all:
even though the state of the
<a href="/FoundationalNeuroscience//68">glial "helper cells"</a>
of the nervous system contains information about the state of the outside world,
that information appears as a consequence of the role of glia in regulating
gross features (mean activity, etc.) of computation in neurons,
where the "representation" truly occurs.</p>
<h2 id="end-matter">End Matter</h2>
<p><a name="aside"><em>Aside on repetition</em></a></p>
<p>In a very real way, no instance is ever repeated:
no matter how carefully we control our environment,
something will escape our grasp:
the barometric pressure,
the tone in our voice as we ask a survey question,
the arrangement of electrons on Jupiter,
or the Hawking radiation emanating from the
supermassive black hole at the center of the galaxy.</p>
<p>For all of these confounds, we have no <em>a priori</em>
reason to exclude them, and for some,
we know they have a (potentially small) effect –
Newton's law of gravitation admits no boundaries.</p>
<p>So instead of getting a <em>precisely</em> repeated instance,
we get an instance that is the same <em>so far as we care to know</em>.
This leaves nature a little wiggle room for things to be slightly different
at the start of our experiment, and so for the results to be different as well.</p>
<p>Famously, the early modern physicist and mathematician
<a href="https://en.wikipedia.org/wiki/Pierre-Simon_Laplace">Pierre-Simon Laplace</a>
stated that, if one only knew the positions, velocities, and masses
of all the particles in the world, one could both predict the future
state of the universe and know everything about the past.
This is
<a href="https://www.quora.com/The-French-scientist-Pierre-Laplace-suggested-that-if-we-knew-both-the-laws-of-physics-and-the-location-of-every-particle-in-the-universe-we-would-be-able-to-predict-everything-that-would-happen-in-the-future-Is-that-possible">untrue</a>,
but the picture is helpful.</p>
<p>We can imagine, though we cannot hope to write down,
the set of all the possible values of the positions, velocities, and masses
of all the particles of the universe.
An artist's interpretation of the idea appears below:</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/theUniverse.png" alt="theUniverse" /></p>
<p><br /></p>
<p>We can restrict the set by
accepting only those values that are consistent with
some fact that we know about some gross state of the world,
like "I have a coin perched on my thumb".
In the terms of probability, we call this
<a href="/FoundationalNeuroscience//11">"conditioning"</a>.
A sketch of the result of such a conditioning appears below.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/particles.png" alt="particles" /></p>
<p style="text-align: center">The set of all states of the universe compatible with a statement,
e.g. "I have a coin perched on my thumb".</p>
<p><br /></p>
<p>When I perform my experiment, those values are all definite,
but unknown (ignoring, for simplicity's sake, quantum effects).
We assume that, within the subset in which we've restrained nature,
the distribution of values takes on
<a href="https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution">the flattest distribution that is compatible with our state of knowledge</a>.</p>
<p>This is more than just an act of modesty.
It is in fact, a necessity if we are to be intellectually honest.
Any less flat distribution would
imply that we know more about the possible distributions
than we admitted to in our description of our state of knowledge.</p>
<p>As an example of such a distribution: if we know
that the values are "around a mean", in the sense that
values further from the mean are always strictly less probable
than those close to the mean,
the flattest distribution (also known as the "maximum entropy distribution"
or the "least informative prior")
is the Normal distribution.</p>
<p>Once we've induced a distribution over possible states of the start of an experiment,
the laws of nature can take over.
These are generally expressed as differential equations,
which we usually think of as descriptions of the motion of particles,
changing their location, velocity, and mass.
They can also move aggregates of particles –
that is, they can be applied to our whole list of positions, velocities, and masses –
and so they can evolve our initial distribution of states into a final distribution of states.</p>
<p>A possible result of this time evolution appears below.
Groups of states have been stretched, shrunk,
translated, and rotated by the action of physical laws.</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/post-flip.png" alt="post-flip" /></p>
<p style="text-align: center">The possible states of the universe after our experiment.</p>
<p><br /></p>
<p>When we, as experimenters, come in and label those states –
"heads" or "tails", "Candidate A won" or "Candidate B won",
"our rocket launched into space" or "our rocket exploded on the launch pad" –
we have the distribution over results of our experiment
that appears as <script type="math/tex">P</script> in the main text.</p>
<p>Three possible labelling schemas for
the states induced by the experiment above appear in the next figure:</p>
<p style="text-align: center"><img src="http://charlesfrye.github.io/FoundationalNeuroscience/img/withPartitions.png" alt="wthPartitions" /></p>
<h3 id="footnotes">Footnotes</h3>
<p><a name="dagger">†</a> If this gives you pause, try this out:
pick some finite value for how surprising "2+2=5" is.
Which is bigger, that value, or the surprise from a coin coming up heads
10 times in a row? 100 times? One billion times?
I can do this all day, and I'll win eventually!
The only solution is to make "2+2=5" <em>infinitely surprising</em>.</p>
<p><a name="asterisk">*</a> In fact, one of the methods for deriving the laws
of probability is based on betting. This method, due to de Finetti, is often called the
<a href="http://plato.stanford.edu/entries/dutch-book/">Dutch Book Argument</a>.
A "Dutch Book" is a betting strategy that guarantees victory against another player –
named after the Dutch due to their reputation as superior financiers,
and evincing more than a little Occupy-type sentiment.
The correct laws of probability are the only ones against which no Dutch Book exists –
a remarkably similar idea to our "surprise tokens" game.
A correctly-made model is one that will always win that game against incorrectly-made models.</p>
<h3 id="acknowledgements">Acknowledgements</h3>
<p>I'd like to thank
<a href="http://csc.ucdavis.edu/~chaos/">Jim Crutchfield</a>,
especially for his excellent course,
<a href="http://csc.ucdavis.edu/~chaos/courses/ncaso/"><em>Natural Computation and Self-Organization</em></a>
(available online)
and
<a href="http://colah.github.io">Chris Olah</a>,
whose blog posts,
<a href="http://colah.github.io/posts/2015-09-Visual-Information/">on information theory</a>
and
<a href="http://colah.gihub.io">otherwise</a>,
were extremely influential.</p>
<p>I'd also like to thank
<a href="http://biophysics.berkeley.edu/students/2015-2/ryan-zarcone/">Ryan Zarcone</a>
for valuable discussions and for comments on a draft.</p>
Tue, 29 Mar 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//82/
http://charlesfrye.github.io/FoundationalNeuroscience//82/tools-and-methodsWhat is a morphogen? How do morphogens generate different cell fates in the spinal cord?<h2 id="answer">Answer</h2>
<p>A morphogen is a protein or signaling molecule
that drives the differentiation of primoridal tissue
into its final form – it is the source, <em>genesis</em>,
of the shape, <em>morphê</em>.</p>
<p>In the spinal cord,
multiple morphogenetic gradients
conspire to create a broad variety of cell types.</p>
<h2 id="ontogeny-recapitulates-phylogeny">Ontogeny Recapitulates Phylogeny</h2>
<p>In the history of life of Earth,
organisms have increased in complexity,
beginning with single-celled organisms,
which bound together to create multi-cellular organisms,
then organisms with body plans,
and then finally the vertebrates and invertebrates.
If we were to trace our ancestry as humans,
we would pick up the trail at the bony fish,
pass through the earliest beasts of the land,
neither mammal nor reptile,
to our furry, milk-making ancestors,
and then finally to humans.</p>
<p>Whenever a new human being is created
from the fusion of a sperm and egg,
this evolutionary history repeats itself,
sped up billions of times
and acting on the miniature scale of a single fetus.
Though it is not a perfect reconstruction,
the similarities are striking enough
that they gave rise to the polysyllabic slogan
"Ontogeny recapitulates phylogeny" –
the process of an organism coming into being
repeats the process of evolution.</p>
<p>Due to the complexity of this task,
and the centrality of this literal
self-replication
to our understanding of biological principles,
the process of building an organism out
of one cell and a bunch of raw material
has become its own branch of biology:
<em>developmental biology</em>.</p>
<p>The development of the nervous system has drawn
particular attention not only because
the intricacy and complexity
of neural tissue necessitate
that nature use her wiliest tricks and deepest principles,
but also because the creation of the nervous system
touches directly on issues of profound philosophical import:
where does our perception come from? Why can it be trusted?
How are experience and sensory data balanced with
hard-wired, genetically-encoded principles?</p>
<p>Developmental biology has a reputation of blandness,
driven by its detail-orientation and its plethora of acronyms.
In a sense, it is to biology what biology is to the other sciences:
it smacks of
<a href="http://trhvidsten.com/docs/Hvidsten_Docent2012.pdf">"stamp-collecting"</a>.</p>
<p>This reputation is unfair.
The study of developmental biology is driven by
these feelings of wonder and passion:
wonder at the astounding observation that
intricate structures arise from an undifferentiated clump of matter,
and a passionate, pre-scientific commitment to either
nature or nurture as the determinant of human destiny.</p>
<p>So with these grand feelings in mind,
let us dive from the clouds of abstraction
into the weeds of biological detail.</p>
<h2 id="establishing-structure-reaction-diffusion-and-pizza">Establishing Structure: Reaction-Diffusion and Pizza</h2>
<p>The term <em>morphogen</em> dates to 1952,
when it was invented by
<a href="https://en.wikipedia.org/wiki/Alan_Turing">Alan Turing</a>,
father of the computer.</p>
<p>In his essay
<a href="http://www.dna.caltech.edu/courses/cs191/paperscs191/turing.pdf">"The Chemical Basis of Morphogenesis"</a>,
Turing sets out to explain the curious fact that
clumps of cells that start out spherically symmetrical
can break that symmetry and become simply bilaterally symmetric –
the same on the left side as on the right.</p>
<p>Consider: if I cut a cheese pizza in half,
the two halves look the same,
no matter where I chose to make the cut.
If I try the same thing with a T-bone steak,
the two halves won't match.
The cheese pizza has <em>circular symmetry</em>,
while the steak has <em>no symmetry</em>.
If I take a single slice of pizza,
I can't just cut willy-nilly like I could with
the whole pizza,
but if I cut through the middle of the slice,
I get two equal-sized halves.
Pizza slices
are <em>bilaterally</em> symmetric,
like squares, equilateral triangles, and vertebrates.
I used to take advantage of this symmetry to make
"pizza-sandwiches" in high school.</p>
<p>Now, say I want to make a pizza that is half pepperoni
and half cheese.
This is a simple task if you can stand above the pizza –
you simply pick a dividing line and then sprinkle the pepperoni
only on one side.
Note that a half pepperoni and half cheese pizza
is no longer circularly symmetric!</p>
<p>The problem biology faces is similar:
our circularly-symmetric object, the zygote,
has to set a dividing line between, e.g.,
the front of the organism and the back,
which gets rid of, or <em>breaks</em>, the circular symmetry.
In keeping with "ontogeny recapitulates phylogeny",
this symmetry-breaking is first used to define
where to put the anal and oral termini of the gut.</p>
<p>The trouble is, biology has no way of "standing above the pizza".
Instead, the pizza slices must decide, on their own,
whether they are on the pepperoni half or the cheese half,
and produce the relevant topping.
This is obviously a tall order!</p>
<p>In fact, using simple differential equations,
it's impossible to solve!
Turing solved it by considering a more complicated
class of differential equations:
the "Reaction-Diffusion" equations.
The importance of his solution is that it arises naturally
from the interactions of relatively simple components.</p>
<p>Each cell is presumed to both secrete diffusible chemicals
and also to use those chemicals to regulate internal reactions,
some of which produce other diffusible chemicals.
These are the chemicals he calls "morphogens",
and the model is known as a "Reaction-Diffusion Model".</p>
<p>The surprising result is that
there need not be any primordial asymmetry in the model
in order to generate asymmetric results.
By analyzing the form of the differential equations
that describe the dynamics of reaction-diffusion models
on a hollow sphere,
Turing discovered that tiny deviations,
like those caused by random jostling and even thermal noise,
are amplified<a href="/FoundationalNeuroscience//32#dagger">†</a>.
The eventual result is the development of an axis
defined by the location of these deviations.</p>
<p>With an axis like this in hand, it is straightforward to
construct models that use simpler mechanisms to generate
further specification.
In a zygote, the initial process sets up
a division into "front half" and "back half",
and then further refinement separates
"brain" from "spinal cord"
and "front of brain" from "back of brain",
and so on.</p>
<p>Let's turn now to a concrete example of such a model,
the differentiation of the spinal cord.</p>
<h2 id="morphogens-and-the-spinal-cord">Morphogens and the Spinal Cord</h2>
<p>One primordial axis
runs from the head of the animal to the tail:
it is the anterior-posterior axis.
Along this axis, a special structure arises,
known as the <em>notochord</em>.
The notochord is a flexible but stiff semi-compressible organ,
much like the tongue.
In the early chordates, this structure served as a
simple spinal column.</p>
<p>In vertebrates, the notochord exists solely to organize development.
It releases a morphogen called "noggin",
which causes the cells located close to the notochord to become neurons.
These cells fold into a tube, which becomes the central nervous system.
The spinal cord remains a tube, while the brain continues to fold
and twist to create the structure we are familiar with.</p>
<p>This undifferentiated tube needs to eventually take on the
dorsal-ventral pattern that characterizes the
<a href="/FoundationalNeuroscience//71">functional anatomy of the spinal cord</a> –
motor neurons on the bottom, sensory neurons on the top.</p>
<p>The notochord is also critical for this process.
The main morphogen secreted by the notochord
for specifying cell fates in the spinal cors is known as
"Sonic Hedgehog" (seriously).</p>
<p>This rather bizarre name deserves a bit of comment.
The <em>hedgehog</em> family of genes was named for the effect
of mutations to this gene family in flies.
Rather than growing sparsely and generally,
the bristles on <em>hedgehog</em> mutant
flies grew densely and on the spine,
giving the pupae the appearance of a hedgehog.
As various members of the <em>hedgehog</em> family were discovered,
they were named after species of hedgehog:
<em>Indian hedgehog</em>, <em>Pygmy hedgehog</em>, and so on.</p>
<p>Due to the relative difficulty of genetic work in vertebrates
compared to invertebrates,
the vertebrate version of <em>hedgehog</em>
was one of the last <em>hedgehog</em> genes to be discovered,
and the community had already used up the various species of hedgehogs
on the invertebrate versions.
Partly as a joke, the name <em>Sonic Hedgehog</em> was suggested,
and it stuck.
Turns out, if you want the good names,
you've gotta go fast.</p>
<p>But I digress.
Secretion of Sonic Hedgehog, or <em>Shh</em>,
causes the cells of the spinal cord that are close
to the notochord to change their gene expression patterns,
giving rise to the motor neurons of the ventral spinal cord.</p>
<p>On the other side of the spinal cord,
the <em>dorsal</em> side,
other cells release a different set of morphogens,
primarily of the <em>Bone Morphogenetic Protein</em>, or <em>BMP</em>, family.
These drive the cells of the dorsal spinal cord to become
spinal interneurons, which mediate messages to the motor neurons
from sensory neurons and from neurons projecting from the cortex.
BMPs and other morphogens also induce the neural precursor cells
that didn't make it into the tube to become the sensory neurons
of the dorsal root ganglia, among other things.</p>
<p>Of course, "Bone Morphogenetic Protein" seems like a silly name
for a protein that regulates the development of spinal cord neurons.
In fact, the BMPs have many roles in development,
not just differentiating neuron clases in the spinal cord –
they tell cells whether to become neurons or skin cells, for example,
and they are, in fact, responsible for the morphogenesis of bones.</p>
<p>This is one of the startling facts about developmental biology:
despite the fact that a bewildering array of structures must arise
to create an entire organism,
the set of distinct organizing molecules for specifying that panoply
can fit comfortably on an index card.
Discovering the secret to devising such an economical and flexible instruction set
for a self-organizing complex system is one of the goals
of developmental biology.</p>
<h3 id="footnotes">Footnotes</h3>
<p><a name="dagger">†</a> The critical role for random fluctuations
and uncertainty in changing the theoretical behavior
of the differential equations in this model
presages the eventual development of
<a href="https://en.wikipedia.org/wiki/Chaos_theory">chaos theory</a>.</p>
Sun, 20 Mar 2016 00:00:00 +0000
http://charlesfrye.github.io/FoundationalNeuroscience//32/
http://charlesfrye.github.io/FoundationalNeuroscience//32/development