Tangent Space
http://charlesfrye.github.io/
Wed, 21 Oct 2020 22:01:56 +0000Wed, 21 Oct 2020 22:01:56 +0000Jekyll v3.9.0Webinars on Linear Algebra and Vector Calculus<p>I’ve started doing some short webinars
on core math topics in machine learning
for
<a href="https://wandb.com">Weights & Biases</a>,
a startup that offers a really cool experiment tracking,
visualization, and sharing tool.</p>
<p>The first webinar,
<a href="https://bit.ly/2RIELUW"><em>How Linear Algebra is Not Like Algebra</em></a>,
presents Linear Algebra from a programmer’s perspective:
every vector/matrix/tensor is a function, shapes are types,
and matrix multiplication is composition of functions.</p>
<p>The second webinar
<a href="https://bit.ly/3bgDpsw"><em>Look Mom, No Indices!</em></a>,
introduces an index-free style of computing gradients
for functions that take vectors and matrices as inputs.
It’s a teaser for
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">this blog post series</a>.</p>
<!--exc-->
Thu, 16 Apr 2020 00:00:00 +0000
http://charlesfrye.github.io/external/2020/04/16/wandb-webinars.html
http://charlesfrye.github.io/external/2020/04/16/wandb-webinars.htmlexternalGaussians as a Log-Linear Family<div style="text-align: center">\[\begin{align}
\nabla_\theta A(\theta, \Theta)
&= -\frac{1}{2}\Theta^{-1}\theta = \mu\\
\nabla_\Theta A(\theta, \Theta)
&= -\frac{1}{4}\theta\theta^\top\Theta^{-2}
- \frac{1}{2}\Theta^{-1} = \mu\mu^\top + \Sigma\\
\end{align}\]
</div>
<!--exc-->
<h3 id="introduction">Introduction</h3>
<p>We return to a favorite topic:
Gaussian distributions,
which we already know arise from a
<a href="http://charlesfrye.github.io/math/2017/11/22/gaussian-diff-eq.html">simple differential equation</a>,
have close connections to
<a href="http://charlesfrye.github.io/external/2017/06/09/binder-convolutions.html">convolutions</a>,
and serve as the error model for
<a href="http://charlesfrye.github.io/math/2018/03/07/frechet-least-squares.html">least-squares regression</a>.</p>
<p>In this post, we see a new side of the Gaussian:
the Gaussians as a <em>log-linear family</em>,
aka an exponential family.
As it turns out, close connections between Gaussians
and two of the problems we know how to solve,
linear systems and quadratic forms,
give (relatively) simple and computationally tractable
expressions for the quantities of interest
in doing inference in log-linear families:
the cumulant generating function,
the sufficient statistics,
the canonical and mean parameters,
and the entropy.
In this post, we’ll focus on the canonical parameters.
The main contribution of this post is to work carefully through
the (non-degenerate) multivariate Gaussian case,
encompassing both the
the derivation of the canonical parameters
and computing the gradient of the cumulant generating function.</p>
<h3 id="log-linear-families">Log-Linear Families</h3>
<p>A family of distributions can be said to be <em>log-linear</em>
if its members can be generated by varying the parameters \(\theta\)
of an equation with the following form:</p>
<div style="text-align: center">\[\begin{align}
\log p(x; \theta) = \langle \theta, \phi(x) \rangle + A(\theta) + h(x)
\end{align}\]
</div>
<p>If such an equation exists,
then the parameters \(\theta\) are known as <em>canonical parameters</em>
for the family.
In plain English,
this equation states that the way that the parameters, \(\theta\),
interact with the outcome values, \(x\),
to determine the log-probability, \(\log p\)
(negative one times the
<a href="http://charlesfrye.github.io/stats/2017/11/09/the-surprise-game.html">surprise</a>),
is linear (an inner product),
once the outcomes have been suitably transformed by \(\phi\).
Importantly, \(x\), \(\theta\), and \(\phi(x)\) can all be vector-valued.</p>
<p>This may not seem like a big win,
since the linearity we have gained is tentative and limited:
changing \(\theta\), for example, changes \(A(\theta)\),
which is as yet an undefined, likely nonlinear function.
But as it turns out, even this small degree of linearity
is sufficient to give big gains in manipulating the probabilities.</p>
<h3 id="warm-up-the-univariate-gaussian-family">Warm-Up: The Univariate Gaussian Family</h3>
<p>Let’s begin by massaging the functional form of the
log-probability of a Gaussian to see if it can be written as
a member of a log-linear family.</p>
<div style="text-align: center">\[\begin{align}
\log p(x; \mu, \sigma^2)
&= -\frac{(x - \mu)^2}{2\sigma^2} - \log \sqrt{2\pi}\sigma
\end{align}\]
</div>
<p>Our goal is to <em>independently</em> apply nonlinear transformations
to \(x\) and to our parameters, so having them all
tied up inside that polynomial is no good.
So we expand, then group terms and pattern match:</p>
<div style="text-align: center">\[\begin{align}
\log p(x; \mu, \sigma^2)
&= -\frac{(x - \mu)^2}{2\sigma^2} - \log \sqrt{2\pi}\sigma \\
&= -\frac{x^2}{2\sigma^2} + 2 \frac{\mu x}{2\sigma^2} - \frac{\mu^2}{2\sigma^2}
- \log \sqrt{2\pi} +\log \sigma \\
&= \frac{\mu}{\sigma^2}x + \frac{-1}{2\sigma^2}x^2
- \left(\frac{\mu^2}{2\sigma^2} + \log \sigma\right) - \log \sqrt{2\pi}\\
&:= \theta_1 x + \theta_2 x^2 - A(\mu, \sigma) + \log \sqrt{2\pi}
\end{align}\]
</div>
<p>The final line should be taken as a definition,
for \(\theta\), \(A(\mu, \sigma)\), and \(h\).
Notice how it matches both the line above
and the definition of a log-linear family.</p>
<p>Except for one problem:
\(A\) is a function of \(\mu\) and \(\sigma\),
which are <em>not</em> our parameters \(\theta_1\) and \(\theta_2\).
One of the key insights of log-linear families is that
it matters very much how exactly you parameterize your distributions;
the different parameterizations correspond to different <em>geometries</em>,
and the problems of inference become geometric problems of transforming
from one parameterization to another.
This approach is called <em>information geometry</em>.</p>
<p>We will eat our vegetables and convert \(A\) into a <em>bona fide</em>
function of \(\theta\) in a moment.
Before we do so, though,
let’s take a look at what our \(\phi\) functions turned out to be:</p>
<div style="text-align: center">\[\begin{align}
\phi(x) &=
\left[\begin{array}{c}
x \\
x^2
\end{array}\right]\
\end{align}\]
</div>
<p>For something that could be an arbitrary nonlinear function,
these have turned out rather simple indeed!
The two functions are also the <em>sufficient statistics</em>
of the Gaussian family:
if you collect i.i.d. observations from a Gaussian distribution,
the average values of these two functions are all you need to know
in order to extract all of the information those observations gave you
about what the underlying parameters of the Gaussian were.</p>
<p>In fact, instead of starting with the formula for the Gaussian,
we might have started by specifying these two sufficient statistics
and then asking
“what is the distribution that has this \(\phi\) as its sufficient statistics?”.
If you’ve ever derived the Gaussian as a <em>maximum entropy distribution</em>,
then that’s another way of stating what you did.
That method requires some heftier math though, so we’ll stick with this approach.</p>
<p>Let’s finish up our derivation of the log-linear form of the univariate Gaussian.
We first write our traditional parameters, \(\mu\) and \(\sigma\),
in terms of our newly-derived canonical parameters,
\(\theta\),</p>
<div style="text-align: center">\[\begin{align}
\mu &= \frac{\theta_1}{-2\theta_2}\\
\sigma^2 &= -\frac{1}{2}\theta_2^{-1}
\end{align}\]
</div>
<p>and then substitute:</p>
<div style="text-align: center">\[\begin{align}
A(\theta) &= -\frac{1}{4}\frac{\theta_1^2}{\theta_2} - \frac{1}{2}\log\left(-2\theta_2\right)
\end{align}\]
</div>
<p>enabling us to finally write</p>
<div style="text-align: center">\[\begin{align}
\log p(x; \theta)
&:= \theta_1 x + \theta_2 x^2 - A(\theta) + \log \sqrt{2\pi}
\end{align}\]
</div>
<p>and so demonstrate that the family of univariate Gaussians
is a log-linear family.
The canonical parameter \(\theta_1\) is an arbitrary real number;
the canonical parameter \(\theta_2\) is an arbitrary positive real number,
as the definition in terms of \(\sigma_2\) makes clear.</p>
<p>We close this section by demonstrating, for this particular case,
one remarkable property of log-linear families.
In a fit of curiosity, let us calculate the gradient of \(A\).
Since this is fundamentally a function of two vector-valued variables,
we compute its gradient with respect to each variable separately,
and embark with \(\theta_1\),
which only touches the first, rational term:</p>
<div style="text-align: center">\[\begin{align}
\frac{\partial}{\partial \theta_1} A(\theta) &=
\frac{\theta_1}{-2\theta_2} \\
&= \mu \\
&= \mathbb{E}(\phi_1(x))
\end{align}\]
</div>
<p>How curious!
The partial derivative of our function \(A\) in its first argument
gave us the expected value of the first index of \(\phi\)!
Does this pattern continue if we try the second argument?</p>
<div style="text-align: center">\[\begin{align}
\frac{\partial}{\partial \theta_2} A(\theta) &=
\frac{1}{4}\frac{\theta_1^2}{\theta_2^2} - \frac{1}{2}\theta_2^{-1} \\
&= \mu^2 +\sigma^2 \\
&= \mathbb{E}(\phi_2(x))
\end{align}\]
</div>
<p>It does!
It would appear that the derivatives of this function \(A\)
give us the expected values of the sufficient statistics.
In fact, this function is also known as <em>cumulant generating function</em>,
because taking its derivatives (including higher order derivatives)
generates the expected values of certain polynomial functions,
known as <em>cumulants</em>, of the sufficient statistics
and their expectations.
This is advantageous because it exchanges integration,
which is typically difficult,
for differentiation, which is typically tractable.
The second cumulant, from the second derivative,
is the beloved <em>Fisher Information Matrix</em>.</p>
<p>Before we move on,
note that we had to bring in some facts from calculus, namely that
\(\frac{\partial}{\partial x} \log -x = -\frac{1}{x}\) for \(x<0\)..
Outside of that, we’ll be sticking to derivatives of
linear and polynomial functions,
all of which can be fairly straightforwardly found using the
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">Fréchet derivative</a>
formulation, in case you’ve forgotten the rules.</p>
<h3 id="the-main-event-the-multivariate-gaussian-family">The Main Event: The Multivariate Gaussian Family</h3>
<p>Let us now repeat that same set of moves,
but with multivariate Gaussians instead of univariate Gaussians.
We’ll need to bust out our linear algebra,
as all of our simple expressions in terms of squares and ratios
will start to involve inner products and matrix inverses,
but nothing fundamental will change.
We begin with the log-probability,
which is perhaps less familiar:</p>
<div style="text-align: center">\[\begin{align}
\log p (x, \mu, \Sigma) &=
-\frac{1}{2} (x - \mu)^\top \Sigma^{-1} (x - \mu)
+ \frac{1}{2} \log \left|\Sigma^{-1}\right| - k \log \sqrt{2\pi}
\end{align}\]
</div>
<p>Once again, we’ll need to coerce this into an expression where
the only interactions between our state vector \(x\)
and our parameter tuple \((\mu, \Sigma)\) is via inner products.
Again, we expand the troublesome polynoimal,
this time showing up as a quadratic form:</p>
<div style="text-align: center">\[\begin{align}
(x - \mu)^\top \Sigma^{-1} (x - \mu) &=
x^\top\Sigma^{-1}x
- \mu^\top\Sigma^{-1}x - x^\top\Sigma^{-1}\mu
+ \mu^\top\Sigma^{-1}\mu \\
&= \mathrm{tr}\left(x^\top\Sigma^{-1}x\right)
-\mathrm{tr}\left(\mu^\top\Sigma^{-1}x\right)
-\mathrm{tr}\left(x^\top\Sigma^{-1}\mu\right)
+ \mathrm{tr}\left(\mu^\top\Sigma^{-1}\mu\right)
\end{align}\]
</div>
<p>The final move we made,
familiar to anyone who read through the
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">Fréchet derivative series</a>,
especially the section on
<a href="http://charlesfrye.github.io/math/2018/03/07/frechet-least-squares.html">linear regression with multiple inputs and outputs</a>,
was to write a bunch of scalar values as <em>traces</em>,
\(\mathrm{tr}\).
For a matrix, the trace is the sum of the diagonal elements.
For a one-by-one matrix, aka a scalar or number,
the trace is just equal to the value.
The two key propertes we need are</p>
<ol>
<li>the trace is <a href="https://math.stackexchange.com/questions/252272/is-trace-invariant-under-cyclic-permutation-with-rectangular-matrices">invariant to cyclic permutations</a>.</li>
<li>the trace is used to define the <a href="http://charlesfrye.github.io/math/2018/02/28/how-big-is-a-matrix.html">inner product of two matrices</a>.</li>
</ol>
<p>We will combine these together,
rearranging the elements of our traces by “cycling” them
until we get inner products between our parameters and our \(x\)s.</p>
<div style="text-align: center">\[\begin{align}
(x - \mu)^\top \Sigma^{-1} (x - \mu) &=
\mathrm{tr}\left(x^\top\Sigma^{-1}x\right)
-2\mathrm{tr}\left(x^\top\Sigma^{-1}\mu\right)
+ \mathrm{tr}\left(\mu^\top\Sigma^{-1}\mu\right)\\
&=\mathrm{tr}\left(xx^\top\Sigma^{-1}\right)
-2\mathrm{tr}\left(x^\top\Sigma^{-1}\mu\right)
+ \mathrm{tr}\left(\mu\mu^\top,\Sigma^{-1}\right)\\
&=\langle\Sigma^{-1}, xx^\top \rangle
-2 \langle \Sigma^{-1}\mu, x \rangle
+ \langle \Sigma^{-1}, \mu\mu^\top \rangle
\end{align}\]
</div>
<p>Notice that the symbol \(\langle,\rangle\) is doing double duty:
it can mean the usual inner product of vectors
or it can mean the (derived) inner product of matrices.</p>
<p>We’ve now got an expression in terms of inner products
between parameters and functions of \(x\),
so we can plug back in and get one step closer
to the log-linear form for the multivariate Gaussian:</p>
<div style="text-align: center">\[\begin{align}
\log p (x, \mu, \Sigma) &=
\langle \Sigma^{-1}\mu, x \rangle
+ \langle-\frac{1}{2} \Sigma^{-1}, xx^\top \rangle
-\left(\frac{1}{2} \langle \Sigma^{-1}, \mu\mu^\top \rangle
- \frac{1}{2} \log \left|\Sigma^{-1}\right|\right)
- k \log \sqrt{2\pi}\\
\log p (x, \theta, \Theta) &=
\langle \theta, x \rangle
+ \langle \Theta, xx^\top \rangle
- A(\theta, \Theta)
-k \log \sqrt{2\pi}
\end{align}\]
</div>
<p>On top of the shared log-linear family form
(inner products, cumulant generating function)
notice the similarities between the multivariate and the univariate case:
\(\mu^2\) becomes \(\mu\mu^\top\),
\(\frac{\mu}{\sigma^2}\) becomes \(\Sigma^{-1}\mu\), and
\(\frac{-1}{2\sigma^2}\) becomes \(\frac{-1}{2}\Sigma^{-1}\)
(lowercase \(\sigma\) is reserved for the standard deviation,
hence the absence of a \(^2\) in the multivariate version).
Tracking these patterns through derivations for univariate and multivariate cases
has helped me to build my intuition for vector operations by “porting” it from
the more familiar scalar numbers.</p>
<p>Of course, we still need to write down an explicit form for \(A\)
as a function of \(\theta\) and \(\Theta\),
not just of \(\mu\) and \(\Sigma\),
just as we had to in the univariate case.</p>
<p>We use the following substitutions
(compare them to their univariate versions!):</p>
<div style="text-align: center">\[\begin{align}
\mu &= -\frac{1}{2}\Theta^{-1}\theta\\
\Sigma^{-1} &= -2\Theta\\
\mu\mu^\top &= \frac{1}{4}\Theta^{-1}\theta\theta^\top\Theta^{-1}
\end{align}\]
</div>
<p>Note that the last one requires some algebraic ledgerdemain.
To show the identity, take the transpose of the right hand side twice,
then bring one transpose in, using the fact that
\(\Sigma\) and \(\Sigma^{-1}\)
(and therefore \(\Theta\) and \(\Theta^{-1}\))
are symmetric.</p>
<p>These identities let us obtain, with some algebra:</p>
<div style="text-align: center">\[\begin{align}
A(\mu, \Sigma) &= \frac{1}{2} \langle \Sigma^{-1}, \mu\mu^\top \rangle
- \frac{1}{2} \log \left|\Sigma^{-1}\right|\\
A(\theta, \Theta) &=
\frac{1}{2} \langle -2 \Theta,
\frac{1}{4}\Theta^{-1}\theta\theta^\top\Theta^{-1} \rangle
- \frac{1}{2} \log \left| -2 \Theta \right|\\
&= -\frac{1}{4}\mathrm{tr}\left(\Theta\Theta^{-1}\theta\theta^\top\Theta^{-1}\right)
- \frac{1}{2} \log \left| -2 \Theta \right|\\
&= -\frac{1}{4} \langle \theta\theta^\top, \Theta^{-1} \rangle
- \frac{1}{2} \log \left| -2 \Theta \right|\\
\end{align}\]
</div>
<p>Take a deep breath.
The trace has served us well again,
helping us to obtain a much simpler expression for \(A\)
that now closely resembles the expression for the univariate case.</p>
<p>As a last exercise,
we would like to compute the gradients of \(A\) and show
that the results are the expected values of the sufficient statistic functions,
\(x\) and \(xx^\top\).</p>
<p>The derivative with respect to the vector-valued parameter \(\theta\)
is almost easy:</p>
<div style="text-align: center">\[\begin{align}
\nabla_\theta A(\theta, \Theta) &= -\frac{1}{4}
\nabla_\theta \langle \theta \theta^\top, \Theta^{-1}\rangle
\end{align}\]
</div>
<p>This is the derivative of a quadratic form with respect to its argument,
which was one of the examples used in the
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">blog post introducing the Fréchet derivative</a>.
For completeness, we rederive it here:</p>
<div style="text-align: center">\[\begin{align}
\langle \left(\theta + \varepsilon\right)\left(\theta + \varepsilon\right)^\top \Theta^{-1}\rangle &=\
\langle \theta \theta^\top, \Theta^{-1} \rangle
+ 2\langle \Theta^{-1}\theta, \varepsilon \rangle
+ \langle \varepsilon\varepsilon^\top, \Theta^{-1}\rangle
\end{align}\]
</div>
<p>The middle term contains the gradient as the argument
to the inner product with \(\varepsilon\),
establishing that the gradient of \(A\)
with respect to its first argument is</p>
<p style="text-align: center">\(\begin{align}
\nabla_\theta A(\theta, \Theta) &= -\frac{1}{2} \Theta^{-1}\theta = \mu
\end{align}\) <!-- _--></p>
<p>as desired.</p>
<p>We’re in the home stretch!
We just need to compute the (matrix-valued) gradient of \(A(\theta, \Theta)\)
with respect to \(\Theta\).</p>
<div style="text-align: center">\[\begin{align}
\nabla_\Theta A(\theta, \Theta) &= -\frac{1}{4}
\nabla_\Theta \langle \theta \theta^\top, \Theta^{-1}\rangle
-\frac{1}{2} \nabla_\Theta \log\left|-2\Theta\right|
\end{align}\]
</div>
<p>We split this into two pieces.
First, we tackle the second term.
The derivative of the log-determinant function is,
funnily enough, just
equal to the inverse of the transposed matrix,
as can be shown
<a href="http://charlesfrye.github.io/math/2019/01/25/frechet-determinant.html">using the Fréchet derivative again</a>.
This makes our second term \(-\frac{1}{2}\Theta^{-1}\),
which is auspicious, since it corresponds to \(\Sigma\),
once of the two terms in the expectation we hope to match.</p>
<p>Now, we tackle the first term,
using the chain rule to split apart the
derivative of the inner product
(which, due to linearity, is just equal to the other argument)
from the derivative of the matrix inverse:</p>
<div style="text-align: center">\[\begin{align}
\nabla_\Theta \langle \theta \theta^\top, \Theta^{-1}\rangle \nabla_\Theta\Theta^{-1}
&= \theta\theta^\top \nabla_\Theta\Theta^{-1}
\end{align}\]
</div>
<p>One might hope, by analogy with the derivative of the inverse of a scalar,
that the answer is the matrix equivalent of
\(\frac{-1}{x^{-2}}\).
<a href="https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix">And so it is</a>!</p>
<p>Putting it all together, we end up with</p>
<p style="text-align: center">\(\begin{align}
\nabla_\Theta A(\theta, \Theta) &= \frac{1}{4}\theta\theta^\top\Theta^{-2}
-\frac{1}{2}\Theta^{-1}\\
&= \mu\mu^\top + \Sigma
\end{align}\)<!-- _--></p>
<p>which is, indeed, the expected value of \(xx^\top\).</p>
Fri, 05 Jul 2019 00:00:00 +0000
http://charlesfrye.github.io/stats/2019/07/05/gaussian-log-linear.html
http://charlesfrye.github.io/stats/2019/07/05/gaussian-log-linear.htmlstatsShort Paper on Square Roots and Critical Points<blockquote>
<p>In the next section, we define an analogous algorithm for finding critical points. That is, we again try to solve a root-finding problem with Newton-Raphson, but this introduces a division, which we reformulate as an optimization problem.</p>
</blockquote>
<p>Today a <a href="https://arxiv.org/abs/1906.05273">short paper</a> I wrote posted to the arXiV.
It’s on a cute connection between the algorithm I use to find the critical points of neural network losses and the algorithm used to compute square roots to high accuracy.</p>
<p>Check out <a href="https://twitter.com/charles_irl/status/1139232020519768064">this Twitter thread</a> for a layman-friendly explanation.</p>
<!--exc-->
Thu, 13 Jun 2019 00:00:00 +0000
http://charlesfrye.github.io/external/2019/06/13/newtonmr-sqrt.html
http://charlesfrye.github.io/external/2019/06/13/newtonmr-sqrt.htmlexternalTails You Win, One Tail You Lose<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">I presented the math for this at the <a href="https://twitter.com/hashtag/cosyne19?src=hash&ref_src=twsrc%5Etfw">#cosyne19</a> diversity lunch today. <br /><br />Success rates for first authors with known gender: <br />Female: 83/264 accepted = 31.4%<br />Male: 255/677 accepted = 37.7%<br /><br />37.7/31.4 = a 20% higher success rate for men <a href="https://t.co/u2sF5WHHmy">https://t.co/u2sF5WHHmy</a></p>— Megan Carey (@meganinlisbon) <a href="https://twitter.com/meganinlisbon/status/1101870079858409478?ref_src=twsrc%5Etfw">March 2, 2019</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Controversy over hypothesis testing methodology encountered in the wild
<a href="http://charlesfrye.github.io/stats/2018/01/09/hypothesis-test-example.html">a second time</a>!
At this year’s Computational and Systems Neuroscience conference, CoSyNE 2019,
there was disagreement over whether the acceptance rates indicated bias against women authors.
As it turns out, part of the disupute turned over which statistical test to run!
<!--exc--></p>
<h3 id="controversial-data">Controversial Data</h3>
<p>CoSyNe is
<a href="http://cosyne.org/c/index.php?title=Cosyne_19">an annual conference</a>
where COmputational and SYstems NEuroscientists to get together.
As a conference in the intersection of two male-dominated fields,
concerns about gender bias abound.
Further, the conference uses single-blind review,
i.e. reviewers but not submitters are anonymous,
which could be expected to
<a href="https://doi.org/10.1177%2F1075547012472684">increase bias against women</a>,
though effects
<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5715744/">might be small</a>.</p>
<p>During the welcome talk, the slide below was posted
(thanks to Twitter user
<a href="https://twitter.com/neuroecology">@neuroecology</a> for sharing their image of the slide;
they have a nice
<a href="https://neuroecology.wordpress.com/2019/02/27/cosyne19-by-the-numbers/">write-up</a>
data mining other CoSyNe author data)
to support the claim that bias was “not too bad”,
since the ratio of male first authors to female first authors was about the same
between submitted and accepted posters.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/cosynegenderbias.jpg" alt="cosynegenderbias" /></p>
<p>However, this method of viewing the data has some problems:
the real metric for bias isn’t the final gender composition of the conference,
it’s the difference in acceptance rate across genders.
A subtle effect there would be hard to see in data plotted as above.</p>
<p>And so Twitter user
<a href="https://twitter.com/meganinlisbon/">@meganinlisbon</a>
got hold of the raw data and computed the acceptance rates and their ratio
in the following tweet:</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">I presented the math for this at the <a href="https://twitter.com/hashtag/cosyne19?src=hash&ref_src=twsrc%5Etfw">#cosyne19</a> diversity lunch today. <br /><br />Success rates for first authors with known gender: <br />Female: 83/264 accepted = 31.4%<br />Male: 255/677 accepted = 37.7%<br /><br />37.7/31.4 = a 20% higher success rate for men <a href="https://t.co/u2sF5WHHmy">https://t.co/u2sF5WHHmy</a></p>— Megan Carey (@meganinlisbon) <a href="https://twitter.com/meganinlisbon/status/1101870079858409478?ref_src=twsrc%5Etfw">March 2, 2019</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Phrased as “20% higher for men”, the gender bias seems staggeringly high!</p>
<p>It seems like it’s time for statistics to come and give us a definitive answer.
Surely math can clear everything up!</p>
<h3 id="controversial-statistics">Controversial Statistics</h3>
<p>Shortly afterwards, several other Twitter users,
including
<a href="https://twitter.com/mjaztwit/status/1101899788688257024">@mjaztwit</a>
and
<a href="https://twitter.com/alexpiet/status/1101882724581822465">@alexpiet</a>
attempted to apply
<a href="http://charlesfrye.github.io/stats/2018/06/09/hypothesis-testing.html">null hypothesis significance testing</a>
to determine whether the observed gender bias was likely to be observed
in the case that there was, in fact, no bias.
Such a result is called <em>significant</em>,
and the degree of evidence for significance is quantified by a value \(p\).
For historical reasons, a value of \(0.05\) is taken as a threshold
for a binary choice about significance.</p>
<p>And they got different answers!
One found that the observation was <em>not significant</em>, with \(p \approx 0.07\),
while the other found the the observation was <em>significant</em>, with \(p \approx 0.03\).
What gives?</p>
<p>There were some slight differences in low-level, quantitative approach:
one was parametric, the other non-parametric.
But they weren’t big enough to change the \(p\) value.
The biggest difference was a choice made at a very high level:
namely, are we testing whether there was <em>any gender bias in CoSyNe acceptance</em>,
or are we testing whether there was more specifically <em>gender bias against women</em>.</p>
<p>The former is called a <em>two-tailed test</em> and is more standard.
Especially in sciences like biology and psychology,
we don’t know enough about our data to completely discount the possibility
that there’s an effect opposite to what we might expect.</p>
<p>Because we consider extreme events “in both directions”,
the typical effect of switching from a two to a one-tailed test
is to cut the \(p\)-value in half.
And indeed, we \(0.03\) is approximately half of \(0.07\).</p>
<p>But is it reasonable to run a two-tailed test for this question?
The claims and concerns of most of the individuals concerned about bias
was framed specifically in terms of female-identifying authors
(to my recollection, choices for gender identification were
<em>male</em>, <em>female</em>, and <em>prefer not to answer</em>, making it impossible to talk
about non-binary authors with this data).
And given the other evidence for misogynist bias in this field
(the undeniably lower rate of female submissions,
the near-absence of female PIs,
the still-greater sparsity of women among top PIs)
it would be a surprising result indeed if there were bias
that favored women in just this one aspect.
Suprising enough that only very strong evidence would be sufficient,
which is approximately what a two-tailed test does.</p>
<p>Even putting this question aside,
is putting this much stock in a single number like the \(p\) value sensible?
After all, the \(p\) value is calculated from our data,
and it can fluctuate from sample to sample.
If just two more female-led projects had been accepted or rejected,
the two tests would agree on which side of \(0.05\) the \(p\) value lay!</p>
<p>Indeed, the CoSyNe review process includes <em>a specific mechanism for randomness</em>,
namely that papers on the margin of acceptance due to a scoring criterion
have their acceptance or rejection determined by the output of a random number generator.</p>
<p>And the effect size expected by most is probably not
too much larger than what is reported,
since the presumption is that the effect is mostly implicit bias from many reviewers
or explicit bias from a small cohort.
In that case, adhering to a strict \(p\) cutoff is electing to have your conclusions
from this test determined <em>almost entirely by an explicitly random mechanism</em>.
This is surely fool-hardy!</p>
<p>It would seem to me that the more reasonable conclusion is that
there is moderately strong evidence of a gender bias in the 2019 CoSyNe review process,
but that the number of submissions is insufficient to make a definitive determination possible
based off of a single year’s data.
This data is unfortunately not available for previous years.</p>
<h3 id="coda">Coda</h3>
<p>At the end of the conference,
the Executive Committee announced that they had heard the complaints
of conference-goers around this bit of gender bias and others
and would be taking concrete steps to address them.
First, they would be adding chairs for Diversity and Inclusion to the committee.
Second, they would move to a system of double-blind review,
in which the authors of submissions are also anonymous to the reviewers.
Given the absence of any evidence that such a system is biased against men
and the
<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5715744/">evidence that such a system reduces biases in general</a>,
this is an unambiguously good move,
regardless of the precise \(p\) value of the data for gender bias this year.</p>
Wed, 06 Mar 2019 00:00:00 +0000
http://charlesfrye.github.io/stats/2019/03/06/cosyne19-gender-bias.html
http://charlesfrye.github.io/stats/2019/03/06/cosyne19-gender-bias.htmlstatsMultiplication Made Convoluted, Part II: Python<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="k">def</span> <span class="nf">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">,</span> <span class="n">other</span><span class="p">.</span><span class="n">arr</span><span class="p">))</span>
</code></pre></div></div>
<!--exc-->
<h3 id="introduction">Introduction</h3>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Well, actually, this is more right than you think: <br />A multiplication *is* a convolution of one multi-digit number by another one over the digit dimension.<br />Think about it.</p>— Yann LeCun (@ylecun) <a href="https://twitter.com/ylecun/status/1053719869005447168?ref_src=twsrc%5Etfw">October 20, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Convolutions show up in many places:
in signal processing,
in probability,
and of course in the marriage of the two,
machine learning.
Check out
<a href="http://charlesfrye.github.io/external/2016/03/27/convolutions.html">this convolution tutorial</a>
for more details.
They are just as intimately related to such deep and powerful mathematics as
<a href="http://charlesfrye.github.io/stats/2017/11/22/gaussian-diff-eq.html">the Central Limit Theorem</a>
and
<a href="https://www.khanacademy.org/math/differential-equations/laplace-transform/convolution-integral/v/the-convolution-and-the-laplace-transform">the Fourier transform</a>
as they are to
<a href="www.colah.github.io/posts/2014-12-Groups-Convolution">understanding what happens when you shuffle cards</a>.</p>
<p>The above tweets,
by satirical convolution fanatic
<a href="https://twitter.com/boredyannlecun">@boredyannlecun</a>
and actual convolution fanatic
<a href="https://twitter.com/ylecn">actual Yann Le Cun</a>,
reveal an unexpected connection between convolutions
and the humble mutliplication operation.
I decided to work it out thoroughly and write it up.</p>
<p>In
<a href="http://charlesfrye.github.io/math/2019/02/20/multiplication-convoluted-part-one.html">a previous blog post</a>,
we worked through the <em>Think about it.</em> phase
by first thinking about how we normally do multiplication,
then generalizing it, and then deriving the convolutional form.</p>
<p>In this blog post,
we’ll work through implementing a number type in Python
that actually makes use of this relationship to do multiplication.
Along the way, we’ll learn how to hook into Python’s built-in operators
with our own objects using what are called, almost without hyperbole,
<em>magic methods</em>.</p>
<h3 id="multiplication-convolution-and-sequences-of-digits">Multiplication, Convolution, and Sequences of Digits</h3>
<p>To think of a multi-digit number, like \(x=12345\),
as something with a “digit dimension”
is to think of it as a one-dimensional vector:</p>
<div style="text-align: center">\[\mathbf{x} = \left[5,\ 4,\ 3,\ 2,\ 1 \right]\]
</div>
<p>If we want to convert that vector back into the original number,
we add up the entries, with each multiplied by \(10\) to some power:</p>
<div style="text-align: center">\[\mathbf{x} = \sum_{i} \mathbf{x}_i \cdot 10^i\]
</div>
<p>The
<a href="http://charlesfrye.github.io/2019/02/20/multiplication-convoluted-part-one.html">previous blog post</a>
showed how, in this representation, we can write multiplication as</p>
<div style="text-align: center">\[\begin{align}
z_k &= \sum_{i} x_i y_{k-i}
\end{align}\]
</div>
<p>which is the form of a convolution!</p>
<h3 id="first-pass-implementation">First Pass Implementation</h3>
<p>In order to implement our multiplication algorithm in Python,
we’ll need to make some inter-related choices:
how do we represent our sequences of digits,
how much of the possible functionality of numbers do we want to implement,
and what do we roll ourselves versus crib from libraries.</p>
<p>The lowest-effort choice is to say that we will treat any sequence as a digit sequence,
only implement a multiplication algorithm,
and use the built-in numpy function <code class="language-plaintext highlighter-rouge">convolve</code>.</p>
<p>The result is admirably terse:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">def</span> <span class="nf">multiply_digit_sequences</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="n">multiply_digit_sequences</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">])</span>
<span class="n">array</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
</code></pre></div></div>
<p>The benefit of this approach is that the user can hook into all of the rich functionality of the types accepted by the <code class="language-plaintext highlighter-rouge">convolve</code> function,
like iteration and built-in functions.</p>
<p>But if we’re implementing our multiplication algorithm as part of a broader project that uses sequences of decimal digits, then this leaves a lot to be desired.</p>
<p>First, there’s nothing to stop a user providing any input that’s valid for <code class="language-plaintext highlighter-rouge">np.convolve</code> as input to this function.
For example, a vector of floating point numbers, which is difficult to interpret as a sequence of digits.
Second, we don’t have a good way to test this small nugget of code, always necessary in a larger, multi-user project, and doing so will add a bunch of heft that takes away the advantage this approach as in brevity.
Third, our function isn’t particularly user-friendly or extensible:
we rely on the user to decide how an array matches onto a decimal sequence.
For example,
one person using the above might presume sequences are stored
in the order they are read and written,
as in</p>
\[12345 \rightarrow \left[1, 2, 3, 4, 5\right]\]
<p>while another, thinking in terms of list indices,
might presume sequences are in order of increasing power of the base,
as in</p>
\[12345 \rightarrow \left[5, 4, 3, 2, 1\right]\]
<p>The function will work for each user separately,
but if they build any additional functions and try to share them,
they are liable to run into unexpected errors.</p>
<h3 id="class-consciousness"><code class="language-plaintext highlighter-rouge">class</code> Consciousness</h3>
<p>We can solve all of these problems by building a <code class="language-plaintext highlighter-rouge">class</code> for our sequences:
a collection of related functions and data.
Inside this class,
we can validate inputs separately from our multiplication operation,
add functionality to make testing easy for ourselves and others,
and express our assumptions about what it means to be a sequence of digits.</p>
<p>We begin with the simplest version of this class.
Every class needs a method called <code class="language-plaintext highlighter-rouge">__init__</code>, short for <code class="language-plaintext highlighter-rouge">initialize</code>.
This method gets called when a member of the class is created.
The presence of two underscores <code class="language-plaintext highlighter-rouge">__</code> (pronounced <em>dunder</em> by some) at the beginning and the
end of this method’s name indicates that it is a <em>magic</em> method.
While other Python functions and methods need to be explicitly called, as in <code class="language-plaintext highlighter-rouge">f(argument, other_argument)</code>, magic methods get invoked by special syntax.</p>
<p>In the case of the <code class="language-plaintext highlighter-rouge">__init__</code> for <code class="language-plaintext highlighter-rouge">ClassName</code>,
that special syntax is <code class="language-plaintext highlighter-rouge">ClassName(argument, other_argument)</code>
(don’t forget that Python methods have an “invisible” first argument,
typically called <code class="language-plaintext highlighter-rouge">self</code>,
that refers to the object whose method is being called).</p>
<p>The code block below implements an <code class="language-plaintext highlighter-rouge">__init__</code> method
and multiplication for a <code class="language-plaintext highlighter-rouge">DecimalSequence</code> type.
In our <code class="language-plaintext highlighter-rouge">__init__</code> method, we allow the user to provide any iterable,
or object over which we can iterate (e.g., a list),
but then convert it to a numpy array of integers.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="k">def</span> <span class="nf">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">,</span> <span class="n">other</span><span class="p">.</span><span class="n">arr</span><span class="p">))</span>
</code></pre></div></div>
<p>We implement our multiplication algorithm with a method called <code class="language-plaintext highlighter-rouge">multiply</code>,
which takes an input, verifies it is another <code class="language-plaintext highlighter-rouge">DecimalSequence</code>,
and then applies <code class="language-plaintext highlighter-rouge">convolve</code> to (the arrays of) both sequences.</p>
<p>We haven’t yet added any of the nice features described above.
Before we do that, let’s try out our <code class="language-plaintext highlighter-rouge">DecimalSequence</code> type.</p>
<p>The multiplication works well,
if you know that the data is stored in the <code class="language-plaintext highlighter-rouge">.arr</code> attribute:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]).</span><span class="n">multiply</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">3</span><span class="p">])).</span><span class="n">arr</span>
<span class="n">array</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
</code></pre></div></div>
<p>but if you don’t,
trying to look at your answer gives something unusable:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]).</span><span class="n">multiply</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">3</span><span class="p">])))</span>
<span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]).</span><span class="n">multiply</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">3</span><span class="p">]))</span>
<span class="o"><</span><span class="n">__main__</span><span class="p">.</span><span class="n">DecimalSequence</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x7f81c0671780</span><span class="o">></span>
<span class="o"><</span><span class="n">__main__</span><span class="p">.</span><span class="n">DecimalSequence</span> <span class="n">at</span> <span class="mh">0x7f81c0671748</span><span class="o">></span>
</code></pre></div></div>
<p>Because we made this class ourselves, Python has no idea how to display it.</p>
<p>Furthermore, it’s usually expected that a number type works with the relevant operators,
like <code class="language-plaintext highlighter-rouge">+</code>, <code class="language-plaintext highlighter-rouge">*</code>, etc.
If someone tries that with one of our <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s, they get an error:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">3</span><span class="p">])</span>
<span class="nb">TypeError</span><span class="p">:</span> <span class="n">unsupported</span> <span class="n">operand</span> <span class="nb">type</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">for</span> <span class="o">*</span><span class="p">:</span> <span class="s">'DecimalSequence'</span> <span class="ow">and</span> <span class="s">'DecimalSequence'</span>
</code></pre></div></div>
<p>And there are lots of built-in operations one might want to use that we’ve lost access to by switching from numpy arrays to <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">4</span> <span class="ow">in</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
<span class="nb">TypeError</span><span class="p">:</span> <span class="n">argument</span> <span class="n">of</span> <span class="nb">type</span> <span class="s">'DecimalSequence'</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">iterable</span>
</code></pre></div></div>
<h3 id="do-you-believe-in-magic">Do You Believe in Magic?</h3>
<p>The solution to all of these problems is magic!</p>
<p>Lots of Python syntax, like mathemetical operators and iteration,
can be extended by means of magic.
All you need is to know the magic word (not “please”, unfortunately).</p>
<p>For multiplication, the magic word is <code class="language-plaintext highlighter-rouge">__mul__</code>.
For printing and sending to the standard out (<code class="language-plaintext highlighter-rouge">Out[ii]:</code>), the magic words are
<code class="language-plaintext highlighter-rouge">__str__</code> and <code class="language-plaintext highlighter-rouge">__repr__</code>, respectively.
For a fully-fledged iterable, we need three magic words:
<code class="language-plaintext highlighter-rouge">__iter__</code> which gets called by things like <code class="language-plaintext highlighter-rouge">in</code> that try to loop, or <code class="language-plaintext highlighter-rouge">iter</code>ate, over our object,
<code class="language-plaintext highlighter-rouge">__len__</code>, which gets called by the <code class="language-plaintext highlighter-rouge">len</code> built-in function, and
<code class="language-plaintext highlighter-rouge">__getitem__</code>, which we can use to index and slice into our object.</p>
<p>The code block below implements these magics by “stealing” them from the array in <code class="language-plaintext highlighter-rouge">.arr</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">10</span>
<span class="k">def</span> <span class="nf">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""magic called when we iterate over `self`, e.g. in a `for` loop"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">.</span><span class="n">__iter__</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""magic called by the `len(self)`"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">.</span><span class="n">__len__</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self[index]` and slicing"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">.</span><span class="n">__getitem__</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""magic called when self is sent to stdout"""</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span><span class="p">.</span><span class="n">__repr__</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""magic called by `print(self)`"""</span>
<span class="k">return</span> <span class="s">" + "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span>
<span class="p">[</span><span class="s">"{}*{}**{}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">val</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">base</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="bp">self</span><span class="p">))]))</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self * other`"""</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">))</span>
</code></pre></div></div>
<p>A few things to notice:</p>
<ul>
<li>The magics <code class="language-plaintext highlighter-rouge">__str__</code> and <code class="language-plaintext highlighter-rouge">__repr__</code> seem redundant,
since both are string representations of our object.
<code class="language-plaintext highlighter-rouge">__str__</code> is typically for communicating to users,
while <code class="language-plaintext highlighter-rouge">__repr__</code> is for debugging and communicating to programs.</li>
<li>The <code class="language-plaintext highlighter-rouge">__str__</code> method makes use of the built-in <code class="language-plaintext highlighter-rouge">reversed</code> applied to our object.
Implementing the iterable magics gave us access to this built-in for free!
Magic begets magic.</li>
<li>Our <code class="language-plaintext highlighter-rouge">__mul__</code> method also makes use of iteration:
it just passes <code class="language-plaintext highlighter-rouge">self</code> and <code class="language-plaintext highlighter-rouge">other</code>,
rather than <code class="language-plaintext highlighter-rouge">self.arr</code> and <code class="language-plaintext highlighter-rouge">other.arr</code>,
to <code class="language-plaintext highlighter-rouge">convolve</code>.
This works because <code class="language-plaintext highlighter-rouge">convolve</code> knows what to do with an iterable,
where it wouldn’t have known what to do with our first draft <code class="language-plaintext highlighter-rouge">DecimalSequence</code>.</li>
</ul>
<p>Now, the simple operations defined above work!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">3</span><span class="p">]))</span>
<span class="mi">4</span> <span class="ow">in</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
<span class="mi">3</span><span class="o">*</span><span class="mi">10</span><span class="o">**</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">6</span><span class="o">*</span><span class="mi">10</span><span class="o">**</span><span class="mi">0</span>
<span class="bp">False</span>
</code></pre></div></div>
<h3 id="trust-but-verify">Trust, but Verify</h3>
<p>Our <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s are now much more usable,
but they are still missing documentation and verification.</p>
<p>For example, if a user provides an invalid input, like <code class="language-plaintext highlighter-rouge">[-1, 1]</code>,
the multiplication proceeds without complaint:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
<span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>
<p>While this is in some sense the right answer,
our result is no longer a <code class="language-plaintext highlighter-rouge">DecimalSequence</code> in that it cannot be written
in the string format we usually write our numbers in.</p>
<p>We can rectify this by adding some validation to the beginning of our class.
In the code block below,
we define a method <code class="language-plaintext highlighter-rouge">check_iterable</code> that verifies that the iterable matches our assumptions:</p>
<ul>
<li>it is full of integers, in that it can be cast to an integer data type without changing value</li>
<li>it is one-dimensional. Technically, <code class="language-plaintext highlighter-rouge">convolve</code> would raise an error if we didn’t, but it’s better to raise errors in a context where they can be clearly explained. <code class="language-plaintext highlighter-rouge">convolve</code> doesn’t know anything about why this error is occuring, but we do and can communicate that to the user.</li>
<li>all of the elements are negative or positive.</li>
</ul>
<p>None of these assumptions require any information that’s specific to this <code class="language-plaintext highlighter-rouge">DecimalSequence</code>.
Therefore, we can write this method as what’s called a <code class="language-plaintext highlighter-rouge">staticmethod</code>:
it is called without the “hidden” <code class="language-plaintext highlighter-rouge">self</code> argument.
This is achieved by writing <code class="language-plaintext highlighter-rouge">@staticmethod</code> above the definition of the method.
<code class="language-plaintext highlighter-rouge">@staticmethod</code> is called a <em>decorator</em>,
because it is added, like decoration,
on top of an existing function, method, or class,
in order to extend it.</p>
<p>While we’re at it, let’s add a doc-string to our class.
A doc-string is a string describing an object or function
that is intended to be displayed to users.
They can be viewed in <code class="language-plaintext highlighter-rouge">IPython</code>/<code class="language-plaintext highlighter-rouge">Jupyter</code> with the <code class="language-plaintext highlighter-rouge">?</code> and <code class="language-plaintext highlighter-rouge">??</code> syntax.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="s">"""A sequence of decimal digits representing an integer.
Digits are in the order they would be written: 123 -> [1, 2, 3].
A digit sequence ${x_i}$ of length $k$ in base b is mapped to an integer by
$\sum_i x_i b^{k-i}$
"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">check_iterable</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">10</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">check_iterable</span><span class="p">(</span><span class="n">iterable</span><span class="p">):</span>
<span class="n">error_msgs</span> <span class="o">=</span> <span class="p">[</span><span class="s">"(castable to) integers"</span><span class="p">,</span>
<span class="s">"one-dimensional"</span><span class="p">,</span>
<span class="s">"all negative or positive"</span><span class="p">]</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)</span>
<span class="n">error_checkers</span> <span class="o">=</span> <span class="p">[</span>
<span class="k">lambda</span> <span class="n">iterable</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array_equal</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">iterable</span><span class="p">),</span>
<span class="k">lambda</span> <span class="n">iterable</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">arr</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">,</span>
<span class="k">lambda</span> <span class="n">iterable</span><span class="p">:</span> <span class="nb">all</span><span class="p">([</span><span class="n">elem</span> <span class="o">>=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">])</span> <span class="ow">or</span>
<span class="nb">all</span><span class="p">([</span><span class="n">elem</span> <span class="o"><=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">])]</span>
<span class="k">for</span> <span class="n">error_checker</span><span class="p">,</span> <span class="n">error_msg</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">error_checkers</span><span class="p">,</span> <span class="n">error_msgs</span><span class="p">):</span>
<span class="k">assert</span> <span class="n">error_checker</span><span class="p">(</span><span class="n">iterable</span><span class="p">),</span>
<span class="s">"DecimalSequences must be {}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">error_msg</span><span class="p">)</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self * other`"""</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">))</span>
</code></pre></div></div>
<p>One small implementational note:
notice that the collection of errors and error messages is defined using lists,
rather than using a collection of <code class="language-plaintext highlighter-rouge">try</code>/<code class="language-plaintext highlighter-rouge">catch</code> blocks or <code class="language-plaintext highlighter-rouge">if</code> statements.
This style is easier to extend later (just add things to the lists)
and it cuts down on the amount of space taken by the code without harming readability.</p>
<p>Now, when a user, intentionally or not, tries some shenanigans with their inputs,
we can catch the problem and give them a warning:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
<span class="nb">AssertionError</span><span class="p">:</span> <span class="n">DecimalSequences</span> <span class="n">must</span> <span class="n">be</span> <span class="nb">all</span> <span class="n">negative</span> <span class="ow">or</span> <span class="n">positive</span>
</code></pre></div></div>
<h3 id="to-int-and-back-again">To <code class="language-plaintext highlighter-rouge">int</code> and Back Again</h3>
<p>We have one problem remaining:
the result of multiplying two valid <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s is still
not guaranteed to be a valid <code class="language-plaintext highlighter-rouge">DecimalSequence</code>,
thanks to a problem noted in
the last blog post in this series:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="n">array</span><span class="p">([</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">22</span><span class="p">,</span> <span class="mi">15</span><span class="p">])</span>
</code></pre></div></div>
<p>We cannot write down 4132215 and mean the same thing as \(123 \times 45=5535\).</p>
<p>The problem is that when we turn a sequence of numbers into a single number,
many sequences get mapped to the same number:
\(\left[ 4, 13, 22, 15 \right]\) and \(\left[5, 5, 3, 5\right]\), for example.</p>
<p>That is, there are many <em>equivalent</em> representations
of a number as a sequence of smaller numbers,
and the valid <code class="language-plaintext highlighter-rouge">DecimalSequence</code> representation is just one.</p>
<p>Now, we could write a method that goes through a sequence and converts it,
by hand, to its valid form.
This is essentially done by “carrying”, as in “carry the one”.
However, getting this exactly correct seems tricky to me –
how are negative numbers to be handled, for example?</p>
<p>Instead, we’re going to make use of the fact that we can map back and forth between
integers and sequences to get this “simplification” step for free!</p>
<p>We start by defining two methods:
one, <code class="language-plaintext highlighter-rouge">from_int</code>, to generate a <code class="language-plaintext highlighter-rouge">DecimalSequence</code> from an integer
and another, <code class="language-plaintext highlighter-rouge">to_int</code>, to generate an integer from a <code class="language-plaintext highlighter-rouge">DecimalSequence</code>.</p>
<p><code class="language-plaintext highlighter-rouge">from_int</code> can be used to construct new <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s,
so it doesn’t make sense to have it as a typical method, attached to a specific sequence.
Luckily, there’s a decorator, <code class="language-plaintext highlighter-rouge">@classmethod</code> designed for this specific purpose.
It replaces the “hidden” <code class="language-plaintext highlighter-rouge">self</code> argument with a “hidden” <code class="language-plaintext highlighter-rouge">cls</code> (pronounced “class”) argument.
This argument refers to the class (here <code class="language-plaintext highlighter-rouge">DecimalSequence</code>), rather than a specific member.</p>
<p>That way, a user can write <code class="language-plaintext highlighter-rouge">DecimalSequence.from_int(integer)</code> and get a new <code class="language-plaintext highlighter-rouge">DecimalSequence</code> made from that integer.</p>
<p><code class="language-plaintext highlighter-rouge">to_int</code> is a more standard method, so no need for decorators there.
We just need to write a Python version of the definition in our doc-string.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="s">"""A sequence of decimal digits representing an integer.
Digits are in the order they would be written: 123 -> [1, 2, 3].
A digit sequence ${x_i}$ of length $k$ in base b is mapped to an integer by
$\sum_i x_i b^{k-i}$
"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">check_iterable</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">10</span>
<span class="o">@</span><span class="nb">classmethod</span>
<span class="k">def</span> <span class="nf">from_int</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">intgr</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="n">cls</span><span class="p">.</span><span class="n">int_to_iterable</span><span class="p">(</span><span class="n">intgr</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="n">base</span><span class="p">)</span>
<span class="k">return</span> <span class="n">cls</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">to_int</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">elem</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">**</span> <span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">elem</span>
<span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="bp">self</span><span class="p">))]))</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">int_to_iterable</span><span class="p">(</span><span class="n">intgr</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">intgr</span><span class="p">,</span> <span class="nb">int</span><span class="p">),</span> <span class="s">"first argument must be integer"</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">intgr</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">val</span> <span class="o">=</span> <span class="n">intgr</span> <span class="o">%</span> <span class="n">base</span>
<span class="n">iterable</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
<span class="n">intgr</span> <span class="o">=</span> <span class="n">intgr</span> <span class="o">//</span> <span class="n">base</span>
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="n">iterable</span><span class="p">))</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self * other`"""</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">))</span>
</code></pre></div></div>
<p>As you can see <code class="language-plaintext highlighter-rouge">from_int</code> is a bit more complicated than <code class="language-plaintext highlighter-rouge">to_int</code>.
For that reason, we split out the meatier part,
converting the integer into an iterable suitable for constructing a <code class="language-plaintext highlighter-rouge">DecimalSequence</code>,
into its own function, <code class="language-plaintext highlighter-rouge">int_to_iterable</code>,
and then leave the construction step to our <code class="language-plaintext highlighter-rouge">__init__</code> method.</p>
<p>And how exactly does this method work?
We walk from the right side of the digit to the left
by using the remainder or modulo operation, <code class="language-plaintext highlighter-rouge">%</code>.
This “indexes” the last element of the integer, and we <code class="language-plaintext highlighter-rouge">append</code> it to our iterable.
Then, we use floor division (<code class="language-plaintext highlighter-rouge">//</code>) to remove the last element.
The result is in reverse order, so we flip it around once we’re done.</p>
<p>Now, we can convert between <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s and integers:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="mi">123</span><span class="p">))</span>
<span class="mi">1</span><span class="o">*</span><span class="mi">10</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="mi">10</span><span class="o">**</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">3</span><span class="o">*</span><span class="mi">10</span><span class="o">**</span><span class="mi">0</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]).</span><span class="n">to_int</span><span class="p">()</span>
<span class="mi">123</span>
</code></pre></div></div>
<p>This suggests a natural test:
sending an integer to its sequence representation and back shouldn’t change it:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">intgr</span> <span class="o">=</span> <span class="mi">123</span>
<span class="k">assert</span> <span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="n">intgr</span><span class="p">).</span><span class="n">to_int</span><span class="p">()</span> <span class="o">==</span> <span class="n">intgr</span>
</code></pre></div></div>
<h3 id="reduceing-complexity-with-algebra"><code class="language-plaintext highlighter-rouge">reduce</code>ing Complexity with Algebra</h3>
<p>We are now ready to write a <code class="language-plaintext highlighter-rouge">reduce</code> function that
simplifies our <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s and guarantees they are always valid.</p>
<p>By defining <code class="language-plaintext highlighter-rouge">int_to_iterable</code> as we did above,
we guaranteed that any <code class="language-plaintext highlighter-rouge">DecimalSequence</code> built from an integer will be valid.
So to coerce any sequence to be valid,
we just need to convert it to an integer and then convert the resulting integer back into a <code class="language-plaintext highlighter-rouge">DecimalSequence</code>!</p>
<p>Imagine lining up all sequences of integers and all integers in two rows,
then drawing an arrow from a sequence <code class="language-plaintext highlighter-rouge">s</code> to an integer <code class="language-plaintext highlighter-rouge">x</code>
if <code class="language-plaintext highlighter-rouge">x == to_int(s)</code>.
Many arrows from different <code class="language-plaintext highlighter-rouge">s</code>s will converge onto the same <code class="language-plaintext highlighter-rouge">x</code>.
Then imagine drawing an arrow from every <code class="language-plaintext highlighter-rouge">x</code> to the valid sequence <code class="language-plaintext highlighter-rouge">s</code>
it is mapped to by <code class="language-plaintext highlighter-rouge">from_int</code>.</p>
<p>The diagram below shows what this might look like.
You’re encouraged to draw your own version to check your understanding!
Notice that multiple red arrows
(corresponding to the action of <code class="language-plaintext highlighter-rouge">to_int</code>)
converge on a single integer,
but that at most one blue arrow
(corresponding to the action of <code class="language-plaintext highlighter-rouge">from_int</code>)
touches each sequence.
Furthermore, only valid sequences are touched by an arrow.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/to_int_from_int.jpg" alt="to_int_from_int" /></p>
<p>A function whose arrows, as drawn above,
touch all of the objects on the other side is called a <em>surjection</em>.
If multiple arrows land on the same object,
the function is called <em>many-to-one</em>.</p>
<p>A function whose arrows, as drawn above,
start from all of the objects on one side
and never land on the same object on the other,
the function is called an <em>injection</em>.</p>
<p>This motif, of a (often many-to-one) surjection
followed by an injection,
is extremely common in abstract algebra,
where it appears in a variety of “decomposition theorems”:
e.g. the
<a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_linear_algebra">Fundamental Theorem of Linear Algebra</a>
and the canonical decompositions of functions and of group homomorphisms.</p>
<p>The key insight of these decomposition theorems is that even very complicated objects,
like the process that converts an invalid <code class="language-plaintext highlighter-rouge">DecimalSequence</code> to a valid one,
can often be decomposed into a few simpler objects,
which we can understand separately and then chain together.</p>
<p>For more on how abstract algebra can provide insight into programming problems,
check out my blog post on
<a href="http://charlesfrye.github.io/math/2018/08/21/functors-film-strips.html">functors and film strips</a>
or, if you really want to take the plunge,
<a href="https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/">Catgeory Theory for Programmers</a>.</p>
<p>In the code block below,
we add a <code class="language-plaintext highlighter-rouge">reduce</code> method to our <code class="language-plaintext highlighter-rouge">DecimalSequence</code> class
that makes use of our “canonical decomposition” of the reduction operation.
We place it in the <code class="language-plaintext highlighter-rouge">__init__</code> method so that all <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s are always in reduced form.</p>
<p>The only piece that’s not as described above is the first few lines of <code class="language-plaintext highlighter-rouge">reduce</code>,
which handle arrays that are already in reduced form
and arrays that start with a bunch of leading <code class="language-plaintext highlighter-rouge">0</code>s.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="s">"""A sequence of decimal digits representing an integer.
Digits are in the order they would be written: 123 -> [1, 2, 3].
A digit sequence ${x_i}$ of length $k$ in base b is mapped to an integer by
$\sum_i x_i b^{k-i}$
"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">check_iterable</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">10</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="nb">reduce</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">reduce</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">all</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">elem</span><span class="p">))</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">]):</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span> <span class="ow">and</span> <span class="bp">self</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">to_int</span><span class="p">()).</span><span class="n">arr</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self * other`"""</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">))</span>
</code></pre></div></div>
<p>Now when we multiply two <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s together,
the result is in the valid form!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span> <span class="o">*</span> <span class="n">DecimalSequence</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="n">array</span><span class="p">([</span><span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
</code></pre></div></div>
<h3 id="fixing-a-bug-easily">Fixing a Bug, Easily</h3>
<p>We are almost done with a nice, usable, extensible <code class="language-plaintext highlighter-rouge">DecimalSequence</code> class.</p>
<p>But there’s a problem!
One of our methods has a bug:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="o">-</span><span class="mi">11</span><span class="p">)</span>
<span class="n">array</span><span class="p">([],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">int64</span><span class="p">)</span>
</code></pre></div></div>
<p>Our implementation of <code class="language-plaintext highlighter-rouge">int_to_iterable</code> was wrong!</p>
<p>When coming up with the algorithm,
we neglected to consider how it would work on
negative integers.</p>
<p>Luckily, we can fix this issue easily.
We wrote an algorithm that works for non-negative integers.
A negative integer is just a positive integer multiplied by a minus sign
and a negative <code class="language-plaintext highlighter-rouge">DecimalSequence</code> is just a positive <code class="language-plaintext highlighter-rouge">DecimalSequence</code>
with all the entries multiplied by a minus sign.</p>
<p>Therefore we can just add a short piece at the beginning of our <code class="language-plaintext highlighter-rouge">int_to_iterable</code>
method that checks whether an input is negative,
and, if so, convert it to a positive integer (multiply by <code class="language-plaintext highlighter-rouge">-1</code>).
Then, once our algorithm is done,
we can easily convert our result to a negative <code class="language-plaintext highlighter-rouge">DecimalSequence</code>
by multiplying each of the elements by <code class="language-plaintext highlighter-rouge">-1</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DecimalSequence</span><span class="p">():</span>
<span class="s">"""A sequence of decimal digits representing an integer.
Digits are in the order they would be written: 123 -> [1, 2, 3].
A digit sequence ${x_i}$ of length $k$ in base b is mapped to an integer by
$\sum_i x_i b^{k-i}$
"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">check_iterable</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">atleast_1d</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">iterable</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">int</span><span class="p">)))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="n">arr</span>
<span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">10</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="nb">reduce</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">reduce</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">all</span><span class="p">([</span><span class="bp">self</span><span class="p">.</span><span class="n">check_element</span><span class="p">(</span><span class="n">elem</span><span class="p">)</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">]):</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span> <span class="ow">and</span> <span class="bp">self</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">arr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">arr</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">to_int</span><span class="p">()).</span><span class="n">arr</span>
<span class="k">def</span> <span class="nf">check_element</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">elem</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">base</span> <span class="o"><</span> <span class="n">elem</span> <span class="o"><</span> <span class="bp">self</span><span class="p">.</span><span class="n">base</span>
<span class="p">...</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">int_to_iterable</span><span class="p">(</span><span class="n">intgr</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">intgr</span><span class="p">,</span> <span class="nb">int</span><span class="p">),</span> <span class="s">"first argument must be integer"</span>
<span class="n">sign</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">intgr</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">sign</span> <span class="o">*=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">intgr</span> <span class="o">*=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">intgr</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">val</span> <span class="o">=</span> <span class="n">intgr</span> <span class="o">%</span> <span class="n">base</span>
<span class="n">iterable</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
<span class="n">intgr</span> <span class="o">=</span> <span class="n">intgr</span> <span class="o">//</span> <span class="n">base</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="p">[</span><span class="n">sign</span> <span class="o">*</span> <span class="n">elem</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="n">iterable</span><span class="p">))</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="s">"""magic called by the expression `self * other`"""</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">other</span><span class="p">,</span> <span class="n">DecimalSequence</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"can't multiply DecimalSequence by object of type:"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">other</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">DecimalSequence</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">convolve</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">))</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="o">-</span><span class="mi">11</span><span class="p">)</span>
<span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>
<p>This is a very similar principle to the one we used to solve the <code class="language-plaintext highlighter-rouge">reduce</code> operation!</p>
<p>We had a working <code class="language-plaintext highlighter-rouge">int_to_iterable</code> method on a certain set of inputs,
and we wanted to extend it to cover more.
Instead of coming up with a more complicated method
to solve the <code class="language-plaintext highlighter-rouge">int_to_iterable</code> problem
on this expanded set of inputs,
we made a map from the inputs on which our method didn’t work
to the inputs on which it did,
and then a map from the outputs of the working method
to the desired outputs for the full method.
Again, we broke a complicated mapping down into simpler maps,
then combined them.</p>
<p>I include this bugfix only partly for pedagogical reasons,
in order to work through another example where thinking algebraically
and compositionally paid dividends,
but also because this actually happened while I was working the problem out myself!</p>
<p>A minor implementational note:
this bugfix also required a change to the part of the <code class="language-plaintext highlighter-rouge">reduce</code> operation
that handles checking whether an array is already reduced,
since that was also not designed to handle negative integers.</p>
<h3 id="testing">Testing</h3>
<p>Finally, we should write a test for our code.</p>
<p>Now that we have a map from integers to <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s,
we can check that mapping an integer to a sequence,
then multiplying,
and then mapping back to an integer
gives the same result as just multiplying the integers.
Algebraically, this is essentially a test that our map to <code class="language-plaintext highlighter-rouge">DecimalSequence</code> is a
<a href="https://en.wikipedia.org/wiki/Homomorphism">homomorphism</a>
with respect to multiplication.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">assert_mul_correct</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="n">seq_a</span><span class="p">,</span> <span class="n">seq_b</span> <span class="o">=</span> <span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">DecimalSequence</span><span class="p">.</span><span class="n">from_int</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="k">assert</span> <span class="p">(</span><span class="n">seq_a</span> <span class="o">*</span> <span class="n">seq_b</span><span class="p">).</span><span class="n">to_int</span><span class="p">()</span> <span class="o">==</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
<span class="n">ints</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="o">-</span><span class="mi">10000</span><span class="p">,</span> <span class="mi">10000</span><span class="p">))</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">100</span>
<span class="p">[</span><span class="n">assert_mul_correct</span><span class="p">(</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">ints</span><span class="p">),</span> <span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">ints</span><span class="p">))</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">test_size</span><span class="p">)];</span>
</code></pre></div></div>
<h3 id="conclusion">Conclusion</h3>
<p>There’s much more to do to make a really excellent <code class="language-plaintext highlighter-rouge">DecimalSequence</code> type.
For one, you can hook into <code class="language-plaintext highlighter-rouge">+</code>, <code class="language-plaintext highlighter-rouge">-</code> and <code class="language-plaintext highlighter-rouge">/</code> with <code class="language-plaintext highlighter-rouge">__add__</code>, <code class="language-plaintext highlighter-rouge">__sub__</code>, and <code class="language-plaintext highlighter-rouge">__div__</code>.
For another, it’s sensible to multiply <code class="language-plaintext highlighter-rouge">DecimalSequence</code>s with <code class="language-plaintext highlighter-rouge">int</code>s,
which requires extending the <code class="language-plaintext highlighter-rouge">__mul__</code> method to do some type-checking and casting
(and don’t forget about <code class="language-plaintext highlighter-rouge">__rmul__</code>!).
Lastly, a <code class="language-plaintext highlighter-rouge">DecimalSequence</code> is just a specific instantiation of a generic
<code class="language-plaintext highlighter-rouge">DigitSequence</code> type, which would allow for different choices of (positive-valued) <code class="language-plaintext highlighter-rouge">base</code>.
These would all make fun projects to test your understanding of magic methods
and digit sequences in Python!</p>
Fri, 22 Feb 2019 00:00:00 +0000
http://charlesfrye.github.io/programming/2019/02/22/multiplication-convoluted-part-two.html
http://charlesfrye.github.io/programming/2019/02/22/multiplication-convoluted-part-two.htmlprogrammingMultiplication Made Convoluted, Part I: Math<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Well, actually, this is more right than you think: <br />A multiplication *is* a convolution of one multi-digit number by another one over the digit dimension.<br />Think about it.</p>— Yann LeCun (@ylecun) <a href="https://twitter.com/ylecun/status/1053719869005447168?ref_src=twsrc%5Etfw">October 20, 2018</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<!--exc-->
<h3 id="introduction">Introduction</h3>
<p>Convolutions show up in many places:
in signal processing,
in probability,
and of course in the marriage of the two,
machine learning.
Check out
<a href="http://charlesfrye.github.io/external/2016/03/27/convolutions.html">this convolution tutorial</a>
for more details.
They are just as intimately related to such deep and powerful mathematics as
<a href="http://charlesfrye.github.io/stats/2017/11/22/gaussian-diff-eq.html">the Central Limit Theorem</a>
and
<a href="https://www.khanacademy.org/math/differential-equations/laplace-transform/convolution-integral/v/the-convolution-and-the-laplace-transform">the Fourier transform</a>
as they are to
<a href="www.colah.github.io/posts/2014-12-Groups-Convolution">understanding what happens when you shuffle cards</a>.</p>
<p>The above tweets,
by satirical convolution fanatic
<a href="https://twitter.com/boredyannlecun">@boredyannlecun</a>
and actual convolution fanatic
<a href="https://twitter.com/ylecn">actual Yann Le Cun</a>,
reveal an unexpected connection between convolutions
and the humble mutliplication operation.
I decided to work it out thoroughly and write it up.</p>
<p>In this blog post,
we’ll work together through the <em>Think about it.</em> phase
by first thinking about how we normally do multiplication,
then generalizing it, and then deriving the convolutional form.</p>
<p>In a
<a href="http://charlesfrye.github.io/programming/2019/02/22/multiplication-convoluted-part-two.html">follow-up blog post</a>,
we’ll work through implementing a number type in Python
that actually makes use of this relationship to do multiplication.
Along the way, we’ll learn how to hook into Python’s built-in operators
with our own objects using what are called, almost without hyperbole,
<em>magic methods</em>,
and we’ll see some benefits of thinking about our programs
using ideas from abstract algebra.</p>
<h3 id="the-multiplication-algorithm">The Multiplication Algorithm</h3>
<p>Briefly, let’s review the multiplication algorithm
we learned in school, taking care to emphasize some points
that will be salient later.</p>
<p>Imagine we wish to multiply two multi-digit numbers together,
e.g. \(123 \times 45\):</p>
<div style="text-align: center">\[\begin{array}{ccccc}
& & 1 & 2 & 3 \\
\times & & & 4 & 5 \\
\hline
& & & & \text{?} \\
\end{array}\]
</div>
<p>We proceed by multiplying 3 by 5, obtaining 15,
then writing a 5 in “the ones place” and “carrying the 1”
over to the next column, where we add it to the result of 2 times 5,
and so on.
When we have finished, we have multipled the number on top, 123,
by the first number on the bottom, 5, and obtained 615.</p>
<div style="text-align: center">\[\begin{array}{ccccc}
& & 1 & 2 & 3 \\
\times & & & 4 & 5 \\
\hline
& & 6 & 1 & 5 \\
& & & & ? \\
\end{array}\]
</div>
<p>Then we begin again:
we start writing the second row, but this time we put a 0 at the front.
Why do we do this?
Once again, we wish to mutiply the number on top by the number on the bottom,
but now “the number on the bottom” isn’t just 4,
it’s <em>40</em>.
Multiplying by 40 is the same as multiplying by 10 and then multiplying by 4,
and writing the 0 is our way of doing that first step.</p>
<div style="text-align: center">\[\begin{array}{ccccc}
& & 1 & 2 & 3 \\
\times & & & 4 & 5 \\
\hline
& & 6 & 1 & 5 \\
+ & 4 & 9 & 2 & 0 \\
\hline
& 5 & 5 & 3 & 5
\end{array}\]
</div>
<p>To get the final result, we add our intermediate results, columnwise.</p>
<p>With the benefit of several more years of math education,
we can write this algorithm compactly and concretely
using sums.</p>
<p>First we note that it makes use of the fact that we can think of a single number,
say \(v\), as the sum of a sequence of smaller numbers, or digits,
multiplied by powers of 10:</p>
<div style="text-align: center">\[\begin{align}
v &= \sum_k {\mathbf{v}_k \cdot 10 ^ k}
\end{align}\]
</div>
<p>where an italic letter like \(v\) always means a number, while a bold-faced letter like
\(\mathbf{v}\) means the <em>sequence of digits</em> that we use to represent the number.
To refer to a digit in that number, we use a subscript, as in \(\mathbf{v}_k\).
Though this way of thinking feels so natural to us as to be unquestionable,
the idea of representing numbers this way had to be invented.
Indeed, Romans had to use a
<a href="www.phy6.org/outreach/edu/roman.htm">much more complicated algorithm</a>
to multiply their numerals.</p>
<p>This notation in hand, we can write our multiplication algorithm
for pairs of multi-digit numbers as:</p>
<div style="text-align: center">\[\begin{align}
z &= x \cdot y \\
&= \sum_j {x \cdot \mathbf{y}_j \cdot 10 ^ j}
\end{align}\]
</div>
<p>notice that there are still multiplications inside the sum,
but they are now between a multi-digit number and a single-digit number
and between a multi-digit number and a power of ten,
which we have separate algorithms for.</p>
<h3 id="rethinking-the-multiplication-algorithm">Rethinking the Multiplication Algorithm</h3>
<p>In order to obtain our “convolution-style”
multiplication algorithm,
we need to reorganize our expression in terms of a different set of sums
and multiplications that result in the same final value.</p>
<p>First, we recognize that, for the sum over \(j\), the value of \(x\) is fixed,
so we can pull it out of the sum.
We then rewrite \(x\) the same way we rewrote \(y\),
i.e. as a sum over its digits:</p>
<div style="text-align: center">\[\begin{align}
z &= x \cdot y \\
&= \sum_j x \cdot \mathbf{y}_j \cdot 10 ^ j \\
&= x\cdot \sum_j \mathbf{y}_j \cdot 10 ^ j \\
&= \sum_i \mathbf{x}_i \cdot 10 ^ i \cdot \sum_j \mathbf{y}_j \cdot 10 ^ j
\end{align}\]
</div>
<p>Let’s write these sums out for a pair of short numbers:
e.g. 123 and 45 as (100 + 20 + 3)(40 + 5).
It should be clear that instead of adding first and then multiplying,
we can just as well multiply everything first, then add
(FOIL style, if that acronym was used in your math education).
And if we do so, we’ll have to multiply each value \(\mathbf{x}_i \cdot 10^i\)
by each value \(\mathbf{y}_j \cdot 10 ^ j\).</p>
<p>That means we’re free to write down any order of addition we want,
just so long as, once all of those additions are done, we’ve managed to include
all of the combinations \(\mathbf{x}_i\) and \(\mathbf{y}_j\).
Each order will correspond to a different choice of multiplication algorithm,
and the one we described first, corresponding to the one we were taught in school,
is just a particular, convenient choice for order of additions.</p>
<h3 id="viva-la-convolucion">Viva la Convolucion</h3>
<p>To get “multiplication as convolution in the digit domain”,
we choose the following ordering
(and simplify it with algebra in the second step):</p>
<div style="text-align: center">\[\begin{align}
z &= \sum_i \mathbf{x}_i \cdot 10 ^ i \cdot \sum_j \mathbf{y}_j \cdot 10 ^ j\\
&= \sum_k \sum_{i+j=k} \mathbf{x}_i \cdot 10^i \cdot \mathbf{y}_j \cdot 10^j \\
&= \sum_k \sum_{i+j=k} \mathbf{x}_i \mathbf{y}_j \cdot 10^k
\end{align}\]
</div>
<p>That is, we first split our pairs of \(i\) and \(j\) values according to
what \(i+j\) equals (and we call that value \(k\)),
then we add up all of the products for each pair whose indices add to \(k\),
and then finally we add up across all choices of \(k\).
Mathematically, this is expressed by the notation \(i+j=k\) at the bottom of the sum,
which means “over values of \(i\) and \(j\) such that they add together to equal \(k\)”.</p>
<p>If the connection to convolutions isn’t clear yet,
first take a look at the final line from above.
We’ve equated \(z\) with a sum over an index of something
times powers of 10 to the index.
That’s the same as our definition of the entries of \(\mathbf{z}\)!</p>
<p>Indeed, we can write</p>
<div style="text-align: center">\[\begin{align}
\mathbf{z}_k &= \sum_{i+j=k} \mathbf{x}_i \mathbf{y}_j
\end{align}\]
</div>
<p>which is one way of expressing
“\(\mathbf{z}\) is the convolution of \(\mathbf{x}\) and \(\mathbf{y}\)”,
thought of as vectors.
This particular notation is non-standard, but
<a href="www.colah.github.io/posts/2014-07-Understanding-Convolutions">more intuitive for me</a>.
We can obtain something a bit more standard if we substitute \(j = k-i\):</p>
<div style="text-align: center">\[\begin{align}
\mathbf{z}_k &= \sum_{i} \mathbf{x}_i \mathbf{y}_{k-i}
\end{align}\]
</div>
<p>which should look familiar!</p>
<h3 id="convolution-in-the-digit-domain">Convolution in the Digit Domain</h3>
<p>Let’s look at how this algorithm pans out in our example.</p>
<p>The convolution operation on a pair \(a,b\) is sometimes described as
“reverse \(b\), then slide it along \(a\),
multiplying the values that align and adding up the results”.</p>
<p>Below, I’ve drawn out this process for the “convolutional multiplication”
of 123 and 45.
A single line separates the two values being multiplied
from the running total of the result.
At each step, we multiply any numbers that line up
and add the results.
Double lines separate out iterations of the process:
when we cross a double line, 45 is “slid along” 123 by one increment.</p>
<div style="text-align: center">\[\begin{array}{ccccc}
& 1 & 2 & 3 & \\
& & & 5 & 4 \\
\hline
& & & 15 & \\
\hline\hline
& 1 & 2 & 3 & \\
& & 5 & 4 &\\
\hline
& & 22 & 15 & \\
\hline\hline
& 1 & 2 & 3 & \\
& 5 & 4 & &\\
\hline
& 13 & 22 & 15 & \\
\hline\hline
& 1 & 2 & 3 & \\
5 & 4 & & &\\
\hline
4 & 13 & 22 & 15 & \\
\end{array}\]
</div>
<p>Notice that there’s one tiny snag:
if \(\sum_{i+j=k} \mathbf{x}_i \cdot \mathbf{y}_j\) is greater than 10 for a given \(k\),
then \(\mathbf{z}_k\) won’t be a “digit”, as we normally think of them
(we write 5535 for the answer to \(123 \times 45\), not 4,13,22,15).
In fact, for something like \(x=5\), \(y=3\),
we end up with \(\mathbf{z}_0=15\),
rather than \(\mathbf{z}_0=5, \mathbf{z}_1=1\), as we’d like.
In
<a href="http://charlesfrye.github.io/programming/2019/02/22/multiplication-convoluted-part-two.html">the follow-up to this blog post</a>,
where we implement a \(\texttt{DecimalSequence}\) type in Python
that uses convolution to do multiplication,
we’ll see a simple way to fix this problem
by decomposing the necessary “simplification” operation
into a pair of maps to and from the integers.</p>
Wed, 20 Feb 2019 00:00:00 +0000
http://charlesfrye.github.io/math/2019/02/20/multiplication-convoluted-part-one.html
http://charlesfrye.github.io/math/2019/02/20/multiplication-convoluted-part-one.htmlmathFréchet Derivatives 4: The Determinant<div style="text-align: center">\[\begin{align}
\nabla \det M &= \det M \cdot \left(M^{-1}\right)^\top
\end{align}\]
</div>
<!--exc-->
<h3 id="introduction">Introduction</h3>
<p>In this series of blog posts,
we’ve seen that writing derivatives in the Fréchet style
makes the computation of derivatives of functions that have
matrix- or vector-valued inputs and scalar outputs
substantially easier.
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">The first blog post in this series</a>
defined the Fréchet derivative and applied it to some very simple functions.
<a href="http://charlesfrye.github.io/math/2018/03/07/frechet-least-squares.html">The second blog post</a>
applied it to the derivation of the normal equations
and gradient updates for linear least squares.
<a href="http://charlesfrye.github.io/math/2019/01/18/frechet-linear-network.html">The third blog post</a>
applied the Fréchet derivative to deep linear networks,
a form of neural network with no nonlinearity.</p>
<p>That was intended to be the final blog post,
but then I came across
<a href="https://terrytao.wordpress.com/2013/01/13/matrix-identities-as-derivatives-of-determinant-identities/">this post by Terry Tao</a>
on the interesting relationships between derivatives of determinants and famous matrix identities.</p>
<p>That post is excellent, and I highly recommend you read it.
I will focus on one of the claims, which appears in the pull-out for this post:
the derivative of the determinant is the determinant times the transpose of the inverse.</p>
<h3 id="the-determinant">The Determinant</h3>
<p>The determinant measures, for real-valued symmetric matrices,
the effect that the matrix has on volumes.
If the determinant is \(2\), then the matrix doubles volumes.
That is, if a region has volume \(x = 10\) before applying the matrix,
then it will have volume \(2x = 20\) after applying it.
Check out
<a href="https://www.youtube.com/watch?v=Ip3X9LOh2dk">this video by master explainer 3blue1brown</a>
for a nice intuitive look at the determinant.
It’s an important function that appears all over the place in linear algebra.</p>
<p>Most important for us, though, is that it takes in a matrix \(M\)
and returns a scalar.
That means we can write a relation for the derivative, \(\nabla \det\), easily and concretely in the Fréchet style:</p>
<div style="text-align: center">\[\det(M + \varepsilon) =
\det M + \mathrm{tr}\left(\nabla \det (M)^\top \varepsilon\right) + o(\varepsilon)\]
</div>
<p>See
<a href="http://charlesfrye.github.io/math/2018/02/28/how-big-is-a-matrix.html">this previous post</a>
for a bit on why the \(\mathrm{tr}(X^\top Y)\) is showing up
and the previous posts in this series for context and examples on why
we write the derivative this way.</p>
<p>For the mathier folks, I’d like to additionally note that the determinant is a
<a href="https://en.wikipedia.org/wiki/Multilinear_map">multilinear functional</a>,
that is, it is a linear functional on a tensor product space,
where the tensor product is over vectors (viewed as rows or columns of the matrix).
This is a fancy way of saying that the determinant changes linearly when a column or row is changed linearly.
Given how smoothly the Fréchet derivative handles linear and polynomial functionals,
we might hope that it smoothly handles the determinant.
And our faith is duly rewarded.</p>
<h3 id="the-derivative-of-the-determinant">The Derivative of the Determinant</h3>
<p>We begin by taking the expression on the left side and trying to find a way to expand it
so that terms that look like the right side begin to appear.</p>
<p>We don’t have a ton of options, but a sufficiently clever individual might try the following:</p>
<div style="text-align: center">\[\begin{align}
\det(M + \varepsilon) &= \det\left(M\left(I + M^{-1}\varepsilon\right)\right)\\
&= \det(M) \cdot \det\left(I + M^{-1}\varepsilon\right)
\end{align}\]
</div>
<p>First, we “pulled the \(M\) out”, incurring an \(M^{-1}\) for our trouble.
Then, we recognized that the determinant of a product of matrices
is the product of the matrices’ determinants.
Consider: if the matrix \(A\) scales volumes by \(2\), and the matrix \(B\) scales them by \(5\),
then the matrix \(AB\), which first applies the transformation \(B\), then \(A\), must scale volumes by \(10\).</p>
<p>Now, we’ve traded the original question,
which was “what does the determinant function look like at a small perturbation around the matrix \(M\)?”,
for an easier question:
“what does the determinant function look like at a small perturbation around the identity matrix \(I\)?”</p>
<p>Let’s write this perturbed identity matrix out, ignoring the \(M^{-1}\) for now:</p>
<div style="text-align: center">\[I + \varepsilon =
\begin{bmatrix}
1 + \varepsilon & \varepsilon & \dots & \varepsilon \\
\varepsilon & 1 + \varepsilon & \dots & \varepsilon \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\varepsilon & \varepsilon & \dots & 1 + \varepsilon
\end{bmatrix}\]
</div>
<p>What does a matrix like this do to volumes?
It’s <em>almost</em> a matrix with only diagonal elements,
since every value off of the diagonal is \(\varepsilon\).</p>
<p>A matrix with only diagonal elements is sometimes called a <em>scaling matrix</em>.
Along each dimension, it simply scales by the diagonal element:
dimension 1 is scaled by the element in row 1 column 1,
dimension 2 by the element in row 2 column 2,
and so on.
Because each dimension is scaled independently,
it’s easy to say what happens to the volume:
it’s scaled by the product of all of the scalings.</p>
<p>Therefore the determinant of the matrix \(I +\varepsilon\)
is equal to the \(n\)-fold product of \(1 + \varepsilon\),
plus any effects due to the off-diagonal terms,
which we will presume to be \(o(\varepsilon)\) (they are).
We write this fact down and then expand the \(n\)-fold product,
immediately noting that all of the terms after the first two
are \(o(\varepsilon)\):</p>
<div style="text-align: center">\[\begin{align}
\det \left(I + \varepsilon\right) &= (1 + \varepsilon)^n + o(\varepsilon)\\
&= 1 + n\varepsilon + o(\varepsilon) + o(\varepsilon) \\
&= 1 + n\varepsilon + o(\varepsilon)
\end{align}\]
</div>
<p>where the last line took advantage of the fact that two small things
(our \(o\) terms) added together are still a small thing.</p>
<p>How do we generalize this to the case where our matrix has different values on the diagonal?
In that case, our \(n\)-fold product is a product of \(n\) terms that look like</p>
<div style="text-align: center">\[(1 + \alpha\varepsilon)(1 + \beta\varepsilon) \dots\]
</div>
<p>and so there will be \(n\) terms that are larger than \(o(\varepsilon)\),
one for each element of the diagonal,
each scaled by the corresponding element of the diagonal.</p>
<p>So the total \(\varepsilon\) term is <em>the sum of the diagonal values of the perturbation</em>.
Or, succinctly, it is the trace, \(\mathrm{tr}\), of the perturbation.
This insight lets us write</p>
<div style="text-align: center">\[\begin{align}
\det \left(I + M^{-1}\varepsilon\right)
&= 1 + \mathrm{tr}\left(M^{-1}\varepsilon\right) + o(\varepsilon)
\end{align}\]
</div>
<p>This is very exciting!
The Fréchet strategy for finding expressions for derivatives of functions on matrices
is all about massaging until we get a trace-of-something-times-\(\varepsilon\) term.
We’re in the home stretch!</p>
<p>First, let’s remind ourselves of where we were,
then slap in our expression above:</p>
<div style="text-align: center">\[\begin{align}
\det(M + \varepsilon)
&= \det(M) \cdot \det\left(I + M^{-1}\varepsilon\right) \\
&= \det(M) \cdot \left(1 + \mathrm{tr}\left(M^{-1}\varepsilon\right) + o(\varepsilon)\right)
\end{align}\]
</div>
<p>We now distribute that \(\det(M)\) term across the remainder of the terms.
The first one is simple, it’s a \(1\).
In the second, we pull the determinant inside the trace,
making use of the fact that the trace is a linear functional (see for yourself!).
In the third, we recall that any concretely-sized thing times a small thing is small thing
(\(\lambda o(\varepsilon) = o(\varepsilon)\))
to simply disappear the determinant, leaving us with:</p>
<div style="text-align: center">\[\begin{align}
\det(M + \varepsilon)
&= \det(M) + \mathrm{tr}\left(\det(M) M^{-1}\varepsilon\right) + o(\varepsilon)
\end{align}\]
</div>
<p>Now, we’re ready to pattern-match onto the definition of the Fréchet derivative:</p>
<div style="text-align: center">\[\det(M + \varepsilon) =
\det M + \mathrm{tr}\left(\nabla \det (M)^\top \varepsilon\right) + o(\varepsilon)\]
</div>
<p>On the left hand sides, we have the function evaluated at a point perturbed away
from \(M\) by \(\varepsilon\).
On the right hand sides, we have three terms.
The first term is equal to the function, evaluated at the point \(M\).
The third term is \(o(\varepsilon)\).
The second term is the trace of a matrix times \(\varepsilon\).
Therefore, this second term contains the derivative.
We just need to transpose the matrix to get the final answer:</p>
<div style="text-align: center">\[\begin{align}
\nabla \det M &= \det M \cdot \left(M^{-1}\right)^\top
\end{align}\]
</div>
Fri, 25 Jan 2019 00:00:00 +0000
http://charlesfrye.github.io/math/2019/01/25/frechet-determinant.html
http://charlesfrye.github.io/math/2019/01/25/frechet-determinant.htmlmathFréchet Derivatives 3: Deep Linear Networks<div style="text-align: center">\[\begin{align}
\nabla_{W_k} l(W_1, \dots, W_L) = W_{k+1:}^\top \nabla L(W) W_{:k}^\top
\end{align}\]
</div>
<!--exc-->
<h3 id="introduction">Introduction</h3>
<p>The Fréchet derivative makes the computation
of certain matrix- and vector-valued derivatives by hand
substantially less painful.
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">The first blog post in this series</a>
defined the Fréchet derivative and applied it to some very simple functions,
while
<a href="http://charlesfrye.github.io/math/2018/03/07/frechet-least-squares.html">the second blog post</a>
applied it to the derivation of the normal equations
and gradient updates for linear least squares.</p>
<p>In this final blog post,
we will apply the Fréchet derivative to
computing the gradients for a somewhat peculiar version of linear regression:
the <em>deep linear network</em>.</p>
<h3 id="neural-networks-without-all-that-pesky-nonlinear-stuff">Neural Networks Without All That Pesky Nonlinear Stuff</h3>
<p>Neural networks are currently pushing forward the state of the art
in machine learning and artificial intelligence.
If you’re unfamiliar with them, check out
<a href="http://charlesfrye.github.io/external/2018/10/31/neural-nets-colab.html">this blogpost</a>
for a gentle introduction.</p>
<p>The most common form of neural network, a feed-forward neural network,
is a parameterized function whose parameters are matrices
and which looks like</p>
<div style="text-align: center">\[f\left(x; W_1, W_2, \dots W_L\right) = \sigma_L \left( W_L \dots \sigma_2\left(W_2 \sigma_1 W_1 x\right)\right)\]
</div>
<p>where each \(\sigma_i\) is some nonlinear function.
In English, we take the input \(x\) and multiply it by a series of matrices,
but in between we apply nonlinear transformations.</p>
<p>Neural networks can mimic
<a href="http://neuralnetworksanddeeplearning.com/chap4.html">just about any function you’d like</a>,
provided you carefully choose the \(W_i\)s.
This genericness is part of what makes them so powerful, but it also makes them hard to understand.</p>
<p>If we’re interested in more deeply understanding things like how neural networks learn and why they
<a href="https://ml.berkeley.edu/blog/2018/01/10/adversarial-examples/">behave strangely sometimes</a>,
we need a model that’s like a neural network in certain important ways, but which we can analyze.</p>
<p>Enter the <em>linear network</em>.
I wrote about single-layer linear networks in a blog post
<a href="http://charlesfrye.github.io/math/2017/08/17/linear-algebra-for-neuroscientists.html">introducing linear algebra ideas to neuroscientists</a>.
A linear network chooses each \(\sigma_i\) to be the identity function,
or a “pass-through” function that doesn’t touch its inputs.
The result is a network that looks like</p>
<div style="text-align: center">\[f\left(x; W_1, W_2, \dots W_{L-1} W_L\right) = W_L W_{L-1} \dots W_2 W_1 x\]
</div>
<p>Unlike in the nonlinear case, it’s easy to write down what function the network is computing.
If we denote by \(W\) the matrix that results from multiplying all of the \(L\) matrices together,
we obtain</p>
<div style="text-align: center">\[F\left(x; W\right) = Wx\]
</div>
<p>and we know quite a bit about functions which can be represented by matrices.
These linear functions, in fact, are just about the only functions we really understand.</p>
<h3 id="computing-the-loss-gradients-of-a-deep-linear-network">Computing the Loss Gradients of a Deep Linear Network</h3>
<p>This derivation is taken directly from
<a href="https://arxiv.org/abs/1712.01473">this arXiV paper</a>,
which goes on to prove some more cool stuff about linear networks.</p>
<p>When we train a neural network,
we’re interested in adjusting the parameters so as to minimize some negative outcome,
which we call the loss, \(l\).
In linear regression, this loss is often the squared prediction error.</p>
<p>For our purposes, we won’t need to know what it is.
We just need the following fact:</p>
<div style="text-align: center">\[l\left(W_L, \cdots W_1\right) = L(W)\]
</div>
<p>that is, there exists a function \(L\) that is just a function of \(W\),
our product of matrices, and which is equal to the original loss \(l\),
which is a function of all of the matrices.</p>
<p>A side note:
you might think of both of these as being functions also of the data \(x\),
but we won’t be referring to the data, so I’ve chosen to suppress that in the notation.</p>
<p>And before we take off, I’ll introduce one last piece of notation.
We’ll need to refer to abbreviated versions of our matrix product \(W\),
e.g. the product of the first 5 or last 3 matrices.
Inspired by Python, we’ll denote the products truncated at \(i\)
from the front and from the back as</p>
<div style="text-align: center">\[W_{:i}, W_{i+1:}\]
</div>
<p>respectively.
If \(i\) is 1 or \(L\),
the resulting matrices are taken to be the identity.</p>
<p>We now recall the Fréchet definition of the derivatve
\(\nabla g\),
of a matrix-to-scalar function \(g\):</p>
<div style="text-align: center">\[g(M + \epsilon) = g(M) + \langle\nabla g(M), \epsilon\rangle + o(\epsilon)\]
</div>
<p>see
<a href="http://charlesfrye.github.io/math/2018/03/06/frechet-derivative-introduction.html">the first blog post in this series</a>
for details.
Most importantly, the inner product \(\langle , \rangle\) is in the
<a href="http://charlesfrye.github.io/math/2018/02/28/how-big-is-a-matrix.html">Frobenius norm</a>,
meaning it applies to matrices and can also be written</p>
<div style="text-align: center">\[\langle A, B \rangle = \mathrm{tr}\left(A^\top B\right)\]
</div>
<p>where \(\mathrm{tr}\) is the trace of the matrix.
This is equivalent to multiplying the matrices entrywise and summing up the results,
much like the more familiar inner product of vectors.</p>
<p>Let’s take our derivative definition and plug in our linear network loss function,
computing the derivative with respect to the \(k\)th weight matrix:</p>
<div style="text-align: center">\[\begin{align}
&l\left(W_1, \dots W_{k-1}, W_k +\epsilon, W_{k+1}, \dots W_L\right)\\
&= L(W + W_{k+1:}\epsilon W_{:k}) \\
&= L(W) + \langle \nabla L(W), W_{k+1:}\epsilon W_{:k}\rangle + o(\epsilon)
\end{align}\]
</div>
<p>where the second line follows by multiplying out the matrix product and applying our definitions.</p>
<p>Next, we swap over to the trace version of the inner product and apply the cyclic property
which was so crucial in the
<a href="http://charlesfrye.github.io/math/2018/03/07/frechet-least-squares.html">the second blog post in this series</a>.
That is, underneath the trace, we are allowed to permute matrices,
letting us write</p>
<div style="text-align: center">\[\begin{align}
&\langle \nabla L(W), W_{k+1:}\epsilon W_{:k}\rangle\\
&= \mathrm{tr}\left(\nabla L(W)^\top W_{k+1:}\epsilon W_{:k}\right)\\
&= \mathrm{tr}\left( W_{:k}\nabla L(W)^\top W_{k+1:}\epsilon\right)\\
&= \langle W_{k+1:}^\top\nabla L(W) W_{:k}^\top,\epsilon\rangle
\end{align}\]
</div>
<p>and thus, pattern-matching to the definition of the derivative, we have that</p>
<div style="text-align: center">\[\begin{align}
\nabla_{W_k} l(W_1, \dots, W_L) = W_{k+1:}^\top \nabla L(W) W_{:k}^\top
\end{align}\]
</div>
<p>That is, we take the derivative with respect to the whole product of matrices
and then propagate it through every other matrix, transposed.</p>
<h3 id="closing">Closing</h3>
<p>This proof may feel shockingly short and direct.
Where are all the indices and sums?
This is the advantage, which I hope this series has made clear,
of using Fréchet derivatives on polynomial and linear functions
that have scalar outputs but take in vectors or matrices.</p>
<p>For functions, like neural networks, where there is mixture of linear and nonlinear components,
the Fréchet derivative can be used to handle the linear component,
which is frequently the source of troubles, from shapes to indexing.
With that part handled, attention can go where it is most needed,
on the nonlinear piece.</p>
Fri, 18 Jan 2019 00:00:00 +0000
http://charlesfrye.github.io/math/2019/01/18/frechet-linear-network.html
http://charlesfrye.github.io/math/2019/01/18/frechet-linear-network.htmlmathGoogle Colab on Neural Networks<blockquote>
<p>The core ideas that make up neural networks are deceptively simple. The emphasis here is on deceptive.</p>
</blockquote>
<p>For a recent talk to a group of undergraduates interested in machine learning, I wrote a short tutorial on what I think are the core concepts needed to understand neural networks in such a way that they can be understood by someone with no more than high school mathematics and a passing familiarity with programming.</p>
<p>Thanks to the power of <em>the cloud</em>, you can check out the tutorial and run the examples.
Just check out <a href="https://colab.research.google.com/drive/1FwzkEGvppWtnkcUsocnAoPVCNSs3CWBH?fbclid=IwAR16cyHBK1_tT3FH_gDl8DxDPn33cvPULo2TfoGfHcBAb-7ecnyVjtnWnCE">this link</a>.
This time, I chose to use Google’s “Colaboratory”, which is like Google Drive for Jupyter notebooks.</p>
<!--exc-->
Wed, 31 Oct 2018 00:00:00 +0000
http://charlesfrye.github.io/external/2018/10/31/neural-nets-colab.html
http://charlesfrye.github.io/external/2018/10/31/neural-nets-colab.htmlexternalFunctors and Film Strips<p style="text-align: center"><img src="https://charlesfrye.github.io/img/stack_smileys.png" alt="stack_smileys" /></p>
<!--exc-->
<h2 id="summary">Summary</h2>
<p>A video can be represented in two different ways:
as a film strip, which lays out a sequence of frames across space,
or as a movie, which lays out a sequence of frames across time.
Each representation has its uses:
editing a video as it’s playing is basically impossible;
enjoying a video as a film strip is challenging.
When needed, videos are (or used to be) edited
by first turning them into strips of film,
then editing the film,
and then projecting the film as a movie.
By implementing this process in Python
and then describing it mathematically,
we’ll reinvent the concept of <em>functors</em>,
which are widely used in a branch of math called
<em>category theory</em> and in <em>functional programming languages</em>
like Haskell.</p>
<h3 id="film-editing-101">Film Editing 101</h3>
<p>While working on a machine learning project recently,
I ran into a common problem:
the code base I wanted to work with made a few assumptions
that didn’t fit my situation, and so it refused to run.
Specifically, the code was written to work with <em>image</em> data,
but I was working with <em>movie</em> data.
That is, the code expected arrays with three dimensions,
height, width, and color, but I was working with
arrays with four dimensions, time, height, width, and color.</p>
<p>I resolved this problem by using the same trick as a video editor:
I converted the movies into wide images, like strips of film,
applied the code to them,
and then converted the film strips back into videos.</p>
<p>To see how this works conretely,
imagine we’d like to alter or edit video data that
is available only as a movie.</p>
<p>First, let’s consider the movie-to-film process.
If we have a movie and we want to convert it to film,
we must record it, e.g. with a camera.
Let’s abstract this process as <code class="language-plaintext highlighter-rouge">capture</code>,
as in <code class="language-plaintext highlighter-rouge">film = capture(movie)</code>.</p>
<p>At this point, we can apply any transformation
that operates on films.
For example, we can correct the colors, frame by frame,
or we can add subtitles,
which is much easier in less than real time.
The result is an edited film, which we write
<code class="language-plaintext highlighter-rouge">edited_film = f(film)</code>,
where <code class="language-plaintext highlighter-rouge">f</code> is a function that takes in film strips
and returns film strips, here abstracting the human editing process.</p>
<p>Finally, we must convert the (edited) film strip back into a movie
if we want to enjoy the fruits of our labor.
This is done by <code class="language-plaintext highlighter-rouge">project</code>ing the film strip, by passing it
in front of a synchronized light source that converts each frame
in the strip into a projected frame in sequence.
If we wanted to write this process the
same way we did the previous two steps,
we might write
<code class="language-plaintext highlighter-rouge">movie = project(film)</code>,
where <code class="language-plaintext highlighter-rouge">film</code> can be any film strip whose aspect ratio
matches the settings on our projection apparatus.</p>
<h3 id="film-editing-for-pythons">Film Editing for Pythons</h3>
<p>Once we’ve agreed on a way to represent movies and film strips,
we can go about writing a computer-friendly version of the editing process,
which is both a nice springboard to thinking about it mathematically
and a useful exercise to check our work.
Computers are like diligent but not-too-bright students
who listen to everything we say and nothing more,
and bring nothing of their own to the table.
If we cannot get them to understand it, we’re missing something
in our description.
This level of rigor is sometimes very useful.</p>
<p>We begin with the <code class="language-plaintext highlighter-rouge">capture</code> process, which converts a movie into a film.
There’s a convenient function in <code class="language-plaintext highlighter-rouge">numpy</code> that solves this for us:
<code class="language-plaintext highlighter-rouge">hstack</code>, which takes an array of \(N\) dimensions
and returns an array of \(N-1\) dimensions in which the first dimension
has been “stacked” columnwise.</p>
<p>For example, matrices are two dimensional
and vectors are one dimensional.
So if we took the matrix <code class="language-plaintext highlighter-rouge">np.eye(3)</code>,
which is the identity matrix with three rows,
and apply <code class="language-plaintext highlighter-rouge">hstack</code>,
as below,</p>
<div style="text-align: center">\[\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \end{bmatrix}
\xrightarrow{\texttt{hstack}}
\begin{bmatrix}
1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1
\end{bmatrix}\]
</div>
<p>we get as output the vector on the right.
If we view the matrix as a sequence of rows,
then the <code class="language-plaintext highlighter-rouge">hstack</code> transformation takes a
sequence of short rows and returns
a single very long row.</p>
<p>Similarly, we can provide
a three-dimensional object,
which is like a sequence of matrices,
and get back out a very wide single matrix,
as in the cartoon below.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/stack_smileys.png" alt="stack_smileys" /></p>
<p>Note how similar the right hand side
looks to a film strip!
This means that the Python code is simple.
Presuming we’ve <code class="language-plaintext highlighter-rouge">import</code>ed <code class="language-plaintext highlighter-rouge">numpy as np</code>,
we simply write</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">capture</span><span class="p">(</span><span class="n">movie</span><span class="p">):</span>
<span class="n">film</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">(</span><span class="n">movie</span><span class="p">)</span>
<span class="k">return</span> <span class="n">film</span>
</code></pre></div></div>
<p>Now to model “projecting” a film
to get a movie.
Reversing our description of <code class="language-plaintext highlighter-rouge">hstack</code> above,
we need a function that converts a single, very wide
array of dimension \(N\)
into an array of dimension \(N+1\)
that is like a sequence of less wide arrays of dimension \(N\).</p>
<p>The <code class="language-plaintext highlighter-rouge">numpy</code> function that achieves this is called <code class="language-plaintext highlighter-rouge">split</code>.
It needs to know which axis to split over
and it needs to know the number of frames,
but then it does exactly what we want,
as shown in the cartoon below</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/split_smileys.png" alt="split_smileys" /></p>
<p>Because we chose to use <code class="language-plaintext highlighter-rouge">hstack</code>,
which stacks across columns, we know that
<code class="language-plaintext highlighter-rouge">axis=1</code>, the column axis in numpy.
The other argument, the number of frames,
is a bit trickier.</p>
<p>It can’t be inferred just from the film strip,
at least as we’ve done it.
Though real film strips have regions separating
each frame, ours do not.
Therefore, the projectionist,
looking at our smiley video as a strip of film,
has no way of telling whether it’s meant to be
a simple movie of a smiley going from surprised to happy
or a more <em>avant-garde</em> movie showing two halves of a face
flickering from side to side.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/smiley_dilemma.png" alt="smiley_dilemma" /></p>
<p>By making assumptions on what kinds of videos
are likely to come through, we could try to infer
the location of the splits,
but that would require us to make judgment calls,
which might make our system break for,
e.g., the
<a href="https://www.youtube.com/watch?v=s5i1pZPPzZ8">hand-painted films of Stan Brakhage</a>.
Since we don’t want to disenfranchise anybody,
we instead restrict our projector to operating with a fixed aspect ratio.
From the aspect ratio and the height and width of the film strip,
we can deduce the number of frames.
If we later have videos with different aspect ratios,
we can just define more projectors.</p>
<p>The resulting Python code is a bit more involved
than our <code class="language-plaintext highlighter-rouge">capure</code> code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">project</span><span class="p">(</span><span class="n">film</span><span class="p">):</span>
<span class="n">aspect_ratio</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># projector settings
</span> <span class="n">frame_height</span><span class="p">,</span> <span class="n">film_width</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">film</span><span class="p">.</span><span class="n">shape</span>
<span class="n">num_frames</span> <span class="o">=</span> <span class="n">film_width</span> <span class="o">/</span> <span class="n">height</span> <span class="o">/</span> <span class="n">aspect_ratio</span>
<span class="n">movie</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="n">film</span><span class="p">,</span> <span class="n">num_frames</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">movie</span><span class="p">)</span>
</code></pre></div></div>
<p>But the majority of our effort is just spent on computing the
<code class="language-plaintext highlighter-rouge">num</code>ber of <code class="language-plaintext highlighter-rouge">frames</code>.
One small gotcha: <code class="language-plaintext highlighter-rouge">split</code> returns a list of equal-sized arrays,
which is subtly different from an array that has
those arrays as sub-components.
To avoid this, we need to use <code class="language-plaintext highlighter-rouge">asarray</code>
to convert the list into an array.</p>
<p>Now that we can convert films to movies and vice versa,
it’s time for the magic step, which lets us
convert any function that operates on films
(with a fixed aspect ratio)
to a function that operates on movies:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">convert_to_movie_function</span><span class="p">(</span><span class="n">film_function</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">movie_function</span><span class="p">(</span><span class="n">movie</span><span class="p">):</span>
<span class="n">edited_movie</span> <span class="o">=</span> <span class="n">project</span><span class="p">(</span><span class="n">film_function</span><span class="p">(</span><span class="n">capture</span><span class="p">(</span><span class="n">movie</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">edited_movie</span>
<span class="k">return</span> <span class="n">movie_function</span>
</code></pre></div></div>
<p>This is the way to express, in Python,
the very intuitive idea that we can watch an edited movie
by converting the movie to film (<code class="language-plaintext highlighter-rouge">capture</code>),
editing the film (<code class="language-plaintext highlighter-rouge">film_function</code>),
and then projecting the edited film as a movie (<code class="language-plaintext highlighter-rouge">project</code>).</p>
<p>You should stop a moment to think about how you might write
<code class="language-plaintext highlighter-rouge">convert_to_film_function</code>, which takes a <code class="language-plaintext highlighter-rouge">movie_function</code>
and returns a <code class="language-plaintext highlighter-rouge">film_function</code>.</p>
<h3 id="generalized-abstract-nonsense">Generalized Abstract Nonsense</h3>
<p>Now that we’ve worked through a concrete example,
both intuitively and more rigorously, through programming it,
we’re ready to think abstractly about this problem.</p>
<p>Our “editing” process was really just almost any transformation
of the film that didn’t change its aspect ratio
(for the diligent, restrict to <em>pure</em>, <em>computable</em>, <em>total</em> functions
that preserve aspect ratio).</p>
<p>Imagine drawing each valid film as a dot on a piece of paper,
and then drawing an arrow pointing
from a film to its edited version.
Mathmetically, a collection of dots
and arrows between them is called a <em>graph</em>.
A small example appears below,
showing a film,
its subtitled version,
and the version that has been subtitled and cut down
to a shorter running time,
connected by the relevant arrows.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/film_cat.png" alt="film_cat" /></p>
<p>Briefly thinking in Python terms again,
notice that if <code class="language-plaintext highlighter-rouge">subtitle</code> and <code class="language-plaintext highlighter-rouge">cut</code> are functions
whose inputs and outputs are valid films,
then we can always define a function <code class="language-plaintext highlighter-rouge">subtitle_then_cut</code>,
that looks like</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">subtitle_then_cut</span><span class="p">(</span><span class="n">film</span><span class="p">):</span>
<span class="k">return</span> <span class="n">cut</span><span class="p">(</span><span class="n">subtitle</span><span class="p">(</span><span class="n">film</span><span class="p">))</span>
</code></pre></div></div>
<p>whose inputs and outputs are always valid films,
and so our previous drawing was <em>missing an arrow</em>.</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/film_cat_compose.png" alt="film_cat_compose" /></p>
<p>Combining two functions into one
by applying one after the other is called <em>composition</em>.
If the functions are \(f\) and \(g\), applied in that order,
we usually write \(g \circ f\) for their composition,
which we might pronounce “\(g\) after \(f\)”.
Alternatively, we might write \(f . g\),
pronounced “\(f\) then \(g\)”.</p>
<p>Now imagine drawing the collection of all valid
movies in a similar way.
Rather than starting over again,
build it from the film diagram.
Start with the dots.
They correspond to the projected versions of each film.
How do we determine the arrows?</p>
<p>We can generate an arrow in our movie diagram
by using <code class="language-plaintext highlighter-rouge">convert_to_movie_function</code>
to take an arrow between films and
create an arrow between movies.
A diagram of this process appears below:</p>
<p style="text-align: center"><img src="https://charlesfrye.github.io/img/film_functor.png" alt="film_functor" /></p>
<p>Dashed arrows represent parts of this process;
single blue arrows represent the action of <code class="language-plaintext highlighter-rouge">capture</code>;
double blue arrows represent the action of <code class="language-plaintext highlighter-rouge">convert_to_movie_function</code>.</p>
<p>Notice that we haven’t included
the arrow for <code class="language-plaintext highlighter-rouge">cut</code> \(\circ\) <code class="language-plaintext highlighter-rouge">subtitle</code>.
By our argument above,
that arrow can be <em>inferred</em> from the arrows
in the simpler diagram, so drawing it is superfluous.
This is one of the reasons mathematics is interested
in rules, which we might frame more positively as <em>guarantees</em>.
They allow us to be more succinct without losing clarity,
at least in principle.</p>
<p>Another thing to notice is that that missing arrow
is missing from both diagrams; that is,
we also have a <code class="language-plaintext highlighter-rouge">cut</code> \(\circ\) <code class="language-plaintext highlighter-rouge">subtitle</code> arrow
for movies.
Returning to our concrete implementation,
consider the difference between
<code class="language-plaintext highlighter-rouge">convert_to_movie_function(subtitle_then_cut)</code>
and the function
<code class="language-plaintext highlighter-rouge">convert_to_movie_function(cut)</code> \(\circ\) <code class="language-plaintext highlighter-rouge">convert_to_movie_function(subtitle)</code>.
The latter will repeatedly convert back and forth between film and movie,
while the former will only do that twice: once at the beginning and once at the end
of the editing pipeline.</p>
<p>If we can confirm a rule, or provide a guarantee,
that the two functions are identical,
then we can write a faster,
perhaps <em>much</em> faster, version without losing anything.
If we were software engineers designing an editing interface,
we’d know that we can take arbitrary-length lists
of editing steps and turn them into a single function
with only two conversion steps
(ponder: how did we go from two functions to an arbitrary number?
hint: composition).</p>
<p>If we become more precise in stating our assumptions,
we can say that the collection of all valid films
and the editing procedures that convert one into another
form a <em>category</em>, a structure from abstract mathematics
that encompasses just about all cases where
some kind of composition “makes sense”.
A mapping that takes one category, e.g. films and edits,
and associates it to another category, e.g. movies and edits,
is called a <em>functor</em>,
just like a mapping between two sets is called a func <em>tion</em>.
The only restriction is that the functor must preserve composition,
meaning that in our film example
<code class="language-plaintext highlighter-rouge">convert_to_movie_function(subtitle_then_cut)</code>
and
<code class="language-plaintext highlighter-rouge">convert_to_movie_function(cut)</code> \(\circ\) <code class="language-plaintext highlighter-rouge">convert_to_movie_function(subtitle)</code>
are the same function, just implemented differently.
Writing the names of the functions as \(f\) and \(g\),
and writing our functor as \(\Phi\), pronounced “fee”,
we can write this condition as</p>
<div style="text-align: center">\[\begin{align}
\Phi(f) \circ \Phi(g) = \Phi(f \circ g)
\end{align}\]
</div>
<p>Functors abound in the world of computer science.
There is a functor between data and lists of data,
which Python folks may have come across as a
<em>list comprehension</em>: a function that takes in the elements of the list
and returns elements is used to create an expression that acts
on the list and returns a list.</p>
<h3 id="more-relevant-reading">More Relevant Reading</h3>
<p>For more detail on categories and functors in computer science,
check out
<a href="https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/">Category Theory for Programmers</a>,
which teaches the functional programming language Haskell
alongside category theory concepts.
If you prefer a lecture format,
check out
<a href="https://www.youtube.com/watch?v=I8LbkfSSR58">the author’s YouTube videos</a>.
These presume some degree of maturity with programming.</p>
<p>For a beginner-oriented pure math approach to understanding functors,
I suggest
<a href="https://graphicallinearalgebra.net">Graphical Linear Algebra</a>
by Pawel Sobocinski.
The primary interest is in presenting a novel view
of linear algebra using diagrams called <em>string diagrams</em>,
which can make even advanced abstract algebra feel intuitive.
The posts are didactic, so all of the necessary category theory
is explained as it is introduced.</p>
<p>For more quick examples of using similar ideas,
check out the blog post
<a href="https://golem.ph.utexas.edu/category/2016/06/how_the_simplex_is_a_vector_sp.html">“How the Simplex is a Vector Space”</a>,
which develops an interesting connection between probabilities, log-probabilities,
and linear algebra,
and
<a href="http://charlesfrye.github.io/math/2018/02/28/how-big-is-a-matrix.html">“How Long is a Matrix?”</a>,
which makes use of the same connection between
\(N\)-dimensional arrays and \(N-1\) dimensional arrays.</p>
Tue, 21 Aug 2018 00:00:00 +0000
http://charlesfrye.github.io/math/2018/08/21/functors-film-strips.html
http://charlesfrye.github.io/math/2018/08/21/functors-film-strips.htmlmath