In the next section, we define an analogous algorithm for finding critical points. That is, we again try to solve a root-finding problem with Newton-Raphson, but this introduces a division, which we reformulate as an optimization problem.

Today a short paper I wrote posted to the arXiV. It’s on a cute connection between the algorithm I use to find the critical points of neural network losses and the algorithm used to compute square roots to high accuracy.

Check out this Twitter thread for a layman-friendly explanation.