Skip to content

Commit 5c665bf

Browse files
committed
[owlbook_ci] automated book compilation
1 parent e8a03fc commit 5c665bf

15 files changed

+31
-30
lines changed

Diff for: HISTORY.log

+1
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,4 @@ Tue Jan 12 15:51:40 GMT 2021
4343
Wed Jan 13 15:01:27 GMT 2021
4444
Sun Jan 17 04:21:03 GMT 2021
4545
Sat Jul 3 17:06:28 BST 2021
46+
Mon Jan 17 14:54:56 GMT 2022

Diff for: docs/algodiff.html

+5-5
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ <h2>How Algorithmic Differentiation Works</h2>
119119
<p>Based on this graphic representation, there are two major ways to apply the chain rules: the forward differentiation mode, and the reverse differentiation mode (not “backward differentiation”, which is a method used for solving ordinary differential equations). Next, we introduce these two methods.</p>
120120
<section class="level3" id="forward-mode">
121121
<h3>Forward Mode</h3>
122-
<p>Our target is to calculate <span class="math inline">\(\frac{\partial~y}{\partial~x_0}\)</span> (partial derivative regarding <span class="math inline">\(x_1\)</span> should be similar). But hold your horse, let’s start with some earlier intermediate results that might be helpful. For example, what is <span class="math inline">\(\frac{\partial~x_0}{\partial~x_1}\)</span>? 1, obviously. Equally obvious is <span class="math inline">\(\frac{\partial~x_1}{\partial~x_1} = 0\)</span>. It’s just elementary. Now, things gets a bit trickier: what is <span class="math inline">\(\frac{\partial~v_3}{\partial~x_0}\)</span>? It is a good time to use the chain rule:</p>
122+
<p>Our target is to calculate <span class="math inline">\(\frac{\partial~y}{\partial~x_0}\)</span> (partial derivative regarding <span class="math inline">\(x_1\)</span> should be similar). But hold your horse, let’s start with some earlier intermediate results that might be helpful. For example, what is <span class="math inline">\(\frac{\partial~x_0}{\partial~x_1}\)</span>? It’s 0. Also, <span class="math inline">\(\frac{\partial~x_1}{\partial~x_1} = 1\)</span>. Now, things gets a bit trickier: what is <span class="math inline">\(\frac{\partial~v_3}{\partial~x_0}\)</span>? It is a good time to use the chain rule:</p>
123123
<p><span class="math display">\[\frac{\partial~v_3}{\partial~x_0} = \frac{\partial~(x_0~x_1)}{\partial~x_0} = x_1~\frac{\partial~(x_0)}{\partial~x_0} + x_0~\frac{\partial~(x_1)}{\partial~x_0} = x_1.\]</span></p>
124124
<p>After calculating <span class="math inline">\(\frac{\partial~v_3}{\partial~x_0}\)</span>, we can then processed with derivatives of <span class="math inline">\(v_5\)</span>, <span class="math inline">\(v_6\)</span>, all the way to that of <span class="math inline">\(v_9\)</span> which is also the output <span class="math inline">\(y\)</span> we are looking for. This process starts with the input variables, and ends with output variables. Therefore, it is called <em>forward differentiation</em>. We can simplify the math notations in this process by letting <span class="math inline">\(\dot{v_i}=\frac{\partial~(v_i)}{\partial~x_0}\)</span>. The <span class="math inline">\(\dot{v_i}\)</span> here is called <em>tangent</em> of function <span class="math inline">\(v_i(x_0, x_1, \ldots, x_n)\)</span> with regard to input variable <span class="math inline">\(x_0\)</span>, and the original computation results at each intermediate point is called <em>primal</em> values. The forward differentiation mode is sometimes also called “tangent linear” mode.</p>
125125
<p>Now we can present the full forward differentiation calculation process, as shown in tbl.&nbsp;2. Two simultaneous computing processes take place, shown as two separated columns: on the left side is the computation procedure specified by eq.&nbsp;5; on the right side shows computation of derivative for each intermediate variable with regard to <span class="math inline">\(x_0\)</span>. Let’s find out <span class="math inline">\(\dot{y}\)</span> when setting <span class="math inline">\(x_0 = 1\)</span>, and <span class="math inline">\(x_1 = 1\)</span>.</p>
@@ -223,7 +223,7 @@ <h3>Reverse Mode</h3>
223223
<p><span class="math display">\[\bar{v_i} = \frac{\partial~y}{\partial~v_i}\]</span></p>
224224
<p>be the derivative of output variable <span class="math inline">\(y\)</span> with regard to intermediate node <span class="math inline">\(v_i\)</span>. It is called the <em>adjoint</em> of variable <span class="math inline">\(v_i\)</span> with respect to the output variable <span class="math inline">\(y\)</span>. Using this notation, eq.&nbsp;6 can be expressed as:</p>
225225
<p><span class="math display">\[\bar{v_7} = \bar{v_9} * \frac{\partial~v_9}{\partial~v_7} = 1 * \frac{1}{v_8}.\]</span></p>
226-
<p>Note the difference between tangent and adjoint. In the forward mode, we know <span class="math inline">\(\dot{v_0}\)</span> and <span class="math inline">\(\dot{v_1}\)</span>, then we calculate <span class="math inline">\(\dot{v_2}\)</span>, <span class="math inline">\(\dot{v3}\)</span>, …. and then finally we have <span class="math inline">\(\dot{v_9}\)</span>, which is the target. Here, we start with knowing <span class="math inline">\(\bar{v_9} = 1\)</span>, and then we calculate <span class="math inline">\(\bar{v_8}\)</span>, <span class="math inline">\(\bar{v_7}\)</span>, …. and then finally we have <span class="math inline">\(\bar{v_0} = \frac{\partial~y}{\partial~v_0} = \frac{\partial~y}{\partial~x_0}\)</span>, which is also exactly our target. Again, <span class="math inline">\(\dot{v_9} = \bar{v_0}\)</span> in this example, given that we are talking about derivative regarding <span class="math inline">\(x_0\)</span> when we use <span class="math inline">\(\dot{v_9}\)</span>. Following this line of calculation, the reverse differentiation mode is also called <em>adjoint mode</em>.</p>
226+
<p>Note the difference between tangent and adjoint. In the forward mode, we know <span class="math inline">\(\dot{v_0}\)</span> and <span class="math inline">\(\dot{v_1}\)</span>, then we calculate <span class="math inline">\(\dot{v_2}\)</span>, <span class="math inline">\(\dot{v_3}\)</span>, …. and then finally we have <span class="math inline">\(\dot{v_9}\)</span>, which is the target. Here, we start with knowing <span class="math inline">\(\bar{v_9} = 1\)</span>, and then we calculate <span class="math inline">\(\bar{v_8}\)</span>, <span class="math inline">\(\bar{v_7}\)</span>, …. and then finally we have <span class="math inline">\(\bar{v_0} = \frac{\partial~y}{\partial~v_0} = \frac{\partial~y}{\partial~x_0}\)</span>, which is also exactly our target. Again, <span class="math inline">\(\dot{v_9} = \bar{v_0}\)</span> in this example, given that we are talking about derivative regarding <span class="math inline">\(x_0\)</span> when we use <span class="math inline">\(\dot{v_9}\)</span>. Following this line of calculation, the reverse differentiation mode is also called <em>adjoint mode</em>.</p>
227227
<p>With that in mind, let’s see the full steps of performing reverse differentiation. First, we need to perform a forward pass to compute the required intermediate values, as shown in tbl.&nbsp;3.</p>
228228
<div id="tbl:algodiff:reverse_01">
229229
<table style="width:44%;">
@@ -964,7 +964,7 @@ <h3>Example: Reverse Mode</h3>
964964
</section>
965965
<section class="level2" id="high-level-apis">
966966
<h2>High-Level APIs</h2>
967-
<p>What we have seen is the basic of AD modules. There might be cases you do need to operate these low-level functions to write up your own applications (e.g., implementing a neural network), then knowing the mechanisms behind the scene is definitely a big plus. However, using these complex low level function hinders daily use of algorithmic differentiation in numerical computation task. In reality, you don’t really need to worry about forward or reverse mode if you simply use high-level APIs such as <code>diff</code>, <code>grad</code>, <code>hessian</code>, and etc. They are all built on the forward or reverse mode that we have seen, but provide clean interfaces, making a lot of details transparent to users. In this section we will introduce how to use these high level APIs.</p>
967+
<p>What we have seen is the basic of AD modules. There might be cases you do need to operate these low-level functions to write up your own applications (e.g., implementing a neural network), then knowing the mechanisms behind the scene is definitely a big plus. However, using these complex low level function hinders daily use of algorithmic differentiation in numerical computation task. In reality, you don’t really need to worry about forward or reverse mode if you simply use high-level APIs such as <code>diff</code>, <code>grad</code>, <code>hessian</code>, etc. They are all built on the forward or reverse mode that we have seen, but provide clean interfaces, making a lot of details transparent to users. In this section we will introduce how to use these high level APIs.</p>
968968
<section class="level3" id="derivative-and-gradient">
969969
<h3>Derivative and Gradient</h3>
970970
<p>The most basic and commonly used differentiation functions is used for calculating the <em>derivative</em> of a function. The AD module provides <code>diff</code> function for this task. Given a function <code>f</code> that takes a scalar as input and also returns a scalar value, we can calculate its derivative at a point <code>x</code> by <code>diff f x</code>, as shown in this function signature.</p>
@@ -1074,7 +1074,7 @@ <h3>Hessian and Laplacian</h3>
10741074
<p>Another way to extend the gradient is to find the second order derivatives of a multivariate function which takes <span class="math inline">\(n\)</span> input variables and outputs a scalar. Its second order derivatives can be organised as a matrix:</p>
10751075
<p><span class="math display">\[ \mathbf{H}(y) = \left[ \begin{matrix} \frac{\partial^2~y_1}{\partial~x_1^2} &amp; \frac{\partial^2~y_1}{\partial~x_1~x_2} &amp; \ldots &amp; \frac{\partial^2~y_1}{\partial~x_1~x_n} \\ \frac{\partial^2~y_2}{\partial~x_2~x_1} &amp; \frac{\partial^2~y_2}{\partial~x_2^2} &amp; \ldots &amp; \frac{\partial^2~y_2}{\partial~x_2~x_n} \\ \vdots &amp; \vdots &amp; \ldots &amp; \vdots \\ \frac{\partial^2~y_m}{\partial^2~x_n~x_1} &amp; \frac{\partial^2~y_m}{\partial~x_n~x_2} &amp; \ldots &amp; \frac{\partial^2~y_m}{\partial~x_n^2} \end{matrix} \right]\]</span></p>
10761076
<p>This matrix is called the <em>Hessian Matrix</em>. As an example of using it, consider the <em>newton’s method</em>. It is also used for solving the optimisation problem, i.e.&nbsp;to find the minimum value on a function. Instead of following the direction of the gradient, the newton method combines gradient and second order gradients: <span class="math inline">\(\frac{\nabla~f(x_n)}{\nabla^{2}~f(x_n)}\)</span>. Specifically, starting from a random position <span class="math inline">\(x_0\)</span>, and it can be iteratively updated by repeating this procedure until converge, as shown in eq.&nbsp;8.</p>
1077-
<p><span id="eq:algodiff:newtons"><span class="math display">\[x_(n+1) = x_n - \alpha~\mathbf{H}^{-1}\nabla~f(x_n)\qquad(8)\]</span></span></p>
1077+
<p><span id="eq:algodiff:newtons"><span class="math display">\[x_{n+1} = x_n - \alpha~\mathbf{H}^{-1}\nabla~f(x_n)\qquad(8)\]</span></span></p>
10781078
<p>This process can be easily represented using the <code>Algodiff.D.hessian</code> function.</p>
10791079
<div class="highlight">
10801080
<pre><code class="language-ocaml">open Algodiff.D
@@ -1097,7 +1097,7 @@ <h3>Hessian and Laplacian</h3>
10971097
</section>
10981098
<section class="level3" id="other-apis">
10991099
<h3>Other APIs</h3>
1100-
<p>Besides, there are also many helper functions, such as <code>jacobianv</code> for calculating jacobian vector product; <code>diff'</code> for calculating both <code>f x</code> and <code>diff f x</code>, and etc. They will come handy in certain cases for the programmers. Besides the functions we have already introduced, the complete list of APIs can be found in the table below.</p>
1100+
<p>Besides, there are also many helper functions, such as <code>jacobianv</code> for calculating jacobian vector product; <code>diff'</code> for calculating both <code>f x</code> and <code>diff f x</code>, etc. They will come handy in certain cases for the programmers. Besides the functions we have already introduced, the complete list of APIs can be found in the table below.</p>
11011101
<div id="tbl:algodiff:apis">
11021102
<table>
11031103
<caption>Table 5: List of other APIs in the AD module of Owl</caption>

0 commit comments

Comments
 (0)