|
1 | 1 | {
|
2 | 2 | "cells": [
|
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Runinng a backward pass through LeNet using MNIST and Joey" |
| 8 | + ] |
| 9 | + }, |
| 10 | + { |
| 11 | + "cell_type": "markdown", |
| 12 | + "metadata": {}, |
| 13 | + "source": [ |
| 14 | + "In this notebook, we will construct LeNet using Joey and run a backward pass through it with some training data from MNIST.\n", |
| 15 | + "\n", |
| 16 | + "The aim of a backward pass is calculating gradients of all network parameters necessary for later weight updates done by a PyTorch optimizer. A backward pass follows a forward pass." |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "markdown", |
| 21 | + "metadata": {}, |
| 22 | + "source": [ |
| 23 | + "Firstly, let's import the required prerequisites:" |
| 24 | + ] |
| 25 | + }, |
3 | 26 | {
|
4 | 27 | "cell_type": "code",
|
5 | 28 | "execution_count": 1,
|
|
11 | 34 | "import torchvision.transforms as transforms\n",
|
12 | 35 | "import joey as ml\n",
|
13 | 36 | "import matplotlib.pyplot as plt\n",
|
14 |
| - "import numpy as np" |
| 37 | + "import numpy as np\n", |
| 38 | + "import torch.nn as nn\n", |
| 39 | + "import torch.nn.functional as F\n", |
| 40 | + "import torch.optim as optim" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "metadata": {}, |
| 46 | + "source": [ |
| 47 | + "Then, let's define `imshow()` allowing us to look at the training data we'll use for the backward pass." |
15 | 48 | ]
|
16 | 49 | },
|
17 | 50 | {
|
|
27 | 60 | " plt.show()"
|
28 | 61 | ]
|
29 | 62 | },
|
| 63 | + { |
| 64 | + "cell_type": "markdown", |
| 65 | + "metadata": {}, |
| 66 | + "source": [ |
| 67 | + "In this particular example, every training batch will have 4 images." |
| 68 | + ] |
| 69 | + }, |
30 | 70 | {
|
31 | 71 | "cell_type": "code",
|
32 | 72 | "execution_count": 3,
|
|
36 | 76 | "batch_size = 4"
|
37 | 77 | ]
|
38 | 78 | },
|
| 79 | + { |
| 80 | + "cell_type": "markdown", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "Once we have `imshow()` and `batch_size` defined, we'll download the MNIST images using PyTorch." |
| 84 | + ] |
| 85 | + }, |
39 | 86 | {
|
40 | 87 | "cell_type": "code",
|
41 | 88 | "execution_count": 4,
|
|
53 | 100 | "dataiter = iter(trainloader)"
|
54 | 101 | ]
|
55 | 102 | },
|
| 103 | + { |
| 104 | + "cell_type": "markdown", |
| 105 | + "metadata": {}, |
| 106 | + "source": [ |
| 107 | + "In our case, only one batch will be used for the backward pass. Joey accepts only NumPy arrays, so we have to convert PyTorch tensors to their NumPy equivalents first." |
| 108 | + ] |
| 109 | + }, |
56 | 110 | {
|
57 | 111 | "cell_type": "code",
|
58 | 112 | "execution_count": 5,
|
|
63 | 117 | "input_data = images.numpy()"
|
64 | 118 | ]
|
65 | 119 | },
|
| 120 | + { |
| 121 | + "cell_type": "markdown", |
| 122 | + "metadata": {}, |
| 123 | + "source": [ |
| 124 | + "For reference, let's have a look at our training data. There are 4 images corresponding to the following digits: 5, 0, 4, 1." |
| 125 | + ] |
| 126 | + }, |
66 | 127 | {
|
67 | 128 | "cell_type": "code",
|
68 | 129 | "execution_count": 6,
|
|
85 | 146 | "imshow(torchvision.utils.make_grid(images))"
|
86 | 147 | ]
|
87 | 148 | },
|
| 149 | + { |
| 150 | + "cell_type": "markdown", |
| 151 | + "metadata": {}, |
| 152 | + "source": [ |
| 153 | + "At this point, we're ready to define `backward_pass()` running the backward pass through Joey-constructed LeNet. We'll do so using the `Conv`, `MaxPooling`, `Flat`, `FullyConnected` and `FullyConnectedSoftmax` layer classes along with the `Net` class packing everything into one network we can interact with." |
| 154 | + ] |
| 155 | + }, |
| 156 | + { |
| 157 | + "cell_type": "markdown", |
| 158 | + "metadata": {}, |
| 159 | + "source": [ |
| 160 | + "Note that a loss function has to be defined manually. Joey doesn't provide any built-in options here at the moment." |
| 161 | + ] |
| 162 | + }, |
88 | 163 | {
|
89 | 164 | "cell_type": "code",
|
90 | 165 | "execution_count": 7,
|
|
95 | 170 | " # Six 3x3 filters, activation RELU\n",
|
96 | 171 | " layer1 = ml.Conv(kernel_size=(6, 3, 3),\n",
|
97 | 172 | " input_size=(batch_size, 1, 32, 32),\n",
|
98 |
| - " activation=ml.activation.ReLU(),\n", |
99 |
| - " generate_code=False)\n", |
| 173 | + " activation=ml.activation.ReLU())\n", |
100 | 174 | " # Max 2x2 subsampling\n",
|
101 | 175 | " layer2 = ml.MaxPooling(kernel_size=(2, 2),\n",
|
102 | 176 | " input_size=(batch_size, 6, 30, 30),\n",
|
103 |
| - " stride=(2, 2),\n", |
104 |
| - " generate_code=False)\n", |
| 177 | + " stride=(2, 2))\n", |
105 | 178 | " # Sixteen 3x3 filters, activation RELU\n",
|
106 | 179 | " layer3 = ml.Conv(kernel_size=(16, 3, 3),\n",
|
107 | 180 | " input_size=(batch_size, 6, 15, 15),\n",
|
108 |
| - " activation=ml.activation.ReLU(),\n", |
109 |
| - " generate_code=False)\n", |
| 181 | + " activation=ml.activation.ReLU())\n", |
110 | 182 | " # Max 2x2 subsampling\n",
|
111 | 183 | " layer4 = ml.MaxPooling(kernel_size=(2, 2),\n",
|
112 | 184 | " input_size=(batch_size, 16, 13, 13),\n",
|
113 | 185 | " stride=(2, 2),\n",
|
114 |
| - " strict_stride_check=False,\n", |
115 |
| - " generate_code=False)\n", |
| 186 | + " strict_stride_check=False)\n", |
116 | 187 | " # Full connection (16 * 6 * 6 -> 120), activation RELU\n",
|
117 | 188 | " layer5 = ml.FullyConnected(weight_size=(120, 576),\n",
|
118 | 189 | " input_size=(576, batch_size),\n",
|
119 |
| - " activation=ml.activation.ReLU(),\n", |
120 |
| - " generate_code=False)\n", |
| 190 | + " activation=ml.activation.ReLU())\n", |
121 | 191 | " # Full connection (120 -> 84), activation RELU\n",
|
122 | 192 | " layer6 = ml.FullyConnected(weight_size=(84, 120),\n",
|
123 | 193 | " input_size=(120, batch_size),\n",
|
124 |
| - " activation=ml.activation.ReLU(),\n", |
125 |
| - " generate_code=False)\n", |
| 194 | + " activation=ml.activation.ReLU())\n", |
126 | 195 | " # Full connection (84 -> 10), output layer\n",
|
127 | 196 | " layer7 = ml.FullyConnectedSoftmax(weight_size=(10, 84),\n",
|
128 |
| - " input_size=(84, batch_size),\n", |
129 |
| - " generate_code=False)\n", |
| 197 | + " input_size=(84, batch_size))\n", |
130 | 198 | " # Flattening layer necessary between layer 4 and 5\n",
|
131 |
| - " layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6),\n", |
132 |
| - " generate_code=False)\n", |
| 199 | + " layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6))\n", |
133 | 200 | " \n",
|
134 | 201 | " layers = [layer1, layer2, layer3, layer4,\n",
|
135 | 202 | " layer_flat, layer5, layer6, layer7]\n",
|
136 | 203 | " \n",
|
137 | 204 | " net = ml.Net(layers)\n",
|
138 | 205 | " outputs = net.forward(input_data)\n",
|
139 | 206 | " \n",
|
140 |
| - " def loss_grad(layer, b):\n", |
| 207 | + " def loss_grad(layer, expected):\n", |
141 | 208 | " gradients = []\n",
|
142 | 209 | " \n",
|
143 |
| - " for i in range(10):\n", |
144 |
| - " result = layer.result.data[i, b]\n", |
145 |
| - " if i == expected_results[b]:\n", |
146 |
| - " result -= 1\n", |
147 |
| - " gradients.append(result)\n", |
| 210 | + " for b in range(batch_size):\n", |
| 211 | + " row = []\n", |
| 212 | + " for i in range(10):\n", |
| 213 | + " result = layer.result.data[i, b]\n", |
| 214 | + " if i == expected[b]:\n", |
| 215 | + " result -= 1\n", |
| 216 | + " row.append(result)\n", |
| 217 | + " gradients.append(row)\n", |
148 | 218 | " \n",
|
149 | 219 | " return gradients\n",
|
150 | 220 | " \n",
|
151 |
| - " net.backward(loss_grad)\n", |
| 221 | + " net.backward(expected_results, loss_grad)\n", |
152 | 222 | " \n",
|
153 | 223 | " return (layer1, layer2, layer3, layer4, layer_flat, layer5, layer6, layer7)"
|
154 | 224 | ]
|
155 | 225 | },
|
| 226 | + { |
| 227 | + "cell_type": "markdown", |
| 228 | + "metadata": {}, |
| 229 | + "source": [ |
| 230 | + "Afterwards, we're ready to run the backward pass." |
| 231 | + ] |
| 232 | + }, |
156 | 233 | {
|
157 | 234 | "cell_type": "code",
|
158 | 235 | "execution_count": 8,
|
|
167 | 244 | "/home/maksymilian/Desktop/UROP/devito/devito/types/grid.py:206: RuntimeWarning: divide by zero encountered in true_divide\n",
|
168 | 245 | " spacing = (np.array(self.extent) / (np.array(self.shape) - 1)).astype(self.dtype)\n",
|
169 | 246 | "Operator `Kernel` run in 0.01 s\n",
|
170 |
| - "Operator `Kernel` run in 0.01 s\n", |
171 |
| - "Operator `Kernel` run in 0.01 s\n", |
172 |
| - "Operator `Kernel` run in 0.01 s\n", |
173 | 247 | "Operator `Kernel` run in 0.01 s\n"
|
174 | 248 | ]
|
175 | 249 | }
|
|
182 | 256 | "cell_type": "markdown",
|
183 | 257 | "metadata": {},
|
184 | 258 | "source": [
|
185 |
| - "PyTorch:" |
| 259 | + "Results are stored in the `kernel_gradients` and `bias_gradients` properties of each layer (where applicable)." |
186 | 260 | ]
|
187 | 261 | },
|
188 | 262 | {
|
189 |
| - "cell_type": "code", |
190 |
| - "execution_count": 9, |
| 263 | + "cell_type": "markdown", |
191 | 264 | "metadata": {},
|
192 |
| - "outputs": [], |
193 | 265 | "source": [
|
194 |
| - "import torch.nn as nn\n", |
195 |
| - "import torch.nn.functional as F\n", |
196 |
| - "import torch.optim as optim" |
| 266 | + "In order to check the numerical correctness, we'll create the same network with PyTorch, run a backward pass through it using the same initial weights and data and compare the results with Joey's." |
| 267 | + ] |
| 268 | + }, |
| 269 | + { |
| 270 | + "cell_type": "markdown", |
| 271 | + "metadata": {}, |
| 272 | + "source": [ |
| 273 | + "Here's the PyTorch code:" |
197 | 274 | ]
|
198 | 275 | },
|
199 | 276 | {
|
200 | 277 | "cell_type": "code",
|
201 |
| - "execution_count": 10, |
| 278 | + "execution_count": 9, |
202 | 279 | "metadata": {},
|
203 | 280 | "outputs": [],
|
204 | 281 | "source": [
|
|
230 | 307 | },
|
231 | 308 | {
|
232 | 309 | "cell_type": "code",
|
233 |
| - "execution_count": 11, |
| 310 | + "execution_count": 10, |
234 | 311 | "metadata": {},
|
235 | 312 | "outputs": [],
|
236 | 313 | "source": [
|
|
252 | 329 | },
|
253 | 330 | {
|
254 | 331 | "cell_type": "code",
|
255 |
| - "execution_count": 12, |
| 332 | + "execution_count": 11, |
256 | 333 | "metadata": {},
|
257 | 334 | "outputs": [],
|
258 | 335 | "source": [
|
|
263 | 340 | "loss.backward()"
|
264 | 341 | ]
|
265 | 342 | },
|
| 343 | + { |
| 344 | + "cell_type": "markdown", |
| 345 | + "metadata": {}, |
| 346 | + "source": [ |
| 347 | + "After running the backward pass in PyTorch, we're ready to make comparisons. Let's calculate relative errors between Joey and PyTorch in terms of weight/bias gradients." |
| 348 | + ] |
| 349 | + }, |
266 | 350 | {
|
267 | 351 | "cell_type": "code",
|
268 |
| - "execution_count": 13, |
| 352 | + "execution_count": 12, |
269 | 353 | "metadata": {},
|
270 | 354 | "outputs": [
|
271 | 355 | {
|
272 | 356 | "name": "stdout",
|
273 | 357 | "output_type": "stream",
|
274 | 358 | "text": [
|
275 |
| - "layers[0] maximum relative error: 1.599673499123359e-14\n", |
276 |
| - "layers[1] maximum relative error: 5.710234136667345e-12\n", |
277 |
| - "layers[2] maximum relative error: 1.9638017195468526e-11\n", |
278 |
| - "layers[3] maximum relative error: 1.8676488586249282e-11\n", |
279 |
| - "layers[4] maximum relative error: 3.4692340371450744e-13\n", |
| 359 | + "layers[0] maximum relative error: 1.4935025269750558e-14\n", |
| 360 | + "layers[1] maximum relative error: 1.0457210947850931e-13\n", |
| 361 | + "layers[2] maximum relative error: 3.0920027811804816e-12\n", |
| 362 | + "layers[3] maximum relative error: 2.615895862310905e-13\n", |
| 363 | + "layers[4] maximum relative error: 1.4951643318957554e-12\n", |
280 | 364 | "\n",
|
281 |
| - "Maximum relative error is in layers[2]: 1.9638017195468526e-11\n" |
| 365 | + "Maximum relative error is in layers[2]: 3.0920027811804816e-12\n" |
282 | 366 | ]
|
283 | 367 | },
|
284 | 368 | {
|
285 | 369 | "name": "stderr",
|
286 | 370 | "output_type": "stream",
|
287 | 371 | "text": [
|
288 |
| - "<ipython-input-13-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n", |
| 372 | + "<ipython-input-12-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n", |
289 | 373 | " kernel_error = abs(kernel_grad - pytorch_kernel_grad) / abs(pytorch_kernel_grad)\n",
|
290 |
| - "<ipython-input-13-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n", |
| 374 | + "<ipython-input-12-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n", |
291 | 375 | " bias_error = abs(bias_grad - pytorch_bias_grad) / abs(pytorch_bias_grad)\n"
|
292 | 376 | ]
|
293 | 377 | }
|
|
320 | 404 | "print()\n",
|
321 | 405 | "print('Maximum relative error is in layers[' + str(index) + ']: ' + str(max_error))"
|
322 | 406 | ]
|
| 407 | + }, |
| 408 | + { |
| 409 | + "cell_type": "markdown", |
| 410 | + "metadata": {}, |
| 411 | + "source": [ |
| 412 | + "As we can see, the maximum error is low enough (given floating-point calculation accuracy and the complexity of our network) for Joey's results to be considered correct." |
| 413 | + ] |
323 | 414 | }
|
324 | 415 | ],
|
325 | 416 | "metadata": {
|
|
0 commit comments