devitocodes
diff --git a/‎README.md
+7-1 b/‎README.md
+7-1
diff --git a/‎examples/README.md
+7 b/‎examples/README.md
+7
diff --git a/‎examples/lenet_backward_pass.ipynb
+137-46 b/‎examples/lenet_backward_pass.ipynb
+137-46
@@ -44,4 +44,10 @@ Done! You can now use Joey in your environment. If you want to make changes to t
 Joey is not available on PyPI yet.
 
 ## How to use
-The documentation is currently under construction. In the meantime, you can have a look at examples in `examples`.
+To start working with Joey, import the following packages:
+```
+import joey
+import joey.activation  # If you want to use activation in neural network layers
+```
+
+Afterwards, you are free to use all functions Joey offers. The recommended way of getting started is going through examples inside the `examples` directory in this repository and looking at `__doc__` that is provided in every Joey class and public/abstract class method.
@@ -0,0 +1,7 @@
+## Joey examples
+In this directory, you can find Jupyter notebooks with explained step-by-step examples of using Joey. The recommended order of going through them is as follows:
+1. `lenet_forward_pass.ipynb`
+2. `lenet_backward_pass.ipynb`
+3. `lenet_training.ipynb`
+
+Enjoy!
@@ -1,5 +1,28 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Runinng a backward pass through LeNet using MNIST and Joey"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we will construct LeNet using Joey and run a backward pass through it with some training data from MNIST.\n",
+    "\n",
+    "The aim of a backward pass is calculating gradients of all network parameters necessary for later weight updates done by a PyTorch optimizer. A backward pass follows a forward pass."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Firstly, let's import the required prerequisites:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -11,7 +34,17 @@
     "import torchvision.transforms as transforms\n",
     "import joey as ml\n",
     "import matplotlib.pyplot as plt\n",
-    "import numpy as np"
+    "import numpy as np\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then, let's define `imshow()` allowing us to look at the training data we'll use for the backward pass."
    ]
   },
   {
@@ -27,6 +60,13 @@
     "    plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this particular example, every training batch will have 4 images."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -36,6 +76,13 @@
     "batch_size = 4"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once we have `imshow()` and `batch_size` defined, we'll download the MNIST images using PyTorch."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -53,6 +100,13 @@
     "dataiter = iter(trainloader)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In our case, only one batch will be used for the backward pass. Joey accepts only NumPy arrays, so we have to convert PyTorch tensors to their NumPy equivalents first."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -63,6 +117,13 @@
     "input_data = images.numpy()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For reference, let's have a look at our training data. There are 4 images corresponding to the following digits: 5, 0, 4, 1."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -85,6 +146,20 @@
     "imshow(torchvision.utils.make_grid(images))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "At this point, we're ready to define `backward_pass()` running the backward pass through Joey-constructed LeNet. We'll do so using the `Conv`, `MaxPooling`, `Flat`, `FullyConnected` and `FullyConnectedSoftmax` layer classes along with the `Net` class packing everything into one network we can interact with."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that a loss function has to be defined manually. Joey doesn't provide any built-in options here at the moment."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -95,64 +170,66 @@
     "    # Six 3x3 filters, activation RELU\n",
     "    layer1 = ml.Conv(kernel_size=(6, 3, 3),\n",
     "                     input_size=(batch_size, 1, 32, 32),\n",
-    "                     activation=ml.activation.ReLU(),\n",
-    "                     generate_code=False)\n",
+    "                     activation=ml.activation.ReLU())\n",
     "    # Max 2x2 subsampling\n",
     "    layer2 = ml.MaxPooling(kernel_size=(2, 2),\n",
     "                           input_size=(batch_size, 6, 30, 30),\n",
-    "                           stride=(2, 2),\n",
-    "                           generate_code=False)\n",
+    "                           stride=(2, 2))\n",
     "    # Sixteen 3x3 filters, activation RELU\n",
     "    layer3 = ml.Conv(kernel_size=(16, 3, 3),\n",
     "                     input_size=(batch_size, 6, 15, 15),\n",
-    "                     activation=ml.activation.ReLU(),\n",
-    "                     generate_code=False)\n",
+    "                     activation=ml.activation.ReLU())\n",
     "    # Max 2x2 subsampling\n",
     "    layer4 = ml.MaxPooling(kernel_size=(2, 2),\n",
     "                           input_size=(batch_size, 16, 13, 13),\n",
     "                           stride=(2, 2),\n",
-    "                           strict_stride_check=False,\n",
-    "                           generate_code=False)\n",
+    "                           strict_stride_check=False)\n",
     "    # Full connection (16 * 6 * 6 -> 120), activation RELU\n",
     "    layer5 = ml.FullyConnected(weight_size=(120, 576),\n",
     "                               input_size=(576, batch_size),\n",
-    "                               activation=ml.activation.ReLU(),\n",
-    "                               generate_code=False)\n",
+    "                               activation=ml.activation.ReLU())\n",
     "    # Full connection (120 -> 84), activation RELU\n",
     "    layer6 = ml.FullyConnected(weight_size=(84, 120),\n",
     "                               input_size=(120, batch_size),\n",
-    "                               activation=ml.activation.ReLU(),\n",
-    "                               generate_code=False)\n",
+    "                               activation=ml.activation.ReLU())\n",
     "    # Full connection (84 -> 10), output layer\n",
     "    layer7 = ml.FullyConnectedSoftmax(weight_size=(10, 84),\n",
-    "                                      input_size=(84, batch_size),\n",
-    "                                      generate_code=False)\n",
+    "                                      input_size=(84, batch_size))\n",
     "    # Flattening layer necessary between layer 4 and 5\n",
-    "    layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6),\n",
-    "                         generate_code=False)\n",
+    "    layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6))\n",
     "    \n",
     "    layers = [layer1, layer2, layer3, layer4,\n",
     "              layer_flat, layer5, layer6, layer7]\n",
     "    \n",
     "    net = ml.Net(layers)\n",
     "    outputs = net.forward(input_data)\n",
     "    \n",
-    "    def loss_grad(layer, b):\n",
+    "    def loss_grad(layer, expected):\n",
     "        gradients = []\n",
     "    \n",
-    "        for i in range(10):\n",
-    "            result = layer.result.data[i, b]\n",
-    "            if i == expected_results[b]:\n",
-    "                result -= 1\n",
-    "            gradients.append(result)\n",
+    "        for b in range(batch_size):\n",
+    "            row = []\n",
+    "            for i in range(10):\n",
+    "                result = layer.result.data[i, b]\n",
+    "                if i == expected[b]:\n",
+    "                    result -= 1\n",
+    "                row.append(result)\n",
+    "            gradients.append(row)\n",
     "    \n",
     "        return gradients\n",
     "    \n",
-    "    net.backward(loss_grad)\n",
+    "    net.backward(expected_results, loss_grad)\n",
     "    \n",
     "    return (layer1, layer2, layer3, layer4, layer_flat, layer5, layer6, layer7)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Afterwards, we're ready to run the backward pass."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -167,9 +244,6 @@
       "/home/maksymilian/Desktop/UROP/devito/devito/types/grid.py:206: RuntimeWarning: divide by zero encountered in true_divide\n",
       "  spacing = (np.array(self.extent) / (np.array(self.shape) - 1)).astype(self.dtype)\n",
       "Operator `Kernel` run in 0.01 s\n",
-      "Operator `Kernel` run in 0.01 s\n",
-      "Operator `Kernel` run in 0.01 s\n",
-      "Operator `Kernel` run in 0.01 s\n",
       "Operator `Kernel` run in 0.01 s\n"
      ]
     }
@@ -182,23 +256,26 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "PyTorch:"
+    "Results are stored in the `kernel_gradients` and `bias_gradients` properties of each layer (where applicable)."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "import torch.nn as nn\n",
-    "import torch.nn.functional as F\n",
-    "import torch.optim as optim"
+    "In order to check the numerical correctness, we'll create the same network with PyTorch, run a backward pass through it using the same initial weights and data and compare the results with Joey's."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here's the PyTorch code:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -230,7 +307,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -252,7 +329,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -263,31 +340,38 @@
     "loss.backward()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After running the backward pass in PyTorch, we're ready to make comparisons. Let's calculate relative errors between Joey and PyTorch in terms of weight/bias gradients."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "layers[0] maximum relative error: 1.599673499123359e-14\n",
-      "layers[1] maximum relative error: 5.710234136667345e-12\n",
-      "layers[2] maximum relative error: 1.9638017195468526e-11\n",
-      "layers[3] maximum relative error: 1.8676488586249282e-11\n",
-      "layers[4] maximum relative error: 3.4692340371450744e-13\n",
+      "layers[0] maximum relative error: 1.4935025269750558e-14\n",
+      "layers[1] maximum relative error: 1.0457210947850931e-13\n",
+      "layers[2] maximum relative error: 3.0920027811804816e-12\n",
+      "layers[3] maximum relative error: 2.615895862310905e-13\n",
+      "layers[4] maximum relative error: 1.4951643318957554e-12\n",
       "\n",
-      "Maximum relative error is in layers[2]: 1.9638017195468526e-11\n"
+      "Maximum relative error is in layers[2]: 3.0920027811804816e-12\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "<ipython-input-13-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n",
+      "<ipython-input-12-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n",
       "  kernel_error = abs(kernel_grad - pytorch_kernel_grad) / abs(pytorch_kernel_grad)\n",
-      "<ipython-input-13-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n",
+      "<ipython-input-12-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n",
       "  bias_error = abs(bias_grad - pytorch_bias_grad) / abs(pytorch_bias_grad)\n"
      ]
     }
@@ -320,6 +404,13 @@
     "print()\n",
     "print('Maximum relative error is in layers[' + str(index) + ']: ' + str(max_error))"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As we can see, the maximum error is low enough (given floating-point calculation accuracy and the complexity of our network) for Joey's results to be considered correct."
+   ]
   }
  ],
  "metadata": {