Welcome to BasicGrad, a minimalist autograd engine inspired by the famous micrograd project by Andrej Karpathy. This library implements a dynamic computational graph with automatic differentiation in just a few lines of Python code. It's designed for educational purposes, making it easy to understand the core concepts behind deep learning frameworks like PyTorch.
- Added support for Variable class, Operation class, AddOperation, MulOperation, and ReLUActivation
- Added support for forward and backward pass for the above operations
- Added MSE loss and SGD optimizer
- Added support for linear layer
- Add support for more activation functions
- Add support for more loss functions
- Add support for more optimizers
- Add support for more layers
- Add support for more operations
BasicGrad is designed around the concept of chaining variables through operations, building a dynamic computational graph that can be used to compute gradients for any scalar output. Here's a breakdown of how it all works:
- Core Idea: In BasicGrad, every operation (like addition or multiplication) creates a new
Variable
object. These operations chain together to form a computational graph, where each node (aVariable
) knows about its predecessors. These operations are attached their specific backward functions. - Why It Matters: This chaining allows us to keep track of the relationships between variables, which is essential for backpropagation and gradient computation.
- Manual Gradient Calculation: Each operation in BasicGrad has a specific derivative (or gradient) that needs to be computed manually. For instance, in addition, the gradient is simply passed along to both operands, while in multiplication, the gradient is multiplied by the value of the other operand.
- Flexible and Extensible: This design makes BasicGrad highly flexible, allowing you to easily extend it with new operations by defining their forward pass and corresponding backward (gradient) logic.
- Topological Order: To compute the gradients, BasicGrad first constructs a topological order of nodes in the graph, ensuring that each node is processed after its dependencies.
- Reverse Pass: Backpropagation then works by traversing this order in reverse, applying the chain rule to compute the gradient of each node (variable) with respect to the final output.
- Gradient Accumulation: During backpropagation, gradients are accumulated at each node. For example, if a variable contributes to multiple downstream operations, its gradient is the sum of its contributions to those operations.
- Automatic Differentiation: By leveraging the computational graph, BasicGrad automates the differentiation process, allowing you to easily compute gradients for any scalar output.
- Layer by Layer: With these core principles, you can start building neural network layers, chaining them together using the same operations and backpropagation logic.
- Activation Functions: Implementing activation functions as operations allows them to seamlessly integrate into the graph, contributing to the gradient flow just like any other operation.
- Dynamic Computational Graph: Build and manage computational graphs on-the-fly with automatic differentiation.
- Modular Operations: Easily extendable with new operations. The current implementation supports addition (
+
) and multiplication (*
), but you can add more by following the simpleOperation
interface. - Operator Overloading: Perform operations using Python's native arithmetic operators, making the code intuitive and easy to use.
- Backpropagation: Efficiently compute gradients for scalar outputs using the backpropagation algorithm.
- Pythonic and Elegant: Written in a clean and modular style, leveraging Python’s object-oriented and functional capabilities.
You can clone this repository and use it directly. No dependencies are required!
git clone https://github.com/paulilioaica/BasicGrad
cd BasicGrad
Here's a simple example to get you started:
from diff import Variable
# Create variables
a = Variable(2)
b = Variable(3)
# Perform operations
c = a + b
d = a * b
# Perform backpropagation
c.backward()
d.backward()
# Inspect gradients
print(a) # Variable(data=2, grad=4)
print(b) # Variable(data=3, grad=3)
print(c) # Variable(data=5, grad=1)
print(d) # Variable(data=6, grad=1)
The Variable
class represents a node in the computational graph. Each Variable
holds a value (data
), a gradient (grad
), and references to its predecessors in the graph (_prev
). The class handles:
- Operator Overloading: Through methods like
__add__
and__mul__
, which enable Pythonic arithmetic operations. - Backpropagation: Using the
backward()
method, which computes the gradient for eachVariable
by traversing the computational graph in reverse order.
The Operation
class is an abstract base class that defines how each operation should be applied and how the backward pass should be handled. Two concrete implementations are provided:
- AddOperation: Handles addition and the corresponding gradient calculations.
- MulOperation: Handles multiplication and the corresponding gradient calculations.
Want to add more operations? Simply extend the Operation
class and define the forward
and _build_backward_function
methods. Here's a quick example of how you might add a subtraction operation:
class SubOperation(Operation):
def forward(self, a, b):
return a.data - b.data, (a, b)
def _build_backward_function(self, a, b, out):
def _backward():
a.grad += out.grad
b.grad -= out.grad
return _backward
Then, integrate it into the Variable
class:
class Variable:
# Existing methods...
def __sub__(self, other):
return self._apply(SubOperation(), other)
Same applies for activation functions, simply extend the ActivationFunction
class and define the forward
and _build_backward_function
methods.
class ReLUActivation(ActivationFunction):
def forward(self, input):
return max(0, input.value), input
def _build_backward_function(self, input, out):
def _backward():
input.grad += (out.value > 0) * out.grad
return _backward
BasicGrad is a simplified and educational framework that offers a glimpse into how deep learning libraries work under the hood. It's a great tool for:
- Learning: Understand the basics of automatic differentiation and computational graphs.
- Prototyping: Quickly experiment with new ideas and custom operations.
- Teaching: Use it as a teaching aid to explain concepts in a clear and concise manner.
BasicGrad is just the beginning! Here are some exciting features planned for future development:
-
Complete Set of Operations: Implement a comprehensive range of mathematical operations, including subtraction, division, exponentiation, and more.
-
Activation Functions: Add support for common activation functions such as ReLU, Sigmoid, Tanh, and more, to enable the construction of neural networks.
-
Neural Network Architectures: Build foundational tools for constructing various neural network architectures, including fully connected layers, convolutional layers, and others.
-
Graph Visualization: Develop tools to visualize computational graphs, making it easier to understand the structure of complex models and track gradients during backpropagation.
This project draws inspiration from Andrej Karpathy's micrograd project, which implements a tiny autograd engine in just 100 lines of code. BasicGrad takes the concepts from micrograd and adds additional modularity and extensibility.
Contributions are welcome! Whether it's adding new operations, improving documentation, or optimizing the code, your help is appreciated. Feel free to open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.