Skip to content

andrewahn-ubc/feed-forward-neural-network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"What I cannot create, I do not understand" -Richard Feynman

Feed-Forward Neural Network

A feed-forward neural network built from scratch (ie. using only numpy, no tensorflow or pytorch) as an exercise to learn machine learning.

Background

There are a lot of tutorials on how to build an NN from scratch, but I challenged myself to come up with my own implementation. I watched 3B1B's series on neural networks and referenced Michael Nielson's online textbook to get a theoretical understanding of NNs.

Implementation Details

At the moment, hyperparameters such as the number of layers, the number of neurons in each layer, and activation functions are hardcoded. I might parameterize it later, but my main goal for this project was to learn deep learning, so I'm content with where I am right now. I tested the model on the classic Optical Character Recognition problem using the MNIST hand-written digits dataset and was able to achieve 86.39% test accuracy using the following model architecture:

  • Input layer: 784 neurons
  • 1st hidden layer: 256 neurons, ReLu
  • 2nd hidden layer: 128 neurons, ReLu
  • 3rd hidden layer: 64 neurons, ReLu
  • 4th hidden layer: 32 neurons, ReLu
  • Output layer: 10 neurons, Softmax
  • Loss function: Categorical Cross-Entropy
  • Optimizer: Stochastic Gradient Descent
  • Other hyper-parameters:
    • Batch size: 0.3% of training set
    • Epochs: 2
    • Learning rate scheduling: Reduce on Plateau

Features

  • Layers
    • Dense
  • Optimizers
    • Stochastic Gradient Descent (mini-batch)
  • Initializers
    • Random Uniform
    • Xavier Normal
    • He Normal
  • Activation Functions
    • ReLU
    • Sigmoid
    • Softmax
  • Loss Functions
    • Mean Squared Error
    • Categorical Cross Entropy
  • Learning rate scheduling
    • Step decay
    • Exponential decay
    • Reduce on Plateau
  • Other optimization techniques
    • gradient clipping
    • early stopping with best model checkpointing

Goals

  • Once the implementation is done, test it on the MNIST hand-written digits dataset and get a relatively high accuracy (anything >80%)

    • Update: DONE, achieved 86.39% test accuracy
    • Screenshot 2025-02-21 at 11 52 52 AM
  • Explore some fun deep learning techniques (learning rate schedules, early stopping, different gradient descent variations, hyperparameter tuning, different initialization methods, etc.).

    • Update: DONE, tried various LR schedules, early stopping methods, initialization methods, activation functions, and model architectures. Did not implement any other optimizers though (like Adam)

Next Steps

  • Refactor model architecture to have flexible number of layers and neurons, and for the user to be able to select features using the CLI
  • Implement Adam optimizer, SGD with momentum, and Dropout regularization
  • (reach) build a web app to demo the model

About

FFNN from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages