October, 2023

Neural Network from scratch

machine learning
ai
fully connected
neural network
python
numpy

Overview

Created a fully connected neural network form scratch in python using numpy. Hyperparameters are fully configurable by the user, who can set the number of layers and number of neurons per layer, number of inputs and outputs, and more.

This project's source code is available on

Neural Networks

In this project a subset of neural networks is implemented -feed forward, fully connected networks- using perceptrons as the units for computation and learning. The perceptron is a unit which takes n inputs, performs a linear combination of them, this result is then fed through an activation function, and at the end gives a single output. A representation of a perceptron is presented in Image 1.

Image 1: Graphical representation of a Perceptron.


The linear combination is done using weights, which are learnable parameters of the perceptron, and a bias -which is also a learnable parameter- is added to the result. The bias can either be seen as a weight multiplying a fixed input of 1, or as a number added to the linear combination. The “learning” on a perceptron involves adjusting the set of weights and biases to minimize the error between the output of the perceptron and a target output for a set of given inputs.

The role of the activation function is to add nonlinearity to the behavior of the perceptron. Functions such as sigmoid, rectified linear units (ReLU), or hyperbolic tangent are widely used as activations for perceptrons. While a single perceptron is capable of learning simple problems, a combination of multiple perceptrons is needed to allow for learning complex nonlinear functions. Perceptrons can be stacked vertically (one layer with multiple perceptrons) and horizontally (multiple layers, where the output of the previous layer is fed as input to the next layer), to form networks with varying architectures. A visual representation of a multilayer network as the one mentioned is shown in Image 2.

Image 2: Feedforward multi-layer neural network.


The network depicted above consists of fully connected layers, meaning that each output of a layer is fed to every input of the next layer. Each big circle is a full perceptron as previously explained. This is the variation of neural networks considered for the scope of this homework. The first layer is usually referred to as an input layer and consists of the inputs being fed into the network. Subsequent layers are called hidden layers, except for the last one, which is referred to as the output layer. In this case, the network has n inputs and a single output, but this is configurable to accommodate any combination of m inputs and n outputs.

Gradient Descent & Backpropagation

The basis for minimizing the error -or fitting the network- is gradient descent. The network begins training with an arbitrary state, and gradient descent is the search algorithm that allows finding a set of weights that minimizes the error by iteratively updating the set of weights a small amount according to the gradient of the error surface. One variant of gradient descent is stochastic gradient descent, where instead of calculating the error for all training examples and then updating the weights, the error is calculated, and weights are updated for each sample.

Gradient descent provides a direct way of optimizing the weights for a layer. But in the case of multilayer networks, we need a way to propagate the update of weights to all layers. Backpropagation is an algorithm that provides a way of updating all weights in a network while using gradient descent to optimize. The goal of backpropagation is to minimize the error between the target values and the actual outputs.

Implementation

The implementation was performed using object-oriented programming in python. A class named FCLayer represents a fully connected layer, including its weights and biases, the activation function and the derivative of the activation function which is used for backpropagation. A class NN represents the neural network and stores a list of layers. It provides methods for training the network using backpropagation and performing inference.

A summary of parameters that are relevant for training:

  • Network architecture: number of layers, neurons per layer
  • Batch size
  • Number of epochs: number of times the network updates on the whole set of training data
  • Error threshold: threshold to stop training
  • Learning rate
  • Momentum
  • Results

    The implemented network is tested with the goal of learning a noisy sine function. This single-input-single-output problem is on the simple end of functions that a neural network can learn, but it is a good check to see if the network learns. It was extensively used while developing the algorithm for fast, simple checks.

    Image 3: Plots for test of model learning the sine function. The left plot shows the RMSE for both the training set and the validation for each epoch. The rightmost plot shows the ground truth data with added noise (hence the thickness of the plot), and predictions for both the training set and validation set.


    Random values were generated in the range [0,2𝜋], shuffled and passed through a sine function. Gaussian noise was added to the results of the sine function. The generated values were then normalized to the range [0,1]. A part (15%) of the generated samples was set aside for validation/evaluation. The model was then trained on the remaining 85% of the generated samples. The evolution of error over training and a plot of predictions on both training and evaluation sets are shown in Figure 3.

    In this case, several configurations were tested, and the final one proved to strike a good balance between low overfit, fast training time and overall low error. The model used consists of three layers, two of 20 neurons, and an output layer of 1 neuron. The learning rate used was 0.1, momentum was left default at 0.6. Batch size used is 64 samples.