Multiclass Classification

MNIST handwritten digits classification using a feedforward neural network.

In this example, a feedforward neural network is constructed to classify handwritten digits from the MNIST dataset. This set contains 60,000 images of handwritten digits that can be used to train models, and 10,000 images for testing.

The modules in the library required to build the network are first imported.

use neuro::activations::Activation;
use neuro::data::{ImageDataSet, ImageDataSetBuilder};
use neuro::errors::*;
use neuro::layers::{Dense, Flatten};
use neuro::losses;
use neuro::metrics::Metrics;
use neuro::models::Network;
use neuro::optimizers::Adam;
use neuro::regularizers::Regularizer;
use neuro::tensor::*;

use std::path::Path;

Then, the main function is created. The function returns a Result so that the ? operator can be used in its body.

fn main() -> Result<(), Error> {

The data are loaded using the from_dir method from the ImageDataSetBuilder struct . The path given to the builder must contain a subfolder named train in which each training sample is stored in a directory containing the name of the class, and optionally a test folder containing samples used to test the model once it has been trained. For the MNIST dataset, the following hierarchy is used:

MNIST/
  train/
    0/
      1.png
      21.png
      ...
    1/
      3.png
      6.png
      ...
    2/
      5.png
      16.png
      ...
    ...
  test/
    0/
      3.png
      10.png
      ...
    1/
      2.png
      5.png
      ...
    ...

The desired size of the images is also passed to from_dir. The images are then one-hot-encoded, split into a training and validation sets with 10% of the images used for validation, scaled to be within 0 and 1, and the dataset is finally built. Some information about the dataset is printed with the println! macro.

// Create the dataset
let path = Path::new("datasets/MNIST");
let data = ImageDataSetBuilder::from_dir(&path, (28, 28))
  .one_hot_encode()
  .valid_split(0.1)
  .scale(1./255.)
  .build()?;
println!("{}", data);

This code will print:

Loading the data...done.
=======
Dataset
=======
Samples shape: [28 28 1]
Labels shape: [10 1 1]
Number of training samples: 54000
Number of validation samples: 6000
Number of test samples: 10000
Number of classes: 10

The next step is to create a network and add some layers. For this example, a softmax cross-entropy loss is used, an Adam optimizer with a learning rate of 0.01, and no regularizer. Dense layers accepting one-dimensional inputs, the images are first reshaped using a Flatten layer. Two dense layers with 32 and 10 units respectively are then added to the network. The hidden layer uses a ReLU activation function and the output layer a softmax activation function. The weights and biases initializers default to a normal distribution with He scaling for the weights and zero initialization for the biases. In neuro, the softmax cross-entropy loss function requires the activation of the last layer to be a softmax activation function. Information about the network is printed using the println! macro.

// Create the neural network
let mut nn = Network::new(Dim::new(&[28, 28, 1, 1]), losses::SoftmaxCrossEntropy, Adam::new(0.01), None)?;
nn.add(Flatten::new());
nn.add(Dense::new(32, Activation::ReLU));
nn.add(Dense::new(10, Activation::Softmax));
println!("{}", nn);

The output of this code snippet is:

=====
Model
=====
Input shape: [28, 28, 1]
Output shape: [10, 1, 1]
Optimizer: Adam

Layer      Parameters    Output shape
----------------------------------------
Flatten    0             [784, 1, 1]
Dense      25120         [32, 1, 1]
Dense      330           [10, 1, 1]

The network is trained using a batch size of 128 for 10 epochs. The training progress is displayed at each epoch and the accuracy is used as metrics.

// Fit the network
nn.fit(&data, 128, 10, Some(1), Some(vec![Metrics::Accuracy]));

The output of the training is:

Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:00:18] [##################################################] epoch: 1/10, train_loss: 0.16202234, train_metrics: [0.95073414], valid_loss: 0.18537684, valid_metrics: [0.9434128]
[00:00:16] [##################################################] epoch: 2/10, train_loss: 0.13555525, train_metrics: [0.95779294], valid_loss: 0.16868275, valid_metrics: [0.9495868]
[00:00:16] [##################################################] epoch: 3/10, train_loss: 0.11468804, train_metrics: [0.96397895], valid_loss: 0.1668769, valid_metrics: [0.95186645]
[00:00:16] [##################################################] epoch: 4/10, train_loss: 0.102836646, train_metrics: [0.96831894], valid_loss: 0.15369178, valid_metrics: [0.9558795]
[00:00:16] [##################################################] epoch: 5/10, train_loss: 0.09340673, train_metrics: [0.970522], valid_loss: 0.16604772, valid_metrics: [0.95538086]
[00:00:16] [##################################################] epoch: 6/10, train_loss: 0.07049004, train_metrics: [0.9778875], valid_loss: 0.14080979, valid_metrics: [0.9613887]
[00:00:16] [##################################################] epoch: 7/10, train_loss: 0.06628984, train_metrics: [0.97876024], valid_loss: 0.15605086, valid_metrics: [0.96221983]
[00:00:16] [##################################################] epoch: 8/10, train_loss: 0.06473273, train_metrics: [0.97910935], valid_loss: 0.14735277, valid_metrics: [0.9613174]
[00:00:16] [##################################################] epoch: 9/10, train_loss: 0.06747824, train_metrics: [0.97820747], valid_loss: 0.17169471, valid_metrics: [0.95946527]
[00:00:16] [##################################################] epoch: 10/10, train_loss: 0.055476084, train_metrics: [0.98240995], valid_loss: 0.15725411, valid_metrics: [0.9605101]

The first line indicates that the library runs on the GPU and uses OpenCL. The training loss is then printed along the accuracy on the training set, validation loss, and accuracy on the validation set for each epoch. After a single epoch, the network already performs quite well with an accuracy of 94.34%. At the end of the 10 epochs, the network is able to classify the images from the validation set with 96.05% accuracy. This is pretty good considering the simple architecture of this network. Looking at the training and validation losses for the last epochs, it can be seen that the difference between the training loss and the validation loss is relatively large. This indicates that the network is overfitting and its ability to generalize is deteriorating. In order to prevent this to happen, some regularization can be added to the network. One way to do it is by adding L2 regularization to the loss function. That is, the loss function is augmented by the sum of the L2-norm of the weights of each layer. The regularization is added by modifying the last argument of the command used previously:

let mut nn = Network::new(&data, losses::SoftmaxCrossEntropy, Adam::new(0.003), Some(Regularizer::L2(1e-3)));

where a weight of $\lambda = 10^{-3}$ is chosen. The network is then trained again for 10 epochs:

Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:00:18] [##################################################] epoch: 1/10, train_loss: 0.22968657, train_metrics: [0.94206476], valid_loss: 0.24208377, valid_metrics: [0.93764246]
[00:00:17] [##################################################] epoch: 2/10, train_loss: 0.18827538, train_metrics: [0.95615846], valid_loss: 0.2055092, valid_metrics: [0.9520327]
[00:00:17] [##################################################] epoch: 3/10, train_loss: 0.16081122, train_metrics: [0.9657271], valid_loss: 0.18796511, valid_metrics: [0.9546448]
[00:00:17] [##################################################] epoch: 4/10, train_loss: 0.159695, train_metrics: [0.97020984], valid_loss: 0.18822601, valid_metrics: [0.96352583]
[00:00:17] [##################################################] epoch: 5/10, train_loss: 0.15711737, train_metrics: [0.97163534], valid_loss: 0.18854602, valid_metrics: [0.96195865]
[00:00:17] [##################################################] epoch: 6/10, train_loss: 0.14395225, train_metrics: [0.9769698], valid_loss: 0.17960496, valid_metrics: [0.9644044]
[00:00:17] [##################################################] epoch: 7/10, train_loss: 0.1555237, train_metrics: [0.97727925], valid_loss: 0.20086867, valid_metrics: [0.9638346]
[00:00:17] [##################################################] epoch: 8/10, train_loss: 0.14285155, train_metrics: [0.97976255], valid_loss: 0.19195384, valid_metrics: [0.96364456]
[00:00:17] [##################################################] epoch: 9/10, train_loss: 0.14540239, train_metrics: [0.98272735], valid_loss: 0.19692291, valid_metrics: [0.96761024]
[00:00:17] [##################################################] epoch: 10/10, train_loss: 0.1420376, train_metrics: [0.9826321], valid_loss: 0.2001842, valid_metrics: [0.96578175]

The final accuracy is 96.58% which is slightly better than before. In this case the training and validation losses are closer, indicating a better ability to generalize to unseen examples than previously. Once the network has been trained, it can be evaluated on the test set:

// Evaluate the trained model on the test set
nn.evaluate(&data, Some(vec![Metrics::Accuracy]));

which, in this case, outputs

Evaluation of the test set: loss: 0.20226437, metrics: [0.96706885]

The network can now be used to predict the class of unseen images. Using the load_img_vec method provided by ImageDataSet to load a few images from the test set, predict the output class for each of them, and print the prediction along with the confidence of the network. Finally, a return Ok(()) at the end of the function to exit gracefully.

// Predict the output of some images from the test set
let input = ImageDataSet::load_image_vec(&vec![
  Path::new("datasets/MNIST/test/1/5.png"),
  Path::new("datasets/MNIST/test/3/2008.png"),
  Path::new("datasets/MNIST/test/5/59.png"),
  Path::new("datasets/MNIST/test/9/104.png")
  ], (28,28), data.image_ops())?;

let predictions = nn.predict_class(&input);
print_prediction(&predictions);

Ok(())
}

fn print_prediction(predictions: &Vec<(String, PrimitiveType)>) {
    println!("Predictions:");
    let mut index = 0;
    for (class, probability) in predictions {
        index += 1;
        println!("image {}: class: {}, probability: {}", index, class, probability);
    }
}

The output of this last snippet is:

Predictions:
image 1: class: 1, probability: 0.9992324
image 2: class: 3, probability: 0.9944198
image 3: class: 5, probability: 0.91884094
image 4: class: 9, probability: 0.92022246

The model successfully classifies all four images as indicated by the names in the files paths.