Image Classification With a CNN

MNIST handwritten digit classification using a convolutional neural network.

This example demonstrates the use of a convolutional neural network (CNN) to classify images from the MNIST dataset. In another example, a fully connected neural network was used to classify handwritten digits and reached an accuracy of 96.71%. In this example, a CNN is designed to improve the classification accuracy.

Before the model is created, the modules used in this example are imported:

use neuro::activations::Activation;
use neuro::data::ImageDataSet;
use neuro::errors::*;
use neuro::layers::{Dense, Conv2D, Padding, MaxPooling2D, Dropout};
use neuro::losses;
use neuro::metrics::Metrics;
use neuro::models::Network;
use neuro::optimizers::Adam;
use neuro::regularizers::*;

use std::path::Path;

Then, the main function is defined:

fn main() -> Result<(), Error> {

The images are loaded using an ImageDataSetBuilder structure. The files are located in dataset/MNIST and are organized with the following architecture:

MNIST/
  train/
    0/
      1.png
      21.png
      ...
    1/
      3.png
      6.png
      ...
    2/
      5.png
      16.png
      ...
    ...
  test/
    0/
      3.png
      10.png
      ...
    1/
      2.png
      5.png
      ...
    ...

The top-level directory contains a train folder and a test folder which both contain subfolders named after the classes present in the dataset (in this case 0 to 9). Each image representing a handwritten digit is then placed in the corresponding folder. The MNSIT dataset contains 60,000 images to train the model and 10,000 to test it. In this example, the original image size of 28 by 28 is unaltered. The labels are one-hot encoded, 10% of the training samples is used to validate the model, and the images are scaled such that each pixel is within 0 and 1.

// Load and preprocess the data
let path = Path::new("datasets/MNIST");
let data = ImageDataSetBuilder::from_dir(&path, (28, 28))
  .one_hot_encode()
  .valid_split(0.1)
  .scale(1./255.)
  .build()?;
println!("{}", data);

The output of the println! statement is:

Loading the data...done.
=======
Dataset
=======
Samples shape: [28 28 1]
Labels shape: [10 1 1]
Number of training samples: 54000
Number of validation samples: 6000
Number of test samples: 10000
Number of classes: 10

The next step is to define the CNN. A Network that accepts inputs of size 28 by 28 is created. The cross-entropy loss function is minimized using an Adam optimizer with a learning rate of 0.001. No regularization is used in this example. Two Conv2D layers are added to the network, each containing 32 and 64 filters respectively. The filters have a height and width of 3, and a vertical and horizontal stride of 1 is used for the convolution. A Same padding method is used resulting in identical dimensions for the inputs and outputs of each of these layers. A max pooling layer that uses a 2 by 2 moving window and a vertical and horizontal stride of 2 is then added. This operation is applied to reduce the number of parameters in the following layers. In order to avoid overfitting, a dropout layer is added with a drop rate of 50%. The two-dimensional inputs are transformed into a one-dimensional tensor using a Flatten layer before being fed to a fully connected layer containing 128 units and using a ReLU activation function. Lastly, a second dropout layer with a drop rate of 25% is added just before the output layer which contains 10 units and uses a softmax activation function.

// Create the neural network
let mut nn = Network::new(Dim::new(&[28, 28, 1, 1]), losses::SoftmaxCrossEntropy, Adam::new(0.001), None)?;
nn.add(Conv2D::new(32, (3, 3), (1, 1), Padding::Same));
nn.add(Conv2D::new(64, (3, 3), (1, 1), Padding::Same));
nn.add(MaxPool2D::new((2, 2)));
nn.add(Dropout::new(0.5));
nn.add(Flatten::new());
nn.add(Dense::new(128, Activation::ReLU));
nn.add(Dropout::new(0.25));
nn.add(Dense::new(10, Activation::Softmax));
println!("{}", nn);

The output of this code snippet shows the layers in the network as well as the trainable parameters that each layer contains:

=====
Model
=====
Input shape: [28, 28, 1]
Output shape: [10, 1, 1]
Optimizer: Adam

Layer      Parameters    Output shape
----------------------------------------
Conv2D     320           [28, 28, 32]
Conv2D     18496         [28, 28, 64]
MaxPool2D  0             [14, 14, 64]
Dropout    0             [14, 14, 64]
Flatten    0             [12544, 1, 1]
Dense      1605760       [128, 1, 1]
Dropout    0             [128, 1, 1]
Dense      1290          [10, 1, 1]

The model is trained using the dataset created previously. A batch size of 128 is selected and the model is trained for 10 epochs. The loss and accuracy are printed at each epoch. Once the training is completed, the model is saved in HDF5 format.

// Fit the model
nn.fit(&data, 128, 10, Some(1), Some(vec![Metrics::Accuracy]));
nn.save("mnist_cnn.h5")?;
Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:03:40] [##################################################] epoch: 1/10, train_loss: 0.10648244, train_metrics: [0.9675757], valid_loss: 0.12886678, valid_metrics: [0.964167]
[00:03:36] [##################################################] epoch: 2/10, train_loss: 0.067097135, train_metrics: [0.97914904], valid_loss: 0.09257295, valid_metrics: [0.9736892]
[00:03:36] [##################################################] epoch: 3/10, train_loss: 0.04680932, train_metrics: [0.9851684], valid_loss: 0.07700441, valid_metrics: [0.97601634]
[00:03:36] [##################################################] epoch: 4/10, train_loss: 0.04004219, train_metrics: [0.98747194], valid_loss: 0.06959132, valid_metrics: [0.97953075]
[00:03:36] [##################################################] epoch: 5/10, train_loss: 0.052570708, train_metrics: [0.98297596], valid_loss: 0.084476784, valid_metrics: [0.9726919]
[00:03:37] [##################################################] epoch: 6/10, train_loss: 0.034354176, train_metrics: [0.9889768], valid_loss: 0.06277204, valid_metrics: [0.9815017]
[00:03:37] [##################################################] epoch: 7/10, train_loss: 0.02819305, train_metrics: [0.9913253], valid_loss: 0.061558064, valid_metrics: [0.97948325]
[00:03:37] [##################################################] epoch: 8/10, train_loss: 0.029502345, train_metrics: [0.99060863], valid_loss: 0.063200325, valid_metrics: [0.98119295]
[00:03:37] [##################################################] epoch: 9/10, train_loss: 0.027170192, train_metrics: [0.9917194], valid_loss: 0.06300245, valid_metrics: [0.9793408]
[00:03:37] [##################################################] epoch: 10/10, train_loss: 0.024771206, train_metrics: [0.99181193], valid_loss: 0.06653453, valid_metrics: [0.9825228]
Model saved in: mnist_cnn.h5

As can be seen from this output, an accuracy of 98.25% on the validation set is achieved after 10 epochs. Once the model has been trained, it can be evaluated on the test set contained in data:

// Evaluate the trained model on the test set
nn.evaluate(&data, Some(vec![Metrics::Accuracy]));
Evaluation of the test set: loss: 0.047828082, metrics: [0.9865506]

The accuracy that this model achieves on the test test is 98.66% which represents a 1.95% increase from the feedforward neural network. Finally, a Ok variant is returned to gracefully exit the program.

Ok(())
}