Binary Classification

Learning the XOR gate with a neural network.

In this example, a simple feedforward neural network is trained to learn an approximation of the XOR gate. The XOR function takes two inputs and returns a single output. Each input can be 0 or 1. If one and only one of the inputs is 1, then the XOR function returns 1. It returns 0 otherwise. The following table shows the training samples used to train the model.

The first step to create a neural network with neuro is to bring the required modules into scope.

use neuro::activations::Activation;
use neuro::data::TabularDataSet;
use neuro::errors::*;
use neuro::initializers::*;
use neuro::layers::Dense;
use neuro::losses;
use neuro::metrics;
use neuro::models;
use neuro::optimizers::SGD;
use neuro::tensor::*;

The main function of the program is then defined with a return type Result such that the ? operator can be used.

fn main() -> Result<(), Error> {

The dataset is created using the samples described before. A TabularDataSet is created using the from_tensor method. No data is used for validation and testing so None values are used for the last four arguments.

// Create the dataset
let input_values = [0., 0., 0., 1., 1., 0., 1., 1.];
let x_train = Tensor::new(&input_values, Dim::new(&[2, 1, 1, 4]));
let output_values = [0., 1., 1., 0.];
let y_train = Tensor::new(&output_values, Dim::new(&[1, 1, 1, 4]));
let data = TabularDataSet::from_tensor(x_train.copy(), y_train.copy(), None, None, None, None)?;

A shallow neural network is then created. The loss function is the binary cross entropy and an SGD optimizer with a learning rate of 0.1 is used. No regularization is applied in this example so a None value is used for the last argument. A hidden layer with two units is added to the network with a ReLU activation function. In this example, the weights are initialized with a constant value of 0.01 and the biases with a zero value. The output layer contains a single unit and uses the sigmoid activation function. The network is then trained using mini-batches containing 4 samples for 2000 epochs. The progress of the training is printed every 200 epochs and the accuracy of the classification is monitored.

// Create the neural network and add two layers
let mut nn = models::Network::new(Dim::new(&[2, 1, 1, 1]), losses::BinaryCrossEntropy, SGD::new(0.1), None)?;
nn.add(Dense::with_param(2, Activation::ReLU, Initializer::Constant(0.01), Initializer::Zeros));
nn.add(Dense::new(1, Activation::Sigmoid));

// Fit the model
nn.fit(&data, 4, 2000, Some(200), Some(vec![metrics::Metrics::Accuracy]));

The ouput is:

Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:00:01] [##################################################] epoch: 200/2000, train_loss: 0.5774144, train_metrics: [0.75]
[00:00:00] [##################################################] epoch: 400/2000, train_loss: 0.44036782, train_metrics: [0.75]
[00:00:00] [##################################################] epoch: 600/2000, train_loss: 0.227238, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 800/2000, train_loss: 0.11523885, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 1000/2000, train_loss: 0.06838377, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 1200/2000, train_loss: 0.046537183, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 1400/2000, train_loss: 0.034270532, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 1600/2000, train_loss: 0.026832152, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 1800/2000, train_loss: 0.021835616, train_metrics: [1.0]
[00:00:00] [##################################################] epoch: 2000/2000, train_loss: 0.018323554, train_metrics: [1.0]

The first line indicates that the library runs on the GPU and uses OpenCL. The training loss and accuracy are then printed every 200 epochs. The loss decreases as expected and the accuracy quickly reaches 100%. Lastly, the probability that each sample belongs to class 1 is computed:

// Compute the output for the training data
let predictions = nn.predict(&x_train);
println!("Predictions:");
Tensor::print_tensor(&predictions);

Ok(())
}

which prints:

Predictions:

[1 1 1 4]
    0.0448 


    0.9898 


    0.9898 


    0.0070 

Hence, the network is very confident that samples 2 and 3 only contain one true value and samples 1 and 4 contain zero or two true values, i.e. the model has learned the XOR gate.