Regression

Regression on a mathematical function.

In this example a regression problem is considered. A neural network is trained to approximate the function

$$y = x_1x_2 + x_1x_3 - x_2x_3$$

The network will thus take three features as inputs and output a single scalar. The data used to train the model have been generated with random values for $x_1$, $x_2$, and $x_3$ and both the inputs and outputs have been normalized. The input data are stored in input.csv with each sample on a different row and different features on different columns. The output data are stored in output.csv with the output for each sample in different rows.

The first step to create the neural network is to bring the required modules into scope:

use neuro::activations::Activation;
use neuro::data::{TabularDataSet, DataSet};
use neuro::errors::*;
use neuro::layers::Dense;
use neuro::losses;
use neuro::models::Network;
use neuro::optimizers::Adam;
use neuro::tensor::*;

use std::path::Path;

The main function of the program is defined. A Result return type is selected such that the ? operator can be used in the body of the function.

fn main() -> Result<(), Error> {

The paths to the csv files are specified and a TabularDataSet is created. 10% of the samples are used as validation data and the last true argument indicates that the csv files contain a header.

// Load the data
let inputs = Path::new("input.csv");
let outputs = Path::new("output.csv");
let data = TabularDataSet::from_csv(&inputs, &outputs, 0.1, true)?;
println!("{}", data);

The println! statement prints the following:

=======
Dataset
=======
Input shape: [3 1 1]
Output shape: [1 1 1]
Number of training samples: 4500
Number of validation samples: 500

The neural network is then created. The mean squared error loss function is minimized using an Adam optimizer with a learning rate of 0.01. No regularization is applied to the model. Two dense hidden layers with 32 and 16 units respectively are added. These two layers use a ReLU activation function and the default weights and biases initializers. The output layer contains a single unit and uses a linear activation function.

// Create the network
let mut nn = Network::new(data.input_shape(), losses::MeanSquaredError, Adam::new(0.01), None)?;
nn.add(Dense::new(32, Activation::ReLU));
nn.add(Dense::new(16, Activation::ReLU));
nn.add(Dense::new(1, Activation::Linear));
println!("{}", nn);

The output is:

=====
Model
=====
Input shape: [3, 1, 1]
Output shape: [1, 1, 1]
Optimizer: Adam

Layer      Parameters    Output shape
---------------------------------------
Dense      128           [32, 1, 1]
Dense      528           [16, 1, 1]
Dense      17            [1, 1, 1]

The network is trained using mini-batches containing 64 samples for 50 epochs. The training progress is printed every 10 epochs and no metrics is used. Once the model is trained, it is saved in HDF5 format.

// Train and save the model
nn.fit(&data, 64, 50, Some(10), None);
nn.save("feedforward.h5")?;

The output of the training process is:

Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:00:12] [##################################################] epoch: 10/50, train_loss: 0.0010379077, train_metrics: [], valid_loss: 0.0012252745, valid_metrics: []
[00:00:10] [##################################################] epoch: 20/50, train_loss: 0.0007133077, train_metrics: [], valid_loss: 0.0007566337, valid_metrics: []
[00:00:10] [##################################################] epoch: 30/50, train_loss: 0.00045565667, train_metrics: [], valid_loss: 0.0004974289, valid_metrics: []
[00:00:10] [##################################################] epoch: 40/50, train_loss: 0.00043236514, train_metrics: [], valid_loss: 0.0005186321, valid_metrics: []
[00:00:10] [##################################################] epoch: 50/50, train_loss: 0.00022926035, train_metrics: [], valid_loss: 0.0003253596, valid_metrics: []
Model saved in: feedforward.h5

The first line indicates that the library runs on the GPU and uses OpenCL. As can be seen, the loss function decreases for both the training and validation sets as training progresses. Since both losses stay relatively close to each other, the network is not overfitting. If/when it does, several regularization methods are available in the library such as L1/L2 regularization, dropout, or batch normalization. Once the model has been trained, it is used to predict the output of two new samples:

// Predictions: create two inputs: (-0.5, 0.92, 0.35) and (0.45, -0.72, -0.12).
let inputs = Tensor::new(&[-0.5, 0.92, 0.35, 0.45, -0.72, -0.12], Dim::new(&[3, 1, 1, 2]));
let res = nn.predict(&inputs); // expected: -0.957 and -0.4644 respectively.
println!("Predictions:");
res.print_tensor();

Ok(())
}

Which displays:

Predictions:

[1 1 1 2]
   -0.9556 


   -0.4767

The expected outputs being -0.957 and -0.4644, the predicted results are pretty close to the true values. The accuracy could be improved by training for more epochs, increasing the complexity of the model or tuning the optimizer’s parameters.