Regression
In this example a regression problem is considered. A neural network is trained to approximate the function
$$y = x_1x_2 + x_1x_3 - x_2x_3$$
The network will thus take three features as inputs and output a single scalar. The data used to train the model have been generated with random values for $x_1$, $x_2$, and $x_3$ and both the inputs and outputs have been normalized. The input data are stored in input.csv
with each sample on a different row and different features on different columns. The output data are stored in output.csv
with the output for each sample in different rows.
The first step to create the neural network is to bring the required modules into scope:
use neuro::activations::Activation;
use neuro::data::{TabularDataSet, DataSet};
use neuro::errors::*;
use neuro::layers::Dense;
use neuro::losses;
use neuro::models::Network;
use neuro::optimizers::Adam;
use neuro::tensor::*;
use std::path::Path;
The main function of the program is defined. A Result
return type is selected such that the ?
operator can be used in the body of the function.
fn main() -> Result<(), Error> {
The paths to the csv files are specified and a TabularDataSet
is created. 10% of the samples are used as validation data and the last true
argument indicates that the csv files contain a header.
// Load the data
let inputs = Path::new("input.csv");
let outputs = Path::new("output.csv");
let data = TabularDataSet::from_csv(&inputs, &outputs, 0.1, true)?;
println!("{}", data);
The println!
statement prints the following:
=======
Dataset
=======
Input shape: [3 1 1]
Output shape: [1 1 1]
Number of training samples: 4500
Number of validation samples: 500
The neural network is then created. The mean squared error loss function is minimized using an Adam optimizer with a learning rate of 0.01. No regularization is applied to the model. Two dense hidden layers with 32 and 16 units respectively are added. These two layers use a ReLU activation function and the default weights and biases initializers. The output layer contains a single unit and uses a linear activation function.
// Create the network
let mut nn = Network::new(data.input_shape(), losses::MeanSquaredError, Adam::new(0.01), None)?;
nn.add(Dense::new(32, Activation::ReLU));
nn.add(Dense::new(16, Activation::ReLU));
nn.add(Dense::new(1, Activation::Linear));
println!("{}", nn);
The output is:
=====
Model
=====
Input shape: [3, 1, 1]
Output shape: [1, 1, 1]
Optimizer: Adam
Layer Parameters Output shape
---------------------------------------
Dense 128 [32, 1, 1]
Dense 528 [16, 1, 1]
Dense 17 [1, 1, 1]
The network is trained using mini-batches containing 64 samples for 50 epochs. The training progress is printed every 10 epochs and no metrics is used. Once the model is trained, it is saved in HDF5 format.
// Train and save the model
nn.fit(&data, 64, 50, Some(10), None);
nn.save("feedforward.h5")?;
The output of the training process is:
Running on AMD_Radeon_Pro_555X_Compute_Engine using OpenCL.
[00:00:12] [##################################################] epoch: 10/50, train_loss: 0.0010379077, train_metrics: [], valid_loss: 0.0012252745, valid_metrics: []
[00:00:10] [##################################################] epoch: 20/50, train_loss: 0.0007133077, train_metrics: [], valid_loss: 0.0007566337, valid_metrics: []
[00:00:10] [##################################################] epoch: 30/50, train_loss: 0.00045565667, train_metrics: [], valid_loss: 0.0004974289, valid_metrics: []
[00:00:10] [##################################################] epoch: 40/50, train_loss: 0.00043236514, train_metrics: [], valid_loss: 0.0005186321, valid_metrics: []
[00:00:10] [##################################################] epoch: 50/50, train_loss: 0.00022926035, train_metrics: [], valid_loss: 0.0003253596, valid_metrics: []
Model saved in: feedforward.h5
The first line indicates that the library runs on the GPU and uses OpenCL. As can be seen, the loss function decreases for both the training and validation sets as training progresses. Since both losses stay relatively close to each other, the network is not overfitting. If/when it does, several regularization methods are available in the library such as L1/L2 regularization, dropout, or batch normalization. Once the model has been trained, it is used to predict the output of two new samples:
// Predictions: create two inputs: (-0.5, 0.92, 0.35) and (0.45, -0.72, -0.12).
let inputs = Tensor::new(&[-0.5, 0.92, 0.35, 0.45, -0.72, -0.12], Dim::new(&[3, 1, 1, 2]));
let res = nn.predict(&inputs); // expected: -0.957 and -0.4644 respectively.
println!("Predictions:");
res.print_tensor();
Ok(())
}
Which displays:
Predictions:
[1 1 1 2]
-0.9556
-0.4767
The expected outputs being -0.957 and -0.4644, the predicted results are pretty close to the true values. The accuracy could be improved by training for more epochs, increasing the complexity of the model or tuning the optimizer’s parameters.