NN Training with TF

We’ve covered the digit recognition model, and we’ll go into details here on how to train it with TF. Here is a recap and an overview of what we’ve done and will do

Remember we have a 3 layer model with
25, 15 & 1 unit per layer
Given (x,y) as examples (input)
1- We sequentially string together 3 layers using the Sequential() function
2-Ask TF to compile the model using a Loss and Gradient Descent procedures, in this case we will use BinaryCrossentropy()
Binay is because we want to classify if the value is a 0 or 1
Of course if you were wanting a regression model then you can use a different function to calculate the Loss and GD
3- Call the fit() to train the model using X and Y as inputs as well as
Set the number of iterations with epochs

Train NN

Logistic Regression

So let’s break it down. Let’s go back to how we trained a logistic regression model.

Specify how to compute the output given input x and parameters w, b and therefore we specified the model f_w,b(x)=?
We then specified the loss and cost functions to use.
Remember the loss L(f_w,b(x),y) is for one example while the cost is the average summation of all the losses for all the rows and all parameters
Then we minimize the cost J_(w,b) of the function by utilizing GD

Neural Network

Define the layers
Compile the model and tell it which loss function to use and then average the summation of all the losses to calculate the cost
Fit the model to try to minimize the cost of the model

So let’ break down the 3 steps individually

Create Model

As shown above we start by defining the model
We know the parameters for the first layer are w^[1], b^[1]
for the second layer: w^[2], b^[2] and so on
we identify the number of units per layer
We are using the logistic regression function sigmoid
Now we have everything we need to create the model

Loss & Cost Functions

Since we are working on the handwritten digit classification problem, a binary classification model is what we will need where we are comparing prediction vs target
In TF the binary logistic loss function is known as Binarycrossentropy
Keras is a library that contains the Binaycrossentropy function
So we compile our model using that function to calculate the Loss and Cost of our model
Having specified the loss for one training example, TF knows that it needs to optimize the cost function for this model while training it on the sample input (training data)
Remember that the cost function J(W,B) includes the loss for all the parameters for all rows and layers
Remember that f_w,b(x) is the value for each function for each parameter applied to each X or row of training data
If we were solving a regression problem where we are predicting a number and not categories/classes, we can use a different loss/cost function
We can use for example the MeanSquaredError() function instead of the BinaryCrossentropy() function

Gradient Descent

Finally we will minimize the cost of the model for every parameter using GD
We set the iteration number
TF uses back propagation to calculate all the partial derivatives for all the parameters and input rows in the model.fit()
the fit function requires the input and label, as well as the number of iterations = epochs
The fit function basically updates the NN parameters in order to reduce the cost