NN Training with TF
We’ve covered the digit recognition model, and we’ll go into details here on how to train it with TF. Here is a recap and an overview of what we’ve done and will do
- Remember we have a 3 layer model with
- 25, 15 & 1 unit per layer
- Given (x,y) as examples (input)
- 1- We sequentially string together 3 layers using the Sequential() function
- 2-Ask TF to compile the model using a Loss and Gradient Descent procedures, in this case we will use BinaryCrossentropy()
- Binay is because we want to classify if the value is a 0 or 1
- Of course if you were wanting a regression model then you can use a different function to calculate the Loss and GD
- 3- Call the fit() to train the model using X and Y as inputs as well as
- Set the number of iterations with epochs
Train NN
Logistic Regression
So let’s break it down. Let’s go back to how we trained a logistic regression model.
- Specify how to compute the output given input x and parameters w, b and therefore we specified the model fw,b(x)=?
- We then specified the loss and cost functions to use.
- Remember the loss L(fw,b(x),y) is for one example while the cost is the average summation of all the losses for all the rows and all parameters
- Then we minimize the cost J(w,b) of the function by utilizing GD
Neural Network
- Define the layers
- Compile the model and tell it which loss function to use and then average the summation of all the losses to calculate the cost
- Fit the model to try to minimize the cost of the model
So let’ break down the 3 steps individually
Create Model
- As shown above we start by defining the model
- We know the parameters for the first layer are w[1], b[1]
- for the second layer: w[2], b[2] and so on
- we identify the number of units per layer
- We are using the logistic regression function sigmoid
- Now we have everything we need to create the model
Loss & Cost Functions
- Since we are working on the handwritten digit classification problem, a binary classification model is what we will need where we are comparing prediction vs target
- In TF the binary logistic loss function is known as Binarycrossentropy
- Keras is a library that contains the Binaycrossentropy function
- So we compile our model using that function to calculate the Loss and Cost of our model
- Having specified the loss for one training example, TF knows that it needs to optimize the cost function for this model while training it on the sample input (training data)
- Remember that the cost function J(W,B) includes the loss for all the parameters for all rows and layers
- Remember that fw,b(x) is the value for each function for each parameter applied to each X or row of training data
- If we were solving a regression problem where we are predicting a number and not categories/classes, we can use a different loss/cost function
- We can use for example the MeanSquaredError() function instead of the BinaryCrossentropy() function
Gradient Descent
- Finally we will minimize the cost of the model for every parameter using GD
- We set the iteration number
- TF uses back propagation to calculate all the partial derivatives for all the parameters and input rows in the model.fit()
- the fit function requires the input and label, as well as the number of iterations = epochs
- The fit function basically updates the NN parameters in order to reduce the cost