Neural Networks for Classification with TensorFlow
Notebook demonstrates Neural Networks for Binary and Multi-class classification Problems with TensorFlow
- Neural Network for Classification with TensorFlow
- Classification Inputs and Outputs:
- Architecture of a Classification model
- Creating data to view and fit
- Steps in Modelling
- Improving our model
- The Missing Piece: Non-linearity:
- Evaluating and improving our classification model
- Plot the Loss or Training Curve:
- Finding the best learning rate
- More classification evaluation methods
- Working with larger examples (Multi-class Classification)
This Notebook is an account of working for the Udemy course by Daniel Bourke: TensorFlow Developer Certificate in 2022: Zero to Mastery
Classification problems are those where our motive is to classify given data by predicting the labels for the input data after the networks looks at a large sample of examples
Example Classification Problems:
-
Is this email spam or not spam? (Binary Classification)
-
Is this a photo of sushi, steak or pizza? (Multi-Class Classification problem)
-
What tags should this article have?(Multi-label Classification)
This Notebook covers:
-
Architecture of a neural network
classification
model. -
Input shapes and output shapes of a
classification
model(features and labels) -
Creating custom data to view and fit.
-
Steps in modelling
- Creating a model
- compiling a model
- fitting a model
- evaluating a model.
-
Different
classification
evaluation methods. -
Saving and Loading models.
[batch_size
, width
, height
, colour_channels
] gets represented as a tensor.
Example shape of the tensor encoded from an image of resolution 224X224 pixels.
Shape = [None,224,224,3] or
Shape = [32,224,224,3]
we have set the batch_size as 32 in the second one because it is the common batch to take for a model. It's actually a default batch_size in tensorflow. we mostly take batches of 32 so that we don't run out of memory during our training. The batch_size can be configured according to the memory of the computer we are working on. some of the other examples of batch sizes are: 64, 128, 256 etc..
Examples of output shape:
Shape = [n], where n is the number of classes we have in our classification model.
Architecture of a Classification model
Example architecture of a classification model:
# 1. Create a model (specified to your problem)
model = tf.keras.Sequential([
tf.keras.Input(shape= (224,224,3)),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(3, activation = "softmax")
])
# 2. Compile the Model
model.compile(loss = tf.keras.losses.CategoricalCrossentropy(),
optimizer = tf.keras.Adam(),
metrics = ["accuracy"])
# 3. Fit the model
model.fit(X_train, y_train, epochs = 5)
# 4. Evaluate the model
model.evaluate(X_test, y_test)
Hyperparameter | Binary Classification | Mutliclass Classification |
---|---|---|
Input layer shape | Same as number of features | Same as Binary Classification |
Hidden Layer(s) | Problem Specific | Same as Binary Classification |
Neurons per hidden layer | Problem Specific(genearlly 10 to 100) | Same as Binary Classification |
Output layer shape | 1(one class or the other) | 1 per class |
Hidden Activation | Usually ReLU(rectified linear unit) | Same as Binary Classification |
Output Activation | Sigmoid | Softmax |
Loss function | Cross Entropy(tf.keras.losses.BinaryCrossentropy ) |
Cross entropy(tf.keras.losses.CategoricalCrossentropy ) |
Optimizer | SGD(Stochastic gradient Descent), Adam | Same as Binary Classification |
from sklearn.datasets import make_circles
# Make 1000 examples
n_samples = 1000
# Create circles
X, y = make_circles(n_samples,
noise = 0.03,
random_state = 42)
X
y[:10]
our data is hard to understand right now. Let's visualize it!
import pandas as pd
import numpy as np
circles = pd.DataFrame({"X0": X[:,0], "X1": X[:,1], "lablel": y})
circles
import matplotlib.pyplot as plt
from matplotlib import style
style.use('dark_background')
plt.scatter(X[:,0], X[:,1], c= y, cmap = plt.cm.RdYlBu );
Resources:
- Neural Networks Playground You can tweak the hyperparameters and visualize the results in a much more interactive way.
X.shape, y.shape
len(X), len(y)
X[0], y[0]
import tensorflow as tf
print(tf.__version__)
tf.random.set_seed(42)
# 1. Create the model using Sequential API
model_1 = tf.keras.Sequential([
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model_1.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.SGD(),
metrics = ["accuracy"])
# 3. Fit the model
model_1.fit(X,y, epochs = 5)
model_1.fit(X,y ,epochs = 200, verbose = 0)
model_1.evaluate(X,y)
Since we are working on a binary classification problem and our model is getting around ~50% accuracy so, it's practically guessing
# Add another layer and train for longer
# Set the random seed
tf.random.set_seed(42)
# 1. Create a model, this time with 2 layers
model_2 = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model_2.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.SGD(),
metrics = ["accuracy"])
# 3. Fit the model
model_2.fit(X,y, epochs = 100, verbose = 0)
model_2.evaluate(X,y)
Improving our model
Let's see what we change in each of the modelling steps:
-
Create a model - we might add more layers or increase the number of hidden units within a layer.
-
Compiling a model - we can choose different optimization function like Adam, instead of SGD.
-
Fitting a model - we can fit our model for more epochs(train for longer)
# Set the random seed
tf.random.set_seed(42)
# 1. Create the model (this time 3 layers)
model_3 = tf.keras.Sequential([
tf.keras.layers.Dense(100), # add 100 dense neurons
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1) # add another layer with 10 neurons
])
# 2. Compile the model
model_3.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# 3. Fit the model
model_3.fit(X,y, epochs=100, verbose = 0)
model_3.evaluate(X,y)
still ~50% accuracy ..really bad result for us
To visualize our model's predictions let's create a function plot_decision_boundary
, this will:
-
Take in a trained model, features(X) and labels(y)
-
Create a meshgrid of te different X values.
-
Make predictions across the meshgird
- Plot the predictions as well as line between zones(where each unique class falls)
def plot_decision_boundary(model, X,y):
"""
plots the decision boundary created by a model prediction on X
"""
# Define the axis boundaries of the plot and create a meshgrid
x_min, x_max = X[:,0].min() -0.1, X[:,0].max() + 0.1
y_min, y_max = X[:,1].min() - 0.1, X[:,1].max() + 0.1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
np.linspace(y_min,y_max,100))
# Create X value(you are goind to make predictions on these)
x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together
# Make predictions
y_pred = model.predict(x_in)
# Check for mutli-class
if len(y_pred[0]) > 1 :
print("doing multi class classification")
# We have to reshape our prediction to get them ready for plotting\
y_pred = np.argmax(y_pred, axis = 1).reshape(xx.shape)
else:
print("doing binary classification")
y_pred = np.round(y_pred).reshape(xx.shape)
# Plot the decision boundary
plt.contourf(xx, yy, y_pred, cmap = plt.cm.RdYlBu, alpha = 0.7)
plt.scatter(X[:,0], X[:,1], c= y, s= 40, cmap = plt.cm.RdYlBu)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plot_decision_boundary(model_3, X, y)
This looks like model is fitting a linear boundary. That is the reason why our model is performing poorly on our metrics.
# Let's see if our model can be used for a regression problem..
tf.random.set_seed(42)
# Create some regression data
X_regression = tf.range(0,1000,5)
y_regression = tf.range(100,1100, 5) # # y = X +100
# Let's split our training data into training and test splits
X_reg_train = X_regression[:150]
X_reg_test = X_regression[150:]
y_reg_train = y_regression[:150]
y_reg_test = y_regression[150:]
We compiled our model for a binary classification problem, Now we are working on a regression problem, let's change the model to suit our data.
tf.random.set_seed(42)
# 1. Create teh model
model_3 = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
])
# 2. Compile the model, this time with a regression-specific loss function
model_3.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.Adam(),
metrics = ["mae"])
# 3. Fit the model
model_3.fit(tf.expand_dims(X_reg_train, axis=-1), y_reg_train, epochs =100, verbose = 0)
y_reg_preds = model_3.predict(X_reg_test)
# Plot the model's predictoins against our regression data
plt.figure(figsize = (10,7))
plt.scatter(X_reg_train, y_reg_train, c= 'b', label = "Training data")
plt.scatter(X_reg_test, y_reg_test, c = 'g', label ="Test data")
plt.scatter(X_reg_test, y_reg_preds, c = 'r', label = "Predictions")
plt.legend();
We missed to capture the Non-linearity !
# Set the seed
tf.random.set_seed(42)
# 1. Create the model
model_4 = tf.keras.Sequential([
tf.keras.layers.Dense(1, activation = tf.keras.activations.linear)
])
# 2. Compile the model
model_4.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(lr = 0.001),
metrics = ["accuracy"])
# 3. Fit the model
history = model_4.fit(X, y, epochs = 100)
plt.scatter(X[:,0], X[:,1], c = y, cmap = plt.cm.RdYlBu);
plot_decision_boundary(model_4, X, y)
Let's try build our first neural network with a non-linear activation function
# Set random seed
tf.random.set_seed(42)
# 1. Create a model with non-linear activation
model_5 = tf.keras.Sequential([
tf.keras.layers.Dense(1, activation = tf.keras.activations.relu)
])
# 2. Compile the model
model_5.compile(loss= tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.Adam(lr = 0.001),
metrics = ["accuracy"])
# 3. Fit the model
history = model_5.fit(X,y,epochs= 100)
Still the result looks bad. Our model is performing worse than guessing.
#set the random seed
tf.random.set_seed(42)
# 1. Create the model
model_6 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = 'relu'),
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model_6.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(lr = 0.001),
metrics = ["accuracy"])
# 3. Fit the model
history = model_6.fit(X,y, epochs = 250)
Let's evaluate the model to see how it is doing
model_6.evaluate(X,y)
It's performing much worse than guessing now.
plot_decision_boundary(model_6, X, y)
It's time to fix the model and provide the missing piece and that is
#set the random seed
tf.random.set_seed(42)
# 1. Create the model
model_7 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# 2. Compile the model
model_7.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(lr = 0.001),
metrics = ["accuracy"])
# 3. Fit the model
history = model_7.fit(X,y, epochs = 250)
model_7.evaluate(X,y)
plot_decision_boundary(model_7,X,y)
Wow, we are very close our model segregated the data points almost all the blue ones are inside the blue decision boundary.
Note: The combination of linear(straight lines) and non-linear(non-straight lines) functions is one of they key fundamentals of neural networks.
A = tf.cast(tf.range(-10,10), tf.float32)
A
plt.plot(A);
It's a straight line!!
def sigmoid(x):
return 1/(1+tf.exp(-x))
# Use the sigmoid function on our toy tensor
sigmoid(A)
plt.plot(sigmoid(A));
The sigmoid function transformed the linear graph into non-linear plot
def relu(x):
return tf.maximum(0,x)
relu(A)
plt.plot(relu(A));
tf.keras.activations.linear(A)
plt.plot(tf.keras.activations.linear(A));
So, a linear function doesn't do anything for our input data. So for our data which had non-linear relation it failed to plot the best decision boundary.
A brief review of standard activation functions:
- Linear:
tf.keras.activations.linear(A)
- Sigmoid:
tf.keras.activations.sigmoid(A)
- ReLU:
tf.keras.activations.relu(A)
Evaluating and improving our classification model
so far we have been training and testing on the same data set. But we risk the model overfitting more to training data and achieve high accuracy on train set but fail to give us correct predictions on test set. And that is disastrous for an ML application. Three sets of data:
- Training set
- Validation set
- Test set
X_train, y_train = X[:800], y[:800]
X_test, y_test = X[800:], y[800:]
X_train.shape, X_test.shape, y_train.shape, y_test.shape
# Let's create a model to fit on the training data and evaluate on the testing data
# Set the random seed
tf.random.set_seed(42)
# 1. Create the model (same as model_7)
model_8 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# 2. Compile the Modle
model_8.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(lr = 0.01),
metrics = ["accuracy"])
# 3. Fit the Model
history = model_8.fit(X_train, y_train, epochs = 25)
model_8.evaluate(X_test, y_test)
plt.figure(figsize = (12,6))
plt.subplot(1,2,1)
plt.title("Train")
plot_decision_boundary(model_8, X_train, y_train)
plt.subplot(1,2,2)
plt.title("Test")
plot_decision_boundary(model_8, X_test, y_test)
Excellent!! our model performed with 100% accuracy on the test set and that too with 25 epochs (because we have turned up the learning rate by 10X)
model_8.summary()
history.history
# Convert the history object into a DataFrame
pd.DataFrame(history.history)
pd.DataFrame(history.history).plot()
plt.title("Loss curve of Model_8")
Note: For many problems, the loss function going down means the model is improving (the predictions it's maksing are getting closer to the ground truth labels)
Finding the best learning rate
To find the ideal learning rate (the learning rate where the loss decreases the most during training) we're going to use the following steps:
- A learning rate callback - you can think of a call back as an extra piece of functionality, you can add to your model while training
- Create another model
- A modified loss curve plot
# Set the random seed
tf.random.set_seed(42)
# Create a model (same as model_8)
model_9 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# Compile the Model
model_9.compile(loss = "binary_crossentropy",
optimizer = "Adam",
metrics = ["accuracy"])
# Create a learning rate call back
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20))
# Fit the model
history_9 = model_9.fit(X_train,
y_train,
epochs = 100,
callbacks = [lr_scheduler])
pd.DataFrame(history_9.history).plot(figsize=(10,7), xlabel = "epochs");
# Plot the learning rate versus the loss
lrs = 1e-4 * (10**(tf.range(100)/20))
plt.figure(figsize=(10,7))
plt.semilogx(lrs, history_9.history["loss"])
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("learning rate vs. Loss");
Let's analyze the plot and pick the best learning rate. You can use this plot to pick the learning rate when you are trying to improve the performance of your model. genearally the default learning rates that come with the pre-built functions works well too.
In this case, pick the learning rate around the value where the the loss decreases faster.
10**0, 10**-1, 10**-2, 10**-3, 10**-4
Pick the value 0.02 where the loss is still decreasing steadily.
10**-2
# Let's try using a higher *ideal* learning rate with the same model
# Set the random seed
tf.random.set_seed(42)
# Create a model (same as model_8)
model_10 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# Compile the Model
model_10.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(lr = 0.02),
metrics = ["accuracy"])
# Fit the model with 20 epochs(5 less than before)
history_10 = model_10.fit(X_train,
y_train,
epochs = 20)
model_10.evaluate(X_test,y_test)
model_8.evaluate(X_test,y_test)
Which model should we chooose?
one has got the best accuracy and the other has very low loss value.
plt.figure(figsize = (12,6))
plt.subplot(1,2,1)
plt.title("Train")
plot_decision_boundary(model_10, X_train, y_train)
plt.subplot(1,2,2)
plt.title("Test")
plot_decision_boundary(model_10, X_test,y_test)
plt.show();
Classification Evaluation methods:
Keys: tp = true-positive
, tn = true-negative
, fp = false-positive
, fn = false-negative
.
Metric Name | Metric Formula | Code | When to use |
---|---|---|---|
Accuracy | Accuracy = $\frac{tp=tn}{tp+tn+fp+fn}$ |
tf.keras.metrics.Accuracy() or sklearn.metrics.accuracy_score()
|
Default metric for classification problems. Not the bset for imbalanced cases |
Precision | Precision = $\frac{tp}{tp+fp}$ |
tf.keras.metrics.Precision() or sklearn.metrics.precision_score()
|
High precision lead to less false positives |
Recall | Recall = $\frac{tp}{tp+fn}$ |
tf.keras.metrics.Recall() or sklearn.metrics.recall_score()
|
Higher recall leads to less false negatives |
F1-Score | F1-Score = $2\frac{precision \cdot recall}{precision + recall}$ | sklearn.metrics.f1_score() |
Combination of precision and recall, usually a good overall metric for a classification model |
Confusion matrix | NA |
custom function or sklearn.metrics.confusion_matrix()
|
When comparing predictions to truth labels to see where model gets confused. Can be hard to use with large number of classes |
loss, accuracy = model_10.evaluate(X_test, y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {(accuracy*100) : .2f}% ")
Anatomy of confusion matrix
- True positive = model predicts 1 when truth is 1
- True negative = model predicts 0 when truth is 0
- False positive = model predicts 1 when truth is 0
- False negative = model predicts 0 when truth is 1
In the confusion matrix the correct predictions are along the diagonals.(i.e. True positives, True negatives)
from sklearn.metrics import confusion_matrix
# Make predictions
y_preds = model_10.predict(X_test)
# Create confusion matrix
confusion_matrix(y_test, y_preds)
Let's check the values of y_preds we passed in
y_preds[1:10]
Looks like our predictions array has come out in prediction probability form. The standard output from the sigmoid(or softmax)
tf.round(y_preds)[:10]
confusion_matrix(y_test, tf.round(y_preds))
Now, we got our confusion matrix let's evaluate and make the confusion matrix in much more understandable form
import itertools
figsize = (10,10)
# Create the confusion matrix
cm = confusion_matrix(y_test, tf.round(y_preds))
cm_norm = cm.astype("float")/ cm.sum(axis = 1)[:, np.newaxis] # Noramlize our confusion matrix
n_classes = cm.shape[0]
# Let's make it neat and clear
fig, ax = plt.subplots(figsize = figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap = plt.cm.Blues)
fig.colorbar(cax)
# Create classes
classes = False
if classes:
labels = classes
else:
labels = np.arange(cm.shape[0])
# Let's label the axis
ax.set(title = "Confusion Matrix",
xlabel = "Predicted Label",
ylabel = "True Label",
xticks = np.arange(n_classes),
yticks = np.arange(n_classes),
xticklabels = labels,
yticklabels = labels)
# Set x-axis labels to the bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()
# ADjust label size
ax.yaxis.label.set_size(20)
ax.xaxis.label.set_size(20)
ax.title.set_size(20)
# Set threshold for different colors
threshold = (cm.max() + cm.min())/2.
# Plot the text on each cell
for i,j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, f"{cm[i,j]} ({cm_norm[i,j]*100:.1f}% )",
horizontalalignment = "center",
color = "white" if cm[i,j] > threshold else "black",
size = 15)
We will try to code up a function to reuse the plotting of confusion matrix for another time
def plot_confusion_matrix(y_test,y_preds):
import itertools
figsize = (10,10)
# Create the confusion matrix
cm = confusion_matrix(y_test, tf.round(y_preds))
cm_norm = cm.astype("float")/ cm.sum(axis = 1)[:, np.newaxis] # Noramlize our confusion matrix
n_classes = cm.shape[0]
# Let's make it neat and clear
fig, ax = plt.subplots(figsize = figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap = plt.cm.Blues)
fig.colorbar(cax)
# Create classes
classes = False
if classes:
labels = classes
else:
labels = np.arange(cm.shape[0])
# Let's label the axis
ax.set(title = "Confusion Matrix",
xlabel = "Predicted Label",
ylabel = "True Label",
xticks = np.arange(n_classes),
yticks = np.arange(n_classes),
xticklabels = labels,
yticklabels = labels)
# Set x-axis labels to the bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()
# Adjust label size
ax.yaxis.label.set_size(20)
ax.xaxis.label.set_size(20)
ax.title.set_size(20)
# Set threshold for different colors
threshold = (cm.max() + cm.min())/2.
# Plot the text on each cell
for i,j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, f"{cm[i,j]} ({cm_norm[i,j]*100:.1f}% )",
horizontalalignment = "center",
color = "white" if cm[i,j] > threshold else "black",
size = 15)
plot_confusion_matrix(y_test,y_preds)
Great, our function is working and we can reuse the function for other models from now on so that we can save time rewriting all the code for the above plot.
Working with larger examples (Multi-class Classification)
When you have more than two classes as an option, it is known as multi-class classification.
- This means if you have 3 different classes, it's multi-class classification
- It aslo means if we have hundred different classes, it's a multi-class classification.
To practice mutli-class classification, we are going to build a neural netowrk to classify images of different items of clothing.
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
fashion_mnist = tf.keras.datasets.fashion_mnist
# The data has already been sorted into training and test sets for us
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()
train_data[2]
train_labels[0]
# Show the first training example
print(f"Training sample:\n {train_data[0]}\n")
print(f"Training label:\n {train_labels[0]}\n")
train_data[0].shape, train_labels[0].shape
import matplotlib.pyplot as plt
plt.imshow(train_data[7]);
train_labels[7]
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
len(class_names)
index_of_choice = 100
plt.imshow(train_data[17], cmap = plt.cm.binary)
plt.title(class_names[train_labels[index_of_choice]])
import random
plt.figure(figsize = (7,7))
for i in range(4):
ax = plt.subplot(2,2,i+1)
rand_index = random.choice(range(len(train_data)))
plt.imshow(train_data[rand_index], cmap = plt.cm.binary)
plt.title(class_names[train_labels[rand_index]])
plt.axis(False)
Building a Multi-class classification model
For our multi-class classification model, we can use similar architecture to our binary classifier, however we have to tweak few things:
- Input shape = 28 x 28 (the shape of one image)
- Output shape = 10(one per class of clothing)
- Loss function = tf.keras.losses.CategoricalCrossentropy()
- If your labels are one-hot encoded, use CategoricalCrossentropy()
- If your labels are integer form use SparseCategoricalentropy()
- Output layer activation =
softmax
notsigmoid
flatten_model = tf.keras.Sequential([tf.keras.layers.Flatten(input_shape=(28,28))])
flatten_model.output_shape
The dimension 784
is the total number of pixels in a 28 x 28 greyscale image
train_labels[0:10]
our train labels are in the form of integers we need to transform into one-hot encoded vector.
If we change the loss to sparse categorical entropy it does this task by itself.
tf.one_hot(train_labels[:10], depth=10), tf.one_hot(test_labels[:10], depth=10)
# Set the random seed
tf.random.set_seed(42)
# Create the model
model_11 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (28,28)), # To avoid shape mismatch errors we flatten the model
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(10, activation = tf.keras.activations.softmax) # Changed to softmax for mutli-class detection
])
# compile the model
model_11.compile(loss = tf.keras.losses.CategoricalCrossentropy(), # changed the loss function
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Fit the model
non_norm_history = model_11.fit(train_data,
tf.one_hot(train_labels,depth=10),
epochs =10,
validation_data = (test_data, tf.one_hot(test_labels, depth=10)))
model_11.summary()
train_data.min(), test_data.max()
Neural Networks prefer data to be scaled or normalized this means, they like to have the numbers in the tensors between 0 & 1
train_data_norm = train_data/ 255.0
test_data_norm = test_data/ 255.0
# Check the min and max values of the scaled training data
train_data_norm.min(), train_data_norm.max()
tf.random.set_seed(42)
# Create a model (same as model_11)
model_12 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape= (28,28)),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(10, activation = "softmax")
])
# Compile the model
model_12.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Fit the Model
norm_history = model_12.fit(train_data_norm,
train_labels,
epochs = 10,
validation_data = (test_data_norm,test_labels))
Note: Neural Networks tend to prefer data in numerical form as well as sclaed/normalized form.
import pandas as pd
# Plot non-normalized data loss curves
pd.DataFrame(non_norm_history.history).plot(title= "Non-normalized data")
# Plot normalized data loss curves
pd.DataFrame(norm_history.history).plot(title="Normalized data")
Note: Same model with even slightly different data can produce dramatically different results. So, when you are comparing models, it's important to make sure you are comparing them on the same criteria(eg. same architecture but different data or same data but different architecture)
Brief description of steps in modelling:
- Turn all the data into numbers (neural networks can't handle strings)
- Make sure all of your tensors are the right shape
- Scale features (normalize or standardize, neural networks tend to prefer normalization)
# Set random seed
tf.random.set_seed(42)
# Create model
model_13 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (28,28)),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(10 ,activation = "softmax")
])
# Compile the model
model_13.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Create a learning rate call back
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 10**(epoch/20))
# Fit the model
find_lr_history = model_13.fit(train_data_norm,
train_labels,
epochs= 40,
validation_data = (test_data_norm, test_labels),
callbacks = [lr_scheduler])
import numpy as np
import matplotlib.pyplot as plt
lrs = 1e-3 * (10**(tf.range(40)/20))
plt.semilogx(lrs, find_lr_history.history["loss"])
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Finding the ideal learning rate")
How do we find ideal learing rate?
- Just take a look at the graph and see at what value the above graph the curve is lowest and go backwards a little.
SO, it looks like $10^{-3}$ seems to be good
# Let's refit a model with the ideal learning rate
# Set the random seed
tf.random.set_seed(42)
# Create the model
model_14 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (28,28)),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(10, activation = "softmax")
])
# Compile the model
model_14.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Fit the model
history_14 = model_14.fit(train_data_norm,
train_labels,
epochs = 20,
validation_data = (test_data_norm,test_labels))
Evaluating our multi-class classification model
To evaluate our multi-class classification model we could:
- Evaluate our multi-class classification metrics(such as confusion matrix)
- Assess some of its predictions (through visualizations)
- Improve its results (by training it for longer or changing the architecture)
- Save and export it for use in application.
Let's go through the first two steps for evaluation process
import itertools
from sklearn.metrics import confusion_matrix
figsize = (10,10)
def make_confusion_matrix(y_true, y_pred, classes=None, figsize= (15,15), text_size= 10 ):
# Create the confusion matrix
cm = confusion_matrix(y_true, tf.round(y_pred))
cm_norm = cm.astype("float")/ cm.sum(axis = 1)[:, np.newaxis] # Noramlize our confusion matrix
n_classes = cm.shape[0]
# Let's make it neat and clear
fig, ax = plt.subplots(figsize = figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap = plt.cm.Blues)
fig.colorbar(cax)
# set labels to be classes
if classes:
labels = classes
else:
labels = np.arange(cm.shape[0])
# Let's label the axis
ax.set(title = "Confusion Matrix",
xlabel = "Predicted Label",
ylabel = "True Label",
xticks = np.arange(n_classes),
yticks = np.arange(n_classes),
xticklabels = labels,
yticklabels = labels)
# Set x-axis labels to the bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()
# Adjust label size
ax.yaxis.label.set_size(text_size)
ax.xaxis.label.set_size(text_size)
ax.title.set_size(text_size)
# Set threshold for different colors
threshold = (cm.max() + cm.min())/2.
# Plot the text on each cell
for i,j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, f"{cm[i,j]} ({cm_norm[i,j]*100:.1f}% )",
horizontalalignment = "center",
color = "white" if cm[i,j] > threshold else "black",
size = 15)
class_names
y_probs = model_14.predict(test_data_norm) # probs is short for prediction probability
# View the first 5 predictions
y_probs[:5]
NOTE:
Remember to make predictions on the same kind of data your model ws trained on(eg. if you trained your model on normalized data, you'll want to make predictions on normalized data)
y_probs[0], tf.argmax(y_probs[0]) , class_names[tf.argmax(y_probs[0])]
y_preds = y_probs.argmax(axis = 1)
# View the first 10 prediction labels
y_preds[:10]
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true = test_labels,
y_pred = y_preds)
make_confusion_matrix(y_true = test_labels,
y_pred = y_preds,
classes = class_names,
figsize = (20,20),
text_size = 10)
Note:
Often when working with images and other forms of visual data, it's a good idea to visualize as much as possible to develop a further understanding of the data and the inputs and outputs of your models.
How about we create a function for:
- Plot a random image
- Make a prediction on said image
- Label the plot with the truth label & the predited label.
import random
def plot_random_image(model,images, true_labels, classes):
"""
Picks a random image , plots it and labels it with a prediction and truth value
"""
# Setup random integer
i = random.randint(0,len(images))
# Create a prediction
target_image = images[i]
pred_probs = model.predict(target_image.reshape(1,28,28))
pred_label = classes[pred_probs.argmax()]
true_label = classes[true_labels[i]]
# Plot the image
plt.imshow(target_image, cmap = plt.cm.binary)
# Change the color of the titles depending on if the prediction is right or wrong
if pred_label == true_label:
color = "green"
else:
color = "red"
# ADD xlabel infomation (predictions/true label)
plt.xlabel("pred:{} {:2.0f}% (True: {})".format(pred_label,
100*tf.reduce_max(pred_probs),
true_label,
color = color)) # set color to green or red based on if prediction is correct or wrong.
plot_random_image(model= model_14,
images = test_data_norm, # always make predictions on the same kind of data your model was trained on
true_labels = test_labels,
classes = class_names)
Yeah! we finally did it. We are getting the correct predictions for the images in the dataset. Run the cell to view predictions for random images in the dataset.
model_14.layers
model_14.layers[1]
weights, biases = model_14.layers[1].get_weights()
#Shapes
weights, weights.shape
In (784,4), 784 is the number of pixel values in the image from the dataset.
and the number 4 is the number of nodes in that layer of the neural network
Reading resouce: kernel initializer and glorot_uniform in the tf.keras.layers.Dense
Now let's check out the bias vector
biases, biases.shape
The difference between Weights matrix and bias vector is that the weight matrix has one value per data point. Whereas bias vector has one value per hidden unit of that layer.
Every neuron has a bias vector. Each of these is paired with a weights matrix.
The Bias vector get initialized as zeros(at least in the case of a TensorFlow Dense layer).
The Bias vector dictates how much the patterns within the corresponding weights matrix should influence the next layer.
from tensorflow.keras.utils import plot_model
# See the inputs and outputs of each layer
plot_model(model_14,show_shapes= True)