Convolutional Neural Networks and Computer Vision with TensorFlow
Notebook demonstrates Convolutional Neural Networks for Computer vision applications with TensorFlow
- Convolutional Neural Networks and Computer Vision with TensorFlow
- Multi-class Image Classification
- 1. Import data
- 2. Preprocess the data(prepare it for a model)
- 3. Create a Model(start with a baseline)
- 4. Fit the Model
- 5. Evaluate the Model
- 6. Adjust the Hyperparamaters and Improve the model(to beat baseline/reduce overfitting)
- Trying to reduce overfitting with data augmentation
- 7. Repeat until satisfied
- Making a prediction with our trained model on custom data
- Saving and Loading our Model
- Bibliography:
This Notebook is an account of working for the Udemy course by Daniel Bourke: TensorFlow Developer Certificate in 2022: Zero to Mastery
Concepts covered in this Notebook:
- Getting a dataset to work with
- Architecture of a convolutional neural network(CNN) with TensorFlow.
- An end-to-end binary image classification problem
- Steps in modelling with CNNs:
- Creating a CNN
- Compiling a model
- Fitting a Model
- evaluating a model
- An end-to-end mutli-class image classification problem
- Making predictions on our own custom images.
Architecture of a CNN
Hyperparameter/Layer type | What does it do? | Typical Values |
---|---|---|
Input image(s) | Target images you'd like to discover patterns in | Whatever you can take a photo or video of |
Input layer | Takes in target images and preprocesses them for further layers | input_shape = [batch_size, image_height, image_width, color_channels] |
Convolution layer | Extracts/learns the most important features from target images | Multiple, can create with tf.keras.layers.ConvXD (X can be multiple values |
Hidden activation | Adds non-linearity to learned features(non-straight lines) | Usually ReLU (`tf.keras.activations.relu) |
Pooling layer | Reduces the dimensionality of learned image features | Average(tf.keras.layers.AvgPool2D ) or Max(tf.keras.layers.MaxPool2D ) |
Fully Connected layer | Further refines learned features from convolution layers | tf.keras.layers.Dense |
Output layer | Takes learned features and outputs them in shape of target labels | output_shape = [number_of_classes] (e.g. 3 for pizza, steack or sushi) |
Output activation | Adds non-linearity to output layer |
tf.keras.activations.sigmoid (binary classification) or tf.keras.activations.softmax (multi-class classification) |
An example of a CNN model in TensorFlow:
# 1. Create a CNN Model (same as Tiny VGG model)
cnn_model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters=10,
kernel_size = 3, # can also be (3,3)
activation = 'relu',
input_shape = (224,224,3)), # Specify the input shape(height, width, colour channels)
tf.keras.layers.Conv2D(10,3, activation = "relu"),
tf.keras.layers.MaxPool2D(pool_size= 2, # pool_size can be (2,2)
padding = "valid"), # Padding can also be 'same'
tf.keras.layers.Conv2D(10,3, activation = "relu"),
tf.keras.layers.Conv2D(10,3, activation = "relu"), # activation = 'relu' == tf.keras.layers.Activaions(tf.nn.relu)
tf.keras.MaxPool2D(2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1, activation = "sigmoid") # Binary activation output
])
# 2. Compile the model
cnn_model.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# 3. Fit the model
history = cnn_model.fit(train_data, epochs = 5)
Get the data
The images we're working with are from the Food101 dataset (101 different classes of food): kaggle Food101 Dataset
However we've modified it to only use two classes (pizza & steak) using the image_data_modification Notebook
Note: We start with smaller dataset so we can experiment quickly and figure what works and what doesn't work before scaling up.
import zipfile
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip
# Unzip the downloaded file
zip_ref = zipfile.ZipFile("pizza_steak.zip")
zip_ref.extractall()
zip_ref.close()
!ls pizza_steak
!ls pizza_steak/train
# Let's check what is inside the pizza training data
!ls pizza_steak/train/pizza
import os
# Walk through pizza_steak directory and list number of files
for dirpath, dirnames, filenames, in os.walk("pizza_steak"):
print(f"There are {len(dirnames)} directories and {len(filenames)} images in {dirpath}")
!ls -la pizza_steak
num_steak_images_train = len(os.listdir("pizza_steak/train/steak"))
num_steak_images_train
Note: To visualize our images, first let's get the class names programmatically
import pathlib
import numpy as np
data_dir = pathlib.Path("pizza_steak/train")
class_names = np.array(sorted([item.name for item in data_dir.glob("*")]))
# Created a list of classnames from subdirectories
print(class_names)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random
plt.style.use('dark_background')
def view_random_image(target_dir, target_class):
# Setup target directory (we'll view images from here)
target_folder = target_dir+target_class
# Get a random image path
random_image = random.sample(os.listdir(target_folder), 1)
# Read in the image and plot it using matplotlib
img = mpimg.imread(target_folder + "/" + random_image[0])
plt.imshow(img)
plt.title(target_class)
plt.axis("off");
print(f"Image shape: {img.shape}") # show the shape of the image
return img
img = view_random_image(target_dir="pizza_steak/train/",
target_class="steak")
import tensorflow as tf
tf.constant(img)
img.shape # Returns width, height, colour channels
# Normalize the pixel values (Data preprocessing step)
img/255.
It took so much time for one epoch becuase we are not using any GPU accelerator. It doesn't have anything to do with our code but convolution layers take up a lot of computation and for a dataset that big the normal CPU takes a lot of time. GPU are excellent in number crunching with very fast speeds. So, let's change the runtime type and Use GPU Accelerator.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Set the seed
tf.random.set_seed(42)
# Preprocess data (get all of the pixel values between 0 and 1)
# This is also called scaling/normalization.
train_datagen = ImageDataGenerator(rescale = 1./255)
valid_datagen = ImageDataGenerator(rescale = 1./255)
# Setup paths to our data directories
train_dir = "/content/pizza_steak/train"
test_dir = "pizza_steak/test"
# Import data from directories and turn it into batches
train_data = train_datagen.flow_from_directory(directory = train_dir,
batch_size = 32, # Number of image to process at a time
target_size = (224,224),
class_mode = "binary",
seed = 42)
valid_data = valid_datagen.flow_from_directory(directory = test_dir,
batch_size = 32,
target_size = (224,224),
class_mode = "binary",
seed = 42)
# Build a CNN model(same as tiny VGG on the CNN explainer website)
model_1 = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 10,
kernel_size = 3,
activation = "relu",
input_shape = (224,224,3)),
tf.keras.layers.Conv2D(10,3,activation = "relu"),
tf.keras.layers.MaxPool2D(pool_size = 2,
padding = "valid"),
tf.keras.layers.Conv2D(10,3, activation = "relu"),
tf.keras.layers.Conv2D(10,3, activation = "relu"),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# Compile our CNN
model_1.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Fit the model
history_1 = model_1.fit(train_data,
epochs = 5,
steps_per_epoch= len(train_data),
validation_data = valid_data,
validation_steps = len(valid_data))
If it is taking long for the first epoch make sure you have changed the runtime type to GPU. You can change it by going to the runtime option and in the change runtime type selected GPU as your hardware accelerator.
Our model above performed really well. It is prediction at an accuracy of 87% on the validation data( the model hasn't seen this data during training)
model_1.summary()
Using the same model as before (Non-CNN model)
Let's replicate the model we have built in a previous post.
The model we are building is from TensorFlow Playground
tf.random.set_seed(42)
# Create a model to replicate the TensorFlow playground model
model_2 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (224,224,3)),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(4, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# Compile the model
model_2.compile(loss = "binary_crossentropy",
optimizer =tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# Fit the model
history_2 = model_2.fit(train_data,
epochs = 5,
steps_per_epoch = len(train_data),
validation_data=valid_data,
validation_steps = len(valid_data))
model_2.summary()
Despite having 20 times more parameters than our CNN(model_1), model_2 performs poorly.
tf.random.set_seed(42)
# Create the model(same as above but let's modify it for better)
model_3 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (224,224,3)),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(1, activation = "sigmoid")
])
# 2. Compile the model
model_3.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
# 3. Fit the model
history_3 = model_3.fit(train_data,
epochs = 5,
steps_per_epoch = len(train_data),
validation_data = valid_data,
validation_steps = len(valid_data))
model_3.summary()
The model_3 has around 15 million parameters. Model_3 has around 500 times more parameters to train than the model_2 (the CNN model). And this shows the true power of convolution neural networks(CNN)
Note:
You can think of trainable parameters as patterns a model can learn from data. Intuitively, you might think more is better. In several cases that will be true. But the difference here is the two different styles of model we are using.
Convolutional neural network seeks to sort out and learn the most important pattterns in an image. So even though there are less number of parameters in our CNN these are often more helpful in deciphering from visual data.
Binary Classification:
- Getting more familiar with our dataset
- Preprocess the data(eg. scaling/Normalization)
- Create a model (start with a basic one)
- Fit the model
- Evaluate the model
- Adjust different paramters and improve the model (try to beat any benchmarks if exist or make your own benchmark)
- Repeat the steps to get the best possible result(experiment, experiment, experiment)
import os
import matplotlib.pyplot as plt
plt.figure()
plt.subplot(1,2,1)
steak_img = view_random_image("pizza_steak/train/", "steak")
plt.subplot(1,2,2)
pizza_img = view_random_image("pizza_steak/train/" , "pizza")
train_dir = "pizza_steak/train/"
test_dir = "pizza_steak/test/"
Our next step is to turn our data into batches
A batch is a small subset of data. Rather than look at all 10,000 images at one time, a model might only look at 32 at a time.
It does this for a couple or reasons:
- 10,000 images(or more) might not fit into the memory of your processor(GPU)
- Trying to learn the patterns in 10,000 images in one hit could result in thd model not begin able to learn very well.
-
Why a mini-batch size of 32?
Training with large minibatches is bad for your health.
— Yann LeCun (@ylecun) April 26, 2018
More importantly, it's bad for your test error.
Friends dont let friends use minibatches larger than 32. https://t.co/hxx2rGhIG1
!nvidia-smi
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1/255.)
test_datagen = ImageDataGenerator(rescale = 1/255.)
train_data = train_datagen.flow_from_directory(directory = train_dir, # Target directory of images
target_size = (224,224), # Target size of images(height,width)
class_mode = "binary", # type of data you are working with
batch_size = 32) # size of mini-batches to load data into
test_data = test_datagen.flow_from_directory(directory = test_dir,
target_size = (224,224),
class_mode = "binary",
batch_size = 32)
images, labels = train_data.next() # get the "next" batch of images/labels in the batch
len(images), len(labels)
len(train_data) # No. of total images/ mini-batch size
# Get the first two images
images[:2], images[0].shape # we get the output as array of pixel values
labels # we get to know which label by crosschecking the labels with the input data
2. Create a CNN Model (start with a baseline)
A baseline is a relatively simple model or existing result that you setup when beginning a machine learning experiment, as you keep experimenting we try to do more better than our basline.
Note: In deep learning, there is almost an infinite amount of architectures you could create. So one of the best ways to get started is to start with something simple and see if it works on your data and then introduce complexity as required(e.g. look at which current model is performing best in the field for your problem)
Hyperparameter name | What does it do? | Typical valuse |
---|---|---|
Filters | Decides how many filters should pass over an input tensor (e.g. sliding windows over an image) | 10, 32, 64, 128 (higher values lead to more complex models) |
Kernel size(also called filter size) | Determines the shape of the filters(sliding windows) over the outputs | 3,5, 7 (lower values learn smaller features, higher values learn larger features) |
Padding | Pads the target tensor with zeros (if "same") to preserver input shape. Or leaves in the target tensor as is (if "valid"), loweing the output shape. | "same" or "valid" |
Strides | The number of steps a filter takes across an image at a time(eg. if strides = 1, a fileter moves across an image 1 pixel at a time) | 1 (default), 2 |
Resource: CNN-Explainer
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Activation
from tensorflow.keras import Sequential
import tensorflow as tf
# Create the model
model_4 = Sequential([
Conv2D(filters = 10, # filters is the number of sliding windows going across an input
kernel_size = 3, # the size of the sliding window going across an input
strides =1, # the size of the step the sliding window takes across an input
padding = "valid", # If "same" padding output is the same shape as input but "valid" output shape gets compressed.
activation = "relu",
input_shape = (224,224,3)),
Conv2D(10,2, activation = "relu"),
Conv2D(10,3 , activation = "relu"),
Flatten(),
Dense(1, activation = "sigmoid") # Output layer(working with binary classification so only 1 output neuron)
])
# Compile the model
model_4.compile(loss = "binary_crossentropy",
optimizer = Adam(),
metrics = ["accuracy"])
model_4.summary()
len(train_data), len(test_data)
history_4 = model_4.fit(train_data,
epochs = 5,
steps_per_epoch = len(train_data),
validation_data = test_data,
validation_steps = len(test_data))
model_4.evaluate(test_data)
import pandas as pd
pd.DataFrame(history_4.history).plot(figsize = (10,7))
def plot_loss_curves(history):
"""
Returns separate loss curves for training and validation metrics
"""
loss = history.history["loss"]
val_loss = history.history["val_loss"]
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
epochs = range(len(history.history["loss"])) # how many epochs we run for?
# Plot the loss
plt.plot(epochs, loss, label = "training_loss")
plt.plot(epochs, val_loss, label = "val_loss")
plt.title("loss")
plt.xlabel("epochs")
plt.legend()
# Plot the accuracy
plt.figure()
plt.plot(epochs, accuracy, label = "training_accuracy")
plt.plot(epochs, val_accuracy, label = "val_accuracy")
plt.title("accuracy")
plt.xlabel("epochs")
plt.legend()
Note: When a model's validation loss starts to increase, it's likely that the model is overfitting tht training dataset. This means, it's learning the patterns in the training dataset too well and thus the model's ability to generalize to unseen data will be diminished.
plot_loss_curves(history_4)
Note: Ideally the two loss curves (training and validation) will be very similar to each other (training loss and validation loss decreasing at similar rates), when there are large differences your model may be overfitting.
6. Adjust the model parameters
Fitting a machine learning model comes in 3 steps:
- Create a baseline
- Beat the baseline by overfitting a larger model.
- Reduce the overfitting
Ways to induce overfitting:
- Increase the number of conv layers
- Increase the number of conv filters
Reduce overfitting:
- Add data augmentation
- Add regularization (such as MaxPool2D)
- Add more data...
Note: Reducing overfitting also known as regularization.
model_5 = Sequential([
Conv2D(10,3, activation = "relu", input_shape=(224,224,3)),
MaxPool2D(pool_size = 2),
Conv2D(10,3, activation = "relu"),
MaxPool2D(),
Conv2D(10,3, activation= "relu"),
MaxPool2D(),
Flatten(),
Dense(1, activation = "sigmoid")
])
model_5.compile(loss = "binary_crossentropy",
optimizer = Adam(),
metrics = ["accuracy"])
history_5 = model_5.fit(train_data,
epochs =5,
steps_per_epoch = len(train_data),
validation_data = test_data,
validation_steps = len(valid_data))
model_5.summary()
plot_loss_curves(history_5)
train_datagen_augmented = ImageDataGenerator(rescale = 1/255.,
rotation_range= 0.2, # how much do you want to rotate an image?
shear_range= 0.2, # how much do you want to shear an image?
zoom_range= 0.2, # zoom in randomly on an image
width_shift_range = 0.2,
height_shift_range=0.3,
horizontal_flip= True) # flipping the image
# Create ImageDataGenerator without data augmentation
train_datagen = ImageDataGenerator(rescale = 1/255.)
# Create ImageDataGenerator without datat augmentation for the test dataset
test_datagen = ImageDataGenerator(rescale =1/255.)
Question: What is data augmentation?
Data augmentation is the process of altering our training data, leading it to have more diversity and in turn allowing our modesl to learn more genearalizable patterns. Altering means adjusting the rotation of an image, flipping it, cropping it or something similar.
Improving a model(from data perspective)
Method to improve a model(reduce overfitting) | What does it do? |
---|---|
More data | Gives a model more of a chance to learn patterns between samples(e.g. if model is performing poorly on images of pizza, show it more images of pizza) |
Data augmentation | Increase the diversity of your training dataset without collecting more data(e.g. take your photos of pizza and randomly rotate them 30 deg.). Increased diversity forces a model to learn more generalizable patterns |
Better data | Not all data samples are created equally. Removing poor samples from or adding better samples to your dataset can improve your model's performance |
Use transfer learning | Take a model's pre-trained patterns from one problem and tweak them to suit your own problem. For example, take a model trained on pictures of cars to recognise pictures of trucks |
Let's write some code to visualize data augmentation.
print("Augmented training images:")
train_data_augmented = train_datagen_augmented.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=False) # Don't shuffle for demonstration purposes, usually a good thing to shuffle
# Create non-augmented data batches
print("Non-augmented training images:")
train_data = train_datagen.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=False) # Don't shuffle for demonstration purposes
print("Unchanged test images:")
test_data = test_datagen.flow_from_directory(test_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary')
Note: Data augmentation usually only performed on the training data. Usually
ImageDataGenerator
built-in data augmentation parameters our images are left as they are in the directories but are modified as they're loaded into the model.
Let's visualize some augmented data!
images, labels = train_data.next()
augmented_images, augmented_labels = train_data_augmented.next() # Note: labels aren't augmented, they stay the same
random_number = random.randint(0, 32) # we're making batches of size 32, so we'll get a random instance
print(f"showing image number: {random_number}")
plt.imshow(images[random_number])
plt.title(f"Original image")
plt.axis(False)
plt.figure()
plt.imshow(augmented_images[random_number])
plt.title(f"Augmented image")
plt.axis(False);
Let's build a model and see how it learns on augmented data.
model_6 = Sequential([
Conv2D(10,3, activation = "relu"),
MaxPool2D(pool_size = 2),
Conv2D(10,3 , activation = "relu"),
MaxPool2D(),
Conv2D(10,3, activation = "relu"),
MaxPool2D(),
Flatten(),
Dense(1, activation = "sigmoid")
])
# Compile the model
model_6.compile(loss = "binary_crossentropy",
optimizer = Adam(),
metrics = ["accuracy"])
# Fit the model
history_6 = model_6.fit(train_data_augmented, # fitting model_6 on augmented data
epochs = 5,
steps_per_epoch = len(train_data_augmented),
validation_data = test_data,
validation_steps = len(test_data))
plot_loss_curves(history_6)
Let's shuffle our augmented training data and train another model(the same as before)
# Import data and augment it from training directory
print("Augmented training images:")
train_data_augmented_shuffled = train_datagen_augmented.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=True) # Don't shuffle for demonstration purposes, usually a good thing to shuffle
model_7 = Sequential([
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)), # same input shape as our images
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(1, activation='sigmoid')
])
# Compile the model
model_7.compile(loss="binary_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
history_7 = model_7.fit(train_data_augmented_shuffled,
epochs=5,
steps_per_epoch=len(train_data_augmented_shuffled),
validation_data=test_data,
validation_steps=len(test_data))
We have hit an accuracy of around 77%!!
plot_loss_curves(history_7)
Note: When shuffling training data, the model gets exposed to all different kinds of data during training, thus enabling it to learn features across a wide array of images(in our case, pizza & steak at the same time instead of just pizza then steak)
7. Repeat the steps to get the best possible result(experiment, experiment, experiment)
Since we've already improved a long way from our baseline, there are a few things we can try to continue to improve the model:
- Increase the number of model layers (e.g. add more
Conv2D
/MaxPool2D
layers) - Increase the number of filters in each convolutional layers (e.g. from 10 to 32 or even 64)
- Train for longer (more epochs)
- Find an ideal learning rate
- Get more data(give the model more oppurtunites to learn)
- Use transfer learning to leverage what another image model has learn and adjust it for our own use case.
print(class_names)
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
steak = mpimg.imread("03-steak.jpeg")
plt.imshow(steak)
plt.axis(False);
steak.shape
steak
Note: We need to preprocess our custom data before predicting with our model. It is important that our custom data is preprocessed into the same format as the data our model was trained on.
def load_and_prep_image(filename, img_shape=224):
"""
Reads an image from filename, turns it into a tensor and reshapes
it to (img_shape, colour_channels)
"""
img = tf.io.read_file(filename)
# Decode the read file into tensor
img = tf.image.decode_image(img, channels =3)
# Resize the image
img = tf.image.resize(img, size = [img_shape, img_shape])
# Rescale the image and get all values between 0 and 1
img = img/255.
return img
steak = load_and_prep_image("03-steak.jpeg",img_shape = 224)
steak
print(f"Shape before new dimension: {steak.shape}")
steak = tf.expand_dims(steak, axis=0) # add an extra dimension at axis 0
#steak = steak[tf.newaxis, ...] # alternative to the above, '...' is short for 'every other dimension'
print(f"Shape after new dimension: {steak.shape}")
steak
pred = model_7.predict(steak)
pred
Looks like our custom image is begin put through our model, however it currently outputs a prediction probability.
class_names
pred_class = class_names[int(tf.round(pred))]
pred_class
def pred_and_plot(model,filename, class_names= class_names):
"""
Import an image located at filename, makes a prediction with model
and plots the image with predicted class as the title
"""
# Import the target image and preprocess it
img = load_and_prep_image(filename)
# Make a prediction
pred = model.predict(tf.expand_dims(img,axis=0))
# Get the predicted class
pred_class = class_names[int(tf.round(pred))]
# PLot the image and predicted class
plt.imshow(img)
plt.title(f"Prediction: {pred_class}")
plt.axis(False);
pred_and_plot(model_7, "03-steak.jpeg")
Wow!! our model works on a custom image data. That is so cool!!
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
pred_and_plot(model_7, "03-pizza-dad.jpeg")
Let's test on another random image pulled from the internet
!wget https://raw.githubusercontent.com/sandeshkatakam/My-Machine_learning-Blog/master/images/pizza-internet.jpg
pred_and_plot(model_7, "pizza-internet.jpg")
Yayyy! Our model is very successful for images of pizza/steak from outside the dataset too.
model_7.save('pizza_steak_detect_model.h5')
Multi-class Image Classification
we've have already seen binary classification now we are going to level up with 10 classes of food.
- Become one with the data
- Preprocess the data(get it ready for model)
- Create a model(start with a baseline model)
- Fit the model(overfit it to make sure it works)
- Evaluate the model
- Adjust different hyperparameters and improve the model(try to beat baseline/reduce overfitting)
- Repeat until satisified.
import zipfile
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip
# Unzip our data
zip_ref = zipfile.ZipFile("10_food_classes_all_data.zip","r")
zip_ref.extractall()
zip_ref.close()
import os
# walk through 10 classes of food image data
for dirpath, dirnames, filenames, in os.walk("10_food_classes_all_data"):
print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")
!ls -la 10_food_classes_all_data/
train_dir = "10_food_classes_all_data/train/"
test_dir = "10_food_classes_all_data/test/"
import pathlib
import numpy as np
data_dir = pathlib.Path(train_dir)
class_names = np.array(sorted([item.name for item in data_dir.glob('*')]))
print(class_names)
import random
img = view_random_image(target_dir = train_dir,
target_class = random.choice(class_names))
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Rescale
train_datagen = ImageDataGenerator(rescale=1/255.)
test_datagen = ImageDataGenerator(rescale = 1/255.)
# Load data in from directories and turn it into batches
train_data = train_datagen.flow_from_directory(train_dir,
target_size=(224,224),
batch_size= 32,
class_mode ="categorical")
test_data = test_datagen.flow_from_directory(test_dir,
target_size=(224,224),
batch_size= 32,
class_mode ="categorical")
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Activation
IMG_SIZE = (224,224) # Set as global variable for reuse
model_8 = Sequential([
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)), # same input shape as our images
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(10, activation='softmax') # changed to have 10 output neurons and used softmax
])
# Compile the model
model_8.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
history_8 = model_8.fit(train_data, #now 10 different classes
epochs=5,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=len(test_data))
model_8.evaluate(test_data)
plot_loss_curves(history_8)
What do these loss curves tell us?
It seem our model is overfitting the training set badly.. in ohter words, it's getting great results on the training data but fails to genearlize well to unseen data and performs poorly on the test dataset.
6. Adjust the Hyperparamaters and Improve the model(to beat baseline/reduce overfitting)
Due to its performance on the training data, it's clear our model is learning something..
However, it's not generalizing well to unseen data(overfitting)
So, let's try and fix overfitting by...
- Get more data - having more data gives a model more oppurtunity to learn diverse patterns...
- Simplify the model - if our current model is overfitting the data, it may be too complicated of a model, one way to simplify a model is to: reduce # of layers or reduce # hidden units in layers. Since hidden units try to find more complex relationships in the train-data they may cause overfitting more for the train data.
- Use data augmentation - data augmentation manipulates the training data in such a way to add more diversity to it (without altering the original data)
- Use transfer learning - transfer learning leverages the patterns another model has learned on a similar data set and use those patterns on your own dataset.
# Let's try to remove 2 Conv layers
model_9 = Sequential([
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(10, activation='softmax')
])
# Compile the model
model_9.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
model_9.summary()
history_9 = model_9.fit(train_data,
epochs=5,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=len(test_data))
plot_loss_curves(history_9)
Looks like our simplifying the model experiment did not work.. the accuracy went down and overfitting continued..
Next, we try data augmentation.
train_datagen_augmented = ImageDataGenerator(rescale = 1/255.,
rotation_range= 0.2, # how much do you want to rotate an image?
shear_range= 0.2, # how much do you want to shear an image?
zoom_range= 0.2, # zoom in randomly on an image
width_shift_range = 0.2,
height_shift_range=0.3,
horizontal_flip= True) # flipping the image
train_data_augmented = train_datagen_augmented.flow_from_directory(train_dir,
target_size =(224,224),
batch_size = 32,
class_mode = "categorical")
# Create ImageDataGenerator without datat augmentation for the test dataset
test_datagen = ImageDataGenerator(rescale =1/255.)
test_data = test_datagen.flow_from_directory(test_dir,
target_size=(224,224),
batch_size = 32,
class_mode = "categorical")
model_10 = tf.keras.models.clone_model(model_9) # replicate the same model
# Compile the model
model_10.compile(loss = "categorical_crossentropy",
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])
history_10 = model_10.fit(train_data_augmented,
epochs=5,
steps_per_epoch=len(train_data_augmented),
validation_data=test_data,
validation_steps= len(test_data))
model_8.evaluate(test_data)
model_10.evaluate(test_data)
plot_loss_curves(history_10)
Woah!! That result is much better, the loss curves are much closer to each ohter than the baseline model and they look line they'rea heading in the right direction. So, if we were to train for longer, we might see further improvements.
7. Repeat until satisfied
we can still try to bring our loss curves closer together and trying to improve the validation/test accuracy.
By running lots of experiments:
- restructuring our model's architecture (increasing layers/hidden units)
- try different methods of data augmentation (adjust the hyperparameters in our ImageDataGenerator instance)
- training for longer (e.g. 10 epochs or more)
- try transfer learning
class_names
# Download some custom images
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-hamburger.jpeg
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-sushi.jpeg
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
def pred_and_plot(model,filename, class_names= class_names):
"""
Import an image located at filename, makes a prediction with model
and plots the image with predicted class as the title
"""
# Import the target image and preprocess it
img = load_and_prep_image(filename)
# Make a prediction
pred = model.predict(tf.expand_dims(img,axis=0))
# Add in logic for mulit-class
if len(pred[0]) > 1:
pred_class = class_names[tf.argmax(pred[0])]
else:
pred_class = class_names[int(tf.round(pred[0]))]
# PLot the image and predicted class
plt.imshow(img)
plt.title(f"Prediction: {pred_class}")
plt.axis(False);
pred_and_plot(model = model_10,
filename = "03-pizza-dad.jpeg",
class_names = class_names)
This our model got it right!
pred_and_plot(model = model_10,
filename = "03-sushi.jpeg",
class_names = class_names)
Looks, like our model got this wrong!
This is because it only achieved ~39% accuracy on the test data. So we can expect it to function quite poorly on other unseen data.
model_10.save("save_trained_model_10")
loaded_model_10 = tf.keras.models.load_model("save_trained_model_10")
loaded_model_10.evaluate(test_data)