Neural Networks for Regression with TensorFlow
Notebook demonstrates Neural Networks for Regression Problems with TensorFlow
- Neural Network Regression Model with TensorFlow
- Regression Inputs and outputs
- Introduction to Regression with Neural Networks in TensorFlow
- Input and Output shapes
- Steps in modelling with Tensorflow
- Improving our Model
- Evaluating our model
- The 3 sets ...
- Visualizing the data
- Visualizing our model's predictions
- Evaluating our model's predictions with regression evaluation metrics
- Running experiments to improve our model
- Comparing the results of our experiments
- Tracking your experiments:
- Saving our models
- A larger example
- Preprocessing data (normalization and standardization)
- External Resources:
- Bibliography:
This notebook is continuation of the Blog post TensorFlow Fundamentals. The notebook is an account of my working for the Tensorflow tutorial by Daniel Bourke on Youtube.
The Notebook will cover the following concepts:
- Architecture of a neural network regression model.
- Input shapes and output shapes of a regression model(features and labels).
- Creating custom data to view and fit.
- Steps in modelling
- Creating a model, compiling a model, fitting a model, evaluating a model.
- Different evaluation methods.
- Saving and loading models.
Regression Problems: A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”. Many different models can be used, the simplest is the linear regression. It tries to fit data with the best hyper-plane which goes through the points. Examples:
- How much will this house sell for?
- How many people will buy this app?
- How much will my health insurace be?
- How much should I save each week for fuel?
We can also use the regression model to try and predict where the bounding boxes should be in object detection problem. Object detection thus involves both regression and then classifying the image in the box(classification problem).
Architecture of a regression model:
- Hyperparameters:
- Input Layer Shape : same as shape of number of features.
- Hidden Layrer(s): Problem specific
- Neurons per hidden layer : Problem specific.
- Output layer shape: same as hape of desired prediction shape.
- Hidden activation : Usually ReLU(rectified linear unit) sometimes sigmoid.
- Output acitvation: None, ReLU, logistic/tanh.
- Loss function : MSE(Mean squared error) or MAE(Mean absolute error) or combination of both.
- Optimizer: SGD(Stochastic Gradient Descent), Adam optimizer.
Source: Adapted from page 239 of Hands-On Machine learning with Scikit-Learn, Keras & TensorFlow
Example of creating a sample regression model in TensorFlow:
# 1. Create a model(specific to your problem)
model = tf.keras.Sequential([
tf.keras.Input(shape = (3,)),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(1, activation = None)
])
# 2. Compile the model
model.compile(loss = tf.keras.losses.mae, optimizer = tf.keras.optimizers.Adam(lr = 0.0001), metrics = ["mae"])
# 3. Fit the model
model.fit(X_train, Y_train, epochs = 100)
import tensorflow as tf
print(tf.__version__)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('dark_background')
# create features
X = np.array([-7.0,-4.0,-1.0,2.0,5.0,8.0,11.0,14.0])
# Create labels
y = np.array([3.0,6.0,9.0,12.0,15.0,18.0,21.0,24.0])
# Visualize it
plt.scatter(X,y)
y == X + 10
Yayy.. we got the relation by just seeing the data. Since the data is small and the relation ship is just linear, it was easy to guess the relation.
house_info = tf.constant(["bedroom","bathroom", "garage"])
house_price = tf.constant([939700])
house_info, house_price
X[0], y[0]
X[1], y[1]
input_shape = X[0].shape
output_shape = y[0].shape
input_shape, output_shape
X[0].ndim
we are specifically looking at scalars here. Scalars have 0 dimension
X = tf.cast(tf.constant(X), dtype = tf.float32)
y = tf.cast(tf.constant(y), dtype = tf.float32)
X.shape, y.shape
input_shape = X[0].shape
output_shape = y[0].shape
input_shape, output_shape
plt.scatter(X,y)
Steps in modelling with Tensorflow
-
Creating a model - define the input and output layers, as well as the hidden layers of a deep learning model.
-
Compiling a model - define the loss function(how wrong the prediction of our model is) and the optimizer (tells our model how to improve the partterns its learning) and evaluation metrics(what we can use to interpret the performance of our model).
-
Fitting a model - letting the model try to find the patterns between X & y (features and labels).
X,y
X.shape
tf.random.set_seed(42)
# Create a model using the Sequential API
model = tf.keras.Sequential([
tf.keras.layers.Dense(1)
])
# Compile the model
model.compile(loss=tf.keras.losses.mae, # mae is short for mean absolute error
optimizer=tf.keras.optimizers.SGD(), # SGD is short for stochastic gradient descent
metrics=["mae"])
# Fit the model
# model.fit(X, y, epochs=5) # this will break with TensorFlow 2.7.0+
model.fit(tf.expand_dims(X, axis=-1), y, epochs=5)
X, y
y_pred = model.predict([17.0])
y_pred
The output is very far off from the actual value. So, Our model is not working correctly. Let's go and improve our model in the next section.
Improving our Model
Let's take a look about the three steps when we created the above model.
We can improve the model by altering the steps we took to create a model.
-
Creating a model - here we might add more layers, increase the number of hidden units(all called neurons) within each of the hidden layers, change the activation function of each layer.
-
Compiling a model - here we might change the optimization function or perhaps the learning rate of the optimization function.
-
Fitting a model - here we might fit a model for more epochs (leave it for training longer) or on more data (give the model more examples to learn from)
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose = 0)
X , y
model.predict([17.0])
We got so close the actual value is 27 we performed a better prediction than the last model we trained. But we need to improve much better. Let's see what more we change and how close can we get to our actual output
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.Adam(lr = 0.0001), # lr stands for learning rate
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose = 0)
model.predict([17.0]) # we are going to predict for the same input value 17
Oh..god!! This result went really bad for us.
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(100, activation = "relu"), # only difference we made
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose = 0) # verbose will hide the output from epochs
X , y
model.predict([17.0])
Oh, this should be 27 but this prediction is very far off from our previous prediction.
It seems that our previous model did better than this.
Even though we find the values of our loss function are very low than that of our previous model. We still are far away from our label value.
Why is that so??
The explanation is our model is overfitting the dataset. That means it is trying to map a function that just fits the already provided examples correctly but it cannot fit the new examples that we are giving.
So, the mae
and loss value
if not the ultimate metric to check for improving the model. because we need to get less error for new examples that the model has not seen before.
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(100, activation = "relu"), # only difference we made
tf.keras.layers.Dense(1)
])
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.Adam(),
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose = 0)# verbose will hide the epochs output
model.predict([17.0])
Still not better!!
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),# only difference we made
tf.keras.layers.Dense(1)
])
# default value of lr is 0.001
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.Adam(lr = 0.01), # lr stands for learning rate
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100, verbose = 0) # verbose will hide the epochs output
The learning rate is the most important hyperparameter for all the Neural Networks
Evaluating our model
In practice, a typical workflow you'll go through when building a neural network is:
Build a model -> fit it -> evaluate it -> tweak a model -> fit it -> evaluate it -> tweak it -> fit it
Common ways to improve a deep model:
- Adding Layers
- Increase the number of hidden units
- Change the activation functions
- Change the optimization function
- Change the learning rate
- Fitting on more data
- Train for longer (more epochs)
Because we can alter each of these they are called hyperparameters
When it comes to evaluation.. there are 3 words you should memorize:
"Visualize, Visualize, Visualize"
It's a good idea to visualize: The data - what data are working with? What does it look like The model itself - What does our model look like?
- The training of a model - how does a model perform while it learns?
- The predictions of the model - how does the prediction of the model line up against the labels(original value)
X_large = tf.range(-100,100,4)
X_large
y_large = X_large + 10
y_large
import matplotlib.pyplot as plt
plt.scatter(X_large,y_large)
The 3 sets ...
-
Training set - The model learns from this data, which is typically 70-80% of the total data you have available.
-
validation set - The model gets tuned on this data, which is typically 10-15% of the data avaialable.
-
Test set - The model gets evaluated on this data to test what it has learned. This set is typically 10-15%.
len(X_large)
# since the dataset is small we can skip the valdation set
X_train = X_large[:40]
X_test = X_large[40:]
y_train = y_large[:40]
y_test = y_large[40:]
len(X_train), len(X_test), len(y_train), len(y_test)
plt.figure(figsize = (10,7))
# Plot the training data in blue
plt.scatter(X_train, y_train, c= 'b', label = "Training data")
# Plot the test data in green
plt.scatter(X_test, y_test, c = "g", label = "Training data")
plt.legend();
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(1)
])
# default value of lr is 0.001
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.SGD(), # lr stands for learning rate
metrics = ["mae"])
# 3. Fit the model to our dataset
#model.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)
Let's visualize it before fitting the model
model.summary()
model.summary() doesn't work without building the model or fitting the model
X[0], y[0]
tf.random.set_seed(42)
# Create a model(same as above)
model = tf.keras.Sequential([
tf.keras.layers.Dense(1, input_shape = [1]) # input_shape is 1 refer above code cell
])
# Compile the model
model.compile(loss= "mae",
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"])
model.summary()
- Total params - total number of parameters in the model.
- Trainable parameters- these are the parameters (patterns) the model can update as it trains.
- Non-Trainable parameters - these parameters aren't updated during training(this is typical when you have paramters from other models during transfer learning)
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape = [1], name= "input_layer"),
tf.keras.layers.Dense(1, name = "output_layer")
], name = "model_1")
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.SGD(), # lr stands for learning rate
metrics = ["mae"])
model.summary()
We have changed the layer names and added our custom model name.
from tensorflow.keras.utils import plot_model
plot_model(model = model, to_file = 'model1.png', show_shapes = True)
# 1. Create the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),
tf.keras.layers.Dense(100, activation = "relu"),# only difference we made
tf.keras.layers.Dense(1)
], name)
# default value of lr is 0.001
# 2. Compile the model
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.Adam(lr = 0.01), # lr stands for learning rate
metrics = ["mae"])
# 3. Fit the model to our dataset
model.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100, verbose = 0)
model.predict(X_test)
wow, we are so close!!!
model.summary()
from tensorflow.keras.utils import plot_model
plot_model(model = model, to_file = 'model.png', show_shapes = True)
tf.random.set_seed(42)
# Create a model (same as above)
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape = [1], name = "input_layer"),
tf.keras.layers.Dense(1, name = "output_layer") # define the input_shape to our model
], name = "revised_model_1")
# Compile model (same as above)
model.compile(loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.SGD(),
metrics=["mae"])
model.summary()
model.fit(X_train, y_train, epochs=100, verbose=0)
model.summary()
y_pred = model.predict(X_test)
tf.constant(y_pred)
These are our predictions!
y_test
These are the ground truth labels!
plot_model(model, show_shapes=True)
Note: IF you feel like you're going to reuse some kind of functionality in future, it's a good idea to define a function so that we can reuse it whenever we need.
def plot_predictions(train_data= X_train,
train_labels = y_train,
test_data = X_test,
test_labels =y_test,
predictions = y_pred):
"""
Plots training data, test data and compares predictions to ground truth labels
"""
plt.figure(figsize = (10,7))
# Plot training data in blue
plt.scatter(train_data, train_labels, c= "b", label = "Training data")
# Plot testing data in green
plt.scatter(test_data, test_labels, c= "g", label = "Testing data")
# Plot model's predictions in red
plt.scatter(test_data, predictions, c= "r", label = "Predictions")
# Show legends
plt.legend();
plot_predictions(train_data=X_train,
train_labels=y_train,
test_data=X_test,
test_labels=y_test,
predictions=y_pred)
We tuned our model very well this time. The predictions are really close to the actual values.
Evaluating our model's predictions with regression evaluation metrics
Depending on the problem you're working on, there will be different evaluation metrics to evaluate your model's performance.
Since, we're working on a regression, two of the main metrics:
-
MAE - mean absolute error, "on average, how wrong id each of my model's predictions"
- TensorFlow code:
tf.keras.losses.MAE()
- or
tf.metrics.mean_absolute_error()
$$ MAE = \frac{Σ_{i=1}^{n} |y_i - x_i| }{n} $$
- TensorFlow code:
-
MSE - mean square error, "square of the average errors"
tf.keras.losses.MSE()
-
tf.metrics.mean_square_error()
$$ MSE = \frac{1}{n} Σ_{i=1}^{n}(Y_i - \hat{Y_i})^2$$
$\hat{Y_i}$ is the prediction our model makes. $Y_i$ is the label value.
-
Huber - Combination of MSE and MAE, Less sensitive to outliers than MSE.
tf.keras.losses.Huber()
model.evaluate(X_test, y_test)
mae = tf.metrics.mean_absolute_error(y_true = y_test,
y_pred = tf.constant(y_pred))
mae
We got the metric values wrong..why did this happen??
tf.constant(y_pred)
y_test
Notice that the shape of y_pred
is (10,1) and the shape of y_test
is (10,)
They might seem the same but they are not of the same shape.
Let's reshape the tensor to make the shapes equal.
tf.squeeze(y_pred)
mae = tf.metrics.mean_absolute_error(y_true = y_test,
y_pred = tf.squeeze(y_pred))
mae
Now,we got our metric value. The mean absolute error of our model is 3.1969407.
Now, let's calculate the mean squared error and see how that goes.
mse = tf.metrics.mean_squared_error(y_true = y_test,
y_pred = tf.squeeze(y_pred))
mse
Our mean squared error is 13.070143. Remember, the mean squared error squares the error for every example in the test set and averages the values. So, generally, the mse is largeer than mae.
When larger errors are more significant than smaller errors, then it is best to use mse.
MAE can be used as a great starter metric for any regression problem.
We can also try Huber and see how that goes.
huber_metric = tf.losses.huber(y_true = y_test,
y_pred = tf.squeeze(y_pred))
huber_metric
def mae(y_true, y_pred):
return tf.metrics.mean_absolute_error(y_true = y_test,
y_pred = tf.squeeze(y_pred))
def mse(y_true, y_pred):
return tf.metrics.mean_squared_error(y_true = y_test,
y_pred = tf.squeeze(y_pred))
def huber(y_true, y_pred):
return tf.losses.huber(y_true = y_test,
y_pred = tf.squeeze(y_pred))
Running experiments to improve our model
Build a model -> fit it -> evaluate it -> tweak a model -> fit it -> evaluate it -> tweak it -> fit it
- Get more data - get more examples for your model to train on(more oppurtunities to learn patterns or relationships between features and labels).
- Make your mode larger(using a more complex model) - this might come in the form of more layeres or more hidden unites in each layer.
- Train for longer - give your model more of a chance to find patterns in the data.
Let's do a few modelling experiments:
-
model_1
- same as original model, 1 layer, trained for 100 epochs. -
model_2
- 2 layers, trained for 100 epochs -
model_3
- 2 layers, trained for 500 epochs.
You can design more experiments too to make the model more better
Build Model_1
X_train, y_train
tf.random.set_seed(42)
# 1. Create the model
model_1 = tf.keras.Sequential([
tf.keras.layers.Dense(1, input_shape = [1])
], name = "Model_1")
# 2. Compile the model
model_1.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"])
# 3. Fit the model
model_1.fit(X_train, y_train ,epochs = 100, verbose = 0)
model_1.summary()
y_preds_1 = model_1.predict(X_test)
plot_predictions(predictions = y_preds_1)
mae_1 = mae(y_test, y_preds_1)
mse_1 = mse(y_test, y_preds_1)
mae_1, mse_1
Build Model_2
- 2 dense layers, trained for 100 epochs
tf.random.set_seed(42)
# 1. Create the model
model_2 = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape =[1]),
tf.keras.layers.Dense(1)
], name = "model_2")
# 2. Compile the model
model_2.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mse"]) # Let's build this model with mse as eval metric.
# 3. Fit the model
model_2.fit(X_train, y_train ,epochs = 100, verbose = 0)
model_2.summary()
y_preds_2 = model_2.predict(X_test)
plot_predictions(predictions = y_preds_2)
Yeah,we improved this model very much than the previous one. If you want to compare with previous one..scroll up and see the plot_predictions of previous one and compare it with this one.
mae_2 = mae(y_test, y_preds_2)
mse_2 = mse(y_test, y_preds_2)
mae_2, mse_2
Build Model_3
- 2 layers, trained for 500 epochs
tf.random.set_seed(42)
# 1. Create the model
model_3 = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape =[1]),
tf.keras.layers.Dense(1)
], name = "model_3")
# 2. Compile the model
model_3.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"]) # Let's build this model with mse as eval metric.
# 3. Fit the model
model_2.fit(X_train, y_train ,epochs = 500, verbose = 0)
y_preds_3 = model_3.predict(X_test)
plot_predictions(predictions = y_preds_3)
This is even terrible performance than the first model. we have actually made the model worse. WHY??
We, overfitted the model too much because we trained it for much longer than we are supposed to.
mae_3 = mae(y_test, y_preds_3)
mse_3 = mse(y_test, y_preds_3)
mae_3, mse_3
whoaa, the error is extremely high. I think the best of our models is model_2
The Machine Learning practitioner's motto:
Experiment, experiment, experiment
Note: You want to start with small experiments(small models) and make sure they work and then increase their scale when neccessary.
import pandas as pd
model_results = [["model_1", mae_1.numpy(), mse_1.numpy()],
["model_2", mae_2.numpy(), mse_2.numpy()],
["model_3", mae_3.numpy(), mse_3.numpy()]]
all_results = pd.DataFrame(model_results, columns =["model", "mae", "mse"])
all_results
It looks like model_2 performed done the best. Let's look at what is model_2
model_2.summary()
This is the model that has done the best on our dataset.
Note: One of your main goals should be to minimize the time between your experiments. The more experiments you do, the more things you will figure out which don't work and in turn, get closer to figuring out what does work. Remeber, the machine learning pracitioner's motto : "experiment, experiment, experiment".
Tracking your experiments:
One really good habit of machine learning modelling is to track the results of your experiments.
And when doing so, it can be tedious if you are running lots of experiments.
Luckily, there are tools to help us!
Resources: As you build more models, you'll want to look into using:
-
TensorBoard - a component of TensorFlow library to help track modelling experiments. It is integrated into the TensorFlow library.
-
Weights & Biases - A tool for tracking all kinds of machine learning experiments (it plugs straight into tensorboard).
model.save()
allows us to save the model and we can use it again to do add things to the model after reloading it.
model_2.save("best_model_SavedModel_format")
If we are planning to use this model inside the tensorflow framework. we will be better off using the SavedModel
format. But if we are planning to export the model else where and use it outside the tensorflow framework use the HDF5 format.
model_2.save("best_model_HDF5_format.h5")
Saving a model with SavedModel format will give us a folder with some files regarding our model.
Saving a model with HDF5 format will give us just one file with our model.
loaded_SavedModel_format = tf.keras.models.load_model("/content/best_model_SavedModel_format")
loaded_SavedModel_format.summary()
model_2.summary()
model_2_preds = model_2.predict(X_test)
loaded_SavedModel_format_preds = loaded_SavedModel_format.predict(X_test)
model_2_preds == loaded_SavedModel_format_preds
mae(y_true = y_test, y_pred = model_2_preds) == mae(y_true = y_test, y_pred = loaded_SavedModel_format_preds)
loaded_h5_model = tf.keras.models.load_model("/content/best_model_HDF5_format.h5")
loaded_h5_model.summary()
model_2.summary()
Yeah the loading of .hf format model matched with our original mode_2 format.
So, our model loading worked correctly.
model_2_preds = model_2.predict(X_test)
loaded_h5_model_preds = loaded_h5_model.predict(X_test)
model_2_preds == loaded_h5_model_preds
Download a model(or any other file) from google colab
If you want to download your files from Google Colab:
-
you can go to the files tab and right click on the file you're after and click download.
-
Use code(see the cell below).
-
You can save it to google drive by connecting to google drive and copying it there.
from google.colab import files
files.download("/content/best_model_HDF5_format.h5")
!cp /content/best_model_HDF5_format.h5 /content/drive/MyDrive/tensor-flow-deep-learning
!ls /content/drive/MyDrive/tensor-flow-deep-learning
We have saved our model to our google drive !!!
A larger example
We take a larger dataset to do create a regression model. The model we do is insurance forecast by using linear regression available from kaggle Medical Cost Personal Datasets
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
insurance = pd.read_csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv")
insurance
This is a quite bigger dataset than the one we have previously worked with.
insurance_one_hot = pd.get_dummies(insurance)
insurance_one_hot.head()
X = insurance_one_hot.drop("charges", axis =1)
y = insurance_one_hot["charges"]
X.head()
y.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.2, random_state = 42)
len(X), len(X_train), len(X_test)
X_train
insurance["smoker"] , insurance["sex"]
tf.random.set_seed(42)
# 1. Create a model
insurance_model = tf.keras.Sequential([
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
])
# 2. Compile the model
insurance_model.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.SGD(),
metrics = ["mae"])
#3. Fit the model
insurance_model.fit(X_train, y_train,epochs = 100, verbose = 0)
insurance_model.evaluate(X_test,y_test)
y_train.median(), y_train.mean()
Right now it looks like our model is not performing well, lets try and improve it.
To try and improve our model, we'll run 2 experiments:
- Add an extra layer with more hidden units and use the Adam optimizer
- Train for longer (like 200 epochs)
- We can also do our custom experiments to improve it.
tf.random.set_seed(42)
# 1. Create the model
insurance_model_2 = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
],name = "insurace_model_2")
# 2. Compile the model
insurance_model_2.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.Adam(),
metrics = ["mae"])
# 3. Fit the model
insurance_model_2.fit(X_train, y_train, epochs = 100, verbose = 0)
insurance_model_2.evaluate(X_test, y_test)
tf.random.set_seed(42)
# 1. Create the model
insurance_model_3 = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
],name = "insurace_model_2")
# 2. Compile the model
insurance_model_3.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.Adam(),
metrics = ["mae"])
# 3. Fit the model
history = insurance_model_3.fit(X_train, y_train, epochs = 200, verbose = 0)
insurance_model_3.evaluate(X_test, y_test)
pd.DataFrame(history.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")
plt.title("Training curve of our model")
Question: How long should you train for?
It depends, It really depends on problem you are working on. However, many people have asked this question before, so TensorFlow has a solution!, It is called EarlyStopping callback, which is a TensorFlow component you can add to your model to stop training once it stops improving a certain metric.
Short review of our modelling steps in TensorFlow:
- Get data ready(turn into tensors)
- Build or pick a pretrained model (to suit your problem)
- Fit the model to the data and make a prediction.
- Evaluate the model.
- Imporve through experimentation.
- Save and reload your trained models.
we are going to focus on the step 1 to make our data set more rich for training. some steps involved in getting data ready:
- Turn all data into numbers(neural networks can't handle strings).
- Make sure all of your tensors are the right shape.
- Scale features(normalize or standardize, neural networks tend to prefer normalization) -- this is the one thing we haven't done while preparing our data.
If you are not sure on which to use for scaling, you could try both and see which perform better
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
insurance = pd.read_csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv")
insurance
To prepare our data, we can borrow few classes from Scikit-Learn
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
Feature Scaling:
Scaling type | what it does | Scikit-Learn Function | when to use |
---|---|---|---|
scale(refers to as normalization) | converts all values to between 0 and 1 whilst preserving the original distribution | MinMaxScaler |
Use as default scaler with neural networks |
Standarization | Removes the mean and divides each value by the standard deviation | StandardScaler |
Transform a feature to have close to normal distribution |
ct = make_column_transformer(
(MinMaxScaler(), ["age", "bmi", "children"]), # Turn all values in these columns between 0 and 1
(OneHotEncoder(handle_unknown = "ignore"), ["sex", "smoker", "region"])
)
# Create our X and Y values
# because we reimported our dataframe
X = insurance.drop("charges", axis = 1)
y = insurance["charges"]
# Build our train and test set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 42)
# Fit the column transformer to our training data (only training data)
ct.fit(X_train)
# Transform training and test data with normalization(MinMaxScaler) and OneHotEncoder
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test)
X_train.loc[0]
X_train_normal[0], X_train_normal[12], X_train_normal[78]
# we have turned all our data into numerical encoding and aso normalized the data
X_train.shape, X_train_normal.shape
Beautiful! our data has been normalized and One hot encoded. Let's build Neural Network on it and see how it goes.
tf.random.set_seed(42)
# 1. Create the model
insurance_model_4 = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(10),
tf.keras.layers.Dense(1)
])
# 2. Compile the model
insurance_model_4.compile(loss = tf.keras.losses.mae,
optimizer = tf.keras.optimizers.Adam(),
metrics = ["mae"])
# 3. Fit the model
history = insurance_model_4.fit(X_train_normal, y_train, epochs= 100, verbose = 0)
insurance_model_4.evaluate(X_test_normal, y_test)
insurance_model_4.summary()
pd.DataFrame(history.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")
plt.title("Training curve of insurance_model_4")
Let's just plot some graphs. Since we have use them the least in this notebook.
X["age"].plot(kind = "hist")
X["bmi"].plot(kind = "hist")
X["children"].value_counts()