Time Series Forecasting with TensorFlow
Notebook demonstrates Time series forecasting with Tensorflow. It demonstrates a project called BitPredict. BitPredict is a time-series forecasting model that predicts the price of bitcoin in future time given then historical data.
- Examples of Time Series problems:
- Get data
- Importing the Time series data with Pandas
- Types of Time-series:
- Visualization of our Dataset:
- Importing time series data with Python's CSV Module
- Fomat Dataset: Part 1
- Create train & test sets for time series(correct way!)
- Create a plotting function
- Modelling experiments we will be running:
- Model 0: Naive model (Baseline)
- Evaluating a time series model
- Other kinds of time series forecasting models which can be used for baselines and actual forecasts
- Format Data Part 2 : Windowing our dataset
- Turning windows into train and test sets
- Make a modelling checkpoint
Examples of Time Series problems:
All Time-series data has Temporal(time) component.
-
Classification:
Which of the points is an anomaly?
Are the heart beats regular?
Here, the output is discrete -
Forecasting:
How much will the price of Bitcoin change tomorrow?
How many computers will we sell next year?
How many staff do we need for next week?
Here, the output is continuous
This Blogpost covers the following:
- Downloading and formatiing time series data(the historical price of Bitcoin)
- Writing up preprocessing function to prepare our time series data
- Setting up multiple time series modelling experiments.
- Building a multivariate model to take in multivariate time series data.
- Replicating the N-BEATS algorithm using TensorFlow.
- Making forecasts with prediction intervals.
- Demonstrating why time series forecasting can be not fruitful with the turkey problem.
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv
import pandas as pd
# Let's read our bitcoin data and parse the data
df = pd.read_csv("/content/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv",
parse_dates = ["Date"],
index_col = ["Date"]) # Parse the date column and tell pandas the 1st column is date-time data
df.head()
df.tail()
df.info()
len(df)
We've collected thie historical price of Bitcoin for the past ~8 years but there's only 2787 samples.
Typically deep learning models usually like lots and lots of samples (where lots can be thousands to millions)
A smaller number of samples is something you'll often run into with time series data problems.
Note: The seasonality of a time series data is referred to as the number of samples per year. So for our Bitcoin data, it has a seasonality of daily or value of 365 because we collect one sample per day meaning we'll get 365 samples per year.
Types of Time-series:
Source : Forecasting principles and practices
Trend: -- time series has a clear long-term increase or decrease(may or may not be linear)
Seasonal: -- time series affected by seasonal factors such as time of year (e.g. increased sales towards end of year) or day of week
Cyclic: -- time series shows rises and falls over an unfixed period, these tend to be longer/more variable than seasonal patterns
Other types of Time-series data:
Univariate Time-series data: Only one variable (using the price of Bitcoin to predict the price of Bitcoin)
Multivariate Time-series data: More than one variable (using the price of Bitcoin as well as the block reward size to predic the price of BTC)
bitcoin_prices = pd.DataFrame(df["Closing Price (USD)"]).rename(columns = {"Closing Price (USD)": "Price"})
bitcoin_prices.head()
import matplotlib.pyplot as plt
plt.style.use('dark_background')
bitcoin_prices.plot(figsize = (10,7))
# Labelling
plt.ylabel("BTC Price")
plt.title("Price of Bitcoin from 1 Oct 2013 to 10 May 2021", fontsize = 16)
plt.legend(fontsize = 14);
import csv
from csv import reader
from datetime import datetime
timesteps = []
btc_price = []
with open("/content/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv", "r") as f:
csv_reader = csv.reader(f, delimiter = ",")
next(csv_reader) # Skip first line (this gets rid of the column titles)
for line in csv_reader:
timesteps.append(datetime.strptime(line[1], "%Y-%m-%d")) # Get the dates as dates not strings
btc_price.append(float(line[2])) # Get closing price as float
# View first 10 of each
timesteps[:10], btc_price[:10]
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize = (10,7))
plt.plot(timesteps, btc_price)
plt.ylabel("BTC Price")
plt.title("Price of Bitcoin from 1 Oct 2013 to 10 May 2021", fontsize = 16)
timesteps = bitcoin_prices.index.to_numpy()
prices = bitcoin_prices["Price"].to_numpy()
timesteps[:10], prices[:10]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(timesteps, # Dates
prices,
test_size =0.2,# Bitcoin prices
random_state = 42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
plt.figure(figsize = (10,7))
plt.scatter(X_train, y_train, s = 5, label = "Train data")
plt.scatter(X_test,y_test, s =5, label ="Test data")
plt.xlabel("Date")
plt.ylabel("BTC Prices")
plt.legend(fontsize = 14)
plt.show();
Create train & test sets for time series(correct way!)
Timeseries train & test sets:
Source: ZTM Tensorflow course
split_size = int(0.8* len(prices)) # 80% train , 20% test you can change these values
# Creat train data splits (everything before split)
X_train , y_train = timesteps[:split_size], prices[:split_size]
# Create test data splits (everything beyond the split)
X_test, y_test = timesteps[split_size:], prices[split_size:]
len(X_train), len(X_test), len(y_train), len(y_test)
plt.figure(figsize = (10,7))
plt.scatter(X_train, y_train, s = 5, label = "Train data")
plt.scatter(X_test,y_test, s =5, label ="Test data")
plt.xlabel("Date")
plt.ylabel("BTC Prices")
plt.legend(fontsize = 14)
plt.show();
def plot_time_series(timesteps, values, format=".", start = 0, end=None, label = None):
"""
Plots timesteps (a series of points in time) against values
Parameters
----------
timesteps : array of timestep values
values : array of values across time
format : style of plot, default = "."
start : where to start the plot (setting a value will index from start of timesteps)
end : where to end the plot (similar to start but for the end)
label : label to show on plot about values, default None
"""
# Plot the series
plt.plot(timesteps[start:end], values[start:end], format ,label = label)
plt.xlabel("Time")
plt.ylabel("BTC Price")
if label:
plt.legend(fontsize = 14) # make label bigger
plt.grid(True)
plt.figure(figsize =(10,7)) # we create one figure to plot both plots
plot_time_series(timesteps = X_train, values = y_train, label = "Train data")
plot_time_series(timesteps = X_test, values = y_test, label = "Test data")
Modelling experiments we will be running:
Terms to be familiar with:
- Horizon = number of timesteps into the future we're going to predict.
- window size = number of timesteps we're going to use to predict horizon
Experiment | Model |
---|---|
0 | Naive Model(Baseline) |
1 | Dense model (horizon = 1, window_size = 7) |
2 | Same as model 1 (horizon = 1, window_size = 30) |
3 | Same as model 1 (horizon = 7, window_size = 30) |
4 | Conv1D |
5 | LSTM |
6 | Same as model 1 (but with multivariate data) |
7 | N-BEATS Algorithm |
8 | Ensemble( multiple models stacked together) |
9 | Future prediction model |
10 | Same as model 1(but with turkey data introduced) |
y_test[:10]
naive_forecast = y_test[:-1]
naive_forecast[:10], naive_forecast[-10:]
y_test[-10:]
plt.figure(figsize = (10,7))
plot_time_series(timesteps= X_train, values = y_train, label= "Train data")
plot_time_series(timesteps = X_test, values = y_test, label = "Test data")
plot_time_series(timesteps = X_test[1:], values = naive_forecast, format="-", label = "Naive forecast")
plt.figure(figsize = (10,7))
#plot_time_series(timesteps= X_train, values = y_train, label= "Train data")
plot_time_series(timesteps = X_test, values = y_test, format = "-", start = 350, label = "Test data")
plot_time_series(timesteps = X_test[1:], values = naive_forecast, format="-", start = 350, label = "Naive forecast")
Evaluating a time series model
Let's look into some evaluation metrics for time series forecasting.
We're predicting a number, so that means we have a form of a regression problem.
Because we are working on a regression problem, we'll need some regression-like metrics.
Common regression metrics:
- MAE - Mean absolute error
- MSE - Mean squared error *
The main thing we're evaluting here is: how do our model's forecasts(y_pred) compare against the
These are metrics which can be used to compare time series values and forecasts that are on the same scale.
For example, Bitcoin historical prices in USD veresus Bitcoin forecast values in USD.
Metric | Details | Code |
---|---|---|
MAE (mean absolute error) | Easy to interpret (a forecast is X amount different from actual amount). Forecast methods which minimises the MAE will lead to forecasts of the median. | tf.keras.metrics.mean_absolute_error() |
RMSE (root mean square error) | Forecasts which minimise the RMSE lead to forecasts of the mean. | tf.sqrt(``tf.keras.metrics.mean_square_error()``) |
Percentage errors
Percentage errors do not have units, this means they can be used to compare forecasts across different datasets.
Metric | Details | Code |
---|---|---|
MAPE (mean absolute percentage error) | Most commonly used percentage error. May explode (not work) if y=0 . |
tf.keras.metrics.mean_absolute_percentage_error() |
sMAPE (symmetric mean absolute percentage error) | Recommended not to be used by Forecasting: Principles and Practice, though it is used in forecasting competitions. | Custom implementation |
Scaled errors
Scaled errors are an alternative to percentage errors when comparing forecast performance across different time series.
Metric | Details | Code |
---|---|---|
MASE (mean absolute scaled error). | MASE equals one for the naive forecast (or very close to one). A forecast which performs better than the naïve should get <1 MASE. | See sktime's mase_loss()
|
import tensorflow as tf
def mean_absolute_scaled_error(y_true, y_pred):
"""
Implement MASE (assuming no seasonality in data)
"""
mae = tf.reduce_mean(tf.abs(y_true - y_pred))
# Find the MAE of naive forecast (with no seasonality)
mae_naive_no_seaon = tf.reduce_mean(tf.abs(y_true[1:] - y_true[:-1]))
return mae/mae_naive_no_seaon
mean_absolute_scaled_error(y_true = y_test[1:], y_pred = naive_forecast).numpy()
def evaluate_preds(y_true, y_pred):
"""
Evaluates the predictions for different metrics and stores in dictionary
"""
y_true = tf.cast(y_true, dtype = tf.float32)
y_pred = tf.cast(y_pred, dtype = tf.float32)
mse = tf.keras.metrics.mean_squared_error(y_true, y_pred)
mae = tf.keras.metrics.mean_absolute_error(y_true,y_pred)
rmse = tf.sqrt(mse)
mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
mase = mean_absolute_scaled_error(y_true, y_pred)
return {"mse": mse.numpy(),
"mae" : mae.numpy(),
"rmse": rmse.numpy(),
"mape": mape.numpy(),
"mase" : mase.numpy()}
naive_results = evaluate_preds(y_true = y_test[1:], y_pred = naive_forecast)
naive_results
Other kinds of time series forecasting models which can be used for baselines and actual forecasts
Model/Library Name | Resource |
---|---|
Moving average | https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/ |
ARIMA (Autoregression Integrated Moving Average) | https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/ |
sktime (Scikit-Learn for time series) | https://github.com/alan-turing-institute/sktime |
TensorFlow Decision Forests (random forest, gradient boosting trees) | https://www.tensorflow.org/decision_forests |
Facebook Kats (purpose-built forecasting and time series analysis library by Facebook) | https://github.com/facebookresearch/Kats |
LinkedIn Greykite (flexible, intuitive and fast forecasts) | https://github.com/linkedin/greykite |
print(f"We want to use: {btc_price[:7]} to predict this: {btc_price[7]}")
HORIZON = 1 # predict next 1 day
WINDOW_SIZE = 7 # Use the past week of Bitcoin data to make the prediction
def get_labelled_windows(x, horizon= HORIZON):
"""
Creates label for windowed dataset
E.g.: If horizon = 1
Input: [0, 1, 2, 3, 4, 5, 6, 7] -> (Output : [0, 1, 2, 3, 4, 5, 6] , [7])
"""
return x[:, :-horizon], x[:, -horizon:]
test_window, test_label = get_labelled_windows(tf.expand_dims(tf.range(8)+1, axis = 0))
print(f" Window : {tf.squeeze(test_window).numpy()} - > Label: {tf.squeeze(test_label).numpy()}")
Note:
- Window size(Input) : Number of time steps of historical data used to predict horizon.(Data)
- Horizon(output): Number of time steps to predict into the future.(Label)
We have go at way to label our windowed data. However this only works in small scale. We need a way to do the above across our entire time series.
we could do this with python for loops, for large time series, that'd be quite slow.
To speed things up, we'll leverage Numpy's array Indexing.
Our function will:
- Create a window step of specific window size
-
Use Numpy Indexing to create a 2D array of multiple window steps, for example:
[[0, 1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8]]
-
Uses the 2D array of multiple window steps (from 2) to index on a target series(e.g. the historical price of bitcoin)
- Uses our
get_labelled_winows()
function we created above to turn the window steps into windows with specified horizons.
def make_windows(x, window_size = WINDOW_SIZE, horizon = HORIZON):
"""
Turns a 1D array into a 2D array of sequential labelled windows of window_size with horizon size labels.
"""
# 1. Create a window of specific window_size (add the horizon on the end for labelling later)
window_step = np.expand_dims(np.arange(window_size+horizon), axis = 0)
# 2. Create a 2D array of multiple window steps (minus 1 to account for 0 indexing)
window_indexes = window_step + np.expand_dims(np.arange(len(x)- (window_size+horizon - 1)), axis = 0).T
#print(f"Window Indexes:\n {window_indexes, window_indexes.shape}")
# 3. Index on the target array (a time series) with 2D array of multiple window steps
windowed_array = x[window_indexes]
#print(windowed_array)
# 4. Get the labelled windows
windows, labels = get_labelled_windows(windowed_array, horizon = horizon)
return windows, labels
full_windows, full_labels = make_windows(prices, window_size = WINDOW_SIZE, horizon = HORIZON)
len(full_windows), len(full_labels)
make_windows(prices, window_size= WINDOW_SIZE, horizon = HORIZON)
for i in range(3):
print(f"Window: {full_windows[i]} -> Label: {full_labels[i]}")
Note:
There's a function which does similar to the above in tf.keras.preprocessing.
tf.keras.preprocessing.time_series_dataset_from_array()
def make_train_test_splits(windows, labels, test_split= 0.2):
"""
Splits matching pairs of windows and labels into train and test splits
"""
split_size = int(len(windows) * (1-test_split)) # this will default to 80%/20% train and test splits
train_windows = windows[:split_size]
train_labels = labels[:split_size]
test_windows = windows[split_size:]
test_labels = labels[split_size:]
return train_windows, test_windows, train_labels, test_labels
train_windows, test_windows, train_labels, test_labels = make_train_test_splits(full_windows, full_labels, test_split= 0.2)
len(train_windows), len(test_windows), len(train_labels), len(test_labels)
train_windows[:5], train_labels[:5]
test_windows[:5], test_labels[:5]
np.array_equal(np.squeeze(train_labels[:-HORIZON-1]), y_train[WINDOW_SIZE:])
Make a modelling checkpoint
Because our model's performance will fluctuate from experiment to experiment, we're going to write a model checkpoint so we can compare apples to apples.
We want to compare each of our models best perormance against the other model's best performance.
For example, if our modle performs the best on epoch 55 (but we're training for 100 epochs) , we want to load and evaluate the model saved on epoch 55.
We can create modelling checkpoint callback using the following: Tensorflow docs Model checkpoint
import os
# Create a function to implement a ModelCheckpoint call back with specific filename
def create_model_checkpoint(model_name, save_path = "model_experiments"):
return tf.keras.callbacks.ModelCheckpoint(filepath = os.path.join(save_path, model_name),
verbose = 0,
save_best_only = True)