In this post, you will see how to implement a multivariate time series forecasting using Encoder – Decoder deep learning architecture.

Once again, the dataset is taken from ICU Machine Learning repository. Briefly, as the site says : “the dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device”. For a full description of the dataset, you can follow this link.

The purpose is to forecast the values of the values of the variables.

The complete Jypyter notebook file is available on Github. Assuming you have already basic knowledge on importing data into Pandas, I will jump directly into the specific parts of the topic.

Anytime you work on on Time Series forecasting, the first thing that you need to check is the non-stationarity of the dataset. The stationarity is a property where the mean, variance and covariance do not vary with the time. In fact, Stationary dataset doesn’t allow a better prediction.

There are several ways to check the non-stationarity of series depending on the number of variables. For univariate series, you can just plot the mean, variance and covariance and visualize the trend . For multivariate series, you must check the cointegration between variables using Johansen covariance test which is implemented on StatsModels Library.

data = dataset.drop(['datetime'], axis=1)
data.index = dataset.datetime
#checking stationarity
from statsmodels.tsa.vector_ar.vecm import coint_johansen
#drop one of the column to fit 12 tests
johan_test_temp = data.drop(['CO(GT)'], axis=1)
coint_johansen(johan_test_temp,-1,1).eig

If you can extract eigenvalues from Johansen test, it means the dataset is not stationary and you can proceed to time series forecasting with better accuracy.

Next, you need to proceed to serie windowing. Basically, windowing is slicing the original dataset into train-future datasets.

def split_series(series, n_past, n_future):
  '''
    n_past ==> no of past observations
    n_future ==> no of future observations
  '''
  X, y = list(), list()
  for window_start in range(len(series)):
    past_end = window_start + n_past
    future_end = past_end + n_future
    if future_end > len(series):
      break
    # slicing the past and future parts of the window
    past, future = series[window_start:past_end, :], series[past_end:future_end, :]
    X.append(past)
    y.append(future)
  return np.array(X), np.array(y)

For the model, we will use encoder – decoder as per below architecture.

Encoder – decoder (Source Analytics Vidhya)
# E2D2
# n_features ==> no of features at each timestep in the data.
#
encoder_inputs = Input(shape=(n_past, n_features))
encoder_l1 = LSTM(100,return_sequences = True, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)
encoder_states1 = encoder_outputs1[1:]
encoder_l2 = LSTM(100, return_state=True)
encoder_outputs2 = encoder_l2(encoder_outputs1[0])
encoder_states2 = encoder_outputs2[1:]

decoder_inputs = RepeatVector(n_future)(encoder_outputs2[0])

decoder_l1 = LSTM(100, return_sequences=True)(decoder_inputs,initial_state = encoder_states1)
decoder_l2 = LSTM(100, return_sequences=True)(decoder_l1,initial_state = encoder_states2)
decoder_outputs2 = TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l2)

model_e2d2 = tf.keras.models.Model(encoder_inputs,decoder_outputs2)

Now, we can proceed to the training. We will use Huber As loss function because it is more suitable to handle regression Time Series forecasting.

reduce_lr = tf.keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x, verbose = 1)
model_e2d2.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e2d2=model_e2d2.fit(X_train,y_train,epochs=25,validation_data=(X_test,y_test),batch_size=32,verbose=1,callbacks=[reduce_lr])

We can observe the loss is decreasing correctly and the testing loss is far better than the training which means the model is approriate to our goal.

Loss Graph

Since it is a regression forecasting, we need to calculate the RMSE:

rmse value for CO(GT) is :  2.946465301149107
rmse value for PT08.S1(CO) is :  9.406576031557988
rmse value for NMHC(GT) is :  3.1019596488449808
rmse value for C6H6(GT) is :  1.9661898791039167
rmse value for PT08.S2(NMHC) is :  10.635317731207742
rmse value for NOx(GT) is :  9.164628074932729
rmse value for PT08.S3(NOx) is :  9.932545698752552
rmse value for NO2(GT) is :  5.426479520355997
rmse value for PT08.S4(NO2) is :  10.944755292180378
rmse value for PT08.S5(O3) is :  13.594535572298128
rmse value for T is :  2.0557052764338706
rmse value for RH is :  2.919398978106424
rmse value for AH is :  1.1845951807986206

Another approach to achieve the forecasting is to use regular statistic VAR instead of deep learning approach. The Jupyter notebook is available on Github and it gives below RMSE results.

rmse value for CO(GT) is :  1.066462254187852
rmse value for PT08.S1(CO) is :  13.001824321416066
rmse value for NMHC(GT) is :  2.5302971314862863
rmse value for C6H6(GT) is :  2.4820888653216437
rmse value for PT08.S2(NMHC) is :  15.422996982837953
rmse value for NOx(GT) is :  12.329249935169996
rmse value for PT08.S3(NOx) is :  13.958608690249868
rmse value for NO2(GT) is :  7.259374277148832
rmse value for PT08.S4(NO2) is :  20.86420786014703
rmse value for PT08.S5(O3) is :  19.235469130809918
rmse value for T is :  3.0662985181056874
rmse value for RH is :  3.7422152961864836
rmse value for AH is :  0.6793153728684356

References: