In this post, you will see how to implement a multivariate time series forecasting using Encoder – Decoder deep learning architecture.

Once again, the dataset is taken from ICU Machine Learning repository. Briefly, as the site says : “the dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device”. For a full description of the dataset, you can follow this link.

The purpose is to forecast the values of the values of the variables.

The complete Jypyter notebook file is available on Github. Assuming you have already basic knowledge on importing data into Pandas, I will jump directly into the specific parts of the topic.

Anytime you work on on Time Series forecasting, the first thing that you need to check is the non-stationarity of the dataset. The stationarity is a property where the mean, variance and covariance do not vary with the time. In fact, Stationary dataset doesn’t allow a better prediction.

There are several ways to check the non-stationarity of series depending on the number of variables. For univariate series, you can just plot the mean, variance and covariance and visualize the trend . For multivariate series, you must check the cointegration between variables using Johansen covariance test which is implemented on StatsModels Library.

```
data = dataset.drop(['datetime'], axis=1)
data.index = dataset.datetime
#checking stationarity
from statsmodels.tsa.vector_ar.vecm import coint_johansen
#drop one of the column to fit 12 tests
johan_test_temp = data.drop(['CO(GT)'], axis=1)
coint_johansen(johan_test_temp,-1,1).eig
```

If you can extract eigenvalues from Johansen test, it means the dataset is not stationary and you can proceed to time series forecasting with better accuracy.

Next, you need to proceed to serie windowing. Basically, windowing is slicing the original dataset into train-future datasets.

```
def split_series(series, n_past, n_future):
'''
n_past ==> no of past observations
n_future ==> no of future observations
'''
X, y = list(), list()
for window_start in range(len(series)):
past_end = window_start + n_past
future_end = past_end + n_future
if future_end > len(series):
break
# slicing the past and future parts of the window
past, future = series[window_start:past_end, :], series[past_end:future_end, :]
X.append(past)
y.append(future)
return np.array(X), np.array(y)
```

For the model, we will use encoder – decoder as per below architecture.

```
# E2D2
# n_features ==> no of features at each timestep in the data.
#
encoder_inputs = Input(shape=(n_past, n_features))
encoder_l1 = LSTM(100,return_sequences = True, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)
encoder_states1 = encoder_outputs1[1:]
encoder_l2 = LSTM(100, return_state=True)
encoder_outputs2 = encoder_l2(encoder_outputs1[0])
encoder_states2 = encoder_outputs2[1:]
decoder_inputs = RepeatVector(n_future)(encoder_outputs2[0])
decoder_l1 = LSTM(100, return_sequences=True)(decoder_inputs,initial_state = encoder_states1)
decoder_l2 = LSTM(100, return_sequences=True)(decoder_l1,initial_state = encoder_states2)
decoder_outputs2 = TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l2)
model_e2d2 = tf.keras.models.Model(encoder_inputs,decoder_outputs2)
```

Now, we can proceed to the training. We will use Huber As loss function because it is more suitable to handle regression Time Series forecasting.

```
reduce_lr = tf.keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x, verbose = 1)
model_e2d2.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e2d2=model_e2d2.fit(X_train,y_train,epochs=25,validation_data=(X_test,y_test),batch_size=32,verbose=1,callbacks=[reduce_lr])
```

We can observe the loss is decreasing correctly and the testing loss is far better than the training which means the model is approriate to our goal.

Since it is a regression forecasting, we need to calculate the RMSE:

rmse value for CO(GT) is : 2.946465301149107 rmse value for PT08.S1(CO) is : 9.406576031557988 rmse value for NMHC(GT) is : 3.1019596488449808 rmse value for C6H6(GT) is : 1.9661898791039167 rmse value for PT08.S2(NMHC) is : 10.635317731207742 rmse value for NOx(GT) is : 9.164628074932729 rmse value for PT08.S3(NOx) is : 9.932545698752552 rmse value for NO2(GT) is : 5.426479520355997 rmse value for PT08.S4(NO2) is : 10.944755292180378 rmse value for PT08.S5(O3) is : 13.594535572298128 rmse value for T is : 2.0557052764338706 rmse value for RH is : 2.919398978106424 rmse value for AH is : 1.1845951807986206

Another approach to achieve the forecasting is to use regular statistic VAR instead of deep learning approach. The Jupyter notebook is available on Github and it gives below RMSE results.

rmse value for CO(GT) is : 1.066462254187852 rmse value for PT08.S1(CO) is : 13.001824321416066 rmse value for NMHC(GT) is : 2.5302971314862863 rmse value for C6H6(GT) is : 2.4820888653216437 rmse value for PT08.S2(NMHC) is : 15.422996982837953 rmse value for NOx(GT) is : 12.329249935169996 rmse value for PT08.S3(NOx) is : 13.958608690249868 rmse value for NO2(GT) is : 7.259374277148832 rmse value for PT08.S4(NO2) is : 20.86420786014703 rmse value for PT08.S5(O3) is : 19.235469130809918 rmse value for T is : 3.0662985181056874 rmse value for RH is : 3.7422152961864836 rmse value for AH is : 0.6793153728684356

**References:**