In previous article, we have demonstrated the benefit of using statistical approach like Decision tree over DNN when it comes to small size dataset. Now, we will improve further in that direction with Random Forest.

Random Forest is using Decision Tree algorithm but it offers better prediction by reducing the risk of overfitting i.e high variance . In fact, like its name indicates, it uses random sub-sampling over the root node to split data to many subsamples and run concurrent decision tree algorithm on those and decide the final output based on majority rule. For regression, it will take the mean or medium of the outputs.

When using Random Forest, we could have a near-perfect accuracy of 99%.

Jupyter notebook is available on Github.