In this very short article, we will, once again, use the diabete dataset from UCI Machine learning but using statistical approach Decision Tree from Sklearn.

We have already did the prediction using Deep learning and XGBoost. The purpose here is to show the benefit of statistical approach on small datasets.

Decision tree doesn’t require dataset splitting into training and testing but just for the purpose of comparison to DNN approach, we will keep a portion of 25% of the dataset for the test.

After fitting the model, it gave an accuracy of 97% in less than 1s while by running on DNN, it gave 96.54% of accuracy in 1. 5 minute. This show us that Deep learning is not a silver bullet. Machine learning approach is more efficient on small dataset but when it comes to large dataset which may require complex modeling, then Deep learning must be the way to go.

Full jupyter notebook for both Decision Tree and DNN approach are available on Github.