This tutorial is to achieve the same purpose as the previous article Binary Classification with Keras and Scikit-learn but with the pair Keras-TensorFlow. We will just highlight the difference between both approaches.

  • Load, Explore, visualize dataset

Instead of using Pandas, we use directly Tensorflow to load and create the tensors:

#Download the dataset
URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00529/diabetes_data_upload.csv"
dataset = tf.keras.utils.get_file(
    "diabetes_data_upload.csv",
    URL)

def train_input_fn():
  diabete = tf.data.experimental.make_csv_dataset(
      dataset,
      batch_size=32,
      header=True,
      column_names=CSV_COLUMNS,
      label_name="cls")
  diabete_batches = (
      diabete.cache().repeat().shuffle(100)
      .prefetch(tf.data.AUTOTUNE))
  return diabete_batches
  • Data preprocessing

Also, we use TensorFlow-dataset library for data preprocessing and to convert vocabulary list into Keras features.

def categorical_fc(name, values):
    """Helper function to wrap categorical feature by indicator column.

    Args:
        name: str, name of feature.
        values: list, list of strings of categorical values.
    Returns:
        Indicator column of categorical feature.
    """
    cat_column = tf.feature_column.categorical_column_with_vocabulary_list(
            key=name, vocabulary_list=values)

    return tf.feature_column.indicator_column(categorical_column=cat_column)

def create_feature_columns():
    """Creates dictionary of feature columns from inputs.

    Returns:
        Dictionary of feature columns.
    """
    feature_columns = {
        colname : tf.feature_column.numeric_column(key=colname)
           for colname in ["age"]
    }

    feature_columns["gender"] = categorical_fc("gender", ['Male', 'Female'])
    
    for colname in ['gender','polyuria','polydipsia','sudden_weight_loss', 'weakness', 'polyphagia', 'genital_thrush', 'visual_blurring','itching','irritability', 'delayed_healing', 'partial_paresis', 'muscle_stiffness', 'alopecia', 'obesity']:        
        feature_columns[colname] = categorical_fc(colname, ['Yes', 'No'])

    return feature_columns
  • Develop the model

As for the model, we use DNNClassifier which offers many more advantages like checkpointing among of them.

from tensorflow import keras

model_dir = tempfile.mkdtemp()
model = tf.estimator.DNNClassifier(
    model_dir=model_dir,
    hidden_units=[36,12],
    feature_columns=[age,gender,polyuria,polydipsia,sudden_weight_loss,weakness,polyphagia,genital_thrush,visual_blurring,itching,irritability,delayed_healing,partial_paresis,muscle_stiffness,alopecia,obesity],
    n_classes=2,
    label_vocabulary=['Positive','Negative'],
    activation_fn='relu',
    optimizer=keras.optimizers.Adam(learning_rate=0.005)
)
  • Execute the training and evaluate the model

Now, we can proceed to the training and then evaluate the accuracy:

model = model.train(input_fn=train_input_fn, steps=8000)
result = model.evaluate(train_input_fn, steps=10)

We observe a faster execution compared to Scikit-learn approach and also a better accuracy of 98.4%.

The Jupyter notebook file is available on https://github.com/erasolon/machine_learning/blob/main/Diabete_DNN_Tensorflow.ipynb

Leave a Reply

Your email address will not be published. Required fields are marked *