Bonus example - starting from scratch

14. Bonus example - starting from scratch#

When we are doing real deep learning tasks, the analysed data is not neatly included in the libraries we are using for the analysis (like Keras Datasets). In these real tasks we need to collect the data by ourselves. Organising and preprocessing these real datasets to a suitable form for the analysis can be sometimes challenging.

The purpose of this example is to introduce you to the (possible) steps needed, when you are preparing your data for a deep learning model.

For this example, we need to load the data ourselves that is somewhat laborious. We use image classification data from www.kaggle.com/c/dogs-vs-cats. Kaggle organises ML-competitions, and in this competition, the task is to distinguish dogs from cats in images.

Pinto

First, we load some libraries that are needed to manipulate the image files.

import os,shutil

I have the original training data in the “original_data” folder (under the work folder). You can download the original data from www.kaggle.com/c/dogs-vs-cats.

files = os.listdir('./original_data')

The total number of dog and cat images is 25 000.

len(files)

We do this training “by-the-book” by dividing the data to train, validation and test parts. The validation data is used to finetune the hyperparameters of a model. With separate validation data, we avoid using the hyperparameter optimisation wrongly to optimise the test data performance. Below is an example of a dataset-split that uses 3-fold cross-validation.

The following commands build different folders for the training, validation and test data.

os.mkdir('train')
os.mkdir('validation')
os.mkdir('test')

Under the training, validation and test -folders we make separate folders for the dog and cat pictures. This makes it much easier to use Keras data-generation function as it can automatically collect observations of different classes from different folders. os.path.join() -function makes it easy to build directory structures. You add the “parts” of the directory structure, and it will add automatically slashes when needed.

# Get the current work directory
base_dir = os.getcwd()

# Dogs
os.mkdir(os.path.join(base_dir,'train','dogs'))
os.mkdir(os.path.join(base_dir,'validation','dogs'))
os.mkdir(os.path.join(base_dir,'test','dogs'))
# Cats
os.mkdir(os.path.join(base_dir,'train','cats'))
os.mkdir(os.path.join(base_dir,'validation','cats'))
os.mkdir(os.path.join(base_dir,'test','cats'))

Next, we copy the files to correct folders. We use only part of the data to speed up calculations: 3000 images for the training, 1000 images for the validation and 1000 images for the testing. The first command in each cell constructs a list of correct filenames. It uses Python’s list comprehension, that is a great feature in Python.

Let’s analyse the first one (fnames = [‘dog.{}.jpg’.format(i) for i in range(1500)]):

When we put a for loop inside square brackets, Python will generate a list that has the “rounds” of a loop as values in the list.

‘dog.{}.jpg’.format(i) - This is the part that will be repeated in the list so that the curly brackets are replaced by the value of i.

for i in range(1500) - This will tell what values are inserted in i. range(1500) just means values from 0 to 1500.

More information about list comprehension can be found from https://docs.python.org/3/tutorial/datastructures.html (section 5.1.3)

# Train dogs
fnames = ['dog.{}.jpg'.format(i) for i in range(1500)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'train','dogs',file)
    shutil.copyfile(src,dst)

# Validation dogs
fnames = ['dog.{}.jpg'.format(i) for i in range(1500,2000)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'validation','dogs',file)
    shutil.copyfile(src,dst)

# Test dogs
fnames = ['dog.{}.jpg'.format(i) for i in range(2000,2500)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'test','dogs',file)
    shutil.copyfile(src,dst)

# Train cats
fnames = ['cat.{}.jpg'.format(i) for i in range(1500)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'train','cats',file)
    shutil.copyfile(src,dst)

# Validation cats
fnames = ['cat.{}.jpg'.format(i) for i in range(1500,2000)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'validation','cats',file)
    shutil.copyfile(src,dst)

# Test cats
fnames = ['cat.{}.jpg'.format(i) for i in range(1500,2000)]
for file in fnames:
    src = os.path.join(base_dir,'original_data',file)
    dst = os.path.join(base_dir,'test','cats',file)
    shutil.copyfile(src,dst)

Next, we check that everything went as planned. The dog folders should have 1500,500 and 500 images and similarly to the cat folders.

# Check the dog directories
print(len(os.listdir(os.path.join(base_dir,'train','dogs'))))
print(len(os.listdir(os.path.join(base_dir,'validation','dogs'))))
print(len(os.listdir(os.path.join(base_dir,'test','dogs'))))

1500
500
500

# Check the cat directories
print(len(os.listdir(os.path.join(base_dir,'train','cats'))))
print(len(os.listdir(os.path.join(base_dir,'validation','cats'))))
print(len(os.listdir(os.path.join(base_dir,'test','cats'))))

1500
500
500

14.1. Simple CNN model#

As our preliminary model, we test a basic CNN model with four convolutional layers and four max-pooling layers followed by two dense layers with 12544 (flatten) and 512 neurons. The output layer has one neuron with a sigmoid activation function. So, the output is a prediction for one of the two classes.

First, we need the layers and models -modules from Keras.

from tensorflow.keras import layers
from tensorflow.keras import models

Next, we define a sequential model and add layers using the add()-function.

model = models.Sequential()

The input images to the network are 150x150 pixel RGB images. The size of the convolution-filter is 3x3, and the layer produces 32 feature maps. The ReLU activation function is the common choice with CNNs (and many other neural network types).

model.add(layers.Conv2D(32, (3, 3), activation='relu',input_shape=(150, 150, 3)))

A max-pooling layer with a 2x2 window.

model.add(layers.MaxPooling2D((2, 2)))

Notice how the number of feature maps is increasing.

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(256, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(512, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

Overall, we have almost 7 million parameters in our model, which is way too much for a training set with 3000 images. The model will overfit as we will soon see from the results. First_CNN

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 256)       295168    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 256)         0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 512)               6423040   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 6,811,969
Trainable params: 6,811,969
Non-trainable params: 0
_________________________________________________________________

from tensorflow.keras import optimizers

Next, we compile the model. Because we have now two classes, “binary_crossentropy” is the correct loss_function. There are many gradient descent optimisers available, but usually, RMSprop works very well. More information about RMSprop can be found here: https://keras.io/api/optimizers/rmsprop/.

grad_desc

We measure performance with the accuracy metric.

model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(),metrics=['acc'])

To get images from a folder to a CNN model can be a very tedious task. Luckily, Keras has functions that make our job much more straightforward.

ImageDataGenerator is a Python generator that can be used to transform images from a folder to tensors that can be fed to a neural network model.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

We scale the pixel values from 0-255 to 0-1. Remember: neural networks like small values.

train_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)

We change the size of the images to 150 x 150 and collect them in 25 batches. Basically, we feed (25,150,150,3)-tensors to the model. As you can see, the function automatically recognises two different classes. It is because we placed the cat and dog images to two different folders. We have to make separate generators for the training data and the validation data.

train_generator = train_datagen.flow_from_directory(os.path.join(base_dir,'train'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 3000 images belonging to 2 classes.

validation_generator = train_datagen.flow_from_directory(os.path.join(base_dir,'validation'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 1000 images belonging to 2 classes.

We use a little bit longer training with 30 epochs. Instead of input data, we now give the generators to the model. Also, we separately define the validation generator and validation testing steps. With 25 image batches and 120 steps per epoch, we go through all the 3000 images. To history, we save the training progress details.

history = model.fit(train_generator,
                              steps_per_epoch=120,
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=40)

Epoch 1/30
120/120 [==============================] - 30s 242ms/step - loss: 1.1536 - acc: 0.5054 - val_loss: 0.6858 - val_acc: 0.5140
Epoch 2/30
120/120 [==============================] - 22s 182ms/step - loss: 0.6779 - acc: 0.5652 - val_loss: 0.6450 - val_acc: 0.5970
Epoch 3/30
120/120 [==============================] - 22s 180ms/step - loss: 0.6601 - acc: 0.6260 - val_loss: 0.6499 - val_acc: 0.6480
Epoch 4/30
120/120 [==============================] - 22s 180ms/step - loss: 0.5894 - acc: 0.6766 - val_loss: 0.6674 - val_acc: 0.6730
Epoch 5/30
120/120 [==============================] - 22s 182ms/step - loss: 0.5287 - acc: 0.7391 - val_loss: 0.5743 - val_acc: 0.7050
Epoch 6/30
120/120 [==============================] - 22s 181ms/step - loss: 0.4937 - acc: 0.7739 - val_loss: 0.6369 - val_acc: 0.7050
Epoch 7/30
120/120 [==============================] - 22s 182ms/step - loss: 0.4594 - acc: 0.7848 - val_loss: 0.5385 - val_acc: 0.7270
Epoch 8/30
120/120 [==============================] - 22s 184ms/step - loss: 0.3973 - acc: 0.8170 - val_loss: 0.7588 - val_acc: 0.6450
Epoch 9/30
120/120 [==============================] - 22s 181ms/step - loss: 0.3694 - acc: 0.8405 - val_loss: 0.5704 - val_acc: 0.7490
Epoch 10/30
120/120 [==============================] - 22s 182ms/step - loss: 0.3106 - acc: 0.8738 - val_loss: 0.6121 - val_acc: 0.7510
Epoch 11/30
120/120 [==============================] - 22s 183ms/step - loss: 0.2531 - acc: 0.8894 - val_loss: 1.0076 - val_acc: 0.7020
Epoch 12/30
120/120 [==============================] - 22s 184ms/step - loss: 0.1932 - acc: 0.9199 - val_loss: 0.8431 - val_acc: 0.7350
Epoch 13/30
120/120 [==============================] - 22s 182ms/step - loss: 0.1432 - acc: 0.9474 - val_loss: 0.9455 - val_acc: 0.7600
Epoch 14/30
120/120 [==============================] - 22s 181ms/step - loss: 0.1507 - acc: 0.9525 - val_loss: 0.9235 - val_acc: 0.7310
Epoch 15/30
120/120 [==============================] - 22s 182ms/step - loss: 0.1032 - acc: 0.9624 - val_loss: 1.1940 - val_acc: 0.7550
Epoch 16/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0769 - acc: 0.9693 - val_loss: 1.0742 - val_acc: 0.7620
Epoch 17/30
120/120 [==============================] - 22s 183ms/step - loss: 0.0673 - acc: 0.9834 - val_loss: 1.6420 - val_acc: 0.7300
Epoch 18/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0477 - acc: 0.9846 - val_loss: 1.6290 - val_acc: 0.7710
Epoch 19/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0529 - acc: 0.9855 - val_loss: 1.6750 - val_acc: 0.7420
Epoch 20/30
120/120 [==============================] - 22s 182ms/step - loss: 0.0588 - acc: 0.9810 - val_loss: 2.1016 - val_acc: 0.7480
Epoch 21/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0861 - acc: 0.9782 - val_loss: 1.7882 - val_acc: 0.7720
Epoch 22/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0814 - acc: 0.9821 - val_loss: 2.3528 - val_acc: 0.7310
Epoch 23/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0487 - acc: 0.9893 - val_loss: 1.6582 - val_acc: 0.7360
Epoch 24/30
120/120 [==============================] - 22s 183ms/step - loss: 0.0561 - acc: 0.9818 - val_loss: 2.3192 - val_acc: 0.7590
Epoch 25/30
120/120 [==============================] - 22s 181ms/step - loss: 0.0560 - acc: 0.9883 - val_loss: 1.7321 - val_acc: 0.7640
Epoch 26/30
120/120 [==============================] - 22s 184ms/step - loss: 0.0341 - acc: 0.9898 - val_loss: 2.0470 - val_acc: 0.7290
Epoch 27/30
120/120 [==============================] - 22s 182ms/step - loss: 0.0294 - acc: 0.9930 - val_loss: 2.0211 - val_acc: 0.7510
Epoch 28/30
120/120 [==============================] - 22s 182ms/step - loss: 0.0416 - acc: 0.9887 - val_loss: 3.0814 - val_acc: 0.7510
Epoch 29/30
120/120 [==============================] - 22s 182ms/step - loss: 0.0532 - acc: 0.9913 - val_loss: 3.2485 - val_acc: 0.7610
Epoch 30/30
120/120 [==============================] - 22s 183ms/step - loss: 0.0601 - acc: 0.9908 - val_loss: 2.5971 - val_acc: 0.7250

Let’s check how did it go. In a typical overfitting situation, training accuracy quickly rises to almost 1.0 and validation accuracy stalls to a much lower level. This is also the case here. The training accuracy is 0.984, and the validation accuracy is around 0.72. But still, not that bad! The model recognises cats and dogs correctly from the images 72 % of the time.

Overfitting

import matplotlib.pyplot as plt # Load plotting libraries
plt.style.use('bmh') # bmh-style is usually nice
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'r--', label='Training accuracy')
plt.plot(epochs, val_acc, 'b--', label='Validation accuracy')
plt.legend() # Matplotlib will automatically position the legend in a best possible way.
plt.figure() # This is needed to make two separate figures for loss and accuracy.
plt.plot(epochs, loss, 'r--', label='Training loss')
plt.plot(epochs, val_loss, 'b--', label='Validation loss')
plt.legend()
plt.show()

_images/cd734d09fe25e8ed2c76341cb4c65a8186b563a36a533a779882120521dd3ecb.png

_images/edf868a3fa8dd21d8e62d86d5ea35941daf249f379c596d557c4bdd5db6d4e3f.png

14.1.1. Augmentation and regularisation#

Let’s try to improve our model. Augmentation is a common approach to “increase” the amount of data. The idea of augmentation is to transform images slightly every time they are fed to the model. Thus, we are trying to create new information to the model to train on. However, we are not truly creating new information. Nevertheless, augmentation has proven to be an efficient way to improve results.

Image transformation can be implemented to the ImageDataGenerator()-function. There are many parameters that can be used to transform images. More information: keras.io/api/preprocessing/image/

datagen = ImageDataGenerator(rotation_range=40,
                             width_shift_range=0.2,
                             height_shift_range=0.2,
                             shear_range=0.2,
                             zoom_range=0.2,
                             horizontal_flip=True,
                             fill_mode='nearest')

Let’s check what kind of images we are analysing.

# Image -module to view images
from tensorflow.keras.preprocessing import image

# We pick the 16th image from the train/dogs -folder.
img_path = os.path.join(base_dir,'train','dogs',os.listdir(os.path.join(base_dir,'train','dogs'))[16])

sample_image = image.load_img(img_path, target_size=(150, 150))

Below is an example image from the original dataset. The sixteenth image in our list.

sample_image

_images/aa9fac0132a9c50b32e26c69b41b9c8aec9f0f95332076eb319d13dd3d5b50c1.png

To use the Imagedatagenerator’s flow()-function, we need to transform our image to a numpy-array.

sample_image_np = image.img_to_array(sample_image)
sample_image_np = sample_image_np.reshape((1,) + sample_image_np.shape)

The following code transforms images using ImageDataGenerator() and plots eight examples. As you can see, they are slightly altered images that are very close to the original image.

fig, axs = plt.subplots(2,4,figsize=(14,6),squeeze=True)
i=0
for ax,transform in zip(axs.flat,datagen.flow(sample_image_np, batch_size=1)):
    ax.imshow(image.array_to_img(transform[0]))
    i+=1
    if i%8==0:
        break

_images/4efe674bf5af1603618c084677e60019289259fc08202ec63f9defec0e6cf3d0.png

Next, we define the model. Alongside augmentation, we add regularisation to the model with a dropout-layer. The Dropout layer randomly sets input units to 0 with a frequency of rate (0.5 below) at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

We build our sequential model using add()-functions. The only difference, when compared to the previous model, is the dropout-layer after the flatten-layer (and the augmentation).

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu',input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

The dropout layer does not change the number of parameters. It is exactly the same as in the previous model.

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 15, 15, 256)       295168    
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 7, 7, 256)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dropout (Dropout)            (None, 12544)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               6423040   
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
=================================================================
Total params: 6,811,969
Trainable params: 6,811,969
Non-trainable params: 0
_________________________________________________________________

The compile-step is not changed.

model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(),metrics=['acc'])

We create the augmentation-enabled generators. Remember that the validation dataset should not be augmented!

train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

validation_datagen = ImageDataGenerator(rescale=1./255)

The same dataset of 3000 training images and 1000 validation images.

train_generator = train_datagen.flow_from_directory(os.path.join(base_dir,'train'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 3000 images belonging to 2 classes.

validation_generator = validation_datagen.flow_from_directory(os.path.join(base_dir,'validation'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 1000 images belonging to 2 classes.

Otherwise, the parameters to the model.fit() are the same as in the previous model, but we train the model a little bit longer. This is because regularisation slows down training.

history = model.fit(train_generator,
                              steps_per_epoch=120,
                              epochs=50,
                              validation_data=validation_generator,
                              validation_steps=40)

Epoch 1/50
120/120 [==============================] - 25s 208ms/step - loss: 0.7414 - acc: 0.5106 - val_loss: 0.6894 - val_acc: 0.5240
Epoch 2/50
120/120 [==============================] - 25s 208ms/step - loss: 0.6794 - acc: 0.5682 - val_loss: 0.6630 - val_acc: 0.5560
Epoch 3/50
120/120 [==============================] - 25s 206ms/step - loss: 0.6665 - acc: 0.6169 - val_loss: 0.6329 - val_acc: 0.6480
Epoch 4/50
120/120 [==============================] - 25s 206ms/step - loss: 0.6376 - acc: 0.6395 - val_loss: 0.7231 - val_acc: 0.5270
Epoch 5/50
120/120 [==============================] - 25s 206ms/step - loss: 0.6247 - acc: 0.6450 - val_loss: 0.5788 - val_acc: 0.6990
Epoch 6/50
120/120 [==============================] - 25s 206ms/step - loss: 0.6250 - acc: 0.6423 - val_loss: 0.5785 - val_acc: 0.6950
Epoch 7/50
120/120 [==============================] - 25s 207ms/step - loss: 0.6112 - acc: 0.6518 - val_loss: 0.5708 - val_acc: 0.7100
Epoch 8/50
120/120 [==============================] - 25s 208ms/step - loss: 0.5916 - acc: 0.6938 - val_loss: 0.6353 - val_acc: 0.6760
Epoch 9/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5843 - acc: 0.7051 - val_loss: 0.5636 - val_acc: 0.7160
Epoch 10/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5553 - acc: 0.7068 - val_loss: 0.5602 - val_acc: 0.7260
Epoch 11/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5901 - acc: 0.7350 - val_loss: 0.6322 - val_acc: 0.6580
Epoch 12/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5798 - acc: 0.7104 - val_loss: 0.5457 - val_acc: 0.7400
Epoch 13/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5511 - acc: 0.7223 - val_loss: 0.5793 - val_acc: 0.7230
Epoch 14/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5548 - acc: 0.7151 - val_loss: 0.5333 - val_acc: 0.7380
Epoch 15/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5474 - acc: 0.7300 - val_loss: 0.5244 - val_acc: 0.7500
Epoch 16/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5385 - acc: 0.7317 - val_loss: 0.5740 - val_acc: 0.7480
Epoch 17/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5555 - acc: 0.7194 - val_loss: 0.6109 - val_acc: 0.7110
Epoch 18/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5418 - acc: 0.7296 - val_loss: 0.5669 - val_acc: 0.7530
Epoch 19/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5371 - acc: 0.7338 - val_loss: 0.5645 - val_acc: 0.7260
Epoch 20/50
120/120 [==============================] - 25s 208ms/step - loss: 0.5070 - acc: 0.7625 - val_loss: 0.4967 - val_acc: 0.7830
Epoch 21/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5251 - acc: 0.7369 - val_loss: 0.5816 - val_acc: 0.7650
Epoch 22/50
120/120 [==============================] - 25s 206ms/step - loss: 0.5148 - acc: 0.7496 - val_loss: 0.6144 - val_acc: 0.7330
Epoch 23/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5152 - acc: 0.7534 - val_loss: 0.4925 - val_acc: 0.7640
Epoch 24/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4842 - acc: 0.7857 - val_loss: 0.5371 - val_acc: 0.7450
Epoch 25/50
120/120 [==============================] - 25s 207ms/step - loss: 0.5027 - acc: 0.7669 - val_loss: 0.6170 - val_acc: 0.7140
Epoch 26/50
120/120 [==============================] - 25s 208ms/step - loss: 0.5135 - acc: 0.7538 - val_loss: 0.4628 - val_acc: 0.8010
Epoch 27/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4941 - acc: 0.7826 - val_loss: 0.4712 - val_acc: 0.8130
Epoch 28/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4824 - acc: 0.7768 - val_loss: 0.5085 - val_acc: 0.7840
Epoch 29/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4752 - acc: 0.7775 - val_loss: 0.4786 - val_acc: 0.7970
Epoch 30/50
120/120 [==============================] - 25s 207ms/step - loss: 0.4708 - acc: 0.7895 - val_loss: 0.5117 - val_acc: 0.7940
Epoch 31/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4785 - acc: 0.7891 - val_loss: 0.4519 - val_acc: 0.8060
Epoch 32/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4771 - acc: 0.7673 - val_loss: 0.4709 - val_acc: 0.7840
Epoch 33/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4706 - acc: 0.7851 - val_loss: 0.4894 - val_acc: 0.7950
Epoch 34/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4660 - acc: 0.7886 - val_loss: 0.5086 - val_acc: 0.7990
Epoch 35/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4631 - acc: 0.7873 - val_loss: 0.4509 - val_acc: 0.8030
Epoch 36/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4613 - acc: 0.7896 - val_loss: 0.4691 - val_acc: 0.8040
Epoch 37/50
120/120 [==============================] - 25s 207ms/step - loss: 0.4558 - acc: 0.7890 - val_loss: 0.5306 - val_acc: 0.8110
Epoch 38/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4628 - acc: 0.7939 - val_loss: 0.4788 - val_acc: 0.7980
Epoch 39/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4508 - acc: 0.7944 - val_loss: 0.4615 - val_acc: 0.7990
Epoch 40/50
120/120 [==============================] - 25s 210ms/step - loss: 0.4359 - acc: 0.8097 - val_loss: 0.4332 - val_acc: 0.8380
Epoch 41/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4385 - acc: 0.7937 - val_loss: 0.5200 - val_acc: 0.8030
Epoch 42/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4691 - acc: 0.7876 - val_loss: 0.4910 - val_acc: 0.8050
Epoch 43/50
120/120 [==============================] - 25s 207ms/step - loss: 0.4339 - acc: 0.8049 - val_loss: 0.4760 - val_acc: 0.8190
Epoch 44/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4135 - acc: 0.8281 - val_loss: 0.4042 - val_acc: 0.8290
Epoch 45/50
120/120 [==============================] - 25s 207ms/step - loss: 0.4272 - acc: 0.8048 - val_loss: 0.5202 - val_acc: 0.7600
Epoch 46/50
120/120 [==============================] - 25s 207ms/step - loss: 0.4414 - acc: 0.8299 - val_loss: 0.5741 - val_acc: 0.7900
Epoch 47/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4326 - acc: 0.7954 - val_loss: 0.5968 - val_acc: 0.7940
Epoch 48/50
120/120 [==============================] - 25s 206ms/step - loss: 0.4752 - acc: 0.8095 - val_loss: 0.4758 - val_acc: 0.8170
Epoch 49/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4277 - acc: 0.8256 - val_loss: 0.4358 - val_acc: 0.8390
Epoch 50/50
120/120 [==============================] - 25s 208ms/step - loss: 0.4181 - acc: 0.8122 - val_loss: 0.4724 - val_acc: 0.8320

As you can see from the following figure, overfitting has almost disappeared. The training and validation accuracy stay approximately at the same level through the training. The performance is also somewhat better. Now we achieve a validation accuracy of 0.77.

plt.style.use('bmh')
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'r--', label='Training acc')
plt.plot(epochs, val_acc, 'b--', label='Validation acc')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'r--', label='Training loss')
plt.plot(epochs, val_loss, 'b--', label='Validation loss')
plt.legend()
plt.show()

_images/3ab76e3304c6664890db6d6f86d5c02169a9c6da4afe142a5c74cd37d7893e7f.png

_images/a3d0f2eae51f0f32b58a15918004c80869f0685d35dbcbb2ebf9acc87e02b4a0.png

14.2. Pre-trained model#

Next thing that we could try is to use a pre-trained model that has its parameters already optimised using some other dataset. Usually, CNNs related to computer vision are pre-trained using Imagenet data (http://www.image-net.org/). It is a vast collection of labelled images.

We add our own layers after the pre-trained architecture. As our pre-trained model, we use VGG16

VGG16

VGG16 is included in the keras.applications -module

from tensorflow.keras.applications import VGG16

When we load the VGG16 model, we need to set weights=imagenet to get pre-trained parameter weights. include_top=False removes the output layer with 1000 neurons. We want our output layer to have only one neuron (prediction for dog/cat).

pretrained_base = VGG16(weights='imagenet',include_top=False,input_shape=(150, 150, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 17s 0us/step

VGG16 has 14.7 million parameters without the last layer. It also has two or three convolutional layers in a row. Our previous models were switching between a convolutional layer and a max-pooling layer.

pretrained_base.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 150, 150, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

model = models.Sequential()

When we construct the model, we add the pre-trained VGG16-base first. Then follows a 256-neuron Dense-layer and a one-neuron output layer.

model.add(pretrained_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

Overall, our model has almost 17 million parameters. However, we will lock the pre-trained VGG16 base, which will decrease the number of trainable parameters significantly.

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 4, 4, 512)         14714688  
_________________________________________________________________
flatten_2 (Flatten)          (None, 8192)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 256)               2097408   
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 257       
=================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________

We want to use the pretrained Imagenet-weights, so, we lock the weights of the VGG16 -part.

pretrained_base.trainable = False

Now there is “only” two million trainable parameters.

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 4, 4, 512)         14714688  
_________________________________________________________________
flatten_2 (Flatten)          (None, 8192)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 256)               2097408   
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 257       
=================================================================
Total params: 16,812,353
Trainable params: 2,097,665
Non-trainable params: 14,714,688
_________________________________________________________________

Again, we use the augmentation of the training dataset.

train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

validation_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(os.path.join(base_dir,'train'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 3000 images belonging to 2 classes.

validation_generator = validation_datagen.flow_from_directory(os.path.join(base_dir,'validation'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

Found 1000 images belonging to 2 classes.

Compile- and fit-steps do not have anything new.

model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(),metrics=['acc'])

history = model.fit(train_generator,
                              steps_per_epoch=120,
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=40)

As you can see from the plots below, there is a small overfitting issue. The difference between the training accuracy and the validation accuracy increases slowly. However, the performance is excellent! Now our model can separate dogs from cats correctly 85 % of the time.

plt.style.use('bmh')
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'r--', label='Training acc')
plt.plot(epochs, val_acc, 'b--', label='Validation acc')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'r--', label='Training loss')
plt.plot(epochs, val_loss, 'b--', label='Validation loss')
plt.legend()
plt.show()

14.3. Fine tuning#

There is still (at least) one thing that we can do to improve our model. We can finetune our pre-trained VGG16 model by opening part of its’ weights. As our VGG16 is now optimised for Imagenet data, the weights have information about features that are useful for many different types of images. By opening the last few layers of the model, we allow it to finetune those weights to features that are useful in separating dogs from cats in images.

First, we need to make our VGG16 model trainable again.

pretrained_base.trainable = True

Here is the summary of the VGG16 model again.

pretrained_base.summary()

Let’ lock everything else, but leave the layers of block5 to be finetuned by our dogs/cats images. The following code will go through the VGG16 structure, lock everything until ‘block4_pool’ and leave layers after that trainable.

set_trainable = False
for layer in pretrained_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

There are over 9 million trainable parameters, which can probably cause overfitting, but let’s see.

model.summary()

model.compile(loss='binary_crossentropy',optimizer=optimizers.RMSprop(),metrics=['acc'])

history = model.fit(train_generator,
                              steps_per_epoch=120,
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=40)

As you can see, overfitting starts to be an issue again. But our validation performance is outstanding! The model is correct 90 % of the time.

plt.style.use('bmh')
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'r--', label='Training acc')
plt.plot(epochs, val_acc, 'b--', label='Validation acc')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'r--', label='Training loss')
plt.plot(epochs, val_loss, 'b--', label='Validation loss')
plt.legend()
plt.show()

As the last step, let’s check the model’s performance with the test set.

test_datagen = ImageDataGenerator(rescale=1./255)

test_generator = test_datagen.flow_from_directory(os.path.join(base_dir,'test'),
                                                    target_size=(150, 150),
                                                    batch_size=25,
                                                    class_mode='binary')

model.evaluate(test_generator)

Our model is correct 90 % of the time!

thumbs