Dog Breed Classification using CNN

5 min readFeb 27, 2022

This is a part of Udacity’s Data Scientist program. In this project, we are going to detect the breed of the dog, provided the dog image. It is important to detect the dog first, before we classify the breed of the dog.

Project GitHub Link: MrYelameli/udacity_nanodegree_dog_project: This is a part of Udacity’s Nanodegree (github.com)

Problem Introduction

In this project, we aim to classify 133 different types of dogs’ breeds. This can be challenging project since the there is slight variation in the dog’s images among the breeds. One of such example is shown below.

Strategy to address this problem

We are going to implement this in three steps.

Detect the Dogs
Create a CNN to classify the Dog breeds (from scratch)
Create a CNN to classify the Dog breeds (using Transfer learning)
Finally crate an app which detect the dog and non dogs entities.

Detect the Dogs

We are going to use the pre-trained VGG-16 model which has been trained on ImageNet, a very large and popular dataset used for image classification and other vision tasks. It contains almost 10 million URLs, each linking to an image containing an object from one of 1000 categories. Here is the project GitHub link where you will find the code for this.

Classification metrics

In the classification task, the output are discrete type so we need a metrics that can compare discrete classes. Some of the classification metrics are listed below.

Accuracy
Precision and Recall
F1-score

Accuracy is defined as the ratio of total number of correctly classified object to the total number of objects. This is the simplest form of the classification metrics and we have used this metrics in our project to calculate the accuracy.

Precision is defined as the true positives to the total positives predicted.

Recall is defined as the ratio of true positives to all positives in the ground truth.

F1- score is the harmonic mean of precision and recall, it symbolizes the high precision and high recall.

Data Exploration

There are total 8351 dog images with total 133 different categories. This dataset is divided into train, test and valid dataset with 6680,836 and 835 images respectively. We want our algorithm to learn an invariant representation of the image. For this we have introduced the image transformation of the dataset.

Statistical description of whole dataset.

Statistical description of dog breed’s dataset.

It is also important to see the graph of how many dogs images are there in each category.

Modelling

Data pre-processing

Image augmentation is also one of the important step in computer vision deep learning projects. The algorithm should be able to detect the object irrespective of size of the object or rotation of the object in the image.

Image augmentation

Step 1 Create a CNN to classify the Dog breeds (using Transfer learning)

Now we have already able to detect the dog in the image using pre-trained VGG-16 model as explained above, the next step is to detect the breed of the detected dog. To achieve this task, we are going to build the deep learning model from scratch. The model architecture used in this project is presented below.

The CrossEntropyLoss function has been used and Adam optimizer metrics have been used. With this combination of model architecture, loss function and optimizer, 12% accuracy has been achieved. (The target was to achieve more than 10%)

Step 2 Create a CNN to classify the Dog breeds (using Transfer learning)

Now let’s use pre-trained model and use transfer learning approach to classify the breed of the dog. Here we have used the ResNet50 model as a pre-trained model. The ResNet50 model is already trained on wide dataset with various different objects. The subsequent layers of this model has already learnt patterns, so these layers freezed. The learnings of these layers can be useful for us to classify our own dataset. Only the last layer is changed that is fully connected layer to predict 133 classes. The CrossEntropyLoss function has been used and Adam optimizer have been used here.

Fig .Training and validation loss over each epoch

This combination of model architecture loss function and optimizer function and learning rate has given 84% accuracy on test dataset.

Results

Finally we wrote the algorithm which can be use as an app to classify the dog and breed of the dog. Some of the example outputs are shown below.

In the above fig. 3 we can see that, the algorithm can easily differentiate between the dog and human objects and also it is successful to detect the breed of the dog. If there is neither dog nor human in the image, it can show the error message as well.

Conclusion

In this project, we addressed this problem with two approaches. First we built the CNN from scratch to classify the images and the second one is to use transfer learning. We find that the second approach has given much better accuracy than the first one. We used ResNet50 pre-trained model and modified the last layer of it so that it can be re-trained using new dataset, we found that such approach results higher accuracy.

Further Improvements

I think, it is interesting to improve the model which is built from scratch using CNN. This can be improved by increasing the relevant dataset as we know that, the deep learning algorithm performs better as we increase the dataset.