Categorizing Cassava Leaf Diseases (Part 2)
by Annie Phan, Drew Solomon, and Roma Coffin
Continuing on from our previous post, our main objective now is to improve upon the results from our baseline model. In our last blog post we went over our EDA process as well as our initial attempt in classifying cassava leaf diseases. With our initial VGG16 baseline model, we achieved a validation accuracy of 66.3% and training accuracy of 65.4%. (Using a simple majority classifier, the baseline was 61.5%.) Here, we improve upon our baseline model by using fine-tuning with VGG16, ResNet, and DenseNet models.
For this project, we used TFRecords(from Tensorflow) to load, preprocess, store, and augment the image data before training CNNs. Since this is a fairly large dataset, a binary file structure for storing data, can help efficiency in regards to the training model time as well as the model’s performance. We will not be providing much more detail about TFRecords as we will be focusing more on the three different model approaches that were attempted but most please refer to the main source for our code here.
Although there was a fairly large dataset provided, data augmentation was leveraged to help further diversify the data. Data augmentation creates more images to train on, reduces dependence on specific training examples, and thus mitigates overfitting. We used data augmentation methods such as random rotation, translation, flipping, contrast, random zoom (to get different parts of each image and focus on leaf patterns), and resizing (to fit the fine-tuning models). In particular, our augmentation is based on our EDA-driven observations, where factors such as orientation and angle do not matter as much towards our predictions, but factors such as colors (e.g. yellow) and patterns on the leaves likely do.
To boost our models’ accuracy, we used fine-tuning, a popular technique in transferring learning. By leveraging the knowledge from a model that has already been trained (on lots of data) for a specific task and then fine-tuning to make it perform a similar task to a target dataset, model performance can be enhanced. Here, we tried VGG16, ResNet, and DenseNet pre-trained models as base layers for our fine-tuned models, building upon their existing training on ImageNet data to detect image patterns. We froze the base models layers to keep their weights fixed, adjusting only our added layers. This reduces the total number of trainable parameters, saving computation time. We added layers to train to our specific task (of predicting cassava leaf diseases) and mitigate overfitting. For example, we included global average pooling to average the colors for each channel. This is because color is very important to predicting leaf diseases, which appear more yellow in our EDA. We also included dropout and weight decay (L2 regularization) to prevent overfitting.
We adjusted our models’ hyperparameters to control the model complexity, and to control the learning processes. To control overfitting, we adjusted the weight for L2 regularization (which penalizes larger weights and biases) and the dropout rate (which controls the probability of keeping the output of each node in a hidden layer). For our model structure, we chose the number of layers and neurons in each dense layer. For the training hyperparameters, we adjusted the epoch number (20), batch size (64), and initial earning rate (0.0001).
We also used the Adam optimizer to adaptively adjust the learning rate during training using estimates of first and second moments of the gradients. We included early stopping, whereby the model stops training if the accuracy stops improving. Our models’ loss was calculated by the sparse categorical cross entropy loss, given our categorical labels (in integer format). We also measured training and validation accuracy.
Although three approaches were used(VGG16, ResNet, and DenseNet) example code for VGG16 fine-tuning was shown below.
As shown above, we used a GlobalAveragePooling layer to average the color from each channel, and turn the output of the base model into scalar inputs. We then added 2 blocks consisting of a dense layer with L2 regularization, a batch normalization layer (normalizing the input and stabilizing the training), a dropout layer with rate of 0.5, and ReLU activation. The first block dense layer has 128 neurons, and the second has 64. The final output layer has 5 neurons — one for each class, and a softmax activation to convert outputs to class probabilities. We used a similar fine-tuning structure across our 3 models.
VGG16 Model Results:
Please refer to the Fine-Tuned VGG16 model accuracy graph below for additional context, from the fine-tuned VGG-16, we were able to achieve a training accuracy of 66.5% and a validation accuracy of 65.3%. We see that although the VGG-16 validation score is quite close to the baseline model, we still prefer the VGG-16. This is because we can see from the graph, that the overall performance is appropriate from the 20 epochs that were run, the training accuracy has an increasing slope that then slows down. Additionally, when looking at the training and validation accuracies, we do not seem to see overfitting. Our next step is to think more about fine-tuning and tweaking which other layers to unfreeze to help boost the accuracy score.
ResNet Model Results:
From ResNet, we were able to achieve a training accuracy of 62.1% and a validation accuracy of 46.0% using 20 epochs. ResNet uses residual blocks to allow the inputs to propagate faster through the residual connector. It is interesting to see that even with early stopping, the last validation accuracy score dipped so low.
DenseNet Model Results:
The next model we created was DenseNet, see image structure above for more details. Contrary to ResNet where both inputs and outputs are added, DenseNet connects inputs and outputs on the channel dimension. DenseNet is mainly made up of transition layers and dense blocks. Through DenseNet we were able to achieve the highest training accuracy of 69.4% and a validation accuracy of 69.8% by using 20 epochs.
Our next step is to continue tuning and experimenting with our best model (DenseNet) and try other base architectures for fine-tuning, towards improving our prediction accuracy. This will be accomplished by further adjusting the hyperparameters and refining our model structure so we can only further improve upon our performance and achieve an even higher accuracy score. We plan to explore modifying regularization techniques such as dropout methodology and refine upon our current image data augmentation attempts to better the model’s performance and generalizability. Additionally we hope to further improve our models’ performance by unfreezing some of the last blocks inside the base models for more learning capacity, try different CNN (i.e: MobileNet, Inception), and try different data augmentation techniques.
Team Name: RAD
Baseline Kaggle Submission: https://www.kaggle.com/romacoffin/notebookbdbca07efc
Brownlee, J. (2019, July 05). How to Configure Image Data Augmentation in Keras. Retrieved from https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
Data augmentation : TensorFlow Core. (n.d.). Retrieved from https://www.tensorflow.org/tutorials/images/data_augmentation
Image classification via fine-tuning with EfficientNet. Retrieved from https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/
M. (2021, February 3). ResNet with TensorFlow (Transfer Learning) — The Startup. Medium. https://medium.com/swlh/resnet-with-tensorflow-transfer-learning-13ff0773cf0c
Ruiz, P. (2018, October 18). Understanding and visualizing DenseNets. Retrieved from https://towardsdatascience.com/understanding-and-visualizing-densenets-7f688092391a
Team, K. (n.d.). Keras documentation: How to train a Keras model on TFRecord files. Keras. https://keras.io/examples/keras_recipes/tfrecord/
Tf.keras.layers.experimental.preprocessing.RandomCrop. Retrieved from https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/RandomCrop