Categorizing Cassava Leaf Diseases (Part 3)

by Annie Phan, Drew Solomon, and Roma Coffin


Image classification using deep learning may help identify diseased plants efficiently and prevent crop loss (Makerere University AI Lab, 2020). In this post, we summarize our approaches to classify cassava leaf disease using deep learning. We describe our EDA, data augmentation, model selection, fine-tuning, and hyperparameter tuning. Finally, we identify possible next steps to further improve our accuracy.

In our first blog post we introduced the Kaggle competition and the Cassava Leaf Data, went over our EDA process, and attempted to classify cassava leaf diseases. From our EDA, we determined the class balance (shown above) and examined leaf images by disease category. With our initial VGG16 baseline model, we achieved a validation accuracy of 66.3% and training accuracy of 65.4%. From our baseline model, we went on to our second blog post where we improved upon our baseline model by using fine-tuning with VGG16, ResNet, and DenseNet models. Since our second blog, we have explored additional base models for fine-tuning (MobileNet, ResNet101V2, DenseNet201, and InceptionResNetV2), improved our image augmentation techniques, and adjusted our fine-tuning layers to limit overfitting and achieve higher accuracy. Over these approaches, we have improved our accuracy on the validation set from 66.3% to 80.0%(we achieved a private score of 77.3% and a public score of 77.4% on Kaggle), for our best model (DenseNet201). We believe further hyperparameter tuning and image augmentation may yield higher accuracy predictions, and identify further areas for improvement. Overall, our transfer learning approaches have achieved significantly greater accuracy than our baseline model, towards our goal of accurately identifying diseases for new (unseen) cassava plant images.

Cassava Disease Class Balance (Normalized) Pie Chart performed during EDA

Data Augmentation:

We adjusted our data augmentation methods to yield better performance, based on our exploratory data analysis (EDA) and experiments with Keras functions. We saw improvements from data augmentation by exploring a variety of Keras built-in functions. By having a deeper understanding for the augmentation transformations such as random rotation, flipping, contrast, random zoom (to get different parts of each image and focus on leaf patterns), brightness, and saturation, we were able to set more appropriate parameters for these functions.

Model Selection:

For further improvement, we continued to fine-tune our model from our last post to boost our models’ accuracy even more and to better control the model complexity, learning processes, and for overfitting. In addition, we experimented with new base models: MobileNet, ResNet101V2, DenseNet201, and InceptionResNetV2. We updated ResNet 50 to ResNet1010Vs and Densenet121 to DenseNets201 because the latter models are deeper and more efficient in terms of parameters, computation, and accuracy. Our base models make use of convolutional layers, which preserve the spatial structure of images and leverage local patterns. In addition, we described how each model’s architecture benefits learning and predictions. Through trial-and-error, we found which fine-tuned models worked best for our particular classification task.

We chose VGG16 as the baseline model because VGG is composed of convolutional blocks, layering deep and narrow convolutions with max pooling layers. The simple structure allows it to be a good ‘naive’ classifier for our problem. VGG is also notorious for the vanishing and exploding gradients because it relies on the idea that stacking more layers will lead to better computation.

We tried MobileNet because it has depthwise convolution which treats the color channels separately, making it useful for analyzing the different channels of the color and subtle differences in color features of different images. The drawback of MobileNet is that it is a small, low-latency, low-power model parameterized.

We tried ResNet because in addition to the benefits of depthwise convolution which analyzes the color channels, ResNet introduces “identity shortcut connection” that skip one or more layers that reduce the problem of vanishing gradients as well as the degradation and saturation problem endemic in deeper networks. This is particularly useful because we’ve seen overfitting in our baseline classifier and we have a small validation dataset.

We decided to try DenseNet because it consists of dense blocks and transition layers which concatenate inputs and outputs on the channel dimension. This allows Densenet to grow the channels linearly through the layers in its dense block while reducing computation through the incorporation of 1x1 convolutional layers which increases its ability to analyze the colors better. Moreover, Densenet uses much fewer parameters than resnet and the parameters in a densenet are more representative on average. One drawback of Densenet is that it has excessive connections which not only decreases the networks’ computation-efficiency and parameter-efficiency, but also makes networks more prone to overfitting.

Finally, we also included InceptionResNet in our model selection because it is formulated based on a combination of the inception structure and the residual connection. In the Inception-Resnet block, multiple sized convolutional filters are combined with residual connections. The usage of residual connections not only avoids the degradation problem caused by deep structures but also reduces the training time, in addition to the benefits of the inception block which clusters correlated outputs to exploit sparsity process visual/spatial information at various scales and then aggregates. However, the drawback of InceptionResnet is that non-uniform sparse matrix calculations are expensive.


To enhance our models’ performance further, we explored freezing different layer depths. Freezing the base models’ layers kept their pre-trained weights fixed, preserved learned patterns on ImageNet, and limited the number of trainable parameters. Conversely, unfreezing top layers of the base models increased our models’ complexity and capacity to learn patterns specific to our task. Therefore, we aimed to strike a balance between leveraging past patterns and training to new ones by unfreezing different layers. While we originally froze the entire base model, (for our 2nd blog post), we ultimately unfroze the last 1–2 blocks to attain greater accuracy while avoiding overfitting. In addition, we incorporated global average pooling to average channel values into a scalar, which we chose to highlight color — a key factor in predicting leaf diseases. We also added a dense layer with 64 neurons with the normal kernel initialization and integrated regularization techniques such as dropout, L2 regularization, and batch normalization into our model. Finally, we used a ReLU activation because it helps to prevent the issue of vanishing and exploding gradients.

We were initially surprised when our implementation of batch normalization resulted in worse performance. Batch normalization standardizes inputs in a layer for every mini-batch. We were expecting batch normalization of having the outcome of stabilizing learning thus minimizing the epoch amount in training. We realized that the poor performance was not due to batch normalization but because of the model structure and how we tuned it overall. We had added two dense layers with too much regularization and did not freeze any layers and also had not administered a learning rate schedule. By adjusting the architecture of the model and implementing the appropriate fine-tuning methodology we were successfully able to see the correct execution of batch-normalization.

Hyperparameter Selection:

When it came to adjusting hyperparameters, we increased the epochs from 20 to 100 to allow more passes over the training data, combined with early stopping. By using early stopping (with a patience of 10 epochs), the models stopped training if the validation accuracy did not improve, thus limiting overfitting. We also adjusted the batch size from 64 to 128, to help training speed as well as to reduce variance. To achieve a suitable learning rate, we incorporated a reduced on plateau learning rate (ReduceLROnPlateau) schedule with factor 0.1, and started with a higher learning rate (0.005 from 0.0001). This learning rate schedule reduced the learning rate when the model optimization stagnates, helping convergence. We also used the Adam optimizer to adaptively adjust the learning rate during training using approximations of the gradients first and second moments during the mini-batch gradient descent.

Model Results:

Across our four models, we achieved validation accuracy scores ranging from 77% to 80%. Our best performing model was DenseNet201, which achieved a validation accuracy of 80%(we achieved a private score of 77.3% and a public score of 77.4% on Kaggle), followed by ResNet101V2, which achieved a validation accuracy of 79.5%, followed by InceptionResnetV2 which achieved a validation accuracy of 78.2%, and finally MobileNet which achieved a validation accuracy of 76.4%. In terms of computational efficiency, DenseNet also triumphed, having attained the highest efficiency on 31 epochs, followed by ResNet on 32 epochs, MobileNet on 47 epochs, and InceptionResNetV2 on 53 epochs. This is consistent with our discussion above of the pros and cons of each model.

Best Model:

Below is the fine-tuning code and the structure of our best model, our fine tuned DenseNet201 model.

Fine-tuning code for Fine-tuned DenseNet Model
Model Structure for Fine-tuned DenseNet Model

Overall, DenseNet consistently performed the best, followed very closely by ResNet, and finally MobileNet. Through DenseNet we were able to achieve the highest training accuracy of 83.7% and a validation accuracy of 80.0%(we achieved a private score of 77.3% and a public score of 77.4% on Kaggle), over 31 epochs. The validation accuracy and loss curves fluctuate pretty sharply because we have a small number of samples in our validation dataset as well as a ReduceLROnPlateau (i.e. step-wise) learning rate schedule. While the training without the ReduceLROnPlateau was smoother, the resulting models did not perform as well. Further, the overall trend in training went as expected: accuracy curves increase quickly in the first few epochs then start to ‘flatten’ out towards the end and loss curves decreased quickly in the first few epochs then started to ‘flatten’ out towards the end. Moreover, compared to the accuracy and loss curves of the other models, DenseNet curves are relatively smoother.

Final Validation Accuracy for Fine-tuned DenseNet Model
Epochs vs Accuracy Graph for Fine-tuned DenseNet Model
Epochs vs Loss Graph for Fine-tuned DenseNet Model

Potential Next Steps:

Reflecting on our results, we believe we could improve further by exploring splitting and cross-validation techniques for imbalanced data, ensembling, and customized image augmentation. Given the class imbalance in the cassava leaf diseases, we think it could help to try a stratified k-fold cross validation to ensure consistent class balances across the train and validation data, and to achieve more robust model evaluation. Additionally, we may further improve our models’ performance by trying ensembling methods — weighting predictions across different fine-tuned models. Lastly, although we have made strides in the right direction with data augmentation, we wish we had more time to further customize it, such as balancing out brightness (instead of randomly changing it) and segmenting out the background. Since the leaf images vary by sunlight levels, correcting for sun may make disease patterns more visible, as might filtering out the background. While there is always room for improvement, we are pleased with our progress towards classifying cassava leaf diseases, and our (deep) learning along the way.

Team Name: RAD

Kaggle Submission:

Google Colab Link:

Github Repo:

Sources Used:

Brownlee, J. (2019, July 05). How to Configure Image Data Augmentation in Keras. Retrieved from

Data augmentation : TensorFlow Core. (n.d.). Retrieved from

Image classification via fine-tuning with EfficientNet. Retrieved from

M. (2021, February 3). ResNet with TensorFlow (Transfer Learning) — The Startup. Medium.

Ruiz, P. (2018, October 18). Understanding and visualizing DenseNets. Retrieved from

Team, K. (n.d.). Keras documentation: How to train a Keras model on TFRecord files. Keras.

Tf.keras.layers.experimental.preprocessing.RandomCrop. Retrieved from



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store