Global Journal of Computer Science and Technology, G: Interdisciplinary, Volume 23 Issue 1

A Combination of Data Augmentation Techniques for Mango Leaf Diseases Classification Demba Faye α , Idy Diop σ , Nalla Mbaye ρ & Doudou Dione Ѡ Abstract- Mango is one of the most traded fruits in the world. Therefore, mango production suffers from several pests and diseases which reduce the production and quality of mangoes and their price in the local and international markets. Several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers in the last decade. These solutions are based on Machine Learning (ML) and Deep Learning (DL) algorithms. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for image classification. However, one of the most significant issues facing mango pests and diseases classification solutions is the lack of availability of large and labeled datasets. Data augmentation is one of solutions that has been successfully reported in the literature. This paper deals with data augmentation techniques namely blur, contrast, flip, noise, zoom and affine transformation to know, on the one hand, the impact of each technique on the performance of a ResNet50 CNN using an initial small dataset, on the other hand, the combination between them which gives the best performance to the DL network. Results show that the best combination classifying mango leaf diseases is ‘Contrast & Flip & Affine transformation’ which gives to the model a training accuracy of 98.54% and testing accuracy of 97.80% with an f1_score > 0.9. Keywords: data augmentation, mango, disease, classification, deep learning, resnet50. I. introduction ango or Magnifera Indica L. (scientific name) is a lucrative fruit widely cultivated in tropical countries. It belongs to the family anacardiaceous. Its overall consumption in 2017 was estimated at 50.65 million metric tons [1]. This fruit was in 2021, in terms of quantities exported, the third most traded tropical fruit after pineapple and avocado [2]. Mango fruit is very appreciated because of its richness in nutrients (vitamins A, B, C, K, ...), flavorful pulp and alluring aroma [3,4]. This fruit contributes enormous economic benefits to exporting countries and mango growers. Author α σ Ѡ : Cheikh Anta DIOP University of Dakar (UCAD), Dakar, Senegal, Polytechnic School of Dakar. e-mails: demba.faye@esp.sn , idy.diop@esp.sn, doudou2.dione@ucad.edu.sn Author ρ : Cheikh Anta DIOP University of Dakar (UCAD), Dakar, Senegal, Department of Plant Biology of the Faculty of Science and Technology (FST) of UCAD. e-mail: nalla.mbaye@ucad.edu.sn However, mango production suffers severely from pests and diseases witch lead to a reduction of both quality and quantity. This influence mango price in the international market. In the last decade, several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers. These solutions are first based on image processing (IP) and machine learning (ML) techniques and finally, in the last five years, on deep learning (DL) algorithms DL based solutions have achieved state-of-the-art performance on Image Net and other benchmark datasets [5]. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for object detection in computer vision [5,6]. However, one of the biggest issues facing mango pests and diseases identification solutions is the lack of availability of large and labeled datasets [7,8,9,10]. The limited training data inhibits performance of DL based models which need big data on which to train well to avoid overfitting and improves the model’s generalization ability. Overfitting happens when the training accuracy is higher than the accuracy on the validation/test set. The generalizability of a model is the difference in performance it exhibits when evaluated on training data (known data) versus test data (unknown data). The use of data augmentation process is one of solutions that has been successfully reported in the literature [1]. This overfitting solution generates a more comprehensive set that minimizes the distance between training and validation sets. A data augmentation process based on image manipulation is presented in this paper for improving the quality of a small dataset of mango leaves presented in [1]. The specific contributions of the paper include: • Generate a dataset for every data augmentation strategy except affine transformation. The DL model is trained in each generated dataset to know the impact of each data augmentation technique in the performance of the model. • Generate multiple datasets from pair wise sequential combination of data augmentation techniques, namely blur, contrast, flip, noise and M © 2023 Global Journals Global Journal of Computer Science and Technology Volume XXIII Issue I Version I 1 ( )G Year 2023