Global Journal of Computer Science and Technology, G: Interdisciplinary, Volume 23 Issue 1

zoom. This is to know the combinations which give the best performance to the DL model. • Apply affine transformation technique to the previous best combinations to determine the final combination which is better to classify diseased mango leaves. The rest of the paper is organized as follows: Section 2 is an overview of the literature review, Section 3 deals with the data acquisition and data augmentation techniques and the CNN model used, Section 4 presents and discusses the results of the data augmentation techniques. The last section concludes the paper and announces the futures works of the authors. II. R alated W orks The literature review presented in this paper concerns only data augmentation strategies used for ango pest or diseases classification and mango or other fruits quality grading. Shorten et al. [11] presented a survey dealing with image data augmentation algorithms such as color space augmentations, geometric transformations, mixing images, kernel filters, random erasing, adversarial training, feature space augmentation, generative adversarial networks (GAN), meta-learning and neural style transfer. They also discussed the application of augmentation methods based on GANs and others characteristics of data augmentation such as curriculum learning, test-time augmentation, resolution impact, and final dataset size. Dandavate et al. [12] applied data augmentation techniques namely rotation, scaling and image translation to a fruit dataset to avoid overfitting and obtain better performances with their simple CNN model. Agastya et al. [13] used VGG-16 and VGG-19 for an automatic batik classification. Applying random rotation in a certain degree, scaling and shearing, they improve the accuracy of their models up to 10%. Bargoti et al. [14] presented a fruit (mangoes, apples, and almonds) detection system using Faster R-CNN. They used image flipping and scaling to improve the performance of their model with an F1-score of > 0,9 achieved for mangoes and apples. Wu et al. [15] investigated several deep learning-based methods for mango quality grading. VGG-16 is found to be the best model for this task. During the training of their models, authors applied, at each epoch, randomly data augmentation strategies such as horizontal or vertical image flipping, rotation, brightness, contrast and zoom in/out. Zang et al. [16] developed a fruit category identification by using a 13-layer CNN and three data augmentation strategies namely noise injection, image rotation and Gamma correction. The final obtained overall accuracy is 94.94%, at least 5 percentage points higher than state-of-the-art approaches. Supekar et al. [17] performed a mango grading system based on ripeness, size, shape and defects. They used K-means clustering for defect segmentation and Random Forest Classifiers. To avoid overfitting with an initial training dataset of 69 images, authors applied image rotation on angle of 90,180 and 270. The final training dataset obtained consists of 522 images which allows their model to obtain an overall accuracy of 88,88%. III. M ethodology and M odel a) Data aquisition The dataset used in this paper is a part of ‘MangoLeafBD’ dataset produced by Ahmed et al. [18] and downloadable from ‘Mendeley Data’’ platform (https://data.mendeley.com/datasets/hxsnvwty3r ). MangoLeafBD dataset contains height classes, seven of which correspond to mango leaf diseases and one contains healthy leaves. In this paper, four diseases namely anthracnose, Gall Midge, Powdery Mildew and Sooty Mold are treated as they are among the most mango leaf diseases treated by researchers during the last five years [19] (Fig.1andFig.2). The dataset used contains four classes corresponding respectively to these diseases and a class of healthy leaves. There are 500 RGB leaf images of 240x320 pixels in each class making a total of 2,500. Images are in JPG format. b) Data augmentation Data augmentation is a powerful solution against overfitting. It allows a model with a small dataset to become robust and generalizable. There are two categories of data augmentation: the first is based on image manipulations and the second on DL (generative adversarial networks (GANs), feature space augmentations, adversarial training, Neural Style Transfer, Meta Learning Data Augmentation) [11]. This research focuses on the first category because i) the second is generally used to generate synthetics images from quite a large dataset, ii) mango leaf images taken under real-world conditions suffer mainly from the problems of temperature variation, shadowing, overlapping of leaves, and presence of multiple objects. The first category can allow us to generate images in these cases. This papers deals with following techniques: • Noise injection Image noise is a random disturbance in the brightness and color of an image. Noise injection is an effective way to avoid overfitting and improves the test ability of a machine learning model [13]. There are several ways to add noise to an image (e.g. Gaussian noise, Salt and Pepper noise, Speckle noise, …). Gaussian noise is performed fixing mean parameter to 0 and sigma parameter to 0.05. • Blur Blurring an image means make it less sharp. Photographic blur occurs with movement in the model A Combination of Data Augmentation Techniques for Mango Leaf Diseases Classification © 2023 Global Journals Global Journal of Computer Science and Technology Volume XXIII Issue I Version I 2 ( ) Year 2023 G