Global Journal of Computer Science and Technology, D: Neural & Artificial Intelligence, Volume 22 Issue 1

Fig. 1: Plot of Metrics of trainig data Fig. 2: Plot of metrics of validation data. The above figure shows the metrices for the validation set. All these metrices were evaluated for the model which uses spectrogram images as the input. Fig. 3: Metric Scores for testing data. These above results were calculated on the testing data and we can conclude that our model performed much better than expected and showed state of the art performance on our data. ii . MFCCs The MFCC features were extracted from the audio files and plotted as images and these Images were saved and loaded at the time of model training. The following figures show the accuracies and losses with respect to the epochs. Three types of constants were extracted and same model was trained on these images generated from the audio files. The training was done using the training data, and validated on validated. Following table shows our results. Table 1: Validation losses and Validation accuracies for Different features [images] Feature Validation Loss Validation Accuracy Mel Spectrograms 0.0392 0.9848 MFCC 13 0.04 0.98 MFCC 24 0.031 0.99 MFCC 36 0.06 0.97 From the above table, we can see that the 24 constants performed slightly better than the others on validation data. Using the images as input to the model, Mel- Spectrograms and MFCC 24 constant features performed better than the other features. b) Experiment 2 This experiment was done using Json files of extracted features and giving to CNN having 1 color channel. In this experiment, the features were extracted and were saved in JSON files. No Images were generated for this data. Then the features were loaded back and the model was trained on these features. The below table show the Testing accuracies and testing Losses for various features extracted from audio files. Table 2 : Testing Loss and Testing Accuracy for Various Features [JSON] Feature Testing Loss Testing Accuracy MFCC 13 0.086 0.87 MFCC 24 0.110 0.87 MFCC 36 0.507 0.865 c) Experiment 3 This experiment was done by splitting the audio files in the chunks of 2 seconds. In this experiment the audio was splitted into two second chunks a The Mel spectrograms features were extracted from the splitted audio files and then saved as images and as well as JSON files. Following accuracies and Losses were calculated on validation data. Table 3: Validation Loss and Accuracy for 2 Sec Splitted Data Type of Feature Validation Loss Validation Accuracy Images 0.0331 0.98 JSON files 0.069 0.97 V. C onclusion This research paper proposes a solution to the accent classification for Kashmiri Language using Convolutional Neural networks. The solution is based on deep learning techniques using CNNs that adapt to the multi-dimensional data. CNNs provide solution to this Global Journal of Computer Science and Technology Volume XXII Issue I Version I 27 ( )D © 2022 Global Journals Acoustic Features based Accent Classification of Kashmiri Language using Deep Learning Year 2022