Global Journal of Computer Science and Technology, D: Neural & Artificial Intelligence, Volume 22 Issue 1

Acoustic Features based Accent Classification of Kashmiri Language using Deep Learning Shehzen Sidiq Malla Abstract- Automatic identification of accents is important in today’s world, where we are souranded by ASR systems. Accent classification is the problem of knowing the native place of a person from the way He/She speaks the language into consideration. Accents are present in almost all the languages and it forms an important part of the language. Accents are produced from prosodic and articulation characteristics; in this research the aim is to classify accents of Kashmir Language. We have considered using the MFCC and Mel spectrograms for our research. A lot of research has been done for languages like English and is being done in this field and many models of machine learning and deep learning have shown state of the art results, but this problem is new for Kashmiri Language. The accents in Kashmir, vary from area to area and we have chosen 6 areas as our classes. We extracted the features from the audio data, converted those features into Images and then used the CNN architectures as our model. This research can be taken as base research for further researches in this language. Our custom models achieved the loss of 0.12 and accuracy of 98.66% on test data using Mel spectrograms, which is our best for our features. Keywords: accent classification, CNN, RELU, mel- spectrograms, MFCC. I. I ntroduction ashmiri or Koshur is a Dardic language subgroup from Indo-Aryan, spoken by over seven million Kashmiris [Wikipedia]. There are many accents Spoken in Kashmir. There are some major accents and some minor accents in this language. This leads to diversity in the language and adds to its beautiful sounds and variations. The aim of this research is to classify these different accents. Although many accents are being spoken in this language, for this research, we have classified the prominent accents belonging to Kupwara, Srinagar, Islamabad, Shopian, and Bandipora. The proposed approach is on the basis of using Convolution Neural Networks (CNN) and training Neural networks on the images of features extracted from the audio files. The features are Mel- spectrogram and MFCCs. Our approach uses CNN as the classifier and MFCCs, Mel spectrograms as Features. Three types of MFCCs are extracted, 13, 24 and 36. We got excellent results on our dataset. Accent classification refers to the problem of inferring the native language of a speaker from his or her foreign accented speech. Identifying idiosyncratic differences in speech production is important for Author: e-mail: mallashehzen786@gmail.com improving the robustness of existing speech analysis systems. For example, automatic speech recognition (ASR) systems exhibit lower performance when evaluated on foreign accented speech. By developing pre-processing algorithms that identify the accent, these systems can be modified to customize the recognition algorithm to the particular accent [1] [2]. In addition to ASR applications, accent identi fication is also useful for forensic speaker profiling by identifying the speaker’s regional origin and ethnicity in applications involving targeted marketing [3] [4]. In this paper we propose a method for classification of 11 accents directly from the speech acoustics. For example, Deshpande et al. used GMMs based on formant frequency features to discriminate between standard American English and Indian accented English [6]. Chen et al. explored the effect of the number of components in GMMs on classification performance [7]. Tang and Ghorbani compared the performance of HMMs with Support Vector Machine (SVM) for accent classification [8]. Kumpf and King proposed to use linear discriminant analysis (LDA) for identification of three accents in Australian English [9]. Artificial neural networks, especially Deep Neural Networks (DNNs) and Recurrent Neural Networks (RNNs) and CNNs have been widely used in state-of- the-art speech systems and Image Processing Systems [10] [11] [12] [13]; however,in the area of accent identification, there are only a few studies evaluating the performance of neural networks [14] [15]. Nonetheless, in a related area, language identification (LID), neural networks have been investigated exhaustively [16] [17] [18].In a recent paper [19], where they used spectrograms for accent classification and speaker recognition and achieved an accuracy of 92%. Inspired by their work, we also propose to use Mel-Spectrograms and MFCCs for our research The rest of the paper is organized as follows: In section 2, we discuss the collection and making of dataset. In section 3, we discuss proposed system and discuss in detail the features that we have used for our research. In section 4, we discuss the experiments and show our results and finally in section 5, we conclude our research. K Global Journal of Computer Science and Technology Volume XXII Issue I Version I 25 ( )D © 2022 Global Journals Year 2022