Global Journal of Computer Science and Technology, G: Interdisciplinary, Volume 23 Issue 1

• English Characters Removal: during this step, all English characters in both lower and upper cases (A-Z, a-z) are removed. • Stop words Removal: Removing all stop words that may not influence the tweet’s meaning. • Arabic Normalization: Returning chars to their original. • Arabic Diacritics Removal: Removing all diacritics like [Fatha,Tanwin Fath, Damma, Tanwin Damm, Kasra,Tanwin Kasr,Sukun] . • Mentions Removal: Removing any mentions like @ from the tweet. • Repeated Chars Removal: Any repeating characters are removed during this step. • Punctuations Removal: Removing all punctuation marks like/:”.’, +¦!—. . . ”“– c) Features Extraction Feature extraction techniques aim to represent the text’s emotional value which is able to help classify the emotions into the right category. Feature extraction is essential before EC from the documents, which can be found in method such as Term Frequency-Inverse Document Frequency (TF-IDF). The next section describes the feature extraction method utilized in our proposed approach: • TF-IDF TF-IDF is one in every of the foremost used text feature extraction techniques because it provides helpful insight into the essential features of text documents. during this paper, TF IDF is chosen because the feature extraction technique. It computes the merchandise of the 2 statistics: TF-IDF describes how the word is important to a tweet in an exceedingly collection of tweets. the worth of TF-IDF increases correspondingly to the quantity of times a word appears within the tweets. The more a term occurs in tweets belonging to some category, the more it’s relative there to category. TF- IDF’s function is more developed and offers ideal outcomes as it can identify an emotional Arabic term. Figure (5) highlights the characters’ number in tweets. d) Experiment The experiment describes the approach to predicting users’ emotions from their tweets. To categorize the tweet into (anger, joy, sadness, and fear), we apply different machine-learning approaches: K- Nearest Neighbor, Decision Tree, Support Vec- tor Machine, Naive Bayes, and Multinomial Naive Bayes. This work has been implemented on a cloud- based environment, “Google Colab,” owned by Google. The experiment’s first and most essential phase is preprocessing the tweets for training and test sets. mostly, Arabic text needs more preprocessing because of its nature and structure. Therefore, the preprocessing techniques for every tweet are performed for the training and testing phases. We used the dataset of the Arabic tweets presented by Semi-Eval 2018. Then classified, each tweet was placed into one in all four categorizations, given an emotion and a tweet. This dataset includes (934) tweets for the provided emotions: Fear, anger, sadness, and joy. The TF- IDF is extracted from the text and classified using KNN, SVM, DT, NB, and Multinomial NB. We randomly split our dataset into testing and training with 20-80 ratios. the proportion of every class within the dataset is shown in Fig.6. We used the training datasets to point the classifiers. In contrast, (unseen to the model), the test dataset was reserved for examining the structured model to identify the suitability of the trained model. After splitting our dataset into the testing and training process, 747 samples are within the training dataset and 187 within the testing dataset. The results of 5 machine learning models are compared within the result section. Fig. 5: The number of characters in tweet Emotion Detection in Arabic Text using Machine Learning Methods © 2023 Global Journals Global Journal of Computer Science and Technology Volume XXIII Issue I Version I 16 ( ) Year 2023 G