Global Journal of Computer Science and Technology, C: Software & Data Engineering, Volume 22 Issue 2
Fake News Detection: Covid-19 Perspective Global Journal of Computer Science and Technology Volume XXII Issue II Version I 5 Year 2022 ( ) C © 2022 Global Journals Fig. 4: Workflow of fake news detection: the Covid-19 perspective noise. So, to avoid noise the whole raw data need to convert into a single form either all words into small character or capital character. For the further addition we have converted our data into a set of arrays to perform stemming and removing stop words. c) Porter Stemmer For our work, we have used porter stemmer for stemming. In 1980 a stemming method has gained much popularity which is now known as porter Stammer [26]. Speed and simplicity, are its identification. Data mining and Information retrieval are the main application of porter stammer. As porter stammer uses suffix stripping to produce stems. It produces the best output as compared to other stammers and also it has less error rate than other techniques. However, English words are limited to their application. The output of stem might not be a meaningful word, but the collection of stems is plotted onto the same stem as well. d) Stop Words There are certain words in a sentence as an example, be, too, not, etc. in the English language, which does not have any significance for the processing of natural language. So, during natural language processing, such words are taken out. In fact, stop words are some words that are stripped out of the processing of natural language [27]. In particular, stop words do not bring a lot of value to natural language processing results. Without modifying or compromising the context of every tatement, we can comfortably neglect it. e) Feature Extraction Fig. 8 : Count Vectorizer Count Vectorizer : The count vectorizer is often used to transform a text data collection to a word count vector. This transforms a set of textual data into a token count matrix. As we can see in Fig. 8 count vectorizer converts each word from text data into a count vector. Actually, the count vectorizer counts how frequently a Multinomial Na¨ıve Bayes Support Vector Machine Logistic Regression Passive Aggressive Classifier Result TF-IDF Count Vectorizer Data Preprocessing Data Collection keep smile because life beauti life short life life beauti short live beacuse smile keep 1 2 1 1 1 1 1 As our data is text data containing words. So, for use as inputs in machine learning algorithms, these terms now have to be represented as integers, or floatingpoint values. This method is considered the extraction of features or vectorization.
RkJQdWJsaXNoZXIy NTg4NDg=