Detection of Hate Speech and offensive Language on Sentiment Analysis using Machine Learning Techniques
Guduri Sulakshana1, R Siva jyothi2, Aluri Lakshmi3
1Mrs. Guduri Sulakshana, Dept. of CSE, Institute of Aeronautical Engineering, Hyderabad, India.
2Ms. R Siva jyothi, Dept. of CSE, KSRM College of Engineering, Kadapa, India.
3Ms. Aluri Lakshmi, Dept. of CSE, Institute of Aeronautical Engineering, Hyderabad, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 23, 2020. | Manuscript published on March 10, 2020. | PP: 136-139 | Volume-9 Issue-5, March 2020. | Retrieval Number: E1985039520/2020©BEIESP | DOI: 10.35940/ijitee.E1985.039520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Toxic online content (TOC) has become a significant problem in current day’s world due to uses of the internet by people of distinct culture, social, organization and industries background and followed Twitter, Facebook, WhatsApp, Instagram, and telegram, etc. Even now, there is lots of work going on related to single-label classification for the text analysis and to make less comparative to errors and more efficient. But in recent years, there is a shift towards the multi-label classification, which can be applicable for both text and images. But text classification is not much popular among the researchers when compared to the grading for images. So, in this work, we are using the dataset which is going to be a short messages dataset, to train and develop a model which can tag multiple labels for the messages. Hate speech, and offensive language is a key challenge in automatic detection of toxic text content. In this paper, to contribute term frequency–inverse document frequency(Tf-Idf), Random forest, Support Vector Machine (SVM),and Bayes Naïve classifier approaches for automatically classify tweets. After tuning the model giving the best results, it achieves an Efficient accuracy for evaluating test data analysis. In this contribution of work also moderate and encapsulate paradigms which will communicate and working between the user and Twitter API. Instead of using the traditional techniques like Bag of words or word counter, a new technique which uses Tf-Idf is built to find the similarity, and the text is transformed into the vectors using Tf-Idf, and this is used to train the model using supervised learning technique along with the labels from the dataset. The accuracy of the model is quite good and more efficient with better results.
Keywords: Twitter, Toxic Text, Tf-Idf, Machine Learning.
Scope of the Article: Machine Learning.