Loading

A Machine Learning Model to Identify Duplicate Questions in Social Media Forums
Sandeep Kumar Panda1,Vivek Bhalerao2, Sathya A.R3

1Sandeep Kumar Panda, Department of Computer Science and Engineering, Faculty of Science and Technology, Icfai Tech, ICFAI Foundation for Higher Education, Hyderabad, India.
2Vivek Bhalerao, Department of Computer Science and Engineering, Faculty of Science and Technology, Icfai Tech, ICFAI Foundation for Higher Education, Hyderabad, India.
3Sathya AR*, Department of Computer Science and Engineering, Faculty of Science and Technology, Icfai Tech, ICFAI Foundation for Higher Education, Hyderabad, India.
Manuscript received on January 15, 2020. | Revised Manuscript received on January 20, 2020. | Manuscript published on February 10, 2020. | PP: 370-373 | Volume-9 Issue-4, February 2020. | Retrieval Number: D1362029420/2020©BEIESP | DOI: 10.35940/ijitee.D1362.029420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed. 
Keywords:  Machine learning, Feature extraction, Vectorization, Efficiency.
Scope of the Article:  Machine learning