Advanced Data Imputation Techniques for Predicting Type 2 Diabetes using Machine Learning
Sofia Goel1, Sudhansh Sharma2

1Sofia Goel, School of Computer and Information Sciences, Indira Gandhi National Open University, New Delhi, India.
2Dr. Sudhansh Sharma, School of Computer and Information Sciences, Indira Gandhi National Open University, New Delhi, India.

Manuscript received on November 16, 2019. | Revised Manuscript received on 20 November, 2019. | Manuscript published on December 10, 2019. | PP: 4142-4149 | Volume-9 Issue-2, December 2019. | Retrieval Number: B7466129219/2019©BEIESP | DOI: 10.35940/ijitee.B7466.129219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Type 2 Diabetes mellitus is a serious metabolic disorder that is prevailing worldwide at an alarming rate. Medical dataset often suffers from the problem of missing data and outliers. However, handling of missing data with traditional mean based imputing may lead towards a bias model and return unpredictable outcome. Making complex models by combining multiple classifiers as well as some other methods could increase the accuracy which again is a time-consuming approach and requires heavy computation capability which significantly increases the deployment cost. The proposed research is to design a model to classify the data using class wise imputation technique and outlier handling. Performance of the proposed model is evaluated on nine machine learning classifiers and compared with traditional approaches like simple mean, median, and linear regression. Experimental results show the superiority of the proposed model in terms of classification accuracy and model complexity. The accuracy achieved by the proposed approach is 88.01%, which is highest as compared to the previous studies. The proposed research work is presented to improve accuracy, scalability and overall performance of the classification in the medical dataset, which ultimately proves to be a lifesaver if the diagnosis is achieved efficiently at an early stage. 
Keywords: Type 2 diabetes, machine learning, missing values, outliers, SVM, KNN, LR, RF.
Scope of the Article: Machine Learning