Multivariate Data Quality Enhancement by Ranked Imputation
Muralidharan Jayaraman1, P. Shanmugavadivu2
1Muralidharan Jayaraman, Research Scholar, Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Dindigul, Tamil Nadu, India.
2Dr. P. Shanmugavadivu*, Professor, Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Dindigul, Tamil Nadu, India.
Manuscript received on December 14, 2019. | Revised Manuscript received on December 20, 2019. | Manuscript published on January 10, 2020. | PP: 2387-2391 | Volume-9 Issue-3, January 2020. | Retrieval Number: C9027019320/2020©BEIESP | DOI: 10.35940/ijitee.C9027.019320
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Organizational decisions are based on data-based-analysis and predictions. Effective decisions require accurate predictions, which in-turn depend on the quality of the data. Real time data is prone to inconsistencies, which exhibit negative impacts on the quality of the predictions. This mandates the need for data imputation techniques. This work presents a prediction-based data imputation technique, Rank Based Multivariate Imputation (RBMI) that operates on multivariate data. The proposed model is composed of the ranking phase and the imputation phase. Ranking dictates, the attribute order in which imputation is to be performed. The proposed model utilizes tree-based approach for the actual imputation process. Experiments were performed on Pima, a diabetes dataset. The data was amputed in range between 5% – 30%. The obtained results were compared with existing state-of-the-art models in terms of MAE and MSE levels. The proposed RBMI model exhibits a reduction of 0.03 in MAE levels and 0.001 in MSE levels.
Keywords: Data Imputation, Machine Learning, Multivariate, Correlation, Decision Tree.
Scope of the Article: Machine Learning