Validating a Big Data for Data Quality using Single Column Data Pattern Profiling Technique
K. Makesh Babu1, K. Mohan Kumar2
1K. Makesh Babu*, Research Scholar, PG and Research Department of Computer Science, Rajah Serfoji Government College, Thanjavur, Affiliated to Bharathidasan University, Trichirappalli, Tamil Nadu, India.
2Dr. K. Mohan Kumar, Head, PG and Research Department of Computer Science, Rajah Serfoji Government College, Thanjavur, Affiliated to Bharathidasan University, Trichirappalli, Tamil Nadu, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 20, 2020. | Manuscript published on March 10, 2020. | PP: 867-871 | Volume-9 Issue-5, March 2020. | Retrieval Number: E2749039520/2020©BEIESP | DOI: 10.35940/ijitee.E2749.039520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Data quality is important to all private and government organization. Data quality issues can arise in different ways. Due to inconsistent, inaccurate unreliable and loss of data in e-governance, retrieving of accurate data will become a big trouble in decision making. There are some common data quality issues available in a big data. Those issues and causes are cleared by using data profiling. The process of Data profiling methods detects errors, inconsistencies and redundancies in a dataset. Data profiling has different types of analysis techniques to correct the data such as Single Column analysis, Multicolumn analysis, Multi table and Data dependencies. Single column analysis has different set of analysis. In that Pattern matching technique is used to overcome this challenge of inconsistent data along with much needed data quality for analytic results within bounded execution time. Generally pattern matching is performed manually in an organization. Pattern matching helps to discover the various pattern values within the data and validate the values against any organizations. This data pattern profiling method enables to create a valid data set which is used to generate report for future analysis of an organization with more accuracy. This study compares the results of the proposed data pattern logic with other open source tools and proves the efficiency of proposed logic.
Keywords: Big Data, Data Quality, Data Profiling, Pattern Matching, Outliers
Scope of the Article: Big Data Networking