Faster Query Response for Streaming Data using Probabilistic Data Storage Model
Ramesh Balasubramaniam1, K. Nandhini2
1Ramesh Balasubramaniam, Research Scholar, PG and Research Department of Computer Science, Chikkanna Govt. Arts College (Bharathiyar University), Tirupur (Tamil Nadu), India.
2Dr. K. Nandhini, Assistant Professor, and Research Department of Computer Science, Chikkanna Govt. Arts College (Bharathiyar University), Tirupur (Tamil Nadu), India.
Manuscript received on 01 May 2019 | Revised Manuscript received on 15 May 2019 | Manuscript published on 30 May 2019 | PP: 1556-1560 | Volume-8 Issue-7, May 2019 | Retrieval Number: G5988058719/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Instead of focusing on making programmable changes for faster query response time our focus is on how the data is stored. In Streaming data, an efficient way to speed up delivery of results is by storing data compactly using a compresseddata storage model that will be beneficial in providing real-time analytics. In this paper, a data storage model named PDSM is presented, using Probabilistic data structures which provides answers to real-time queries on data streams. PDSM is a Probabilistic Data Storage Model that stores streaming data using hash functions as the primary data mapping method for incoming stream elements.Hash functions map a data set of arbitrary size to a data set of a fixed size. So regardless of source dataset size, the PDSM has a smaller fixed storage size which proves beneficial as a storage model.PDSM uses sketches (Count-min probabilistic data structure)to record the frequency of streaming data element occurrences and filtering (Bloom Filter probabilistic data structure)to check membership of elements in a stream. PDSM is usedto answer most sought-after queries, this way reducing query time on the most frequent questions.As PDSM uses Probabilistic data structures the accuracy of output results is an estimate and not 100% accurate. However, in streaming data, this is not a critical factor asall data elements in the stream are not required data, and a probabilistic estimate is a close enough answer to the query. Since storage used is minimal, this paper provides PDSMas a positive solution to faster query response time for streaming data and storage ofbig static datasets.
Keyword: Faster Query Response, Hashing, Little Memory, Probabilistic data Structures, Sketching.
Scope of the Article: Data Modelling, Mining and Data Analytics.