F61600486S19 - International Journal of Innovative Technology and Exploring Engineering (IJITEE)

Effective Utilization of Storage Space by Applying File Level and Block-Level Deduplication over HDFS
Sachin Arun Thanekar¹, Kodukula Subrahmanyam², AliAkbar Bagwan³

¹Sachin Arun Thanekar, P.H.D. Scholar, Department of Computer Science & Engineering, KLEF, Vaddeswaram, Guntur Andhra Pradesh, India.

²Kodukula Subrahmanyam, Professor, Department of Computer Science & Engineering, KLEF, Vaddeswaram, Guntur Andhra Pradesh, India.

³Ali Akbar Bagwan, Professor, Department of Computer Engineering, Rajarshi Shahu College of Engineering, Tathwade, Pune (Maharashtra), India.

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Hadoop framework is very efficient and easy to handle huge records storage as well as its processing. Hadoop makes use of massive commodity hardware clusters to save and process massive data in an allotted fashion. Open Source, Massive information handling capabilities and faster processing abilities made it very popular. Existing Hadoop Framework destroys metadata of preceding jobs, it actually allocates Data Nodes via ignoring what it has processed earlier and hence for each new process it reads data from all Data Nodes. There isn’t any provision made for checking relationships between similar data blocks. Thus it weakens the Hadoop overall performance. The uploaded big data files are partitioned in to number of blocks and are distributed over node clusters. To avoid random block distribution and data-duplication, deduplication system is used. Such deduplication system focuses on space management and only keeps track of data files on Hadoop Distributed File System (HDFS). Such system do not participate in efficient job execution in map reduce environment. For efficient execution of job, data locality information and job metadata is stored. Time required for job execution can be decreased for next execution of same job by preserving job metadata. A combined environment produce efficient job execution results with efficient space management.

Keywords: HDFS, Hadoop, Map Reduce, Big Data, H2hadoop.
Scope of the Article: Computer Science and Its Applications

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US