Optimized Partitioning Based Genetic Algorithm For Generating Mining Frequent Patterns From Big Data Sets
Chandaka Babi1, M.Venkateswara Rao2, V.Venkateswara Rao3, Bhanuja Arketla4
1Chandaka Babi, Department of IT, Gitam University, Visakhapatnam (Andhra Pradesh), India.
2Dr M.Venkateswara Rao, Department of IT, Gitam University, Visakhapatnam (Andhra Pradesh), India.
3Dr V.Venkateswara Rao, Department of CSE, Sri Vasavi Engineering College, Tadepalligudem (Andhra Pradesh), India.
4Bhanuja Arketla, Department of CSC, Gitam University, Visakhapatnam (Andhra Pradesh), India.
Manuscript received on 01 May 2019 | Revised Manuscript received on 15 May 2019 | Manuscript published on 30 May 2019 | PP: 289-295 | Volume-8 Issue-7, May 2019 | Retrieval Number: G5193058719/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The Data Mining developments have been built-up and investigated in terms of technologies and methodologies. Mining of Frequent patterns is unique precise data excavating tasks, mostly from merchandizing data. Our job aims to find out all chronological patterns through a customer précised minimum threshold support, and here the support of a pattern can be defines as the total of occurrences of data in the given pattern. This paper concentrates on problems associated with frequent data mining for knowledge based system. A thorough analysis has been done on these problems and answers have been made for the problems related to previous process and new techniques have been created for mining frequent patterns. Initially this research work focus on previous activities in the area of frequent pattern mining and then this research work initially proposed an algorithm Apriori with optimization using genetic Algorithm for finding the frequent patterns. Usually the running time of the procedure to discover or invent frequent items pattern based on total no of candidates produced at every level and the time consumed to read the data set. The proposed method reduces the scanning time and also reduces the number of candidate itemsets generated at each step. This is because the database can be read only for once, at that moment an intermediate dataset can be constructed at each step. Here also then the association rule generated by the Apriori algorithm is optimized using genetic algorithm. To produce strong association rules, the algorithm uses Genetic Algorithm operators like selection, crossover and mutation on association rule produced by Apriori algorithm. The parallel algorithm has been proposed to mine the frequent patterns with a user specified minimum support. The job is distributed among n number of processors to compute frequent item sets. So there will be communication between the processors. The time required to complete the job is very less when compared to other algorithms. The key disadvantage of this procedure is execution time, since number of processors used will be increased when the number of data items increased. To build it more competent, partition algorithm have been designed, in this a separate partitioning is created for each sets of data items. To get the count of a particular item sets, scanning the entire database is not required and only require the particular partition. Consequently the scan time has been decreased. Over all the algorithms, partition algorithm will have improved performance over the present algorithms.
Keyword: Big Data, Data Mining, Frequent Pattern Mining, Frequent Item Sets Hybrid Apriori, Association Rules, Portioning Algorithm, and Parallel Execution.
Scope of the Article: Data Mining.