Tanimoto Coefficient Similarity based Mean Shift Gentle Adaptive Boosted Clustering for Genomic Predictive Pattern Analytics
Marrynal S Eastaff1, V Saravaan2
1Marrynal S Eastaff*, Ph.D Research Scholar, Department of Computer Science, Hindusthan College of Arts and Science, Coimbatore, India.
2Dr. V. Saravanan, Associate Professor, Head Department of IT, Hindusthan College of Arts and Science, Coimbatore, India
Manuscript received on October 10, 2019. | Revised Manuscript received on 20 October, 2019. | Manuscript published on November 10, 2019. | PP: 2034-2042 | Volume-9 Issue-1, November 2019. | Retrieval Number: L38171081219/2019©BEIESP | DOI: 10.35940/ijitee.L3817.119119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Gene expression data clustering is a significant problem to be resolved as it provides functional relationships of genes in a biological process. Finding co-expressed groups of genes is a challenging problem. To identify interesting patterns from the given gene expression data set, a Tanimoto Coefficient Similarity based Mean Shift Gentle Adaptive Boosted Clustering (TCS-MSGABC) Model is proposed. TCS-MSGABC model comprises two processes namely feature selection and clustering. In first process, Tanimoto Coefficient Similarity Measurement based Feature selection (TCSM-FS) is introduced to identify relevant gene features based on the similarity value for performing the genomic expression clustering. Tanimoto Coefficient Similarity Value ranges from ‘ ’ to ‘ ’ where ‘ ’ is highest similarity. The gene feature with higher similarity value is taken to perform clustering process. After feature selection, Mean Shift Gentle Adaptive Boosted Clustering (MSGABC) algorithm is carried out in TCS-MSGABC model to cluster the similar gene expression data based on the selected features. The MSGABC algorithm is a boosting method for combining the many weak clustering results into one strong learner. By this way, the similar gene expression data are clustered with higher accuracy with minimal time. Experimental evaluation of TCS-MSGABC model is carried out on factors such as clustering accuracy, clustering time and error rate with respect to number of gene data. The experimental results show that the TCS-MSGABC model is able to increases the clustering accuracy and also minimizes clustering time of genomic predictive pattern analytics as compared to state-of-the-art works.
Keywords: Genomic, Mean Shift Gentle Adaptive Boosted Clustering, Strong Learner, Tanimoto Coefficient Similarity, Weak Cluster, Weight
Scope of the Article: Clustering