Optimal Classification of Lung Cancer Related Genes using Enhanced reliefF Algorithm and Multiclass Support Vector Machine
Ashok K Patil1, Siddanagouda S Patil2, M Prabhakar3
1Ashok K Patil, School of Computing and Information Technology, Reva University, Bangalore, India.
2Siddanagoda S Patil, Agril Statistics, Applied Mathematics & Computer Science, University of Agricultural Sciences, Bangalore, India.
3M Prabhakar, School of Computing and Information Technology, Reva University, Bangalore, India.
Manuscript received on 05 July 2019 | Revised Manuscript received on 09 July 2019 | Manuscript published on 30 August 2019 | PP: 771-778 | Volume-8 Issue-10, August 2019 | Retrieval Number: J89010881019/2019©BEIESP | DOI: 10.35940/ijitee.J8901.0881019
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Currently, the automatic lung cancer classification remains a challenging issue for the researchers, due to noisy gene expression data, high dimensional data, and the small sample size. To address these problems, an enhanced gene selection algorithm and multiclass classifier are developed. In this research, the lung cancer-related genes (GEO IDs: GSE10245, GSE19804, GSE7670, GSE10072, and GSE6044) were collected from Gene Expression Omnibus (GEO) dataset. After acquiring the lung cancer-related genes, gene selection was carried out by using enhanced reliefF algorithm for selecting the optimal genes. In enhanced reliefF gene selection algorithm, earthmover distance measure and firefly optimizer were used instead of Manhattan distance measure for identifying the nearest miss and nearest hit instances, which significantly lessens the “curse of dimensionality” issue. These optimal genes were given as the input for Multiclass Support Vector Machine (MSVM) classifier for classifying the sub-classes of lung cancer. The experimental section showed that the proposed system improved the classification accuracy up to 3-10% related to the existing systems in light of accuracy, False Positive Rate (FPR), error rate, and True Positive Rate (TPR).
Keywords: Enhanced reliefF algorithm, firefly optimizer, gene selection, microarray gene expression, multiclass support vector machine.
Scope of the Article: Classification