Neighbor Embedding Feature Selected Light Gradient Boosting Classification for Breast Cancer Detection with Gene Expression Data
S.Rajasekaran1, S.Sathyabama2

1S.Rajasekaran, Research Scholar, Bharathiar University, Coimbatore, Tamilnadu. India. 

2Dr.S.Sathyabama, Assistant Professor, Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamilnadu,.India. 

Manuscript received on 11 September 2019 | Revised Manuscript received on 20 September 2019 | Manuscript Published on 11 October 2019 | PP: 645-654 | Volume-8 Issue-11S September 2019 | Retrieval Number: K110809811S19/2019©BEIESP | DOI: 10.35940/ijitee.K1108.09811S19

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Breast cancer is one of the most frequently diagnosed cancers among women worldwide. Accurate detection of Breast cancer is essential for providing better treatment and risk minimization of the patients. Recently, the collection of biological data like gene expression, protein sequences, DNA sequences are used due to improvements of accessible data mining techniques to diagnosis the disease at an earlier stage. The current state-of-art methods reported to have certain limitations in their diagnostic capability. In order to improve the breast cancer classification, an efficient technique called Gaussian Kernelized Neighbor Embedding based Light Gradient Boost Classification (GKNE-LGBC) technique is introduced. The GKNE-LGBC technique considers the benchmark microarray dataset and performs two processes such as feature selection and classification for detecting breast cancer using gene expression data. The number of gene and the data are collected from the microarray dataset. After collecting, the Gaussian Kernelized stochastic neighbor embedding algorithm is applied to select the relevant features (i.e. genes) and remove the irrelevant features based on the distance similarity. Next, the classification of the gene expression data is done with the help of steepest descent light gradient boosting algorithm. The boosting algorithm initially constructs’ number of weak learners i.e. bivariate regression tree to classify the input expression data into normal or cancerous with the selected features. Then the weak classifiers are combined into strong by minimizing the training error. This helps to improve breast cancer detection accuracy and minimizes the false positive rate. The experimental evaluation is carried out using gene microarray dataset with various parameters such as breast cancer detection accuracy, false positive rate and breast cancer detection time with a number of genes. The experimental results confirm that the proposed GKNE-LGBC technique accurately identifies breast cancer with higher accuracy, and minimal time complexity as well as false positive rate as compared to the state-of-art- methods.

Keywords: Benchmark microarray dataset, gene expression data, breast cancer detection, Gaussian Kernelized stochastic neighbor embedding, feature selection, steepest descent light gradient boosting algorithm, bivariate regression tree
Scope of the Article: Classification