Extracting Subset of Relevant Features for Breast Cancer to Improve Accuracy of Classifier
Rajesh Saturi1, Raju Dara2, P. Prem Chand3
1Rajesh saturi ,Research Scholar, Department of CSE, University College of Engineering, Osmania University, Hyderabad-500007, Telangana State, India.
2Dr. Raju Dara, Professor, Dept., of Computer Science & Engineering, Vignana Bharathi Institute of Technology, Hyderabad, Telangana, India.
3Prof.P.Prem Chand, Professor, Department of CSE, University College of Engineering, Osmania University, Hyderabad-500007, Telangana State, India.
Manuscript received on 26 August 2019. | Revised Manuscript received on 08 September 2019. | Manuscript published on 30 September 2019. | PP: 1670-1674 | Volume-8 Issue-11, September 2019. | Retrieval Number: K15070981119/2019©BEIESP | DOI: 10.35940/ijitee.K1507.0981119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Data mining is the essential step which identifies hidden patterns from large repositories. Medical diagnosis became a major area of current research in data mining. Machine learning technique which use statistical methods to enable machine to improve with experiences and identify hidden patterns in data like regression algorithms, clustering algorithms, classification algorithms, neural networks(ANN,CNN,DL),recommender system algorithms, Apriori algorithms, page ranking algorithms, text search and NLP(natural language processing) etc.., but due to lack of evaluation, these algorithms are unsuccessful in finding a better classifier for images to estimate accuracy of classification in medical image processing. Classification is an supervised learning which predicts the future class for an unknown object. The main purpose is to identify an unknown class by consulting with the neighbor class characteristics. Clustering can be known as unsupervised learning as it label the objects based on the scale of similar characteristics without consulting its class label. Main principle of clustering is find the distance like nearby and faraway based on their similarities and dissimilarities and groups the objects and hence can be used to identify outliers (which are far away from from the object). Feature extraction, variable selection is a method of obtaining a subset of relevant characteristics from large dataset. Too many features of a class may affect the accuracy of classifier. Therefore, feature extraction technique can be used to eliminate irrelevant attributes and increases the accuracy of classifier. In this paper we performed an induction to increase the accuracy of classifier by applying mining techniques in WEKA tool. Breast Cancer dataset is chosen from learning repository to analyze and an experimental analysis was conducted with WEKA tool using training dataset by applying naïve bayes, bayesnet, and PART, ZeroR, J48 and Random Forest techniques on the Wisconsin’s dataset on Breast cancer. Finally presented the best classifier where the accuracy is more.
Keywords: Breast cancer, UCI machine learning, classifier, supervised learning, features .
Scope of the Article: Machine Learning