Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script
Mukesh Kumar1, Amandeep Verma2

1Mukesh Kumar, Dept. of Computer Science, Mata Gujri College, Fatehgarh Sahib, Punjab, India.
2Amandeep Verma, Punjabi University Regional Centre for Information Technology & Management, Mohali, Punjab, India.
Manuscript received on December 14, 2019. | Revised Manuscript received on December 24, 2019. | Manuscript published on January 10, 2020. | PP: 3216-3220 | Volume-9 Issue-3, January 2020. | Retrieval Number: C9105019320/2020©BEIESP | DOI: 10.35940/ijitee.C9105.019320
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Document clustering plays a central role in knowledge discovery and data mining by representing large data-sets into a certain number of data objects called clusters. Each cluster consists similar data objects in such a way that data objects in the same cluster are more similar and dissimilar to the data objects of other clusters. Document clustering technique for Gurmukhi script consists two phases namely: 1) Pre-processing phase 2) Processing phase. This paper concentrates pre-processing phase of document clustering technique for Gurmukhi script. The purpose of pre-processing phase is to convert unstructured text into structured text format. Various sub-phases of pre-processing phase are: segmentation, tokenization, removal of stop words, stemming, and normalization. The purpose of this paper is to present the significant role of pre-processing phase in an overall performance of document clustering technique for Gurmukhi script. The experimental results represent the significant role of pre-processing phase in terms of performance regarding assignment of data objects to the relevant clusters as well as in creation of meaningful cluster title list.
Keywords: Document Clustering, Gurmukhi Script Clustering Technique, Pre-processing Phase, Punjabi Text Document Clustering, Data Mining Techniques; Machine learning; Unsupervised learning.
Scope of the Article:  Clustering