Preprocessing for Parts of Speech (POS) Tagging in Dogri Language
Shivangi Dutta1, Bhavna Arora2
1Shivangi Dutta , Department of Computer Science & IT, Central University of Jammu, Jammu, India.
2Bhavna Arora, Department of Computer Science & IT, Central University of Jammu, Jammu, India.
Manuscript received on 08 June 2019 | Revised Manuscript received on 13 June 2019 | Manuscript Published on 08 July 2019 | PP: 114-120 | Volume-8 Issue-8S3 June 2019 | Retrieval Number: H10320688S319/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Natural language processing (NLP) is viewed among the most crucial fields of computer science, information retrieval and artificial intelligence. One such challenging feature in NLP is Parts of speech (POS) tagging. It is the process of labelling the words present in the corpus as the parts of speech. According to English grammar there are eight major parts of speech which are: noun, pronoun, verb, adjective, adverb, preposition, conjunction, interjection. Over the past few years, various researchers have compassed considerable amount of work using various pursues to closely supervised tagging and unmonitored ta gging. These methods of labelling are further divided into rules-based, stochastic and hybrid approaches. The language that has been taken for research work is Dogri Language which is based on Devanagari script. The paper presents the related work in the languages having same script as Dogri. The study helps in the selection of appropriate technique to be used for POS tagging for Dogri language. The paper also presents grammatical and inflectional analysis of Dogri language along with few rules for designing POS tagger. A section of the paper also demonstrates the results of preprocessing i.e. tokenization and stemming of Dogri text, which are considered as the initial steps in POS tagging.
Keywords: Dogri language, Parts of speech tagging, stemming, tokenization.
Scope of the Article: Natural Language Processing