Preprocessing Methods for Unstructured Healthcare Text Data
Naresh Patel K M1, Kiran P2

1Naresh Patel K M., Assistant Professor, Department of CSE, BIET, Davangere (Karnataka), India.

2Dr. Kiran P, Associate Professor, Department of CSE, RNSIT, Bengaluru (Karnataka), India.

Manuscript received on 09 December 2019 | Revised Manuscript received on 17 December 2019 | Manuscript Published on 31 December 2019 | PP: 715-719 | Volume-9 Issue-2S December 2019 | Retrieval Number: B10241292S19/2019©BEIESP | DOI: 10.35940/ijitee.B1024.1292S19

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: At present, the amount unstructured text data is increasing exponentially from the past periodically. Information retrieval (IR) from these unstructured text data is challenging. As the data users foresee for particular/specific outcomes. Retrieval of the significant outcomes depends on the fashion, how they are associated/indexed. Unstructured text data like clinical data containing more health information requires challenging preprocessing methods, which also help to reduce the size of the dataset so that it will optimize the performance of the IR system. In this paper, we have proposed the pre-processing methods such as Data collection, Data Cleaning, Tokenization, Stemming, Removal of Stop words which will efficiently help the data users to find the specific patterns from the unstructured text data.

Keywords: Information Retrieval (IR), Tokenization, Stemming, Stop Words, Unstructured Text Data.
Scope of the Article: Text Mining