Improving the Process of Identifying Internally Displaced Persons Using Big Data Technologies
Hakima Fathi Mahamoud1, Raja Rajeswari Ponnusamy2, Ho Ming Kang3, Jacob Sow Tian You4
1Hakima Fathi Mahamoud, School of Computing, Asia Pacific University of Technology and Innovation, Malaysia.
2Raja Rajeswari Ponnusamy, School of Mathematics, Actuaries and Quantitative Studies, Asia Pacific University of Technology and Innovation, Malaysia.
3Ho Ming Kang, School of Mathematics, Actuaries and Quantitative Studies, Asia Pacific University of Technology and Innovation, Malaysia.
4Jacob Sow Tian You, School of Media, Art and Design, Asia Pacific University of Technology and Innovation, Malaysia.
Manuscript received on 05 December 2018 | Revised Manuscript received on 12 December 2018 | Manuscript Published on 26 December 2018 | PP: 386-391 | Volume-8 Issue- 2S2 December 2018 | Retrieval Number: ES2124017519/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: This data-driven project is systematically contributing on enhancing the conflict-violence or disaster-related displacement within an internationally recognized state border, namely internal displacement. With the availability of a training set with pre-defined categories, the project tackles document classification and information retrieval applications through supervised machine learning. This research can be divided into three core objectives. Firstly to eradicate non-relevant documents by filtrating documents not in English and not providing information on human mobility related to internal displacement. Secondly, to tag documents relatively to the themes Internal Displacement Monitoring Centre (IDMC) used to monitor the causes behind internal displacement, notably conflict/violence or disasters. Thirdly, to extract vital displacement information reported in online sources, such as location, displacement figures, etc. Documents are further analysed by training them using Support Vector Machine for tagging and Multinomial Naïve Bayes for information extraction, added to the pre-processing operations such as mainly working on natural language processing annotators, since the training set is mainly composed of textual documents. Finally, after having adjusted the parameters and learning, the performance of each of the resulting functions, notably Support Vector Machine and Multinomial Naïve Bayes on the training set, were measured on two different test sets, one for tagging and the other for information retrieval. By evaluating the provided dataset, the results were good with a result of 95.83% for classification and 81% for information retrieval.
Keywords: Document Classification, Information Retrieval, Support Vector Machine, Multinomial Naïve Bayes.
Scope of the Article: Classification