Exploiting the Syntax-Annotated Corpus for Analysing Vietnamese Syntax
Phan Thi Ha1, Ha Hai Nam2, Đo Xuan Cho3
1Phan Thi Ha, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam, South China.
2Ha Hai Nam, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam, South China.
3Đo Xuan Cho, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam, South China.
Manuscript received on 02 November 2017 | Revised Manuscript received on 21 November 2017 | Manuscript Published on 30 December 2017 | PP: 15-18 | Volume-7 Issue-2, November 2017 | Retrieval Number: A2469097117/2017©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: This paper presents an algorithm for automatic extraction of the PCFG (probability context free grammar) from Viettreebank and an algorithm for constructing the Vietnamese parser based on the PCFG for Vietnamese sentence analysis. The parsing algorithm for each sentence is developed from the jurafsky and martin algorithm [5]. Applied to the Vietnamese language, an input sentence is labeled by an available part-of-speech(POS) tagging tool, while for Jurafsky and Martin , the input sentences is unlabeled POS of which words are separated by white space.
Keyword: CFG, PCFG, CYK, PCYK, Treebank, Probability Context Free Grammar, parser
Scope of the Article: Algorithm