Loading

Enhanced Speech Recognition for Indonesian Geographic Dictionary Using Deep Learning
H. Hugeng1, D. Gunawan2, A. T. Kusumo3

1Dr. H. Hugeng, Assoc. Professor, Elec. Eng. Depart., Universitas Tarumanagara, Jakarta, Indonesia.
2Dr. D. Gunawan, Professor, Elec. Eng. Depart., Universitas Indonesia, Depok, Indonesia.
3A. T. Kusumo, Comp. Eng., Universitas Multimedia Nusantara, Tangerang, Indonesia.

Manuscript received on 24 August 2019. | Revised Manuscript received on 09 September 2019. | Manuscript published on 30 September 2019. | PP: 2594-2598 | Volume-8 Issue-11, September 2019. | Retrieval Number: K18860981119/2019©BEIESP | DOI: 10.35940/ijitee.K1886.0981119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Speech recognition technology has been developing very fast lately. One of its application is to know the meaning of some terms included in a geographic dictionary. When a subject speaks a word to the system, it will output the word and its meaning and explanation. There are many methods that are applied to speech recognition. One of the methods that can be applied and improve the accuracy of speech recognition is the use of a deep learning method, i.e. Convolutional Neural Network (CNN). In this research, CNN’s speech recognition accuracy for the Indonesian geographic dictionary is analyzed to show that CNN can improve the accuracy of speech recognition compared to speech recognition with Gaussian mixture model and hidden Markov model (GMM-HMM). CNN is one of deep learning methods that analyzes and finds similarity in Mel-frequency cepstral coefficients (MFCC) from sound waves. This research is performed by making models of the spoken words using CNN under Python and TensorFlow. CNN is trained with these models from speech data collected and prepared from 20 students, consists of 19 men and a woman of different ages from 19 to 23 years. The vocabulary of the database consists of 50 words. The result of this research is a desktop application with the trained models implemented. Our application can recognize well the spoken words from subjects. Testing of the trained models was performed to examine the accuracy of the build speech recognition system. The result of the CNN speech recognition method from the Indonesian geographic dictionary is 80% accuracy for isolated words and 72.67% for continuous words in our research.
Keywords: Speech Recognition; Convolutional Neural Network; Deep Learning; Indonesian Geographic Dictionary.
Scope of the Article: Deep Learning