Speech Recognition using Multiscale Scattering of Audio Signals and Long Short-Term Memory of Neural Networks
Haribharath Mahalingam1, M.P. Rajakumar2
1Haribharath Mahalingam, Dept. of Computer Science and Engineering, St. Joseph‟s College of Engineering, Chennai, Tamil Nadu, India.
2Dr M. P. Rajakumar, Professor, Dept. of Computer Science and Engineering, St. Joseph‟s College of Engineering, Chennai, Tamil Nadu, India.
Manuscript received on 27 August 2019. | Revised Manuscript received on 07 September 2019. | Manuscript published on 30 September 2019. | PP: 2955-2961 | Volume-8 Issue-11, September 2019. | Retrieval Number: K22700981119/2019©BEIESP | DOI: 10.35940/ijitee.K2270.0981119
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Communication is one of the key elements of interaction. In order to understand the audio language used by humans, machines use different techniques to convert speech to machine readable form called speech recognition. This paper takes one of the most classic examples of the speech recognition domain, the spoken digit’s recognition. The recognition is done with the help of a technique called wavelet scattering that initially extracts useful information from the signals and sends this information further to a Long Short-Term Memory (LSTM) network to classify the signals. A major advantage of using the LSTM is that it overcomes the vanishing gradient problem and this proposed technique can be used in applications like entry of numerical data for blind people. This method provides an increased accuracy than other standard methods that uses Mel-frequency Cepstral coefficients (MFFC) and LSTM network to recognize digits. The main objective of this work achieved its primary purpose to validate the efficiency of wavelet scattering technique and LSTM networks for spoken digits’ recognition.
Keywords: Deep Learning, Long Short-Term Memory (LSTM), Multi-Scale Scattering, Neural Networks, Speech Recognition.
Scope of the Article: Deep Learning