Loading

Idhazhi: A Min-Max Algorithm for Viseme to Phoneme Mapping
Suriyah M.1, Aarthy Anandan2, Madhan Karky Vairamuthu3

1Suriyah M*., Research Writer, Department of Computer Science and Engineering, Karky Research Foundation, Chennai, India.
2Aarthy Anandan, Software Developer, Department of Computer Applications, Karky Research Foundation, Chennai, India.
3Madhan Karky Vairamuthu, Founder and Research Head, Department of Computer Science, Karky Research Foundation, Chennai, India.
Manuscript received on January 12, 2020. | Revised Manuscript received on January 22, 2020. | Manuscript published on February 10, 2020. | PP: 588-594 | Volume-9 Issue-4, February 2020. | Retrieval Number: D1414029420/2020©BEIESP | DOI: 10.35940/ijitee.D1414.029420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: With the advent of language tools, the barrier of language is diffusing fast. Content from one language is converted to other languages in different platforms especially in the entertainment industry. In this age, it is essential to increase the naturalness of syncing video and audio content while dubbing. It is a proven fact that a majority of people around the world prefer listening to audio content rather than reading in the language known to them. Detecting visemes sequences of words of a language in videos and identifying words from another language matching the detected viseme sequence becomes a valuable application in this scenario. In this paper we propose Idhazhi, an algorithm which suggests words as phonemes that match a particular viseme sequence, in four levels – perfect, optimized, semi-perfect and compacted. This is done by mapping specific oral positions of lips, teeth and tongue for the symbols of the IPA. The system was tested by recording 50 videos of testers pronouncing a word in each, playing it muted to a group 12 of testers who evaluated how much the words suggested by the system are relevant to the viseme sequence in the video; the accuracy was 0.73 after approximations. Apart from the suggested application, this algorithm can have wide application in security for finding out the list of words which may match a viseme sequence in a video like CCTV footage. It may also be extended to help persons with vocal disability by generating speech from their vocal movements. 
Keywords: Dubbing, Natural language Processing, Viseme-Phoneme Mapping.
Scope of the Article:  Natural Language Processing