Text Independent Speaker Identification with Prosody Features in Presence of Noise
S.M. Jagdale1, A.A.Shinde2, J.S.Chitode3
1S.M. Jagdale, Ph.D. Research Scholar, Bharati Vidyapeeth (Deemed to be) University COE, Pune, Maharashtra, India.
2A.A.Shinde, Department of Electronics, Bharati Vidyapeeth (Deemed to be University) COE, Pune, Maharashtra, India.
3J.S.Chitode, Department of Electronics Bharati Vidyapeeth (Deemed to be University) COE, Pune, Maharashtra, India.
Manuscript received on 02 July 2019 | Revised Manuscript received on 16 July 2019 | Manuscript Published on 23 August 2019 | PP: 124-127 | Volume-8 Issue-9S3 August 2019 | Retrieval Number: I30250789S319/2019©BEIESP | DOI: 10.35940/ijitee.I3025.0789S319
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Automatic recognition of Meta data of a speaker apart from recognizing only his or her identity is a challenging task. It gives rich behavioral characteristics of a person.Maximum work have been done in speaker recognitionon low level spectral features. Which gives good accuracy with minimum error, but they ignore other information about the speaker. Also in spectral variations, in session variations and in channel variations these features give degraded performance. State-of-the-art systems for text-independent speaker identification use Mel Frequency cepstral coefficients (MFCCs) as main features. Generally this system performs very good under clean conditions and acceptable under matched conditions. Under mismatched conditions, however, performance significantly deteriorates. One of the principal reasons for poor performance in these conditions is because of the nature of low-level features; being spectral, they are susceptible to spectral variations due to noise and channel effects.Prosodic features are used successfully in these variation conditions as well as in presence of noise.In this paper multi SNR environment is considered. Recognition accuracy has been calculated at different SNR levels i.e. 15 dB, 25 dB and 35 dB.Also results are tested at different types of noise such as Traffic noise, cockpit noise, babble noise and fan noise. It has been found that combining prosodic features such as pitch, energy and formants gives improved performance.
Keywords: Prosodic, Spectral, MFCC, SNR
Scope of the Article: Computer Vision