Vocal Data Assesment To Envision Distinctive Features of An Individual
Arnav Garg1, Kushal Agrawal2, P. Akilandeshwari3
1Arnav Garg*, Department of Computer Science Engineering, SRM IST, Kanchipuram Tamil Nadu.
2Kushal Agrawal, Department of Computer Science Engineering, SRM IST, Kanchipuram Tamil Nadu.
3Mrs. P. Akilandeshwari, Department of Computer Science Engineering, SRM IST, Kanchipuram Tamil Nadu.
Manuscript received on March 15, 2020. | Revised Manuscript received on March 28, 2020. | Manuscript published on April 10, 2020. | PP: 1335-1338 | Volume-9 Issue-6, April 2020. | Retrieval Number: F3771049620/2020©BEIESP | DOI: 10.35940/ijitee.F3771.049620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: There is a lot of audio data generated on a day to day bases, which goes to waste without undergoing due processing. If we process this data, it can be beneficial for a multitude of purposes. Vocal data is unstructured, which makes it even harder for processing. This data has to undergo thorough pre-processing to convert it to a machine-understandable form. We aim to perform analysis of human voice to extract meaningful data and make a prediction of their age, gender, and accent. The developed system uses the Mel-frequency Cepstral Coefficient (MFCC), zero-cross-rate(ZCR), chroma_ stft, spectral_ centroid, spectral_ bandwidth, and spectral_ roll off algorithms as a tool for Feature Extraction. The algorithms used for making inferences are support vector machine (SVM), K-nearest neighbors, and SVR. The work can be extended even further by combining video data with the audio data for analysis. The system can also be improved by increasing the number of languages it can detect.
Keywords: Feature Extraction, Speech Processing, Age-Gender Classification, Accents Classification, Mel-frequency Cepstral Coefficient, Zero Cross Rate, SVM, KNN.
Scope of the Article: Signal and Speech Processing