Loading

Scattering Wavelet Hash Fingerprints for Musical Audio Recognition
Evren Kanalici1, Gokhan Bilgin2

1Evren Kanalici, Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey.

2Gokhan Bilgin, Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey.

Manuscript received on 08 August 2019 | Revised Manuscript received on 14 August 2019 | Manuscript Published on 26 August 2019 | PP: 1011-1015 | Volume-8 Issue-9S August 2019 | Retrieval Number: I11620789S19/19©BEIESP | DOI: 10.35940/ijitee.I1162.0789S19

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Fingerprint design is the cornerstone of the audio recognition systems in which aims robustness and fast retrieval. Short-term Fourier transform and Mel-spectral representations are common for the task in mind, however these extraction methods suffer from being unstable and having limited spectral-spatial resolution. Scattering wavelet transform (SWT) provides another approach to these limitations by recovering information loss, while ensuring translation invariance and stability. We propose a two-stage feature extraction framework using SWT coupled with deep Siamese hashing model for musical audio recognition. Similarity-preserving hashes are the final fingerprints and in the projected embedding space, similarity is defined by a distance metric. Hashing model is trained by roughly aligned and non-matching audio snippets to model musical audio data via two-layer scattering spectrum. Our proposed framework provides competitive performance results to identify audio signals superimposed with environmental noise which can be modeled as real-world obstacles for music recognition. With a very compact storage footprint (256 bytes/sec.), we achieve 98.2% ROC AUC score on GTZAN dataset.

Keywords: Audio Fingerprinting, CNNs, Scattering Wavelet Transform, Siamese Networks, Embedding Hash Models.
Scope of the Article: Communication