Hilbert Spectrum Based Features for Speech/Music Classification

Arvind Kumar; Sandeep Singh Solanki; Mahesh Chandra

doi:10.2298/SJEE2202239K

PDF

Published: Jul 8, 2022

DOI: https://doi.org/10.2298/SJEE2202239K

Keywords:

EMD, Hilbert Spectrum, Hilbert Huang Transform, Cepstral Features, Speech/Music Classification

Arvind Kumar

Sandeep Singh Solanki

Mahesh Chandra

Abstract

Automatic Speech/Music classification uses different signal processing techniques to categorize multimedia content into different classes. The proposed work explores Hilbert Spectrum (HS) obtained from different AM-FM components of an audio signal, also called Intrinsic Mode Functions (IMFs) to classify an incoming audio signal into speech/music signal. The HS is a two-dimensional representation of instantaneous energies (IE) and instantaneous frequencies (IF) obtained using Hilbert Transform of the IMFs. This HS is further processed using Mel-filter bank and Discrete Cosine Transform (DCT) to generate novel IF and Instantaneous Amplitude (IA) based cepstral features. Validations of the results were done using three databases –Slaney Database, GTZAN and MUSAN database. To evaluate the general applicability of the proposed features, extensive experiments were conducted on different combination of audio files from S&S, GTZAN and MUSAN database and promising results are achieved. Finally, performance of the system is compared with performance of existing cepstral features and previous works in this domain.

Issue

Vol 19 No 2 (2022)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details