Hilbert Spectrum Based Features for Speech/Music Classification

Main Article Content

Arvind Kumar
Sandeep Singh Solanki
Mahesh Chandra

Abstract

Automatic Speech/Music classification uses different signal processing techniques to categorize multimedia content into different classes.  The proposed work explores Hilbert Spectrum (HS) obtained from different AM-FM components of an audio signal, also called Intrinsic Mode Functions (IMFs) to classify an incoming audio signal into speech/music signal. The HS is a two-dimensional representation of instantaneous energies (IE) and instantaneous frequencies (IF) obtained using Hilbert Transform of the IMFs. This HS is further processed using Mel-filter bank and Discrete Cosine Transform (DCT) to generate novel IF and Instantaneous Amplitude (IA) based cepstral features. Validations of the results were done using three databases –Slaney Database, GTZAN and MUSAN database. To evaluate the general applicability of the proposed features, extensive experiments were conducted on different combination of audio files from S&S, GTZAN and MUSAN database and promising results are achieved. Finally, performance of the system is compared with performance of existing cepstral features and previous works in this domain.

Article Details

Section
Articles