Easy MFCCs

The mel cepstral frequency coefficients, also known as MFCCs, are set of features derived from a digital signal, consisting of 12-20 digits per sample, used to describe the overall shape of a spectral envelope. One can obtain more or fewer MFCCs depending on their application. The MFCCs are taken per sample, so if the sample rate is 32000, there are 32000 frames within each sample, and out of that, the MFCC formula allow us to derive a wealth of information in fewer datapoints.

def plot_mfccs(file):
    y, sr = librosa.load(file, sr=32000)
    S = librosa.feature.mfcc(y=y, sr=sr, 
                             n_mels=12,
                             fmax=13000)

    plt.figure(figsize=(8,5))
    librosa.display.specshow(S, x_axis='time')
    title = re.sub(r'C:\\Users\\root\\Projects\\new_birds\\New Folder\\',      '', file)
    plt.colorbar()
    plt.title(title)
    plt.show()

These can be used to visualize the spectral envelope of the audio signal in question through plotting as above. I plotted three different types of bird calls using MFCCs to provide an example:

Librosa allows quick extraction of MFCCs. Behind the scenes ,the signal is framed into shorter frames, the power spectrum is estimated as a periodogram. Next the logarithm is taken from the energy of the spectrum. Then, the DCT, or discrete cosine transformation is taken, and the coefficients resulting are what we know as MFCCs.

MFCCs are used in speech recognition and other audio classification tasks, and this feature combined with spectral, and linear features provides a well rounded view on a signal.

Should you prefer to create a vector using the resulting MFCCs, these can be extracted and added to a numpy array to be used in an LSTM classification model or in a multitude of other ways.

def get_mfcc(path):
 # save label in mappings
    pathname = path.replace(f'{data}','' )
    pn = pathname.split("/")
    semanticLabel = pn[-1]
    mapping.append(semanticLabel)
    signal, sr = librosa.load(path, sr=SR)
    start_sample = 1
    end_sample = 90000
    S = librosa.feature.melspectrogram(signal[start_sample:end_sample], sr=sr, n_fft = 4096,
                                       hop_length=512, win_length=200, power=2,
                                       fmin=100, fmax=13000, window='hann')
    
    mfcc = librosa.feature.mfcc(S=librosa.power_to_db(S))
    mfcc_data.append(mfcc.T.tolist())

Share this:

Related

Leave a comment Cancel reply