| View previous topic :: View next topic |
| Author |
Message |
Adam Chapman Guest
|
Posted: Sat Nov 15, 2008 3:49 pm Post subject: Mel filter bank output looks funny |
|
|
Hi,
I've never done any real audio processing so there might actually be
no problem here, I just think my oputput looks funny.
Im basically trying to copy the filter for generating "mel-frequency
cepstrum coefficients" in the paper:
"Comparison of Parametric Representations for Monosyllabic Word
Recognition in Continuously Spoken Sentences" by davis & mermelstein.
I've copied the only relevant paragraph and graph of their filter to
(http://personalpages.manchester.ac.uk/student/adam.chapman/
paperpart.jpg).
First of all, their filter looks funny to me. why does the right hand
leg of their last filter bin only go to 4600 Hz? I have implemented
mine so the right leg goes to 5kHz, incidentally the peak of that last
bin is at around 4600 Hz, and I've used the same overlap scheme as
depicted in their figure.
I put a sample output of my implementation at (http://
personalpages.manchester.ac.uk/student/adam.chapman/
filteroutput.jpg). Frequency is on the x axis in Hz.
The top figure shows the fourier power spectra of a short audio
sample, and overlaid my filter bank (weights exagerated for viewing
clarity)
The second figure down shows the log of the weighted sum of each
filter bin. This is the bit That I find strange. The wider filters at
higher frequencies always have a higher output than the lower
frequency filters, even if the signal is much stronger at lower
frequencies. Takking the log of filter outputs helps, but still the
higher frequency bins always give larger output. Should the filter
ouputs perhaps be normalised- dividing output by filter bin width?
Im nmot sure about the MFCC results, basically because Ive never seen
any to compare with.
Thanks in advance for any response.
Adam |
|
| |
|
Back to top |
Adam Chapman Guest
|
Posted: Tue Nov 18, 2008 12:01 am Post subject: Re: Mel filter bank output looks funny |
|
|
On Nov 15, 3:49 pm, Adam Chapman
<adam.chap...@student.manchester.ac.uk> wrote:
| Quote: | Hi,
I've never done any real audio processing so there might actually be
no problem here, I just think my oputput looks funny.
Im basically trying to copy the filter for generating "mel-frequency
cepstrum coefficients" in the paper:
"Comparison of Parametric Representations for Monosyllabic Word
Recognition in Continuously Spoken Sentences" by davis & mermelstein.
I've copied the only relevant paragraph and graph of their filter to
(http://personalpages.manchester.ac.uk/student/adam.chapman/
paperpart.jpg).
First of all, their filter looks funny to me. why does the right hand
leg of their last filter bin only go to 4600 Hz? I have implemented
mine so the right leg goes to 5kHz, incidentally the peak of that last
bin is at around 4600 Hz, and I've used the same overlap scheme as
depicted in their figure.
I put a sample output of my implementation at (http://
personalpages.manchester.ac.uk/student/adam.chapman/
filteroutput.jpg). Frequency is on the x axis in Hz.
The top figure shows the fourier power spectra of a short audio
sample, and overlaid my filter bank (weights exagerated for viewing
clarity)
The second figure down shows the log of the weighted sum of each
filter bin. This is the bit That I find strange. The wider filters at
higher frequencies always have a higher output than the lower
frequency filters, even if the signal is much stronger at lower
frequencies. Takking the log of filter outputs helps, but still the
higher frequency bins always give larger output. Should the filter
ouputs perhaps be normalised- dividing output by filter bin width?
Im nmot sure about the MFCC results, basically because Ive never seen
any to compare with.
Thanks in advance for any response.
Adam
|
Perhaps I should have squared the output of the fft, I'm not quite
sure wether to do that on the power spectrum or not |
|
| |
|
Back to top |
|