DSPRelated.com
Forums

speech segmentation / filterin questions

Started by Erns...@gmx.at June 26, 2006
Hi!

First off, I'm pretty new to the field so bare with me ...

I'm trying to implement a method of speech segmentation described in "Landmark detection for distinctive feature-based speech recognition" by Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...

first, I need to compute the spectrogram of the signal I investigate, using a 6-ms hamming window every 1 ms. no prob, I just use specgram with the apropriate settings

then, I take the specgram times it's conjugate and seperate that into a couple of frequency bands.

now, each of these bands needs to be smoothed by 20ms. So I just use 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from the spectrogram. But what about the overlap?

Also, the paper states that "a 20-ms average of the squared magnitude of the spectrogram, centered about the time of interest, is computed every 1 ms." ... what does "centered about the time of interest" mean? and how can that be implemented?
then, I need to compute the first difference of that 20ms-averaged-signal "every 1ms using a 50ms-time step", again "centered about the time of interest".
I use 'filter([1 zeros(50-2,1) -1] / 50, 1, signal)' for that, but I'm not sure what I'm doing is right ...

then, I need to find peaks in that first difference that are bigger than +/- 9dB.
Now, I've been reading all over the place that dB is a relative measure and I need some value to measure it against to compute it. But if the mean of the signal is 0, can't I just so 20*log10(signal) to get the signal in dB? That's what I'm doing now, but the values don't seem to be right. also, I had to change that to 20*log10(signal + 1) to get rid of the really small values ... which to me seems like a really unclean way of handling a signal ...

if someone could point me in the right direction, any help would be greatly appreciated!

thanks!
Am 26.06.2006 um 16:38 schrieb Jeff Brower:

> Ernst-
>
>> First off, I'm pretty new to the field so bare with me ...
>>
>> I'm trying to implement a method of speech segmentation described in
>> "Landmark detection for distinctive feature-based speech recognition"
>> by
>> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...
>>
>> first, I need to compute the spectrogram of the signal I investigate,
>> using a 6-ms hamming window every 1 ms. no prob, I just use specgram
>> with
>> the apropriate settings
>>
>> then, I take the specgram times it's conjugate and seperate that into
>> a
>> couple of frequency bands.
>>
>> now, each of these bands needs to be smoothed by 20ms. So I just use
>> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from
>> the spectrogram. But what about the overlap?
>>
>> Also, the paper states that "a 20-ms average of the squared magnitude
>> of
>> the spectrogram, centered about the time of interest, is computed
>> every 1
>> ms." ... what does "centered about the time of interest" mean? and how
>> can that be implemented?
>
> Not sure but... that sounds like a running average -- sometimes this is
> called 'boxcar' filter. In that case, there would be no overlap (no
> recursive input). To center at time of interest, add previous 10
> magnitude frames and next 10 frames. For each new average calculation,
> drop the oldest and add the newest.
>
> -Jeff
>

Hm, with overlap, I didn't mean the filter's "a" coefficients, but the
overlap I used to calculate the specgram.

And about the centering: can this be done with filters or do I really
need to iterate through the signal?

thanks,
Ernst
Ernst-

> First off, I'm pretty new to the field so bare with me ...
>
> I'm trying to implement a method of speech segmentation described in
> "Landmark detection for distinctive feature-based speech recognition" by
> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...
>
> first, I need to compute the spectrogram of the signal I investigate,
> using a 6-ms hamming window every 1 ms. no prob, I just use specgram with
> the apropriate settings
>
> then, I take the specgram times it's conjugate and seperate that into a
> couple of frequency bands.
>
> now, each of these bands needs to be smoothed by 20ms. So I just use
> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from
> the spectrogram. But what about the overlap?
>
> Also, the paper states that "a 20-ms average of the squared magnitude of
> the spectrogram, centered about the time of interest, is computed every 1
> ms." ... what does "centered about the time of interest" mean? and how
> can that be implemented?

Not sure but... that sounds like a running average -- sometimes this is
called 'boxcar' filter. In that case, there would be no overlap (no
recursive input). To center at time of interest, add previous 10
magnitude frames and next 10 frames. For each new average calculation,
drop the oldest and add the newest.

-Jeff

> then, I need to compute the first difference of that 20ms-averaged-signal
> "every 1ms using a 50ms-time step", again "centered about the time of
> interest".
> I use 'filter([1 zeros(50-2,1) -1] / 50, 1, signal)' for that, but I'm not
> sure what I'm doing is right ...
>
> then, I need to find peaks in that first difference that are bigger than
> +/- 9dB.
> Now, I've been reading all over the place that dB is a relative measure
> and I need some value to measure it against to compute it. But if the mean
> of the signal is 0, can't I just so 20*log10(signal) to get the signal in
> dB? That's what I'm doing now, but the values don't seem to be right.
> also, I had to change that to 20*log10(signal + 1) to get rid of the
> really small values ... which to me seems like a really unclean way of
> handling a signal ...
>
> if someone could point me in the right direction, any help would be
> greatly appreciated!
>
> thanks!
Ernst-

> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the
> overlap I used to calculate the specgram.

What is the concern about overlap? You mentioned using Sharlene Lie's instructions
and said "I just use specgram with the apropriate settings" so I don't understand the
concern. Averaging resulting specgram frames is effective regardless of whether the
STFFT used overlap.

> And about the centering: can this be done with filters or do I really
> need to iterate through the signal?

Please clarify... what iteration? With an N-frame running average you add the most
recent frame and subtract the oldest. M adds are needed per specgram, where M is FFT
size.

-Jeff
Am 27.06.2006 um 17:11 schrieb Jeff Brower:

> Ernst-
>
>> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the
>> overlap I used to calculate the specgram.
>
> What is the concern about overlap? You mentioned using Sharlene Lie's
> instructions
> and said "I just use specgram with the apropriate settings" so I don't
> understand the
> concern. Averaging resulting specgram frames is effective regardless
> of whether the
> STFFT used overlap.
I meant that I'm "telling matlab" to compute the spectrum of a 6 ms
window every 1 ms.
Thus, if I for example average two such consecutive frames of the
spectrum, I'm actually averaging the spectrum of 7 ms of the original
signal ... right?
(or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is
the first instant, and t+1 the ms after,

x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,
plus
x(t+1) + x(t+2) + ... + x(t+6)
= x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)

)

so, of course averaging is effective, but isn't there some kind of
redundant information that corrupts the results?
>
>> And about the centering: can this be done with filters or do I really
>> need to iterate through the signal?
>
> Please clarify... what iteration? With an N-frame running average you
> add the most
> recent frame and subtract the oldest. M adds are needed per specgram,
> where M is FFT
> size.
>
> -Jeff

Exactly. I'm just unsure about the "centered about the time of
interest".
With "iterating", I meant that I could, for a signal x at each instant
t, compute

abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))

that would somehow be "centered at the time of interest", since it uses
frames before AND after the current instant to compute the average...
but that can't be done with filters, since I can't "look into the
future" of a signal with a filter, right?
thanks
-Ernst
Ernst-

> I meant that I'm "telling matlab" to compute the spectrum of a 6 ms
> window every 1 ms.
> Thus, if I for example average two such consecutive frames of the
> spectrum, I'm actually averaging the spectrum of 7 ms of the original
> signal ... right?
> (or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is
> the first instant, and t+1 the ms after,
>
> x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,
> plus
> x(t+1) + x(t+2) + ... + x(t+6)
> = x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)
>
> )
>
> so, of course averaging is effective, but isn't there some kind of
> redundant information that corrupts the results?

It may be redundant, but I don't see an issue with corruption. It would be like
calculating a running sum of 10 samples from an input stream, then calculating a
second running sum using initial running sum values. It's a linear operation, and
you could come up with a formula for it.

Part of the point with overlap is not just averaging. If you are applying a window
to time domain data prior to STFFT, then you need minimum 50% overlap to treat all
input data "equally" and counteract windowing truncation. The advantage is the
reduction in wideband (edge) noise you get from windowing, the disadvantage is loss
in freq resolution in the resulting spectrogram.

> Exactly. I'm just unsure about the "centered about the time of
> interest".
> With "iterating", I meant that I could, for a signal x at each instant
> t, compute
>
> abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))
>
> that would somehow be "centered at the time of interest", since it uses
> frames before AND after the current instant to compute the average...
> but that can't be done with filters, since I can't "look into the
> future" of a signal with a filter, right?

Sure you can if you are buffering the input -- which you are by calculating STFFT.
An N-point buffer adds delay (latency) of N/Fs sec to your processing; if you look in
the middle of each buffer then from that perspective you're looking N/2 points in the
past and N/2 in the future.

-Jeff