Hi!

First off, I'm pretty new to the field so bare with me ...

I'm trying to implement a method of speech segmentation described in "Landmark
detection for distinctive feature-based speech recognition" by Sharlene Lie
(JASA Nov. 96) and ran into a couple of questions ...

first, I need to compute the spectrogram of the signal I investigate, using a
6-ms hamming window every 1 ms. no prob, I just use specgram with the apropriate
settings

then, I take the specgram times it's conjugate and seperate that into a couple
of frequency bands.

now, each of these bands needs to be smoothed by 20ms. So I just use
'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from the
spectrogram. But what about the overlap?

Also, the paper states that "a 20-ms average of the squared magnitude of the
spectrogram, centered about the time of interest, is computed every 1 ms." ...
what does "centered about the time of interest" mean? and how can that be
implemented?

then, I need to compute the first difference of that 20ms-averaged-signal "every
1ms using a 50ms-time step", again "centered about the time of interest".

I use 'filter([1 zeros(50-2,1) -1] / 50, 1, signal)' for that, but I'm not sure
what I'm doing is right ...

then, I need to find peaks in that first difference that are bigger than +/-
9dB.

Now, I've been reading all over the place that dB is a relative measure and I
need some value to measure it against to compute it. But if the mean of the
signal is 0, can't I just so 20*log10(signal) to get the signal in dB? That's
what I'm doing now, but the values don't seem to be right. also, I had to change
that to 20*log10(signal + 1) to get rid of the really small values ... which to
me seems like a really unclean way of handling a signal ...

if someone could point me in the right direction, any help would be greatly
appreciated!

thanks!

# speech segmentation / filterin questions

Started by ●June 26, 2006

Reply by ●June 27, 20062006-06-27

Am 26.06.2006 um 16:38 schrieb Jeff Brower:

> Ernst-

>

>> First off, I'm pretty new to the field so bare with me ...

>>

>> I'm trying to implement a method of speech segmentation described in

>> "Landmark detection for distinctive feature-based speech recognition"

>> by

>> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...

>>

>> first, I need to compute the spectrogram of the signal I investigate,

>> using a 6-ms hamming window every 1 ms. no prob, I just use specgram

>> with

>> the apropriate settings

>>

>> then, I take the specgram times it's conjugate and seperate that into

>> a

>> couple of frequency bands.

>>

>> now, each of these bands needs to be smoothed by 20ms. So I just use

>> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from

>> the spectrogram. But what about the overlap?

>>

>> Also, the paper states that "a 20-ms average of the squared magnitude

>> of

>> the spectrogram, centered about the time of interest, is computed

>> every 1

>> ms." ... what does "centered about the time of interest" mean? and how

>> can that be implemented?

>

> Not sure but... that sounds like a running average -- sometimes this is

> called 'boxcar' filter. In that case, there would be no overlap (no

> recursive input). To center at time of interest, add previous 10

> magnitude frames and next 10 frames. For each new average calculation,

> drop the oldest and add the newest.

>

> -Jeff

>

Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

overlap I used to calculate the specgram.

And about the centering: can this be done with filters or do I really

need to iterate through the signal?

thanks,

Ernst

> Ernst-

>

>> First off, I'm pretty new to the field so bare with me ...

>>

>> I'm trying to implement a method of speech segmentation described in

>> "Landmark detection for distinctive feature-based speech recognition"

>> by

>> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...

>>

>> first, I need to compute the spectrogram of the signal I investigate,

>> using a 6-ms hamming window every 1 ms. no prob, I just use specgram

>> with

>> the apropriate settings

>>

>> then, I take the specgram times it's conjugate and seperate that into

>> a

>> couple of frequency bands.

>>

>> now, each of these bands needs to be smoothed by 20ms. So I just use

>> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from

>> the spectrogram. But what about the overlap?

>>

>> Also, the paper states that "a 20-ms average of the squared magnitude

>> of

>> the spectrogram, centered about the time of interest, is computed

>> every 1

>> ms." ... what does "centered about the time of interest" mean? and how

>> can that be implemented?

>

> Not sure but... that sounds like a running average -- sometimes this is

> called 'boxcar' filter. In that case, there would be no overlap (no

> recursive input). To center at time of interest, add previous 10

> magnitude frames and next 10 frames. For each new average calculation,

> drop the oldest and add the newest.

>

> -Jeff

>

Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

overlap I used to calculate the specgram.

And about the centering: can this be done with filters or do I really

need to iterate through the signal?

thanks,

Ernst

Reply by ●June 27, 20062006-06-27

Ernst-

> First off, I'm pretty new to the field so bare with me ...

>

> I'm trying to implement a method of speech segmentation described in

> "Landmark detection for distinctive feature-based speech recognition" by

> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...

>

> first, I need to compute the spectrogram of the signal I investigate,

> using a 6-ms hamming window every 1 ms. no prob, I just use specgram with

> the apropriate settings

>

> then, I take the specgram times it's conjugate and seperate that into a

> couple of frequency bands.

>

> now, each of these bands needs to be smoothed by 20ms. So I just use

> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from

> the spectrogram. But what about the overlap?

>

> Also, the paper states that "a 20-ms average of the squared magnitude of

> the spectrogram, centered about the time of interest, is computed every 1

> ms." ... what does "centered about the time of interest" mean? and how

> can that be implemented?

Not sure but... that sounds like a running average -- sometimes this is

called 'boxcar' filter. In that case, there would be no overlap (no

recursive input). To center at time of interest, add previous 10

magnitude frames and next 10 frames. For each new average calculation,

drop the oldest and add the newest.

-Jeff

> then, I need to compute the first difference of that 20ms-averaged-signal

> "every 1ms using a 50ms-time step", again "centered about the time of

> interest".

> I use 'filter([1 zeros(50-2,1) -1] / 50, 1, signal)' for that, but I'm not

> sure what I'm doing is right ...

>

> then, I need to find peaks in that first difference that are bigger than

> +/- 9dB.

> Now, I've been reading all over the place that dB is a relative measure

> and I need some value to measure it against to compute it. But if the mean

> of the signal is 0, can't I just so 20*log10(signal) to get the signal in

> dB? That's what I'm doing now, but the values don't seem to be right.

> also, I had to change that to 20*log10(signal + 1) to get rid of the

> really small values ... which to me seems like a really unclean way of

> handling a signal ...

>

> if someone could point me in the right direction, any help would be

> greatly appreciated!

>

> thanks!

> First off, I'm pretty new to the field so bare with me ...

>

> I'm trying to implement a method of speech segmentation described in

> "Landmark detection for distinctive feature-based speech recognition" by

> Sharlene Lie (JASA Nov. 96) and ran into a couple of questions ...

>

> first, I need to compute the spectrogram of the signal I investigate,

> using a 6-ms hamming window every 1 ms. no prob, I just use specgram with

> the apropriate settings

>

> then, I take the specgram times it's conjugate and seperate that into a

> couple of frequency bands.

>

> now, each of these bands needs to be smoothed by 20ms. So I just use

> 'filter(ones(20,1),1,signal)' since I allready have 1ms per frame from

> the spectrogram. But what about the overlap?

>

> Also, the paper states that "a 20-ms average of the squared magnitude of

> the spectrogram, centered about the time of interest, is computed every 1

> ms." ... what does "centered about the time of interest" mean? and how

> can that be implemented?

Not sure but... that sounds like a running average -- sometimes this is

called 'boxcar' filter. In that case, there would be no overlap (no

recursive input). To center at time of interest, add previous 10

magnitude frames and next 10 frames. For each new average calculation,

drop the oldest and add the newest.

-Jeff

> then, I need to compute the first difference of that 20ms-averaged-signal

> "every 1ms using a 50ms-time step", again "centered about the time of

> interest".

> I use 'filter([1 zeros(50-2,1) -1] / 50, 1, signal)' for that, but I'm not

> sure what I'm doing is right ...

>

> then, I need to find peaks in that first difference that are bigger than

> +/- 9dB.

> Now, I've been reading all over the place that dB is a relative measure

> and I need some value to measure it against to compute it. But if the mean

> of the signal is 0, can't I just so 20*log10(signal) to get the signal in

> dB? That's what I'm doing now, but the values don't seem to be right.

> also, I had to change that to 20*log10(signal + 1) to get rid of the

> really small values ... which to me seems like a really unclean way of

> handling a signal ...

>

> if someone could point me in the right direction, any help would be

> greatly appreciated!

>

> thanks!

Reply by ●June 28, 20062006-06-28

Ernst-

> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

> overlap I used to calculate the specgram.

What is the concern about overlap? You mentioned using Sharlene Lie's instructions

and said "I just use specgram with the apropriate settings" so I don't understand the

concern. Averaging resulting specgram frames is effective regardless of whether the

STFFT used overlap.

> And about the centering: can this be done with filters or do I really

> need to iterate through the signal?

Please clarify... what iteration? With an N-frame running average you add the most

recent frame and subtract the oldest. M adds are needed per specgram, where M is FFT

size.

-Jeff

> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

> overlap I used to calculate the specgram.

What is the concern about overlap? You mentioned using Sharlene Lie's instructions

and said "I just use specgram with the apropriate settings" so I don't understand the

concern. Averaging resulting specgram frames is effective regardless of whether the

STFFT used overlap.

> And about the centering: can this be done with filters or do I really

> need to iterate through the signal?

Please clarify... what iteration? With an N-frame running average you add the most

recent frame and subtract the oldest. M adds are needed per specgram, where M is FFT

size.

-Jeff

Reply by ●June 28, 20062006-06-28

Am 27.06.2006 um 17:11 schrieb Jeff Brower:

> Ernst-

>

>> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

>> overlap I used to calculate the specgram.

>

> What is the concern about overlap? You mentioned using Sharlene Lie's

> instructions

> and said "I just use specgram with the apropriate settings" so I don't

> understand the

> concern. Averaging resulting specgram frames is effective regardless

> of whether the

> STFFT used overlap.

I meant that I'm "telling matlab" to compute the spectrum of a 6 ms

window every 1 ms.

Thus, if I for example average two such consecutive frames of the

spectrum, I'm actually averaging the spectrum of 7 ms of the original

signal ... right?

(or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is

the first instant, and t+1 the ms after,

x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,

plus

x(t+1) + x(t+2) + ... + x(t+6)

= x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)

)

so, of course averaging is effective, but isn't there some kind of

redundant information that corrupts the results?

>

>> And about the centering: can this be done with filters or do I really

>> need to iterate through the signal?

>

> Please clarify... what iteration? With an N-frame running average you

> add the most

> recent frame and subtract the oldest. M adds are needed per specgram,

> where M is FFT

> size.

>

> -Jeff

Exactly. I'm just unsure about the "centered about the time of

interest".

With "iterating", I meant that I could, for a signal x at each instant

t, compute

abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))

that would somehow be "centered at the time of interest", since it uses

frames before AND after the current instant to compute the average...

but that can't be done with filters, since I can't "look into the

future" of a signal with a filter, right?

thanks

-Ernst

> Ernst-

>

>> Hm, with overlap, I didn't mean the filter's "a" coefficients, but the

>> overlap I used to calculate the specgram.

>

> What is the concern about overlap? You mentioned using Sharlene Lie's

> instructions

> and said "I just use specgram with the apropriate settings" so I don't

> understand the

> concern. Averaging resulting specgram frames is effective regardless

> of whether the

> STFFT used overlap.

I meant that I'm "telling matlab" to compute the spectrum of a 6 ms

window every 1 ms.

Thus, if I for example average two such consecutive frames of the

spectrum, I'm actually averaging the spectrum of 7 ms of the original

signal ... right?

(or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is

the first instant, and t+1 the ms after,

x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,

plus

x(t+1) + x(t+2) + ... + x(t+6)

= x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)

)

so, of course averaging is effective, but isn't there some kind of

redundant information that corrupts the results?

>

>> And about the centering: can this be done with filters or do I really

>> need to iterate through the signal?

>

> Please clarify... what iteration? With an N-frame running average you

> add the most

> recent frame and subtract the oldest. M adds are needed per specgram,

> where M is FFT

> size.

>

> -Jeff

Exactly. I'm just unsure about the "centered about the time of

interest".

With "iterating", I meant that I could, for a signal x at each instant

t, compute

abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))

that would somehow be "centered at the time of interest", since it uses

frames before AND after the current instant to compute the average...

but that can't be done with filters, since I can't "look into the

future" of a signal with a filter, right?

thanks

-Ernst

Reply by ●July 9, 20062006-07-09

Ernst-

> I meant that I'm "telling matlab" to compute the spectrum of a 6 ms

> window every 1 ms.

> Thus, if I for example average two such consecutive frames of the

> spectrum, I'm actually averaging the spectrum of 7 ms of the original

> signal ... right?

> (or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is

> the first instant, and t+1 the ms after,

>

> x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,

> plus

> x(t+1) + x(t+2) + ... + x(t+6)

> = x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)

>

> )

>

> so, of course averaging is effective, but isn't there some kind of

> redundant information that corrupts the results?

It may be redundant, but I don't see an issue with corruption. It would be like

calculating a running sum of 10 samples from an input stream, then calculating a

second running sum using initial running sum values. It's a linear operation, and

you could come up with a formula for it.

Part of the point with overlap is not just averaging. If you are applying a window

to time domain data prior to STFFT, then you need minimum 50% overlap to treat all

input data "equally" and counteract windowing truncation. The advantage is the

reduction in wideband (edge) noise you get from windowing, the disadvantage is loss

in freq resolution in the resulting spectrogram.

> Exactly. I'm just unsure about the "centered about the time of

> interest".

> With "iterating", I meant that I could, for a signal x at each instant

> t, compute

>

> abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))

>

> that would somehow be "centered at the time of interest", since it uses

> frames before AND after the current instant to compute the average...

> but that can't be done with filters, since I can't "look into the

> future" of a signal with a filter, right?

Sure you can if you are buffering the input -- which you are by calculating STFFT.

An N-point buffer adds delay (latency) of N/Fs sec to your processing; if you look in

the middle of each buffer then from that perspective you're looking N/2 points in the

past and N/2 in the future.

-Jeff

> I meant that I'm "telling matlab" to compute the spectrum of a 6 ms

> window every 1 ms.

> Thus, if I for example average two such consecutive frames of the

> spectrum, I'm actually averaging the spectrum of 7 ms of the original

> signal ... right?

> (or more actually, 1ms + 5ms overlap + 1ms + 5ms overlap, so if "t" is

> the first instant, and t+1 the ms after,

>

> x(t) + x(t+1) + x(t+2) + x(t+3) + x(t+4) + x(t+5) for the first ms,

> plus

> x(t+1) + x(t+2) + ... + x(t+6)

> = x(t) + 2*(x(t+1) + x(t+2) + ... + x(t+5)) + x(t+6)

>

> )

>

> so, of course averaging is effective, but isn't there some kind of

> redundant information that corrupts the results?

It may be redundant, but I don't see an issue with corruption. It would be like

calculating a running sum of 10 samples from an input stream, then calculating a

second running sum using initial running sum values. It's a linear operation, and

you could come up with a formula for it.

Part of the point with overlap is not just averaging. If you are applying a window

to time domain data prior to STFFT, then you need minimum 50% overlap to treat all

input data "equally" and counteract windowing truncation. The advantage is the

reduction in wideband (edge) noise you get from windowing, the disadvantage is loss

in freq resolution in the resulting spectrogram.

> Exactly. I'm just unsure about the "centered about the time of

> interest".

> With "iterating", I meant that I could, for a signal x at each instant

> t, compute

>

> abs(x(t)-x(t-n)) + abs(x(t) - x(t+n))

>

> that would somehow be "centered at the time of interest", since it uses

> frames before AND after the current instant to compute the average...

> but that can't be done with filters, since I can't "look into the

> future" of a signal with a filter, right?

Sure you can if you are buffering the input -- which you are by calculating STFFT.

An N-point buffer adds delay (latency) of N/Fs sec to your processing; if you look in

the middle of each buffer then from that perspective you're looking N/2 points in the

past and N/2 in the future.

-Jeff