Forums

overlapping in speech analysis

Started by ducn...@yahoo.com September 17, 2009
Hi, I am a new comer in speech processing. I have a little question - why do we often have to split a long speech signal into shorter blocks with an overlap for example - first block - from 0 to 20 ms (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I just want to know the effect of this work - overlap.
And one more question is that - we often use DFT (FFT) to analyze the speech signal with the assumption signals are periodic, but if we partition a signal like above (with or without overlapping), this assumption will become fall (we assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties of the shorter blocks? Or do we do this with some assumptions but I do not know.

Thank you for reading and explaining,
I am looking forward to hearing from you soon,

Duc, Nguyen Anh
Hi Abhishek Ballaney,

Thank you very much for your explanation, I totally agree with u that we have to segment a long observed signal into segments, but I just wondering why this goes with overlapping, in fact, this is more like overlap-save than overlap-add. I guess this is for better time resolution in STFT (the issue time resolution vs. frequency resolution in STFT analysis), but I need more details, especially the relation between the lengths, segment length and overlapping length, with the pitch of the speech. Could you give me some details?

I am very happy to discuss and get helped form you,

Best Wishes,

Duc, Nguyen Anh

--- On Tue, 9/22/09, Abhishek Ballaney wrote:

From: Abhishek Ballaney
Subject: Re: [audiodsp] overlapping in speech analysis
To: d...@yahoo.com
Date: Tuesday, September 22, 2009, 1:14 AM

Dear Duc,

The overlap-add method is used to break long signals into smaller
segments for easier processing. There are many DSP applications where a long signal must be filtered in segments. With high data rate signals like video or hi-fi digital audio, it is common
for computers to have insufficient memory to simultaneously hold the
entire signal to be processed. There are also systems that process
segment-by-segment because they operate in real time.

Regards,
abhishek

--- On Thu, 17/9/09, d...@yahoo.com wrote:

From: d...@yahoo.com
Subject: [audiodsp] overlapping in speech analysis
To: a...
Date: Thursday, 17 September, 2009, 7:48 AM




Hi, I am a new comer in speech processing. I have a little question - why do we often have to split a long speech signal into shorter blocks with an overlap for example - first block - from 0 to 20 ms (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I just want to know the effect of this work - overlap.

And one more question is that - we often use DFT (FFT) to analyze the speech signal with the assumption signals are periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties of the shorter blocks? Or do we do this with some assumptions but I do not know.

Thank you for reading and explaining,

I am looking forward to hearing from you soon,

Duc, Nguyen Anh


_____________________________________
Duc-

> Thank you very much for your explanation, I totally agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlap-save than
> overlap-add.

Overlap-add and overlap-save have to do with accounting for FFT periodicity when applied to segmented or non-periodic
data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
reconstruction of segmented data is required so overlap-add or overlap-save is applied to time domain data *after*
inverse FFT.

Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates wide-band noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.

> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?

Typically segment length is decided based on the nature of the data. For example speech is considered
quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
just one example -- Abhishek has to decide based on his system parameters.

These pages have additional details:

http://www.statemaster.com/encyclopedia/Short_time-Fourier-transform

http://en.wikipedia.org/wiki/Window_function

http://en.wikipedia.org/wiki/Spectrogram

-Jeff

> --- On Tue, 9/22/09, Abhishek Ballaney wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: d...@yahoo.com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlap-add method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
> With high data rate signals like video or hi-fi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segment-by-segment because they operate in real time.
>
> Regards,
> abhishek
>
> --- On Thu, 17/9/09, d...@yahoo.com wrote:
>
> From: d...@yahoo.com
> Subject: [audiodsp] overlapping in speech analysis
> To: a...
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
> Hi, I am a new comer in speech processing. I have a little question - why do we often have to split
> a long speech signal into shorter blocks with an overlap for example - first block - from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work - overlap.
>
> And one more question is that - we often use DFT (FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
> Duc, Nguyen Anh
Dear Jeff Brower,

Thank you very much for your prompt reply.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wide-band noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the
final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
I am still a bit unclear about the purpose of overlapping here, could you please explain more or give me some references to get more information regarding "edge noise", "wide-band noise" and the effect of overlapping in analyzing signals. Thank you in advance.

Nice to hear you again,

Best Regards,

Duc, Nguyen Anh

--- On Wed, 9/23/09, Jeff Brower wrote:

From: Jeff Brower
Subject: Re: [audiodsp] overlapping in speech analysis
To: "Duc Nguyen Anh"
Cc: "Abhishek Ballaney" , a...
Date: Wednesday, September 23, 2009, 1:15 PM




Duc-

> Thank you very much for your explanation, I totally agree with u

> that we have to segment a long observed signal into

> segments, but I just wondering why this goes with overlapping,

> in fact, this is more like overlap-save than

> overlap-add.

Overlap-add and overlap-save have to do with accounting for FFT periodicity when applied to segmented or non-periodic

data, for example when performing convolution or correlation in the frequency domain. In such case, perfect

reconstruction of segmented data is required so overlap-add or overlap-save is applied to time domain data *after*

inverse FFT.

Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally

overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise

effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a

combination like 50% overlap and Hanning window is used... this eliminates wide-band noise due to segmentation, while

still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window

weighting). The tradeoff is some loss in frequency domain precision.

> I guess this is for better time resolution in STFT

> (the issue time resolution vs. frequency resolution in

> STFT analysis), but I need more details, especially the

> relation between the lengths, segment length and overlapping

> length, with the pitch of the speech. Could you give me

> some details?

Typically segment length is decided based on the nature of the data. For example speech is considered

quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's

just one example -- Abhishek has to decide based on his system parameters.

These pages have additional details:

http://www.statemas ter.com/encyclop edia/Short_ time-Fourier- transform

http://en.wikipedia .org/wiki/ Window_function

http://en.wikipedia .org/wiki/ Spectrogram

-Jeff

> --- On Tue, 9/22/09, Abhishek Ballaney wrote:

>

> From: Abhishek Ballaney

> Subject: Re: [audiodsp] overlapping in speech analysis

> To: ducna80@yahoo. com

> Date: Tuesday, September 22, 2009, 1:14 AM

>

> Dear Duc,

>

> The overlap-add method is used to break long signals into smaller

> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.

> With high data rate signals like video or hi-fi digital audio, it is common

> for computers to have insufficient memory to simultaneously hold the

> entire signal to be processed. There are also systems that process

> segment-by-segment because they operate in real time.

>

> Regards,

> abhishek

>

> --- On Thu, 17/9/09, ducna80@yahoo. com wrote:

>

> From: ducna80@yahoo. com

> Subject: [audiodsp] overlapping in speech analysis

> To: audiodsp@yahoogroup s.com

> Date: Thursday, 17 September, 2009, 7:48 AM

>

>

>

>

>

>

>

>

>

>

>

>

> Hi, I am a new comer in speech processing. I have a little question - why do we often have to split

> a long speech signal into shorter blocks with an overlap for example - first block - from 0 to 20 ms

> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2

> consecutive blocks 10 ms. I just want to know the effect of this work - overlap.

>

> And one more question is that - we often use DFT (FFT) to analyze the speech signal with the assumption signals are

> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we

> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we

> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties

> of the shorter blocks? Or do we do this with some assumptions but I do not know.

>

>

>

> Thank you for reading and explaining,

>

> I am looking forward to hearing from you soon,

>

>

>

> Duc, Nguyen Anh

















_____________________________________
Duc-

> Thank you very much for your prompt reply.
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
> one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
> to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or
> other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates
> wide-band noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the
> final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
> I am still a bit unclear about the purpose of overlapping here, could you
> please explain more or give me some
> references to get more information regarding "edge noise", "wide-band noise"
> and the effect of overlapping in analyzing signals. Thank you in advance.

Sorry I am slow to reply. To think about edge noise, consider that you give an FFT the following shape:
__
| |
____| |____

The sides of the pulse are "edges". What frequencies are contained in an edge? Or more precisely, what frequencies
are contained in this shape:

|
____|____

which is also called a Dirac-Delta function when amplitude is infinitely large and pulse width is infinitely narrow.

If you know these answers, then you can see the problem if you don't apply a window to your time domain speech frames
prior to FFT. For example if your speech frame prior to FFT looks like this:
_
/\ / \_
| \__/ |
| |

|--- Fr ---| Fr = frame size

then the FFT will see two "edges" -- do you want frequencies due to those edges in your results? Are they actually
there in the original data? If not, then people might call those frequencies "noise" (re. your 'wide band noise'
question above). Now think about what happens if you apply a window prior to the FFT.

-Jeff

> --- On Wed, 9/23/09, Jeff Brower wrote:
>
> From: Jeff Brower
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: "Duc Nguyen Anh"
> Cc: "Abhishek Ballaney" , a...
> Date: Wednesday, September 23, 2009, 1:15 PM
>
> Duc-
>
>> Thank you very much for your explanation, I totally agree with u
>> that we have to segment a long observed signal into
>> segments, but I just wondering why this goes with overlapping,
>> in fact, this is more like overlap-save than
>> overlap-add.
>
> Overlap-add and overlap-save have to do with accounting for FFT periodicity when applied to segmented or non-periodic
> data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
> reconstruction of segmented data is required so overlap-add or overlap-save is applied to time domain data *after*
> inverse FFT.
>
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates wide-band noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
>
>> I guess this is for better time resolution in STFT
>> (the issue time resolution vs. frequency resolution in
>> STFT analysis), but I need more details, especially the
>> relation between the lengths, segment length and overlapping
>> length, with the pitch of the speech. Could you give me
>> some details?
>
> Typically segment length is decided based on the nature of the data. For example speech is considered
> quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
> just one example -- Abhishek has to decide based on his system parameters.
> These pages have additional details:
>
> http://www.statemas ter.com/encyclop edia/Short_ time-Fourier- transform
>
> http://en.wikipedia .org/wiki/ Window_function
>
> http://en.wikipedia .org/wiki/ Spectrogram
>
> -Jeff
>
>> --- On Tue, 9/22/09, Abhishek Ballaney wrote:
>
>>> From: Abhishek Ballaney > Subject: Re: [audiodsp] overlapping in speech analysis
>
>> To: ducna80@yahoo. com
>
>> Date: Tuesday, September 22, 2009, 1:14 AM
>
>>> Dear Duc,
>
>>> The overlap-add method is used to break long signals into smaller
>
>> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
>
>> With high data rate signals like video or hi-fi digital audio, it is common
>
>> for computers to have insufficient memory to simultaneously hold the
>
>> entire signal to be processed. There are also systems that process
>
>> segment-by-segment because they operate in real time.
>
>> Regards,
>
>> abhishek
>
>> --- On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
>> From: ducna80@yahoo. com > Subject: [audiodsp] overlapping in speech analysis
>
>> To: audiodsp@yahoogroup s.com
>
>> Date: Thursday, 17 September, 2009, 7:48 AM
>
>> Hi, I am a new comer in speech processing. I have a little question - why do we often have to
>> split
>
>> a long speech signal into shorter blocks with an overlap for example - first block - from 0 to 20 ms
>
>> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
>
>> consecutive blocks 10 ms. I just want to know the effect of this work - overlap.
>
>>> And one more question is that - we often use DFT (FFT) to analyze the speech signal with the assumption signals are
>
>> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall
>> (we
>
>> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
>
>> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by
>> properties
>
>> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
>> Thank you for reading and explaining,
>
>> I am looking forward to hearing from you soon,
>
>> Duc, Nguyen Anh