Hi, I am a new comer in speech processing. I have a little question  why do we
often have to split a long speech signal into shorter blocks with an overlap for
example  first block  from 0 to 20 ms (along time axis), second block from 10
ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I
just want to know the effect of this work  overlap.
And one more question is that  we often use DFT (FFT) to analyze the speech
signal with the assumption signals are periodic, but if we partition a signal
like above (with or without overlapping), this assumption will become fall (we
assume the long original signal is periodic, so blocks are not periodic in
correlation with the original one, so we can not use DFT or FTT for these
blocks), so can we still maintain/get properties (spectrum) of signals by
properties of the shorter blocks? Or do we do this with some assumptions but I
do not know.
Thank you for reading and explaining,
I am looking forward to hearing from you soon,
Duc, Nguyen Anh
overlapping in speech analysis
Started by ●September 17, 2009
Reply by ●September 23, 200920090923
Hi Abhishek Ballaney,
Thank you very much for your explanation, I totally agree with u that we have to segment a long observed signal into segments, but I just wondering why this goes with overlapping, in fact, this is more like overlapsave than overlapadd. I guess this is for better time resolution in STFT (the issue time resolution vs. frequency resolution in STFT analysis), but I need more details, especially the relation between the lengths, segment length and overlapping length, with the pitch of the speech. Could you give me some details?
I am very happy to discuss and get helped form you,
Best Wishes,
Duc, Nguyen Anh
 On Tue, 9/22/09, Abhishek Ballaney wrote:
From: Abhishek Ballaney
Subject: Re: [audiodsp] overlapping in speech analysis
To: d...@yahoo.com
Date: Tuesday, September 22, 2009, 1:14 AM
Dear Duc,
The overlapadd method is used to break long signals into smaller
segments for easier processing. There are many DSP applications where a long signal must be filtered in segments. With high data rate signals like video or hifi digital audio, it is common
for computers to have insufficient memory to simultaneously hold the
entire signal to be processed. There are also systems that process
segmentbysegment because they operate in real time.
Regards,
abhishek
 On Thu, 17/9/09, d...@yahoo.com wrote:
From: d...@yahoo.com
Subject: [audiodsp] overlapping in speech analysis
To: a...
Date: Thursday, 17 September, 2009, 7:48 AM
Hi, I am a new comer in speech processing. I have a little question  why do we often have to split a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties of the shorter blocks? Or do we do this with some assumptions but I do not know.
Thank you for reading and explaining,
I am looking forward to hearing from you soon,
Duc, Nguyen Anh
_____________________________________
Thank you very much for your explanation, I totally agree with u that we have to segment a long observed signal into segments, but I just wondering why this goes with overlapping, in fact, this is more like overlapsave than overlapadd. I guess this is for better time resolution in STFT (the issue time resolution vs. frequency resolution in STFT analysis), but I need more details, especially the relation between the lengths, segment length and overlapping length, with the pitch of the speech. Could you give me some details?
I am very happy to discuss and get helped form you,
Best Wishes,
Duc, Nguyen Anh
 On Tue, 9/22/09, Abhishek Ballaney wrote:
From: Abhishek Ballaney
Subject: Re: [audiodsp] overlapping in speech analysis
To: d...@yahoo.com
Date: Tuesday, September 22, 2009, 1:14 AM
Dear Duc,
The overlapadd method is used to break long signals into smaller
segments for easier processing. There are many DSP applications where a long signal must be filtered in segments. With high data rate signals like video or hifi digital audio, it is common
for computers to have insufficient memory to simultaneously hold the
entire signal to be processed. There are also systems that process
segmentbysegment because they operate in real time.
Regards,
abhishek
 On Thu, 17/9/09, d...@yahoo.com wrote:
From: d...@yahoo.com
Subject: [audiodsp] overlapping in speech analysis
To: a...
Date: Thursday, 17 September, 2009, 7:48 AM
Hi, I am a new comer in speech processing. I have a little question  why do we often have to split a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties of the shorter blocks? Or do we do this with some assumptions but I do not know.
Thank you for reading and explaining,
I am looking forward to hearing from you soon,
Duc, Nguyen Anh
_____________________________________
Reply by ●September 24, 200920090924
Duc
> Thank you very much for your explanation, I totally agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlapsave than
> overlapadd.
Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?
Typically segment length is decided based on the nature of the data. For example speech is considered
quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
just one example  Abhishek has to decide based on his system parameters.
These pages have additional details:
http://www.statemaster.com/encyclopedia/Short_timeFouriertransform
http://en.wikipedia.org/wiki/Window_function
http://en.wikipedia.org/wiki/Spectrogram
Jeff
>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: d...@yahoo.com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlapadd method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
> With high data rate signals like video or hifi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segmentbysegment because they operate in real time.
>
> Regards,
> abhishek
>
>  On Thu, 17/9/09, d...@yahoo.com wrote:
>
> From: d...@yahoo.com
> Subject: [audiodsp] overlapping in speech analysis
> To: a...
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
> Hi, I am a new comer in speech processing. I have a little question  why do we often have to split
> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
> Duc, Nguyen Anh
> Thank you very much for your explanation, I totally agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlapsave than
> overlapadd.
Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?
Typically segment length is decided based on the nature of the data. For example speech is considered
quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
just one example  Abhishek has to decide based on his system parameters.
These pages have additional details:
http://www.statemaster.com/encyclopedia/Short_timeFouriertransform
http://en.wikipedia.org/wiki/Window_function
http://en.wikipedia.org/wiki/Spectrogram
Jeff
>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: d...@yahoo.com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlapadd method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
> With high data rate signals like video or hifi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segmentbysegment because they operate in real time.
>
> Regards,
> abhishek
>
>  On Thu, 17/9/09, d...@yahoo.com wrote:
>
> From: d...@yahoo.com
> Subject: [audiodsp] overlapping in speech analysis
> To: a...
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
> Hi, I am a new comer in speech processing. I have a little question  why do we often have to split
> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
> Duc, Nguyen Anh
Reply by ●September 25, 200920090925
Dear Jeff Brower,
Thank you very much for your prompt reply.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the
final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
I am still a bit unclear about the purpose of overlapping here, could you please explain more or give me some references to get more information regarding "edge noise", "wideband noise" and the effect of overlapping in analyzing signals. Thank you in advance.
Nice to hear you again,
Best Regards,
Duc, Nguyen Anh
 On Wed, 9/23/09, Jeff Brower wrote:
From: Jeff Brower
Subject: Re: [audiodsp] overlapping in speech analysis
To: "Duc Nguyen Anh"
Cc: "Abhishek Ballaney" , a...
Date: Wednesday, September 23, 2009, 1:15 PM
Duc
> Thank you very much for your explanation, I totally agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlapsave than
> overlapadd.
Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?
Typically segment length is decided based on the nature of the data. For example speech is considered
quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
just one example  Abhishek has to decide based on his system parameters.
These pages have additional details:
http://www.statemas ter.com/encyclop edia/Short_ timeFourier transform
http://en.wikipedia .org/wiki/ Window_function
http://en.wikipedia .org/wiki/ Spectrogram
Jeff
>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: ducna80@yahoo. com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlapadd method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
> With high data rate signals like video or hifi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segmentbysegment because they operate in real time.
>
> Regards,
> abhishek
>
>  On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
> From: ducna80@yahoo. com
> Subject: [audiodsp] overlapping in speech analysis
> To: audiodsp@yahoogroup s.com
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
>
>
>
>
>
>
>
>
>
>
> Hi, I am a new comer in speech processing. I have a little question  why do we often have to split
> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
>
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
>
>
> Duc, Nguyen Anh
_____________________________________
Thank you very much for your prompt reply.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the
final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
I am still a bit unclear about the purpose of overlapping here, could you please explain more or give me some references to get more information regarding "edge noise", "wideband noise" and the effect of overlapping in analyzing signals. Thank you in advance.
Nice to hear you again,
Best Regards,
Duc, Nguyen Anh
 On Wed, 9/23/09, Jeff Brower wrote:
From: Jeff Brower
Subject: Re: [audiodsp] overlapping in speech analysis
To: "Duc Nguyen Anh"
Cc: "Abhishek Ballaney" , a...
Date: Wednesday, September 23, 2009, 1:15 PM
Duc
> Thank you very much for your explanation, I totally agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlapsave than
> overlapadd.
Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?
Typically segment length is decided based on the nature of the data. For example speech is considered
quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
just one example  Abhishek has to decide based on his system parameters.
These pages have additional details:
http://www.statemas ter.com/encyclop edia/Short_ timeFourier transform
http://en.wikipedia .org/wiki/ Window_function
http://en.wikipedia .org/wiki/ Spectrogram
Jeff
>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: ducna80@yahoo. com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlapadd method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
> With high data rate signals like video or hifi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segmentbysegment because they operate in real time.
>
> Regards,
> abhishek
>
>  On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
> From: ducna80@yahoo. com
> Subject: [audiodsp] overlapping in speech analysis
> To: audiodsp@yahoogroup s.com
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
>
>
>
>
>
>
>
>
>
>
> Hi, I am a new comer in speech processing. I have a little question  why do we often have to split
> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
>
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
>
>
> Duc, Nguyen Anh
_____________________________________
Reply by ●October 9, 200920091009
Duc
> Thank you very much for your prompt reply.
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
> one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
> to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or
> other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates
> wideband noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the
> final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
> I am still a bit unclear about the purpose of overlapping here, could you
> please explain more or give me some
> references to get more information regarding "edge noise", "wideband noise"
> and the effect of overlapping in analyzing signals. Thank you in advance.
Sorry I am slow to reply. To think about edge noise, consider that you give an FFT the following shape:
__
 
____ ____
The sides of the pulse are "edges". What frequencies are contained in an edge? Or more precisely, what frequencies
are contained in this shape:

________
which is also called a DiracDelta function when amplitude is infinitely large and pulse width is infinitely narrow.
If you know these answers, then you can see the problem if you don't apply a window to your time domain speech frames
prior to FFT. For example if your speech frame prior to FFT looks like this:
_
/\ / \_
 \__/ 
 
 Fr  Fr = frame size
then the FFT will see two "edges"  do you want frequencies due to those edges in your results? Are they actually
there in the original data? If not, then people might call those frequencies "noise" (re. your 'wide band noise'
question above). Now think about what happens if you apply a window prior to the FFT.
Jeff
>  On Wed, 9/23/09, Jeff Brower wrote:
>
> From: Jeff Brower
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: "Duc Nguyen Anh"
> Cc: "Abhishek Ballaney" , a...
> Date: Wednesday, September 23, 2009, 1:15 PM
>
> Duc
>
>> Thank you very much for your explanation, I totally agree with u
>> that we have to segment a long observed signal into
>> segments, but I just wondering why this goes with overlapping,
>> in fact, this is more like overlapsave than
>> overlapadd.
>
> Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
> data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
> reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
> inverse FFT.
>
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
>
>> I guess this is for better time resolution in STFT
>> (the issue time resolution vs. frequency resolution in
>> STFT analysis), but I need more details, especially the
>> relation between the lengths, segment length and overlapping
>> length, with the pitch of the speech. Could you give me
>> some details?
>
> Typically segment length is decided based on the nature of the data. For example speech is considered
> quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
> just one example  Abhishek has to decide based on his system parameters.
> These pages have additional details:
>
> http://www.statemas ter.com/encyclop edia/Short_ timeFourier transform
>
> http://en.wikipedia .org/wiki/ Window_function
>
> http://en.wikipedia .org/wiki/ Spectrogram
>
> Jeff
>
>>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
>>> From: Abhishek Ballaney > Subject: Re: [audiodsp] overlapping in speech analysis
>
>> To: ducna80@yahoo. com
>
>> Date: Tuesday, September 22, 2009, 1:14 AM
>
>>> Dear Duc,
>
>>> The overlapadd method is used to break long signals into smaller
>
>> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
>
>> With high data rate signals like video or hifi digital audio, it is common
>
>> for computers to have insufficient memory to simultaneously hold the
>
>> entire signal to be processed. There are also systems that process
>
>> segmentbysegment because they operate in real time.
>
>> Regards,
>
>> abhishek
>
>>  On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
>> From: ducna80@yahoo. com > Subject: [audiodsp] overlapping in speech analysis
>
>> To: audiodsp@yahoogroup s.com
>
>> Date: Thursday, 17 September, 2009, 7:48 AM
>
>> Hi, I am a new comer in speech processing. I have a little question  why do we often have to
>> split
>
>> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
>
>> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
>
>> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
>>> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
>
>> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall
>> (we
>
>> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
>
>> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by
>> properties
>
>> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
>> Thank you for reading and explaining,
>
>> I am looking forward to hearing from you soon,
>
>> Duc, Nguyen Anh
> Thank you very much for your prompt reply.
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
> one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
> to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or
> other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates
> wideband noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the
> final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
> I am still a bit unclear about the purpose of overlapping here, could you
> please explain more or give me some
> references to get more information regarding "edge noise", "wideband noise"
> and the effect of overlapping in analyzing signals. Thank you in advance.
Sorry I am slow to reply. To think about edge noise, consider that you give an FFT the following shape:
__
 
____ ____
The sides of the pulse are "edges". What frequencies are contained in an edge? Or more precisely, what frequencies
are contained in this shape:

________
which is also called a DiracDelta function when amplitude is infinitely large and pulse width is infinitely narrow.
If you know these answers, then you can see the problem if you don't apply a window to your time domain speech frames
prior to FFT. For example if your speech frame prior to FFT looks like this:
_
/\ / \_
 \__/ 
 
 Fr  Fr = frame size
then the FFT will see two "edges"  do you want frequencies due to those edges in your results? Are they actually
there in the original data? If not, then people might call those frequencies "noise" (re. your 'wide band noise'
question above). Now think about what happens if you apply a window prior to the FFT.
Jeff
>  On Wed, 9/23/09, Jeff Brower wrote:
>
> From: Jeff Brower
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: "Duc Nguyen Anh"
> Cc: "Abhishek Ballaney" , a...
> Date: Wednesday, September 23, 2009, 1:15 PM
>
> Duc
>
>> Thank you very much for your explanation, I totally agree with u
>> that we have to segment a long observed signal into
>> segments, but I just wondering why this goes with overlapping,
>> in fact, this is more like overlapsave than
>> overlapadd.
>
> Overlapadd and overlapsave have to do with accounting for FFT periodicity when applied to segmented or nonperiodic
> data, for example when performing convolution or correlation in the frequency domain. In such case, perfect
> reconstruction of segmented data is required so overlapadd or overlapsave is applied to time domain data *after*
> inverse FFT.
>
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc) to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech or other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates wideband noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
>
>> I guess this is for better time resolution in STFT
>> (the issue time resolution vs. frequency resolution in
>> STFT analysis), but I need more details, especially the
>> relation between the lengths, segment length and overlapping
>> length, with the pitch of the speech. Could you give me
>> some details?
>
> Typically segment length is decided based on the nature of the data. For example speech is considered
> quasistationary for about 15 msec, so Abhishek might use a 128 pt frame size if his sampling rate is 8 kHz. That's
> just one example  Abhishek has to decide based on his system parameters.
> These pages have additional details:
>
> http://www.statemas ter.com/encyclop edia/Short_ timeFourier transform
>
> http://en.wikipedia .org/wiki/ Window_function
>
> http://en.wikipedia .org/wiki/ Spectrogram
>
> Jeff
>
>>  On Tue, 9/22/09, Abhishek Ballaney wrote:
>
>>> From: Abhishek Ballaney > Subject: Re: [audiodsp] overlapping in speech analysis
>
>> To: ducna80@yahoo. com
>
>> Date: Tuesday, September 22, 2009, 1:14 AM
>
>>> Dear Duc,
>
>>> The overlapadd method is used to break long signals into smaller
>
>> segments for easier processing. There are many DSP applications where a long signal must be filtered in segments.
>
>> With high data rate signals like video or hifi digital audio, it is common
>
>> for computers to have insufficient memory to simultaneously hold the
>
>> entire signal to be processed. There are also systems that process
>
>> segmentbysegment because they operate in real time.
>
>> Regards,
>
>> abhishek
>
>>  On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
>> From: ducna80@yahoo. com > Subject: [audiodsp] overlapping in speech analysis
>
>> To: audiodsp@yahoogroup s.com
>
>> Date: Thursday, 17 September, 2009, 7:48 AM
>
>> Hi, I am a new comer in speech processing. I have a little question  why do we often have to
>> split
>
>> a long speech signal into shorter blocks with an overlap for example  first block  from 0 to 20 ms
>
>> (along time axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
>
>> consecutive blocks 10 ms. I just want to know the effect of this work  overlap.
>
>>> And one more question is that  we often use DFT (FFT) to analyze the speech signal with the assumption signals are
>
>> periodic, but if we partition a signal like above (with or without overlapping) , this assumption will become fall
>> (we
>
>> assume the long original signal is periodic, so blocks are not periodic in correlation with the original one, so we
>
>> can not use DFT or FTT for these blocks), so can we still maintain/get properties (spectrum) of signals by
>> properties
>
>> of the shorter blocks? Or do we do this with some assumptions but I do not know.
>
>> Thank you for reading and explaining,
>
>> I am looking forward to hearing from you soon,
>
>> Duc, Nguyen Anh