> Thank you very much for your prompt reply.
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis
(as
> one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman,
etc)
> to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech
or
> other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates
> wide-band noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the
> final STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
> I am still a bit unclear about the purpose of overlapping here, could you
> please explain more or give me some
> references to get more information regarding "edge noise", "wide-band
noise"
> and the effect of overlapping in analyzing signals. Thank you in advance.
Sorry I am slow to reply. To think about edge noise, consider that you give an
FFT the following shape:
__
| |
____| |____
The sides of the pulse are "edges". What frequencies are contained in an edge?
Or more precisely, what frequencies
are contained in this shape:
|
____|____
which is also called a Dirac-Delta function when amplitude is infinitely large
and pulse width is infinitely narrow.
If you know these answers, then you can see the problem if you don't apply
a window to your time domain speech frames
prior to FFT. For example if your speech frame prior to FFT looks like this:
_
/\ / \_
| \__/ |
| |
|--- Fr ---| Fr = frame size
then the FFT will see two "edges" -- do you want frequencies due to those edges
in your results? Are they actually
there in the original data? If not, then people might call those frequencies
"noise" (re. your 'wide band noise'
question above). Now think about what happens if you apply a window prior to
the FFT.
-Jeff
> --- On Wed, 9/23/09, Jeff Brower
wrote:
>
> From: Jeff Brower
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: "Duc Nguyen Anh"
> Cc: "Abhishek Ballaney" , a...
> Date: Wednesday, September 23, 2009, 1:15 PM
>
> Duc-
>
>> Thank you very much for your explanation, I totally agree with u
>> that we have to segment a long observed signal into
>> segments, but I just wondering why this goes with overlapping,
>> in fact, this is more like overlap-save than
>> overlap-add.
>
> Overlap-add and overlap-save have to do with accounting for FFT periodicity
when applied to segmented or non-periodic
> data, for example when performing convolution or correlation in the frequency
domain. In such case, perfect
> reconstruction of segmented data is required so overlap-add or overlap-save is
applied to time domain data *after*
> inverse FFT.
>
> Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
> overlap is combined with a time domain window (Hamming, Hanning, Blackman,
etc) to avoid "edge noise"; i.e. noise
> effects due to arbitarily segmenting continuous time domain data (like speech
or other audio). Typically a
> combination like 50% overlap and Hanning window is used... this eliminates
wide-band noise due to segmentation, while
> still allowing each time domain sample to "contribute equally" to the final
STFT result (i.e. compensate for window
> weighting). The tradeoff is some loss in frequency domain precision.
>
>> I guess this is for better time resolution in STFT
>> (the issue time resolution vs. frequency resolution in
>> STFT analysis), but I need more details, especially the
>> relation between the lengths, segment length and overlapping
>> length, with the pitch of the speech. Could you give me
>> some details?
>
> Typically segment length is decided based on the nature of the data. For
example speech is considered
> quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size
if his sampling rate is 8 kHz. That's
> just one example -- Abhishek has to decide based on his system parameters.
> These pages have additional details:
>
> http://www.statemas ter.com/encyclop edia/Short_ time-Fourier- transform
>
> http://en.wikipedia .org/wiki/ Window_function
>
> http://en.wikipedia .org/wiki/ Spectrogram
>
> -Jeff
>
>> --- On Tue, 9/22/09, Abhishek Ballaney wrote:
>
>>> From: Abhishek Ballaney > Subject: Re: [audiodsp]
overlapping in speech analysis
>
>> To: ducna80@yahoo. com
>
>> Date: Tuesday, September 22, 2009, 1:14 AM
>
>>> Dear Duc,
>
>>> The overlap-add method is used to break long signals into smaller
>
>> segments for easier processing. There are many DSP applications where a long
signal must be filtered in segments.
>
>> With high data rate signals like video or hi-fi digital audio, it is
common
>
>> for computers to have insufficient memory to simultaneously hold the
>
>> entire signal to be processed. There are also systems that process
>
>> segment-by-segment because they operate in real time.
>
>> Regards,
>
>> abhishek
>
>> --- On Thu, 17/9/09, ducna80@yahoo. com wrote:
>
>> From: ducna80@yahoo. com > Subject: [audiodsp]
overlapping in speech analysis
>
>> To: audiodsp@yahoogroup s.com
>
>> Date: Thursday, 17 September, 2009, 7:48 AM
>
>> Hi, I am a new comer in speech processing. I have a little
question - why do we often have to
>> split
>
>> a long speech signal into shorter blocks with an overlap for example - first
block - from 0 to 20 ms
>
>> (along time axis), second block from 10 ms to 30 ms ... and so on with the
overlap between 2
>
>> consecutive blocks 10 ms. I just want to know the effect of this work -
overlap.
>
>>> And one more question is that - we often use DFT (FFT) to analyze the speech
signal with the assumption signals are
>
>> periodic, but if we partition a signal like above (with or without
overlapping) , this assumption will become fall
>> (we
>
>> assume the long original signal is periodic, so blocks are not periodic in
correlation with the original one, so we
>
>> can not use DFT or FTT for these blocks), so can we still maintain/get
properties (spectrum) of signals by
>> properties
>
>> of the shorter blocks? Or do we do this with some assumptions but I do not
know.
>
>> Thank you for reading and explaining,
>
>> I am looking forward to hearing from you soon,
>
>> Duc, Nguyen Anh
Reply by Duc Nguyen Anh●September 25, 20092009-09-25
Dear Jeff Brower,
Thank you very much for your prompt reply.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman,
etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech
or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wide-band noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the
final STFT result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
I am still a bit unclear about the purpose of overlapping here, could you
please explain more or give me some references to get more information regarding
"edge noise", "wide-band noise" and the effect of overlapping in analyzing
signals. Thank you in advance.
Nice to hear you again,
Best Regards,
Duc, Nguyen Anh
--- On Wed, 9/23/09, Jeff Brower wrote:
From: Jeff Brower
Subject: Re: [audiodsp] overlapping in speech analysis
To: "Duc Nguyen Anh"
Cc: "Abhishek Ballaney" , a...
Date: Wednesday, September 23, 2009, 1:15 PM
Duc-
> Thank you very much for your explanation, I totally
agree with u
> that we have to segment a long observed signal
into
> segments, but I just wondering why this goes with
overlapping,
> in fact, this is more like overlap-save than
> overlap-add.
Overlap-add and overlap-save have to do with accounting for FFT periodicity when
applied to segmented or non-periodic
data, for example when performing convolution or correlation in the frequency
domain. In such case, perfect
reconstruction of segmented data is required so overlap-add or overlap-save is
applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wide-band noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT
result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution
in
> STFT analysis), but I need more details, especially
the
> relation between the lengths, segment length and
overlapping
> length, with the pitch of the speech. Could you give
me
> some details?
Typically segment length is decided based on the nature of the data. For
example speech is considered
quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size if
his sampling rate is 8 kHz. That's
just one example -- Abhishek has to decide based on his system parameters.
> Subject: Re: [audiodsp] overlapping in speech
analysis
> To: ducna80@yahoo. com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlap-add method is used to break long signals
into smaller
> segments for easier processing. There are many DSP
applications where a long signal must be filtered in segments.
> With high data rate signals like video or hi-fi
digital audio, it is common
> for computers to have insufficient memory to
simultaneously hold the
> entire signal to be processed. There are also systems
that process
> segment-by-segment because they operate in real
time.
>
> Regards,
> abhishek
>
> --- On Thu, 17/9/09, ducna80@yahoo. com
wrote:
>
> From: ducna80@yahoo. com
> Subject: [audiodsp] overlapping in speech analysis
> To: audiodsp@yahoogroup s.com
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
>
>
>
>
>
>
>
>
>
>
> Hi, I am a new comer in speech
processing. I have a little question - why do we often have to split
> a long speech signal into shorter blocks with an
overlap for example - first block - from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms
... and so on with the overlap between 2
> consecutive blocks 10 ms. I just want to know the
effect of this work - overlap.
>
> And one more question is that - we often use DFT
(FFT) to analyze the speech signal with the assumption signals are
> periodic, but if we partition a signal like above
(with or without overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so
blocks are not periodic in correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we
still maintain/get properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some
assumptions but I do not know.
>
>
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
>
>
> Duc, Nguyen Anh
_____________________________________
Reply by Jeff Brower●September 24, 20092009-09-24
Duc-
> Thank you very much for your explanation, I totally
agree with u
> that we have to segment a long observed signal into
> segments, but I just wondering why this goes with overlapping,
> in fact, this is more like overlap-save than
> overlap-add.
Overlap-add and overlap-save have to do with accounting for FFT periodicity when
applied to segmented or non-periodic
data, for example when performing convolution or correlation in the frequency
domain. In such case, perfect
reconstruction of segmented data is required so overlap-add or overlap-save is
applied to time domain data *after*
inverse FFT.
Overlap *prior* to FFT, done in the time domain and used in STFT analysis (as
one example), is more basic. Normally
overlap is combined with a time domain window (Hamming, Hanning, Blackman, etc)
to avoid "edge noise"; i.e. noise
effects due to arbitarily segmenting continuous time domain data (like speech or
other audio). Typically a
combination like 50% overlap and Hanning window is used... this eliminates
wide-band noise due to segmentation, while
still allowing each time domain sample to "contribute equally" to the final STFT
result (i.e. compensate for window
weighting). The tradeoff is some loss in frequency domain precision.
> I guess this is for better time resolution in STFT
> (the issue time resolution vs. frequency resolution in
> STFT analysis), but I need more details, especially the
> relation between the lengths, segment length and overlapping
> length, with the pitch of the speech. Could you give me
> some details?
Typically segment length is decided based on the nature of the data. For
example speech is considered
quasi-stationary for about 15 msec, so Abhishek might use a 128 pt frame size if
his sampling rate is 8 kHz. That's
just one example -- Abhishek has to decide based on his system parameters.
> --- On Tue, 9/22/09, Abhishek Ballaney
wrote:
>
> From: Abhishek Ballaney
> Subject: Re: [audiodsp] overlapping in speech analysis
> To: d...@yahoo.com
> Date: Tuesday, September 22, 2009, 1:14 AM
>
> Dear Duc,
>
> The overlap-add method is used to break long signals into smaller
> segments for easier processing. There are many DSP applications where a long
signal must be filtered in segments.
> With high data rate signals like video or hi-fi digital audio, it is common
> for computers to have insufficient memory to simultaneously hold the
> entire signal to be processed. There are also systems that process
> segment-by-segment because they operate in real time.
>
> Regards,
> abhishek
>
> --- On Thu, 17/9/09, d...@yahoo.com wrote:
>
> From: d...@yahoo.com
> Subject: [audiodsp] overlapping in speech analysis
> To: a...
> Date: Thursday, 17 September, 2009, 7:48 AM
>
>
> Hi, I am a new comer in speech processing. I have a little
question - why do we often have to split
> a long speech signal into shorter blocks with an overlap for example - first
block - from 0 to 20 ms
> (along time axis), second block from 10 ms to 30 ms ... and so on with the
overlap between 2
> consecutive blocks 10 ms. I just want to know the effect of this work -
overlap.
>
> And one more question is that - we often use DFT (FFT) to analyze the speech
signal with the assumption signals are
> periodic, but if we partition a signal like above (with or without
overlapping) , this assumption will become fall (we
> assume the long original signal is periodic, so blocks are not periodic in
correlation with the original one, so we
> can not use DFT or FTT for these blocks), so can we still maintain/get
properties (spectrum) of signals by properties
> of the shorter blocks? Or do we do this with some assumptions but I do not
know.
>
> Thank you for reading and explaining,
>
> I am looking forward to hearing from you soon,
>
> Duc, Nguyen Anh
Reply by Duc Nguyen Anh●September 23, 20092009-09-23
Hi Abhishek Ballaney,
Thank you very much for your explanation, I totally agree with u that we have to
segment a long observed signal into segments, but I just wondering why this goes
with overlapping, in fact, this is more like overlap-save than overlap-add. I
guess this is for better time resolution in STFT (the issue time resolution vs.
frequency resolution in STFT analysis), but I need more details, especially the
relation between the lengths, segment length and overlapping length, with the
pitch of the speech. Could you give me some details?
I am very happy to discuss and get helped form you,
Best Wishes,
Duc, Nguyen Anh
--- On Tue, 9/22/09, Abhishek Ballaney wrote:
From: Abhishek Ballaney
Subject: Re: [audiodsp] overlapping in speech analysis
To: d...@yahoo.com
Date: Tuesday, September 22, 2009, 1:14 AM
Dear Duc,
The overlap-add method is used to break long signals into smaller
segments for easier processing. There are many DSP applications where a long
signal must be filtered in segments. With high data rate signals like video or
hi-fi digital audio, it is common
for computers to have insufficient memory to simultaneously hold the
entire signal to be processed. There are also systems that process
segment-by-segment because they operate in real time.
Regards,
abhishek
--- On Thu, 17/9/09, d...@yahoo.com wrote:
From: d...@yahoo.com
Subject: [audiodsp] overlapping in speech analysis
To: a...
Date: Thursday, 17 September, 2009, 7:48 AM
Hi, I am a new comer in speech processing. I have a little
question - why do we often have to split a long speech signal into shorter
blocks with an overlap for example - first block - from 0 to 20 ms (along time
axis), second block from 10 ms to 30 ms ... and so on with the overlap between 2
consecutive blocks 10 ms. I just want to know the effect of this work -
overlap.
And one more question is that - we often use DFT (FFT) to analyze the speech
signal with the assumption signals are periodic, but if we partition a signal
like above (with or without overlapping) , this assumption will become fall (we
assume the long original signal is periodic, so blocks are not periodic in
correlation with the original one, so we can not use DFT or FTT for these
blocks), so can we still maintain/get properties (spectrum) of signals by
properties of the shorter blocks? Or do we do this with some assumptions but I
do not know.
Thank you for reading and explaining,
I am looking forward to hearing from you soon,
Duc, Nguyen Anh
_____________________________________
Reply by ducn...@yahoo.com●September 17, 20092009-09-17
Hi, I am a new comer in speech processing. I have a little question - why do we
often have to split a long speech signal into shorter blocks with an overlap for
example - first block - from 0 to 20 ms (along time axis), second block from 10
ms to 30 ms ... and so on with the overlap between 2 consecutive blocks 10 ms. I
just want to know the effect of this work - overlap.
And one more question is that - we often use DFT (FFT) to analyze the speech
signal with the assumption signals are periodic, but if we partition a signal
like above (with or without overlapping), this assumption will become fall (we
assume the long original signal is periodic, so blocks are not periodic in
correlation with the original one, so we can not use DFT or FTT for these
blocks), so can we still maintain/get properties (spectrum) of signals by
properties of the shorter blocks? Or do we do this with some assumptions but I
do not know.
Thank you for reading and explaining,
I am looking forward to hearing from you soon,