Hi everyone! Working on a signal processing problem, I read a little deeper into the work of Portnoff, Crochiere and others about the short time Fourier transform. Well, I can see their points about analysis windows etc., but the reconstruction window stuff is something that I still don't get totally. Is there someone giving me some illuminative hints? (please :-) What I do is quite simple: take a frame from some infinitely long input signal in time domain, apply an analysis window, zero-pad it at the end of each frame, analyse it by discrete STFT, apply some filter in the frequency domain, inverse DFT, add-overlap, output the filtered signal. Easy, hmmm? Well, there are still some questions I simply can't answer to my own satisfaction: (1) I generally understand the reconstruction window as an average in time for each single frequency bin of my DFT. Right or wrong? (2) If I use some window function for analysis, which adds up to perfect reconstruction when using the appropriate overlap (e.g. Hann-window with 50% overlap), do I need some reconstruction window at all? (3) Also, if I don't want to average in time (since my analysis window is already doing this more than required due to very long required frame length), do I just use an rectangle window with length=1 frame as reconstruction window (i.e. in the end forget about the reconstruction window stuff...)? (4) How do I add up other windows, e.g. in the case of a Hamming window, for perfect reconstruction? I can just see the reconstruction window in that case as some "inversing" of the analysis window, to get a "one" when multiplying the windows for analysis and reconstruction. But that gives quite large errors on the boundaries of the window in time, where the values are close to zero... (5) Doing some filtering in the frequency domain will extend my input frame due to convolution effects, which I take care of by padding some zeros to the end of the frame. In some theoretical case, when filtering just some single frequencies from the DFT, the inverse DFT gives me: the filtered version of my input values (wanted) + some convolution extension of the frame to the right (unavoidable and treated by add-overlap) + an infinite signal, containing the filtered sinusoids (at least this is what I get from MATLAB simulation). Do I just cut that last part off? How do I know where the end of the convolution effects is if I don't know about the exact filter length? If there is someone really reading up to here and could give me some ideas how to find an answer to that questions, I would be very happy. Thanks in advance m.
question about reconstruction windows with STFT-synthesis
Started by ●March 27, 2006
Reply by ●March 28, 20062006-03-28
mono.kultur wrote:> Hi everyone!..> What I do is quite simple: take a frame from some infinitely long input > signal in time domain, apply an analysis window, zero-pad it at the end > of each frame, analyse it by discrete STFT, apply some filter in the > frequency domain, inverse DFT, add-overlap, output the filtered signal. > Easy, hmmm?Perhaps, but you didn't get it right. You must differentiate between two fundamentally different actions: 1. Spectrum estimation by - Welch's method (averaging several windowed DFTed segments) or - spectrograms. 2. Frequency domain FIR filtering using overlap-add block processing. If you are interested in 2., then you may not apply windowing in time-domain. There is no point to it (as you are not interested in analysis but filtering), and it messes up your filtering with periodic modulation of the filter kernel. Only when you really know what you are doing may you combine steps 1. and 2. together. In that case, it is best to look for windows with sparse DFT coefficients. You will be interested in a classic paper [1] that describes such windows. Regards, Andor [1] A. H. Nuttall: "Some Windows with Very Good Sidelobe Behavior", IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol ASSP-29, No. 1, Feb. 1981
Reply by ●March 28, 20062006-03-28
Hi Andor, Thank you very much for your posting, I'm going to get that paper as well and try to figure that out. The problem is, since I have extremly large amplitude dynamics in the frequency domain (up to 50dB from one spectral line to the neighbouring bin), I need to time-window my signal to get the desired dynamics in the Fourier transformed signal (otherwise, when using rectangular windows, I just get 13dB dynamics to the next spectral bin). I try to "find" a signal in the spectral domain which is corrupted by single spectral lines that are ~50dB larger in magnitude. Also, I still don't get the other stuff above... Anyone can help? Thank you again, m.
Reply by ●March 28, 20062006-03-28
mono.kultur wrote: ...> The problem is, since I have extremly > large amplitude dynamics in the frequency domain (up to 50dB from one > spectral line to the neighbouring bin), I need to time-window my signal > to get the desired dynamics in the Fourier transformed signal > (otherwise, when using rectangular windows, I just get 13dB dynamics to > the next spectral bin). I try to "find" a signal in the spectral domain > which is corrupted by single spectral lines that are ~50dB larger in > magnitude.You have to differentiate between searching for the signal in the spectrum (which is the analysis part) and extracting the signal (the filtering part). For the analysis, you may use a window. Not so for the filtering.> Also, I still don't get the other stuff above...Which stuff? Regards, Andor
Reply by ●March 28, 20062006-03-28
Hi Andor, thanks again for your answer.> You have to differentiate between searching for the signal in the > spectrum (which is the analysis part) and extracting the signal (the > filtering part).Ok, assuming I have broadband signal 1, corrupted by signal 2 which is +50dB in some frequency bins and zero in the rest. Do you mean that I should have an analysis part of the system (using time windowing), where I identify my signal 2; and a filter part (without time windows), where I use the information from the analysis to separate signal 1 and signal 2?>> Also, I still don't get the other stuff above... > Which stuff?The questions formulated under (1) to (4) about reconstruction of a frequency-domain-filtered signal, especially about the meaning of the reconstruction window and the estimation of the length of a frequency-domain filter. Sincerely, m.
Reply by ●March 28, 20062006-03-28
stereo wrote:> Hi Andor, > > thanks again for your answer. > > > You have to differentiate between searching for the signal in the > > spectrum (which is the analysis part) and extracting the signal (the > > filtering part). > > Ok, assuming I have broadband signal 1, corrupted by signal 2 which is > +50dB in some frequency bins and zero in the rest. Do you mean that I > should have an analysis part of the system (using time windowing), > where I identify my signal 2; and a filter part (without time windows), > where I use the information from the analysis to separate signal 1 and > signal 2?Yes! If you do it correctly (with the windows described in that paper) you can do the analysis and the filtering with just one FFT operation per frame - you simply apply the window in frequency domain via convolution. That is why you need windows with sparse DFT coefficients, to have small convolution kernels.> > >> Also, I still don't get the other stuff above... > > Which stuff? > > The questions formulated under (1) to (4) about reconstruction of a > frequency-domain-filtered signal, especially about the meaning of the > reconstruction window and the estimation of the length of a > frequency-domain filter.Overlap-add, and even better overlap-save, is well described in many textbooks, for example http://www.dspguide.com Regards, Andor
Reply by ●March 29, 20062006-03-29
Andor wrote:> stereo wrote: >...> > The questions formulated under (1) to (4) about reconstruction of a > > frequency-domain-filtered signal, especially about the meaning of the > > reconstruction window and the estimation of the length of a > > frequency-domain filter. > > Overlap-add, and even better overlap-save, is well described in many > textbooks, for exampleAndor, you might already know this but this response is confusing. "overlap-add" to refer to an algorithm has two somewhat unrelated meanings. when "overlap-add" is used together with "overlap-save", the context is "fast convolution" using the FFT to do fast circular convolution and the overlap-add or overlap-save operations are means to cope with this circular tool and somehow press it into use performing linear convolution. and that is described in textbooks. the "windowing" is rectangular (or none at all) and the overlap-add is about overlap adding the tails resulting from the circular convolution of a zero-stuffed input from adjacent frames. when "overlap-add" is used in the context of STFT or Portnoff, it is about a frame-by-frame, windowing, analysis, processing/resynthesis, possibly a second windowing (the reconstruction window), and overlap adding the rise of the current frame to the fall of the previous frame. now, i suppose one can use this STFT to do simple filtering in the frequency domain, but when compared to the prudent FIR filtering or the "fast convolution" previously mentioned, i suspect there are framing and windowing artifacts. it won't be perfectly clean. usually we do this STFT overlap-add stuff when we want to pitch-shift or time-scale complex material and such an operation has a little alchemy in it and won't be perfectly clean either. stereo, to get a meaningful answer, you might need to work with us a little to frame your question in such a way that we know what it is you're asking about specifically. r b-j
Reply by ●March 30, 20062006-03-30
Hi everyone, thanky you, and special thanks to r b-j for your answer, differentiating between ov-add with fast convolution and with STFT signal processing. I'll try to improve my style of problem formulation to give you a better idea of my current problem. What I want to do is the second case in r b-j's posting: Using the STFT to analyse a signal, then manipulate it in the Frequency domain and re-sythesize the processed signal to get a "filtered output". I don't want to pitch-shift it or so, only manipulation of the spectral content and output at the same frame rate. The input signals have, as described earlier, quite large dynamics in the magnitude of the spectral bins, therefore I use some windowing in time domain to analyse it properly (analysis window). Reading through the STFT papers I still don't get a real good idea what the reconstruction window is for. I have the idea that it is used (1) to undo the window effect and (2) to add some averaging in time for every spectral bin. Any comments wheather this is correct or not? I wonder if I can have perfect reconstruction, using windows like Hamming. The only idea I have is to "inverse" the window effect, which in my opinion gives large errors at the frame boundaries where the window value is very small. I'll try to illustrate this: Imagine x(n), windowed by h(n) giving s(n). The value for s(1) and s(end) are very small, since h(1) and h(end) are very small. When trying to undo the windowing I need to multiply s(n) by f(n)=1/h(n). f(1) and f(end) are very large. Now suppose some error on s(n), due to noise, roundoff-error, convolution effects from filtering in Frequency domain etc. This error, which might be in the magnitude of the signal, is then multiplied by some large coefficient, ending up in a complete mess at the frame boundaries. Finally, I try to figure out if there is a way to estimate the length of a filter from its frequency domain description. Imagine a magnitude curve, describing the filter function in the frequency domain. Can I somehow estimate the required length of the filter in time domain to realise that filter function? I hope that I was able to define my questions a little more comprehensible...any comments are appreciated very much. Thanks in advance stereo
Reply by ●March 30, 20062006-03-30
in article 1143713068.225542.152720@v46g2000cwv.googlegroups.com, stereo at leben.in.stereo@googlemail.com wrote on 03/30/2006 05:04:> Hi everyone, > > thanky you, and special thanks to r b-j for your answer, > differentiating between ov-add with fast convolution and with STFT > signal processing. I'll try to improve my style of problem formulation > to give you a better idea of my current problem. > > What I want to do is the second case in r b-j's posting: Using the STFT > to analyse a signal, then manipulate it in the Frequency domain and > re-sythesize the processed signal to get a "filtered output". I don't > want to pitch-shift it or so, only manipulation of the spectral content > and output at the same frame rate.okay, nonetheless, even though you want to only filter the signal by adjusting the amplitude of different frequency components, if you seek to accomplish that using the same STFT method used to do more exotic processing like pitch-shifting, time-scaling, vocoding, etc, you still run the risk of some kind of small artifacts that will be possibly faintly heard at the frame rate.> The input signals have, as described earlier, quite large dynamics in > the magnitude of the spectral bins, therefore I use some windowing in > time domain to analyse it properly (analysis window). > Reading through the STFT papers I still don't get a real good idea what > the reconstruction window is for.the analysis window is used to window off a segment of the input signal, and then to analyze it with some tool, usually a DFT/FFT. the issues in choosing this window has to do with what sort of artifacts one is willing to deal with in the frequency-domain data after the FFT. these artifacts would be the width of the "main lobe" and the appearance of "side lobes" in the resulting spectrum due to a single sinusoid (or "frequency component"). the side lobes of one frequency component will overlap (and add to) the main lobe of another frequency component. if the latter frequency component is much weaker than the former, the former's side lobe interference will possibly completely obscure the weaker main lobe. when the main lobe is wide, that causes some ambiguity of exactly what the true frequency is of the the sinusoidal component associated with it. to narrow the width of the main lobe, the analysis window can be made to be much wider than the frame size, that is an essential difference from the reconstruction window (if there is one). even though the analysis window is centered with the frame, the width of it may be completely independent of the frame width. also, to reduce the side lobes, the choice of analysis window can be made to cause that. e.g. a gaussian window w(t) = 1/T * exp(-pi*(t/T)^2) has the nice property that the Fourier Transform of it is another gaussian function: W(f) = exp(-pi*(T*f)^2) so this window has no side lobes, it all the main lobe getting lower and lower. (because the gaussian function goes on forever and we have to truncate it somewhere to be of practical use, there are teeny-weeny side lobes.) now the reconstruction window (if there is one) is for a different. after all this analysis, depending on what the algorithm or task is, a bunch of sinusoidal components are generated (this is the so-called "sinusoidal modeling" method). you have a bunch sinusoids generated for the previous frame and a bunch of sinusoids generated for the current frame. assuming that you've done your best to align the phases of matching sinusoids between the two frames, you want to somehow cross-fade from the previous frame to the current. the fade-out of the previous frame is the trailing half (or falling half) of the previous frame's reconstruction window, the fade-in or the current frome is the leading half (or rising half) of the current frame's reconstruction window. so these reconstruction windows will *always* have a non-zero width of twice the frame hop. in addition the falling half of the previous frame's window adds to the rising half of the current frame's window and adds to the number 1 (this is the complementary property of some windows - note the gaussian window does not have that property). usually the Hann window is the choice: w(t) = 1/2 *(1 + cos(pi*t/H)) for |t| < H, zero otherwise where H is the frame hop length. sometimes a triangular window is used: w(t) = 1 - |t|/H for |t| < H, zero otherwise> I have the idea that it is used (1) to undo the window effectin the phase vocoder (not the sinusoidal modeling method), the modified spectrum might still contain the effect of the analysis window. depending on what the modification was, say, it was just changing the phase of frequency components, it's likely that when you inverse FFT back to the time domain, there remains the envelope of the original analysis on the output. if the analysis window was not complementary, then you want to undo it and then apply a complementary window like the Hann or triangular windows above. since both are accomplished by multiplication. you can team up dividing by the analysis window function in the same action of multiplying by the reconstruction window.> and (2) to add some averaging in time for every spectral bin.i dunno about that. it's to fade out the spectral component of the previous frame while fading in the corresponding component of the current frame. if the frequency of that spectral component is changing in time, those two spectral components may not be at exactly the same DFT bin.> Any comments wheather this is correct or not? > > I wonder if I can have perfect reconstruction, using windows like > Hamming.no, not complementary. a closely related window is the Hann which *is* complementary.> The only idea I have is to "inverse" the window effect, which > in my opinion gives large errors at the frame boundaries where the > window value is very small.that's another good reason that the analysis window should be wider than the reconstruction window.> I'll try to illustrate this: Imagine x(n), > windowed by h(n) giving s(n). The value for s(1) and s(end) are very > small, since h(1) and h(end) are very small. When trying to undo the > windowing I need to multiply s(n) by f(n)=1/h(n). f(1) and f(end) are > very large. Now suppose some error on s(n), due to noise, > roundoff-error, convolution effects from filtering in Frequency domain > etc. This error, which might be in the magnitude of the signal, is then > multiplied by some large coefficient, ending up in a complete mess at > the frame boundaries.you have an idea what's going on, but your h[n] appears to me to be an analysis window. that's a good reason that the analysis window should be wider than the reconstruction window. you won't be dividing by nearly zero if the analysis window is wider than the reconstruction window.> Finally, I try to figure out if there is a way to estimate the length > of a filter from its frequency domain description. Imagine a magnitude > curve, describing the filter function in the frequency domain. Can I > somehow estimate the required length of the filter in time domain to > realise that filter function?in a computer program (MATLAB whatever), draw out your frequency response in an array where the delta_f between the DFT bins (Fs/N) is pretty small. inverse DFT that and look about your impulse response. it can mathematically be as long as N which you made a much larger number than you expect the length to be. now apply a window (Hamming would be okay, but Kaiser would be better) to that impulse response and you get a shorter, slightly different impulse response. then DFT back to the frequency domain and see if your resulting spectrum is close enough to what you started with to be tolerable. if it is, the filter length is the length of that windowed impulse response. now, for straight filtering, you should use the "fast convolution" method and either overlap-add or overlap-save. if you use that general STFT, analysis window, DFT, modify spectrum, inverse DFT, reconstruction window, overlap-add method, if the frequency response you use to modify the spectrum is a frequency response that corresponds to an impulse response that is very long, you will get small artifacts in the framing and reconstruction of the data. but if your impulse response is of a known length, an FIR, you can use another version of overlap-add (or overlap-save) that is used in so-called "fast convolution" and get completely glitch-free filtering and reconstruction.> I hope that I was able to define my questions a little more > comprehensible...any comments are appreciated very much.try to get a good book like O&S to help you with the "fast convolution" method. the STFT overlap-add method is different, and i'm not so sure where that is in the textbooks. it *is* in published lit about the phase-vocoder (the Portnoff stuff, etc.), but i am not sure of what textbook has it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●March 31, 20062006-03-31
Hi r b-j, first of all, thank you very much for your extensive and understandable explanations. Very helpful indeed. The concepts described under "analysis window" are fully understood, and are the reason why I need to apply another window than rectangular to my input signal at all - simply to get a chance to detect my tiny signal next to some hugh peaks in frequency domain.> to narrow the width of the main lobe, the analysis window can be made to be > much wider than the frame size, that is an essential difference from the > reconstruction window (if there is one). even though the analysis window is > centered with the frame, the width of it may be completely independent of > the frame width.Guess you are talking about zero-padding on both sides of the windowed part of the input signal, therefore making the analysis window effectively something like [some_zeros-Hamming-some_zeros]? Thanks for the explanation of reconstruction windows with the sinusoidal modelling method - I think I get the idea behind it.> ...your h[n] appears to me to be an > analysis window. > that's a good reason that the analysis window should be > wider than the reconstruction window. you won't be dividing by nearly zero > if the analysis window is wider than the reconstruction window.Yes, h(n) is my analysis window. After reading your posting this morning I again had a look at some of my papers and I think I somehow sorted out the reconstruction-window things now. Sometimes you just need to let things go for some days before taking it again...and some people to share questions and ideas with (yes, thank YOU! :-).>> I wonder if I can have perfect reconstruction, using windows like >> Hamming. > no, not complementary. a closely related window is the Hann which *is* > complementary.Thanks for a clear answer. Can't find that anywhere else :-) About the filter length, what you describe seems to me like the windowed sampling method. What I was thinking about was something like: if you have some frequency response, is there a method to *estimate* the "relevant" length of the impulse response? I just find that method you did describe, which is fine but also has some uncertainties with it (length N of the IFFT etc.).> try to get a good book like O&S...Is that Oppenheim&Schafer? Right next to my right elbow :-) For the Portnoff-stuff et.al. I had a look at the relevant conference paper. Well, there is something else I had in my original posting, but did not sort out properly so far: Imagine an experiment with MATLAB, where I filter some of the frequencies by simply manipulating the magnitude value in the frequency domain (by the way, why is the term "magnitude" here, and "amplitude" in another case?). Doing any filtering in the frequency domain will extend my input frame due to convolution effects, which I take care of by padding some zeros to the end of the frame. In the case when I just filter, lets say three very narrow frequency sections from the DFT, the inverse DFT of my MATLAB experiment gives me: - the filtered version of my input values (wanted) plus - some convolution extension of the frame to the right (unavoidable, and treated by add-overlap) plus - an infinite signal, containing the filtered sinusoids. My problem is in understanding that last part, where it comes from and why it extends "to infinity", as it looks like. Do I just cut that last part off? How do I know where the end of the convolution effects is if I don't know about the exact filter length of my arbitrary constructed filter? Ok, thanks again...and a very nice weekend! Cheers! stereo






