Forums

FFT upsampling filter

Started by blueskull May 16, 2015
On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated>
wrote:

>[...snip...] > >> >>Wait, what? I think the only real assumption is that one needs to >>know what they're doing. Whether the FFT output is folded or not, or >>whether the input is real or complex, or all the other things that >>need to be handled, need to be handled. > >No argument there. It always helps to know what you are doing. > >> >>I don't think what you're assuming is assumed is really assumed. >> > > >[...snip...] >>Don't even assume that something is assumed. >> >>Or so I assume. ;) >> >>>Ced >>>--------------------------------------- >>>Posted through http://www.DSPRelated.com >> >>Eric Jacobsen >>Anchor Hill Communications >>http://www.anchorhill.com > >I guess I'll have to assume that 'assume' is the wrong word. Maybe if I >had said 'acts as if' it would have been clearer. > >Let me illustrate by point by a simple concrete example. Suppose the >signal is > >S_k = cos( k 2pi/16 ) 0 <= k < 16 > >N=16 and M=100. When you take the FFT, bin number one will be one half >and bin 15 will be one half. All other bins will be zero. Now if you end >pad and take the inverse DFT the output signal will be > >S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( 15k 2pi/100 ) >0 <= k < 100 > >The latter being the alias frequency within the original context, but >renderable within the output context. > >If you slide the one half up from bin 15 to bin 99 before you do the >inverse DFT, you will get the desired results: > >S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( 99k 2pi/100 ) >S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( (100-1)k 2pi/100 ) >S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( -k 2pi/100 ) >S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( k 2pi/100 ) >S_k = cos( k 2pi/100 ) > >See what I mean? > >Ced
So you are assuming the FFT output is not folded. That's fine. What you describe is the proper way to zero-pad a non-folded FFToutput, i.e., stuff the zeros in the middle around the Nyquist frequency. If the FFT output were folded, so that DC (or zero frequency, if you prefer) was in the middle, then you'd pad the ends. None of this corrects anything in the post to which you were replying, specifically the procedure: 1. FFT 2. Zero pad to desired output length. 3. Inverse FFT at the longer vector length. That procedure is correct as written.
>--------------------------------------- >Posted through http://www.DSPRelated.com
Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
>On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated> >wrote: > >>[...snip...] >> >>> >>>Wait, what? I think the only real assumption is that one needs to >>>know what they're doing. Whether the FFT output is folded or not,
[...snip...]
>So you are assuming the FFT output is not folded. That's fine. >What you describe is the proper way to zero-pad a non-folded >FFToutput, i.e., stuff the zeros in the middle around the Nyquist >frequency. > >If the FFT output were folded, so that DC (or zero frequency, if you >prefer) was in the middle, then you'd pad the ends. > >None of this corrects anything in the post to which you were replying, >specifically the procedure: > >1. FFT >2. Zero pad to desired output length. >3. Inverse FFT at the longer vector length. > >That procedure is correct as written. >
Given your clarifications, the procedure as written is correct, although also given the varying possible conventions, it is ambiguous. To make sure I am on solid footing making this reply, I went to the foremost authority on the net: Wikipedia. "Assuming periodicity (see Periodicity), the customary domain of k actually computed is [0, N-1]. That is always the case when the DFT is implemented via the Fast Fourier transform algorithm. But other common domains are [-N/2, N/2-1] (N even) and [-(N-1)/2, (N-1)/2] (N odd), as when the left and right halves of an FFT output sequence are swapped." They never use the terms folded or not folded. The "customary domain" is not folded, and you are correct, that is what I had assumed. Note also, they say the output of an FFT is always not folded. In either case, how the zeroes should be stuffed in the DFT has been sufficiently explained. The other convention left unstated are the values of the normalization factors. Again from the flawless authority of Wikipedia (please detect the sarcasm): "The normalization factor multiplying the DFT and IDFT (here 1 and 1/N) and the signs of the exponents are merely conventions, and differ in some treatments. The only requirements of these conventions are that the DFT and IDFT have opposite-sign exponents and that the product of their normalization factors be 1/N. A normalization of sqrt(1/N) for both the DFT and IDFT, for instance, makes the transforms unitary." In order for your procedure to work, the normalization factors have to be 1/N and 1. To me, these are the best ones. The reason for this is that the coefficients turn out to be comparable no matter how many samples you select for a given frame. The second choice for me would be the square root versions. These make the most sense when you are considering the DFT matrix as a vector basis and your signal as a vector. Having an orthonormal basis is preferable to an orthogonal basis. The convention that Wikipedia uses, 1 and 1/N, makes the least sense to me. The other big issue which is left unstated in your procedure, which is of huge importance in the original problem, is how to combine multiple frames. The OP wishes to minimize the number of calculations, so treating the entire signal with one huge DFT is not going to be the best answer. The problem comes in with the extrapolation of the reconstructed signal past the last input point. The reconstructed signal is going to lead to a value of S_M = S_0. Unless the first signal value of the next frame happens to equal the first signal value of the current frame, this will introduce a discontinuity, and a poor fit in the neighborhood. Previously, I suggested an overlap and sliding average to solve this. I did a little research on "polyphase filter interpolation". I disagree with the OP that there wasn't much out there. I found plenty. You can also search under "resampling rate". The OP said he wanted to minimize calculations and was upsampling to a much higher rate. Another approach he might try is to build two sets of Legendre polynomial basis vectors. One with four elements, and the other at four times the upsampling ratio. Then use a four point sliding window on the input signal, calculate the Legendre coefficients with the first basis set, and construct the section between the two middle points with the coefficients and the second basis set. A similar, if not identical, approach would be to use cubic splines. Ced --------------------------------------- Posted through http://www.DSPRelated.com
>much higher rate. Another approach he might try is to build two sets of >Legendre polynomial basis vectors. One with four elements, and the >other >at four times the upsampling ratio. Then use a four
Another oops on my part. Nobody is perfect, right? There are three intervals within the four points, so the output basis should have three time the upsampling ratio plus one. You want to match on the endpoints. Ced --------------------------------------- Posted through http://www.DSPRelated.com
Cedron <103185@dsprelated> wrote:

(snip)
> Given your clarifications, the procedure as written is correct, although > also given the varying possible conventions, it is ambiguous.
> To make sure I am on solid footing making this reply, I went to the > foremost authority on the net: Wikipedia.
> "Assuming periodicity (see Periodicity), the customary domain of k > actually computed is [0, N-1]. That is always the case when the DFT is > implemented via the Fast Fourier transform algorithm. But other common > domains are [-N/2, N/2-1] (N even) and [-(N-1)/2, (N-1)/2] (N odd), > as when the left and right halves of an FFT output sequence are swapped."
(snip)
> The other convention left unstated are the values of the normalization > factors. Again from the flawless authority of Wikipedia (please detect > the sarcasm):
> "The normalization factor multiplying the DFT and IDFT (here 1 and 1/N) > and the signs of the exponents are merely conventions, and differ in some > treatments. The only requirements of these conventions are that the DFT > and IDFT have opposite-sign exponents and that the product of their > normalization factors be 1/N. A normalization of sqrt(1/N) for both the > DFT and IDFT, for instance, makes the transforms unitary."
In floating point, this is just convention. In fixed point, it can make a big difference in the results.
> In order for your procedure to work, the normalization factors have to be > 1/N and 1. To me, these are the best ones. The reason for this is that > the coefficients turn out to be comparable no matter how many samples you > select for a given frame. The second choice for me would be the square > root versions. These make the most sense when you are considering the DFT > matrix as a vector basis and your signal as a vector. Having an > orthonormal basis is preferable to an orthogonal basis. The convention > that Wikipedia uses, 1 and 1/N, makes the least sense to me.
With fixed point, and a normalization of 1, the resulting values can be up to N times the input values, so need log2(N) more bits to store. If you have enough bits, that might be the best way. Then test for the largest resulting coefficient before any additional normalization. -- glen
On Sun, 17 May 2015 09:18:59 -0500, "Cedron" <103185@DSPRelated>
wrote:

>>On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated> >>wrote: >> >>>[...snip...] >>> >>>> >>>>Wait, what? I think the only real assumption is that one needs to >>>>know what they're doing. Whether the FFT output is folded or not, >[...snip...] > >>So you are assuming the FFT output is not folded. That's fine. >>What you describe is the proper way to zero-pad a non-folded >>FFToutput, i.e., stuff the zeros in the middle around the Nyquist >>frequency. >> >>If the FFT output were folded, so that DC (or zero frequency, if you >>prefer) was in the middle, then you'd pad the ends. >> >>None of this corrects anything in the post to which you were replying, >>specifically the procedure: >> >>1. FFT >>2. Zero pad to desired output length. >>3. Inverse FFT at the longer vector length. >> >>That procedure is correct as written. >> > >Given your clarifications, the procedure as written is correct, although >also given the varying possible conventions, it is ambiguous. > >To make sure I am on solid footing making this reply, I went to the >foremost authority on the net: Wikipedia.
Of course. ;)
>"Assuming periodicity (see Periodicity), the customary domain of k >actually computed is [0, N-1]. That is always the case when the DFT is >implemented via the Fast Fourier transform algorithm. But other common >domains are [-N/2, N/2-1] (N even) and [-(N-1)/2, (N-1)/2] (N odd), >as when the left and right halves of an FFT output sequence are swapped." > >They never use the terms folded or not folded. The "customary domain" is >not folded, and you are correct, that is what I had assumed. Note also, >they say the output of an FFT is always not folded. > >In either case, how the zeroes should be stuffed in the DFT has been >sufficiently explained. > >The other convention left unstated are the values of the normalization >factors. Again from the flawless authority of Wikipedia (please detect >the sarcasm): > >"The normalization factor multiplying the DFT and IDFT (here 1 and 1/N) >and the signs of the exponents are merely conventions, and differ in some >treatments. The only requirements of these conventions are that the DFT >and IDFT have opposite-sign exponents and that the product of their >normalization factors be 1/N. A normalization of sqrt(1/N) for both the >DFT and IDFT, for instance, makes the transforms unitary." > >In order for your procedure to work, the normalization factors have to be >1/N and 1. To me, these are the best ones. The reason for this is that >the coefficients turn out to be comparable no matter how many samples you >select for a given frame. The second choice for me would be the square >root versions. These make the most sense when you are considering the DFT >matrix as a vector basis and your signal as a vector. Having an >orthonormal basis is preferable to an orthogonal basis. The convention >that Wikipedia uses, 1 and 1/N, makes the least sense to me. > >The other big issue which is left unstated in your procedure, which is of >huge importance in the original problem, is how to combine multiple >frames. The OP wishes to minimize the number of calculations, so treating >the entire signal with one huge DFT is not going to be the best answer.
It's not "my" procedure. It's just common treatment of practical zero-padding that has been around since the process was invented or discovered or whatever you want to call it. I learned it decades ago from papers and books and other people who'd been doing it for a long time.
>The problem comes in with the extrapolation of the reconstructed signal >past the last input point. The reconstructed signal is going to lead to a >value of S_M = S_0. Unless the first signal value of the next frame >happens to equal the first signal value of the current frame, this will >introduce a discontinuity, and a poor fit in the neighborhood. >Previously, I suggested an overlap and sliding average to solve this. > >I did a little research on "polyphase filter interpolation". I disagree >with the OP that there wasn't much out there. I found plenty. You can >also search under "resampling rate".
Or "multirate filter", which seems to be the popular term these days. As you mentioned, this can include Legendre interpolators and things like Farrow filters. There are lots of ways to solve the problem, but since the OP was asking a general question it didn't make sense to me to go into any detail until there was more clarification or follow-up questions. That doesn't prevent this sort of digression, though, which is fun.
>The OP said he wanted to minimize calculations and was upsampling to a >much higher rate. Another approach he might try is to build two sets of >Legendre polynomial basis vectors. One with four elements, and the other >at four times the upsampling ratio. Then use a four point sliding window >on the input signal, calculate the Legendre coefficients with the first >basis set, and construct the section between the two middle points with >the coefficients and the second basis set.
As has been discussed here before, doing very (e.g., VERY) large FFTs and DFTs offline is practical with the computing power and memory available these days. The OP hasn't come back to clarify whether he's interested in real-time or off-line computation, so his preference isn't clear to me.
>A similar, if not identical, approach would be to use cubic splines.
My personal experience with spline interpolation is that it pretty much sucks. Maybe I was working in an application where it isn't suitable, but I've just always had good success with other methods since I tried that.
>Ced >--------------------------------------- >Posted through http://www.DSPRelated.com
Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
On 5/16/15 4:15 PM, Eric Jacobsen wrote:
> > I'm not sure what you're trying to describe, unless, perhaps, you're > assuming that the FFT output is not folded?
just to be clear about semantics, by "folded" do you mean what MATLAB (and Octave) call "fftshift()"?
> In that case, then, yes, > pad in the middle. Otherwise, pad at the ends. Don't assume. > Don't even assume that something is assumed. >
sounds very Rumsfeldian.
> Or so I assume. ;) >
-- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
[...snip...]

> >It's not "my" procedure. It's just common treatment of practical >zero-padding that has been around since the process was invented or >discovered or whatever you want to call it. I learned it decades ago >from papers and books and other people who'd been doing it for a long >time.
Sorry, I didn't mean to give you credit for inventing/discovering it. [...snip...]
> >As has been discussed here before, doing very (e.g., VERY) large FFTs >and DFTs offline is practical with the computing power and memory >available these days. The OP hasn't come back to clarify whether he's >interested in real-time or off-line computation, so his preference >isn't clear to me.
It's absolutely insane how much computing power and capacity have increased since I started programming. (Teletype to an IBM 360, mid 70's) The OP wrote: "If so, this will reduce the calculation complexity by a whole LOT." Upon rereading, it is unclear to me too whether this is a quality or quantity measure. My initial impression was that he wanted to achieve real-time. [...snip...]
> >Eric Jacobsen >Anchor Hill Communications >http://www.anchorhill.com
I don't really have anything else to add. Ced --------------------------------------- Posted through http://www.DSPRelated.com
One thing to watch out for is that if you use a huge fft ( big enough to transform an entire song, for example) and then you impose a near-brick-wall filter in the frequency domain, and then transform back to the time domain, you will have created a filter whose impulse response is as long as the entire song. Since your ear is doing some sort of sliding transform with a time window in the range of 10ms-40ms, such a long impulse response could be quite audible (it depends on a lot of factors, for example if the passband ripple is low enough then maybe all the ringing is at the filter cutoff which may be ultrasonic ). 
You can of course get around this by imposing a more gentle roll off.


Bob
On Sun, 17 May 2015 15:00:25 -0400, robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On 5/16/15 4:15 PM, Eric Jacobsen wrote: >> >> I'm not sure what you're trying to describe, unless, perhaps, you're >> assuming that the FFT output is not folded? > >just to be clear about semantics, by "folded" do you mean what MATLAB >(and Octave) call "fftshift()"?
Yup. It's essentially just swapping the halves of the output vector. We used to do this in hardware by inverting the MSB of the address bit when the vector was read out. If you do that inversion with an exclusive-OR gate, it is programmable and costs only the exor gate in added complexity.
>> In that case, then, yes, >> pad in the middle. Otherwise, pad at the ends. Don't assume. >> Don't even assume that something is assumed. >> > >sounds very Rumsfeldian.
>> Or so I assume. ;) >> > > >-- > >r b-j rbj@audioimagination.com > >"Imagination is more important than knowledge." > >
Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
On 5/17/15 7:27 PM, Eric Jacobsen wrote:
> On Sun, 17 May 2015 15:00:25 -0400, robert bristow-johnson > <rbj@audioimagination.com> wrote: > >> On 5/16/15 4:15 PM, Eric Jacobsen wrote: >>> >>> I'm not sure what you're trying to describe, unless, perhaps, you're >>> assuming that the FFT output is not folded? >> >> just to be clear about semantics, by "folded" do you mean what MATLAB >> (and Octave) call "fftshift()"? > > Yup. It's essentially just swapping the halves of the output vector. > We used to do this in hardware by inverting the MSB of the address bit > when the vector was read out. If you do that inversion with an > exclusive-OR gate, it is programmable and costs only the exor gate in > added complexity.
#define harp_mode 1 if you do it in software, it takes a whole instruction. but then, unlike your hardware case, the indices are wrong (by 2^MSB). this is another decisive reason why math products like MATLAB should allow for user-definable index origin. we should be able to define an x[n] with positive and negative time, pass that to an FFT and get an X[k], also with positive and negative indices (this time in frequency). i mean, i have (after 2 decades) given up on MathWorks to do what's right, but it is still a pain in arse to be juggling these index values. i also like zero-padding my loaded-in audio buffers on both left and right side. i do that so that my very first frame (or the very last frame) need not have different code than all of the other frames (that have nice neighbors). but then i have to mess with this "zeroPad" offset all the time. having to always do that is just inexcusable. there's this C++ audio framework library called JUCE ( https://en.wikipedia.org/wiki/JUCE ) and i had a few email conversations with the owner (Jules somebody, i think he's German) where i suggested that he extend (in a backward compatible manner) a class called AudioSampleBuffer. there was two members he was missing, one was a float called SampleRate which would get set by whatever utility reads in an audio sample buffer from a file or a stream or an audio port A/D or whatever. we would have to read that Fs value separately and pass it all around along with the AudioSampleBuffer and it just should have been built in to it. the AudioSampleBuffer should have all of the information it needs to physically describe it. if you send it to a sound playback utility, you shouldn't also have to send Fs so that it knows how to play it back. if you send it to a biquad filtering utility (where you specify parameters, like resonant frequency in Hz), you shouldn't have to send it Fs for it to use in coefficient calculation. the other modification i wanted Jules to put in was a built-in ZeroPad. if ZeroPad=0, then the AudioSampleBuffer would be just like before. if you go negative on the sample indices, you get crap. but if ZeroPad>0 then you could go negative on x[n] as far as x[-ZeroPad]. the 0th sample would remain the same (which would give you backward compatibility). we could have a utility that changes ZeroPad (and might move the audio samples internally to accommodate an increased ZeroPad). then you could pass audio around to analysis routines including frame-based analysis routines and not have to worry about pushing up against the wall either at x[0] or x[N-1] (N is the buffer length not counting the zero pad). and the padding wouldn't *have* to have zeros. it *could* have valid samples from adjacent frames if you're splitting frames up into multiple AudioSampleBuffers. the point is you could just plop down the center of the frame window at x[n] and not worry about what is to the left or right of that sample. then your code would look the same for *any* frame and it would look the same as it would processing frames coming from a real-time stream of audio. but the whole idea is so that our code is most natural and explanatory. we don't (or shouldn't) have to do our own zero-padding (and accounting for it in our indices), we don't (or shouldn't) have to fuck with fftshift(), we don't (or shouldn't) have to fuck with DC having a frequency of 1. we shouldn't have to fuck with that shit. what'sa matter with these people? -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."