# FFT upsampling filter

Started by May 16, 2015
```On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated>
wrote:

>[...snip...]
>
>>
>>Wait, what?    I think the only real assumption is that one needs to
>>know what they're doing.   Whether the FFT output is folded or not, or
>>whether the input is real or complex, or all the other things that
>>need to be handled, need to be handled.
>
>No argument there.  It always helps to know what you are doing.
>
>>
>>I don't think what you're assuming is assumed is really assumed.
>>
>
>
>[...snip...]
>>Don't even assume that something is assumed.
>>
>>Or so I assume.  ;)
>>
>>>Ced
>>>---------------------------------------
>>>Posted through http://www.DSPRelated.com
>>
>>Eric Jacobsen
>>Anchor Hill Communications
>>http://www.anchorhill.com
>
>I guess I'll have to assume that 'assume' is the wrong word.  Maybe if I
>had said 'acts as if' it would have been clearer.
>
>Let me illustrate by point by a simple concrete example.  Suppose the
>signal is
>
>S_k = cos( k 2pi/16 )   0 <= k < 16
>
>N=16 and M=100.  When you take the FFT, bin number one will be one half
>and bin 15 will be one half.  All other bins will be zero.  Now if you end
>pad and take the inverse DFT the output signal will be
>
>S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( 15k 2pi/100 )
>0 <= k < 100
>
>The latter being the alias frequency within the original context, but
>renderable within the output context.
>
>If you slide the one half up from bin 15 to bin 99 before you do the
>inverse DFT, you will get the desired results:
>
>S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( 99k 2pi/100 )
>S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( (100-1)k 2pi/100 )
>S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( -k 2pi/100 )
>S_k = 1/2 cos( k 2pi/100 ) + 1/2 cos( k 2pi/100 )
>S_k = cos( k 2pi/100 )
>
>See what I mean?
>
>Ced

So you are assuming the FFT output is not folded.   That's fine.
What you describe is the proper way to zero-pad a non-folded
FFToutput, i.e., stuff the zeros in the middle around the Nyquist
frequency.

If the FFT output were folded, so that DC (or zero frequency, if you
prefer) was in the middle, then you'd pad the ends.

None of this corrects anything in the post to which you were replying,
specifically the procedure:

1. FFT
2. Zero pad to desired output length.
3. Inverse FFT at the longer vector length.

That procedure is correct as written.

>---------------------------------------
>Posted through http://www.DSPRelated.com

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com
```
```>On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated>
>wrote:
>
>>[...snip...]
>>
>>>
>>>Wait, what?    I think the only real assumption is that one needs to
>>>know what they're doing.   Whether the FFT output is folded or not,
[...snip...]

>So you are assuming the FFT output is not folded.   That's fine.
>What you describe is the proper way to zero-pad a non-folded
>FFToutput, i.e., stuff the zeros in the middle around the Nyquist
>frequency.
>
>If the FFT output were folded, so that DC (or zero frequency, if you
>prefer) was in the middle, then you'd pad the ends.
>
>None of this corrects anything in the post to which you were replying,
>specifically the procedure:
>
>1. FFT
>2. Zero pad to desired output length.
>3. Inverse FFT at the longer vector length.
>
>That procedure is correct as written.
>

Given your clarifications, the procedure as written is correct, although
also given the varying possible conventions, it is ambiguous.

To make sure I am on solid footing making this reply, I went to the
foremost authority on the net: Wikipedia.

"Assuming periodicity (see Periodicity), the customary domain of k
actually computed is [0, N-1]. That is always the case when the DFT is
implemented via the Fast Fourier transform algorithm. But other common
domains are  [-N/2, N/2-1]  (N even)  and  [-(N-1)/2, (N-1)/2]  (N odd),
as when the left and right halves of an FFT output sequence are swapped."

They never use the terms folded or not folded.  The "customary domain" is
not folded, and you are correct, that is what I had assumed.  Note also,
they say the output of an FFT is always not folded.

In either case, how the zeroes should be stuffed in the DFT has been
sufficiently explained.

The other convention left unstated are the values of the normalization
factors.  Again from the flawless authority of Wikipedia (please detect
the sarcasm):

"The normalization factor multiplying the DFT and IDFT (here 1 and 1/N)
and the signs of the exponents are merely conventions, and differ in some
treatments. The only requirements of these conventions are that the DFT
and IDFT have opposite-sign exponents and that the product of their
normalization factors be 1/N.  A normalization of sqrt(1/N) for both the
DFT and IDFT, for instance, makes the transforms unitary."

In order for your procedure to work, the normalization factors have to be
1/N and 1.  To me, these are the best ones.  The reason for this is that
the coefficients turn out to be comparable no matter how many samples you
select for a given frame.  The second choice for me would be the square
root versions.  These make the most sense when you are considering the DFT
matrix as a vector basis and your signal as a vector.  Having an
orthonormal basis is preferable to an orthogonal basis.  The convention
that Wikipedia uses, 1 and 1/N, makes the least sense to me.

The other big issue which is left unstated in your procedure, which is of
huge importance in the original problem, is how to combine multiple
frames.  The OP wishes to minimize the number of calculations, so treating
the entire signal with one huge DFT is not going to be the best answer.

The problem comes in with the extrapolation of the reconstructed signal
past the last input point.  The reconstructed signal is going to lead to a
value of S_M = S_0.  Unless the first signal value of the next frame
happens to equal the first signal value of the current frame, this will
introduce a discontinuity, and a poor fit in the neighborhood.
Previously, I suggested an overlap and sliding average to solve this.

I did a little research on "polyphase filter interpolation".  I disagree
with the OP that there wasn't much out there.  I found plenty.  You can
also search under "resampling rate".

The OP said he wanted to minimize calculations and was upsampling to a
much higher rate.  Another approach he might try is to build two sets of
Legendre polynomial basis vectors.  One with four elements, and the other
at four times the upsampling ratio.  Then use a four point sliding window
on the input signal, calculate the Legendre coefficients with the first
basis set, and construct the section between the two middle points with
the coefficients and the second basis set.

A similar, if not identical, approach would be to use cubic splines.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com
```
```>much higher rate.  Another approach he might try is to build two sets of
>Legendre polynomial basis vectors.  One with four elements, and the
>other
>at four times the upsampling ratio.  Then use a four

Another oops on my part.  Nobody is perfect, right?

There are three intervals within the four points, so the output basis
should have three time the upsampling ratio plus one.  You want to match
on the endpoints.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com
```
```Cedron <103185@dsprelated> wrote:

(snip)
> Given your clarifications, the procedure as written is correct, although
> also given the varying possible conventions, it is ambiguous.

> To make sure I am on solid footing making this reply, I went to the
> foremost authority on the net: Wikipedia.

> "Assuming periodicity (see Periodicity), the customary domain of k
> actually computed is [0, N-1]. That is always the case when the DFT is
> implemented via the Fast Fourier transform algorithm. But other common
> domains are  [-N/2, N/2-1]  (N even)  and  [-(N-1)/2, (N-1)/2]  (N odd),
> as when the left and right halves of an FFT output sequence are swapped."

(snip)
> The other convention left unstated are the values of the normalization
> factors.  Again from the flawless authority of Wikipedia (please detect
> the sarcasm):

> "The normalization factor multiplying the DFT and IDFT (here 1 and 1/N)
> and the signs of the exponents are merely conventions, and differ in some
> treatments. The only requirements of these conventions are that the DFT
> and IDFT have opposite-sign exponents and that the product of their
> normalization factors be 1/N.  A normalization of sqrt(1/N) for both the
> DFT and IDFT, for instance, makes the transforms unitary."

In floating point, this is just convention. In fixed point, it can
make a big difference in the results.

> In order for your procedure to work, the normalization factors have to be
> 1/N and 1.  To me, these are the best ones.  The reason for this is that
> the coefficients turn out to be comparable no matter how many samples you
> select for a given frame.  The second choice for me would be the square
> root versions.  These make the most sense when you are considering the DFT
> matrix as a vector basis and your signal as a vector.  Having an
> orthonormal basis is preferable to an orthogonal basis.  The convention
> that Wikipedia uses, 1 and 1/N, makes the least sense to me.

With fixed point, and a normalization of 1, the resulting values can
be up to N times the input values, so need log2(N) more bits to store.
If you have enough bits, that might be the best way.  Then test for the
largest resulting coefficient before any additional normalization.

-- glen
```
```On Sun, 17 May 2015 09:18:59 -0500, "Cedron" <103185@DSPRelated>
wrote:

>>On Sat, 16 May 2015 15:57:53 -0500, "Cedron" <103185@DSPRelated>
>>wrote:
>>
>>>[...snip...]
>>>
>>>>
>>>>Wait, what?    I think the only real assumption is that one needs to
>>>>know what they're doing.   Whether the FFT output is folded or not,
>[...snip...]
>
>>So you are assuming the FFT output is not folded.   That's fine.
>>What you describe is the proper way to zero-pad a non-folded
>>FFToutput, i.e., stuff the zeros in the middle around the Nyquist
>>frequency.
>>
>>If the FFT output were folded, so that DC (or zero frequency, if you
>>prefer) was in the middle, then you'd pad the ends.
>>
>>None of this corrects anything in the post to which you were replying,
>>specifically the procedure:
>>
>>1. FFT
>>2. Zero pad to desired output length.
>>3. Inverse FFT at the longer vector length.
>>
>>That procedure is correct as written.
>>
>
>Given your clarifications, the procedure as written is correct, although
>also given the varying possible conventions, it is ambiguous.
>
>To make sure I am on solid footing making this reply, I went to the
>foremost authority on the net: Wikipedia.

Of course.  ;)

>"Assuming periodicity (see Periodicity), the customary domain of k
>actually computed is [0, N-1]. That is always the case when the DFT is
>implemented via the Fast Fourier transform algorithm. But other common
>domains are  [-N/2, N/2-1]  (N even)  and  [-(N-1)/2, (N-1)/2]  (N odd),
>as when the left and right halves of an FFT output sequence are swapped."
>
>They never use the terms folded or not folded.  The "customary domain" is
>not folded, and you are correct, that is what I had assumed.  Note also,
>they say the output of an FFT is always not folded.
>
>In either case, how the zeroes should be stuffed in the DFT has been
>sufficiently explained.
>
>The other convention left unstated are the values of the normalization
>factors.  Again from the flawless authority of Wikipedia (please detect
>the sarcasm):
>
>"The normalization factor multiplying the DFT and IDFT (here 1 and 1/N)
>and the signs of the exponents are merely conventions, and differ in some
>treatments. The only requirements of these conventions are that the DFT
>and IDFT have opposite-sign exponents and that the product of their
>normalization factors be 1/N.  A normalization of sqrt(1/N) for both the
>DFT and IDFT, for instance, makes the transforms unitary."
>
>In order for your procedure to work, the normalization factors have to be
>1/N and 1.  To me, these are the best ones.  The reason for this is that
>the coefficients turn out to be comparable no matter how many samples you
>select for a given frame.  The second choice for me would be the square
>root versions.  These make the most sense when you are considering the DFT
>matrix as a vector basis and your signal as a vector.  Having an
>orthonormal basis is preferable to an orthogonal basis.  The convention
>that Wikipedia uses, 1 and 1/N, makes the least sense to me.
>
>The other big issue which is left unstated in your procedure, which is of
>huge importance in the original problem, is how to combine multiple
>frames.  The OP wishes to minimize the number of calculations, so treating
>the entire signal with one huge DFT is not going to be the best answer.

It's not "my" procedure.  It's just common treatment of practical
zero-padding that has been around since the process was invented or
discovered or whatever you want to call it.   I learned it decades ago
from papers and books and other people who'd been doing it for a long
time.

>The problem comes in with the extrapolation of the reconstructed signal
>past the last input point.  The reconstructed signal is going to lead to a
>value of S_M = S_0.  Unless the first signal value of the next frame
>happens to equal the first signal value of the current frame, this will
>introduce a discontinuity, and a poor fit in the neighborhood.
>Previously, I suggested an overlap and sliding average to solve this.
>
>I did a little research on "polyphase filter interpolation".  I disagree
>with the OP that there wasn't much out there.  I found plenty.  You can
>also search under "resampling rate".

Or "multirate filter", which seems to be the popular term these days.
As you mentioned, this can include Legendre interpolators and things
like Farrow filters.   There are lots of ways to solve the problem,
but since the OP was asking a general question it didn't make sense to
me to go into any detail until there was more clarification or
follow-up questions.

That doesn't prevent this sort of digression, though, which is fun.

>The OP said he wanted to minimize calculations and was upsampling to a
>much higher rate.  Another approach he might try is to build two sets of
>Legendre polynomial basis vectors.  One with four elements, and the other
>at four times the upsampling ratio.  Then use a four point sliding window
>on the input signal, calculate the Legendre coefficients with the first
>basis set, and construct the section between the two middle points with
>the coefficients and the second basis set.

As has been discussed here before, doing very (e.g., VERY) large FFTs
and DFTs offline is practical with the computing power and memory
available these days.  The OP hasn't come back to clarify whether he's
interested in real-time or off-line computation, so his preference
isn't clear to me.

>A similar, if not identical, approach would be to use cubic splines.

My personal experience with spline interpolation is that it pretty
much sucks.   Maybe I was working in an application where it isn't
suitable, but I've just always had good success with other methods
since I tried that.

>Ced
>---------------------------------------
>Posted through http://www.DSPRelated.com

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com
```
```On 5/16/15 4:15 PM, Eric Jacobsen wrote:
>
> I'm not sure what you're trying to describe, unless, perhaps, you're
> assuming that the FFT output is not folded?

just to be clear about semantics, by "folded" do you mean what MATLAB
(and Octave) call "fftshift()"?

>   In that case, then, yes,
> pad in the middle.   Otherwise, pad at the ends.   Don't assume.
> Don't even assume that something is assumed.
>

sounds very Rumsfeldian.

> Or so I assume.  ;)
>

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```
```[...snip...]

>
>It's not "my" procedure.  It's just common treatment of practical
>zero-padding that has been around since the process was invented or
>discovered or whatever you want to call it.   I learned it decades ago
>from papers and books and other people who'd been doing it for a long
>time.

Sorry, I didn't mean to give you credit for inventing/discovering it.

[...snip...]
>
>As has been discussed here before, doing very (e.g., VERY) large FFTs
>and DFTs offline is practical with the computing power and memory
>available these days.  The OP hasn't come back to clarify whether he's
>interested in real-time or off-line computation, so his preference
>isn't clear to me.

It's absolutely insane how much computing power and capacity have
increased since I started programming.  (Teletype to an IBM 360, mid
70's)

The OP wrote:

"If so, this will reduce the calculation complexity by a whole LOT."

Upon rereading, it is unclear to me too whether this is a quality or
quantity measure.  My initial impression was that he wanted to achieve
real-time.

[...snip...]

>
>Eric Jacobsen
>Anchor Hill Communications
>http://www.anchorhill.com

I don't really have anything else to add.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com
```
```One thing to watch out for is that if you use a huge fft ( big enough to transform an entire song, for example) and then you impose a near-brick-wall filter in the frequency domain, and then transform back to the time domain, you will have created a filter whose impulse response is as long as the entire song. Since your ear is doing some sort of sliding transform with a time window in the range of 10ms-40ms, such a long impulse response could be quite audible (it depends on a lot of factors, for example if the passband ripple is low enough then maybe all the ringing is at the filter cutoff which may be ultrasonic ).
You can of course get around this by imposing a more gentle roll off.

Bob
```
```On Sun, 17 May 2015 15:00:25 -0400, robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On 5/16/15 4:15 PM, Eric Jacobsen wrote:
>>
>> I'm not sure what you're trying to describe, unless, perhaps, you're
>> assuming that the FFT output is not folded?
>
>just to be clear about semantics, by "folded" do you mean what MATLAB
>(and Octave) call "fftshift()"?

Yup.   It's essentially just swapping the halves of the output vector.
We used to do this in hardware by inverting the MSB of the address bit
when the vector was read out.   If you do that inversion with an
exclusive-OR gate, it is programmable and costs only the exor gate in

>>   In that case, then, yes,
>> pad in the middle.   Otherwise, pad at the ends.   Don't assume.
>> Don't even assume that something is assumed.
>>
>
>sounds very Rumsfeldian.

>> Or so I assume.  ;)
>>
>
>
>--
>
>r b-j                  rbj@audioimagination.com
>
>"Imagination is more important than knowledge."
>
>

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com
```
```On 5/17/15 7:27 PM, Eric Jacobsen wrote:
> On Sun, 17 May 2015 15:00:25 -0400, robert bristow-johnson
> <rbj@audioimagination.com>  wrote:
>
>> On 5/16/15 4:15 PM, Eric Jacobsen wrote:
>>>
>>> I'm not sure what you're trying to describe, unless, perhaps, you're
>>> assuming that the FFT output is not folded?
>>
>> just to be clear about semantics, by "folded" do you mean what MATLAB
>> (and Octave) call "fftshift()"?
>
> Yup.   It's essentially just swapping the halves of the output vector.
> We used to do this in hardware by inverting the MSB of the address bit
> when the vector was read out.   If you do that inversion with an
> exclusive-OR gate, it is programmable and costs only the exor gate in

#define harp_mode 1

if you do it in software, it takes a whole instruction.  but then,
unlike your hardware case, the indices are wrong (by 2^MSB).  this is
another decisive reason why math products like MATLAB should allow for
user-definable index origin.  we should be able to define an x[n] with
positive and negative time, pass that to an FFT and get an X[k], also
with positive and negative indices (this time in frequency).

i mean, i have (after 2 decades) given up on MathWorks to do what's
right, but it is still a pain in arse to be juggling these index values.
i also like zero-padding my loaded-in audio buffers on both left and
right side.  i do that so that my very first frame (or the very last
frame) need not have different code than all of the other frames (that
have nice neighbors).  but then i have to mess with this "zeroPad"
offset all the time.  having to always do that is just inexcusable.

there's this C++ audio framework library called JUCE (
https://en.wikipedia.org/wiki/JUCE ) and i had a few email conversations
with the owner (Jules somebody, i think he's German) where i suggested
that he extend (in a backward compatible manner) a class called
AudioSampleBuffer.  there was two members he was missing, one was a
float called SampleRate which would get set by whatever utility reads in
an audio sample buffer from a file or a stream or an audio port A/D or
whatever.  we would have to read that Fs value separately and pass it
all around along with the AudioSampleBuffer and it just should have been
built in to it.  the AudioSampleBuffer should have all of the
information it needs to physically describe it.  if you send it to a
sound playback utility, you shouldn't also have to send Fs so that it
knows how to play it back.  if you send it to a biquad filtering utility
(where you specify parameters, like resonant frequency in Hz), you
shouldn't have to send it Fs for it to use in coefficient calculation.

the other modification i wanted Jules to put in was a built-in ZeroPad.
if ZeroPad=0, then the AudioSampleBuffer would be just like before.
if you go negative on the sample indices, you get crap.  but if
ZeroPad>0 then you could go negative on x[n] as far as x[-ZeroPad].
the 0th sample would remain the same (which would give you backward
compatibility).  we could have a utility that changes ZeroPad (and might
move the audio samples internally to accommodate an increased ZeroPad).
then you could pass audio around to analysis routines including
frame-based analysis routines and not have to worry about pushing up
against the wall either at x[0] or x[N-1] (N is the buffer length not
counting the zero pad).  and the padding wouldn't *have* to have zeros.
it *could* have valid samples from adjacent frames if you're splitting
frames up into multiple AudioSampleBuffers.  the point is you could just
plop down the center of the frame window at x[n] and not worry about
what is to the left or right of that sample.  then your code would look
the same for *any* frame and it would look the same as it would
processing frames coming from a real-time stream of audio.

but the whole idea is so that our code is most natural and explanatory.
we don't (or shouldn't) have to do our own zero-padding (and
accounting for it in our indices), we don't (or shouldn't) have to fuck
with fftshift(), we don't (or shouldn't) have to fuck with DC having a
frequency of 1.

we shouldn't have to fuck with that shit.

what'sa matter with these people?

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

```