DSPRelated.com
Forums

What is the result of FFT exactly?

Started by Rock Lobster June 22, 2007
Hello,

I'm rather new to DSP stuff, so I've got some little questions.

At the moment, I coded a little audioplayer which transforms the audio
data to spectrum using FFT and then back to sample data before playing it
back.

So far, so good. And the way I understand, the result of the FFT is an
array of amplitudes for every possible sine wave. But how should those
numbers be interpreted? Are these linear values?

To experiment a little, I multiplied each value with a slowly increasing
factor to create a fade in (starting from 0.0f and then increasing by
0.005f each cycle). The fade seems to be linear (to my ears at least), so
I assume, the amplitudes are linear as well?

There's another thing that's weird to me: as soon as the factor gets
bigger than 1.0f, the song starts clipping immediately. Then again, if I
increase the volume with an audio tool to 120%, there's no clipping (even
though I enabled the option "allow clipping"). Since I increase EVERY
single value of my result array (and of course all of them by multiplying
with the same value), I expect it to come out just as perfect as with the
audio tool, but it won't.

And the third thing that I don't understand is the following: I picked out
a little frequency band and left the values inside it unchanged, but I
changed the rest of the signal to zero. Looking at my sonagram, I'd expect
to have a little colorful stripe while the rest being pitch-black. The
stripe is there in fact, but the rest isn't black, it's ranging from dark
purple to dark blue. Why is there any signal, when I set all the
amplitudes to zero?

I hope my questions are understandable :)
Thank you in advance

Chris


On 22 Jun, 12:58, "Rock Lobster" <e...@christian-gleinser.de> wrote:
> Hello, > > I'm rather new to DSP stuff, so I've got some little questions. > > At the moment, I coded a little audioplayer which transforms the audio > data to spectrum using FFT and then back to sample data before playing it > back. > > So far, so good. And the way I understand, the result of the FFT is an > array of amplitudes for every possible sine wave.
Not every *possible* sine wave, but close enough... by selecting the parameters of the FFT one can tune the resolution of the spectrum, i.e. how many sines are computeded inside a bandwidth. But leave that for now.
> But how should those > numbers be interpreted? Are these linear values?
Eh, yes, inasmuch as the FFT is linear. Or do you mean "dB" or a linear scale? What comes out of the FFT is linear. There may be conversions to a logarithmic dB scale between the FFT and the display, though. Rune
"Rock Lobster" <email@christian-gleinser.de> wrote in message 
news:ncudnRVCq_GmNubbnZ2dnUVZ_ompnZ2d@giganews.com...
...............

> To experiment a little, I multiplied each value with a slowly increasing > factor to create a fade in (starting from 0.0f and then increasing by > 0.005f each cycle). The fade seems to be linear (to my ears at least), so > I assume, the amplitudes are linear as well? > > There's another thing that's weird to me: as soon as the factor gets > bigger than 1.0f, the song starts clipping immediately.
***This isn't clear to me because you don't say if the multiplication is on the time samples or the frequency samples. And, what is "f"?
>Then again, if I > increase the volume with an audio tool to 120%, there's no clipping (even > though I enabled the option "allow clipping"). Since I increase EVERY > single value of my result array (and of course all of them by multiplying > with the same value), I expect it to come out just as perfect as with the > audio tool, but it won't.
***The clipping suggests that you've increased the signal amplitude considerably - but since the process is unclear, one couldn't say why. The 120% scaling suggests that there remains adequate dynamic range to do that without much clipping.
> > And the third thing that I don't understand is the following: I picked out > a little frequency band and left the values inside it unchanged, but I > changed the rest of the signal to zero. Looking at my sonagram, I'd expect > to have a little colorful stripe while the rest being pitch-black. The > stripe is there in fact, but the rest isn't black, it's ranging from dark > purple to dark blue. Why is there any signal, when I set all the > amplitudes to zero?
You don't say how you "picked out". If in the frequency domain then if you're plotting in the frequency domain then zeros would be zeros wouldn't they? How colors are assigned is just a detail. Fred
On Jun 22, 3:58 am, "Rock Lobster" <e...@christian-gleinser.de> wrote:
> At the moment, I coded a little audioplayer which transforms the audio > data to spectrum using FFT and then back to sample data before playing it > back. > > So far, so good. And the way I understand, the result of the FFT is an > array of amplitudes for every possible sine wave.
Not every possible sine wave, but only sinusoids (consisting of a mix of sine waves and cosine waves) whose periods are exact submultiples of the FFT width. There are only a finite number of these sinusoid frequencies exactly represented in the result (let's call them "bin" frequencies). So what happens to the periodic waveforms which are in between these bin frequencies? They get broken up into components, and spattered all over the FFT result (not just the closest bin). So if you play with only one bin frequency, you are only playing with a fractional portion of some sine wave (except in that rare case when that sinusoid's period is an exact match with the FFT width). That's why when you zero a range of bins, you don't zero all the frequencies in that range, since portions of those frequencies which are not exactly centered in those bins is splattered elsewhere, and thus will still show up in the result. You will also end up munging frequencies well away from those bins, since portions of them will be spattered into the modified bins. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
"Ron N." <rhnlogic@yahoo.com> wrote in message 
news:1182529924.762171.13430@z28g2000prd.googlegroups.com...
> On Jun 22, 3:58 am, "Rock Lobster" <e...@christian-gleinser.de> wrote: >> At the moment, I coded a little audioplayer which transforms the audio >> data to spectrum using FFT and then back to sample data before playing it >> back. >> >> So far, so good. And the way I understand, the result of the FFT is an >> array of amplitudes for every possible sine wave. > > Not every possible sine wave, but only sinusoids (consisting > of a mix of sine waves and cosine waves) whose periods are > exact submultiples of the FFT width. There are only a finite > number of these sinusoid frequencies exactly represented in > the result (let's call them "bin" frequencies). > > So what happens to the periodic waveforms which are in between > these bin frequencies? They get broken up into components, > and spattered all over the FFT result (not just the closest > bin). So if you play with only one bin frequency, you are > only playing with a fractional portion of some sine wave > (except in that rare case when that sinusoid's period is an > exact match with the FFT width). > > That's why when you zero a range of bins, you don't zero all > the frequencies in that range, since portions of those > frequencies which are not exactly centered in those bins is > splattered elsewhere, and thus will still show up in the > result. You will also end up munging frequencies well away > from those bins, since portions of them will be spattered into > the modified bins.
Ron, I think I know what you're referring to but zeroed samples are zeroes nonetheless. And there should be no contribution to the other samples by zeroing. Now, if you want to talk about what splattering happens in the time domain as a result, then yes. Fred
On Jun 24, 1:48 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:
> "Ron N." <rhnlo...@yahoo.com> wrote in message > news:1182529924.762171.13430@z28g2000prd.googlegroups.com...
...
> > On Jun 22, 3:58 am, "Rock Lobster" <e...@christian-gleinser.de> wrote: > >> At the moment, I coded a little audioplayer which transforms the audio > >> data to spectrum using FFT and then back to sample data before playing it > >> back. > > >> So far, so good. And the way I understand, the result of the FFT is an > >> array of amplitudes for every possible sine wave. > > > Not every possible sine wave, but only sinusoids (consisting > > of a mix of sine waves and cosine waves) whose periods are > > exact submultiples of the FFT width. There are only a finite > > number of these sinusoid frequencies exactly represented in > > the result (let's call them "bin" frequencies). > > > So what happens to the periodic waveforms which are in between > > these bin frequencies? They get broken up into components, > > and spattered all over the FFT result (not just the closest > > bin). So if you play with only one bin frequency, you are > > only playing with a fractional portion of some sine wave > > (except in that rare case when that sinusoid's period is an > > exact match with the FFT width). > > > That's why when you zero a range of bins, you don't zero all > > the frequencies in that range, since portions of those > > frequencies which are not exactly centered in those bins is > > splattered elsewhere, and thus will still show up in the > > result. You will also end up munging frequencies well away > > from those bins, since portions of them will be spattered into > > the modified bins. > > Ron, > > I think I know what you're referring to but zeroed samples are zeroes > nonetheless. And there should be no contribution to the other samples by > zeroing. Now, if you want to talk about what splattering happens in the > time domain as a result, then yes.
If you zero a sample either in the time domain or the frequency domain, then you might not change the value of the continuous time or spectrum waveform as is passes through the other sample points, but you could cause that time domain or spectral waveform to bounce around wildly between sample points, and perhaps even at places far removed from the the one(s) you've zeroed. The "splattering" (window convolution) happens almost identically in both the time and frequency domains (unless you happen to have a dft/fft aperture exactly synchronized to the waveform periodicities.) IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
First of all, thanks for your answers!

Well, since I posted the thread, I experimented a little more, and now I
managed to build some little Equalizer which uses gaussian curves, and the
clipping was reduced a little.


To clarify my questions a bit:
1) the main question was if the frequencies in my output array are linear
(which should mean that multiplying them with 3 should get 300% volume),
or are they logarithmical (dB-like, though I'm not too familiar with
that)? Now I assume it's the latter, judging from my equalizing
experiment, but I'm not quite sure, since from my first experiments, I
thought they are linear.

2) The clipping occured when I multiplied the frequency values, not sample
points. I programmed a loop that multiplied each frequency band value with
a factor that slowly increased by 0.0005f (the f simply stands for float),
and once the factor was bigger than 1.0f (which would result in 100%
(original) volume), I got massive clipping.

3) For the zeroing thing, I divided my frequency array into six parts.
Let's say, the whole array carries 4096 floats, then I first divided it
into two parts with 2048 floats each, since the FFT result is symmetric.
Then, each part of those was again divided into three parts, of which one
was left completely unchanged (let's say the floats from 128 to 256 and
their corresponding ones in the other part from 3840 to 3968), and the
other parts' float values were radically set to zero. That way, I saw a
nice thin frequency band on a sonagram view, but the rest wasn't black, it
was purple and blue (meaning there's very low activity) which wasn't what I
expected.

I hope now it's a little bit clearer. The most important part for me would
be 1), but I'd be glad if someone could explain the other two as well.

Since I want to further develop my equalizer, it's actually important to
know if the frequency values are linear or logarithmic, so that my
parameters do not change their characteristics once the master volume is
changed.

Thank you very much!
On Jun 24, 11:07 pm, "Rock Lobster" <e...@christian-gleinser.de>
wrote:
> First of all, thanks for your answers! > > Well, since I posted the thread, I experimented a little more, and now I > managed to build some little Equalizer which uses gaussian curves, and the > clipping was reduced a little. > > To clarify my questions a bit: > 1) the main question was if the frequencies in my output array are linear > (which should mean that multiplying them with 3 should get 300% volume), > or are they logarithmical (dB-like, though I'm not too familiar with > that)? Now I assume it's the latter, judging from my equalizing > experiment, but I'm not quite sure, since from my first experiments, I > thought they are linear. > > 2) The clipping occured when I multiplied the frequency values, not sample > points. I programmed a loop that multiplied each frequency band value with > a factor that slowly increased by 0.0005f (the f simply stands for float), > and once the factor was bigger than 1.0f (which would result in 100% > (original) volume), I got massive clipping. > > 3) For the zeroing thing, I divided my frequency array into six parts. > Let's say, the whole array carries 4096 floats, then I first divided it > into two parts with 2048 floats each, since the FFT result is symmetric. > Then, each part of those was again divided into three parts, of which one > was left completely unchanged (let's say the floats from 128 to 256 and > their corresponding ones in the other part from 3840 to 3968), and the > other parts' float values were radically set to zero. That way, I saw a > nice thin frequency band on a sonagram view, but the rest wasn't black, it > was purple and blue (meaning there's very low activity) which wasn't what I > expected. > > I hope now it's a little bit clearer. The most important part for me would > be 1), but I'd be glad if someone could explain the other two as well. > > Since I want to further develop my equalizer, it's actually important to > know if the frequency values are linear or logarithmic, so that my > parameters do not change their characteristics once the master volume is > changed. > > Thank you very much!
Common FFT code is linear in both the time and frequency domains. Human hearing closer to logarithmic in response. You should end up with two FFT result arrays, real and imaginary (or cosine and sine correlations). The sine array should be antisymmetric, not symmetric. Clipping due to small changes in a multiplier around 1.0 could indicate an arithmetic, type conversion or numeric format bug of some sort. A Gaussian filter curve will result in much less frequency domain "spatter" than zeroing selected bins, so your sonogram should look quieter in the far stop band with a Gaussian equalizer. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
Ahh yes, I just tried the frequency band thing again with my gauss
equalizer, and it's indeed much more what I'd expect ;)

Another question to linearity:
The frequencies are linear (from 0 - 22 kHz), and as you said, the volume
as well. Would that mean that multiplying the entire array with 0.5f is
exactly half volume, and multiplying with 2.0f would be doubled volume? Or
is that just mathematically and the human ear would judge otherwise?

And since the frequencies are linear, but doubled frequency means one
octave increased, my equalizer should in fact work non-linear (at least in
the frequency domain), how could I accomplish this? The gaussian bell curve
should in fact get wider the higher the basic frequency gets. And depending
on my above question (about linearity of volume), the height should also be
affected the higher the overall volume gets. What would be the correct
factors to calculate that?


About the array:
In my case, I use a FFT method that I didn't code by myself, and it just
returns a symmetric float array. The method is called smsFft() but I don't
know what sms stands for in that case.
On Jun 25, 12:24 am, "Rock Lobster" <e...@christian-gleinser.de>
wrote:
> About the array: > In my case, I use a FFT method that I didn't code by myself, and it just > returns a symmetric float array. The method is called smsFft() but I don't > know what sms stands for in that case.
There is a routine on the web named smsFft() which takes and returns the vectors with the complex components interleaved in both the time and frequency domain. e.g.: real[0], imag[0], real[1], imag[1]... So your vector may actually be data pairs, and your real time domain sound data should be in every other vector element. IMHO. YMMV.