Forums

producing spectrogram from audio file.. confusion with FFT

Started by louis October 10, 2006
Hi there,

Okay so I had already put up a post how in order to find the spectrogram
of an audio file, I was going to perform a series of FFT's and from these
compute the spectrogram..

I was confused about whether to use real fft or complex etc.. so I used
the real fft function from kiss_fft, i.e. kiss_fftr( )

The spectrogram I get has the right shape overall, however it looks
terribly noisy.. (I had used a 1024 point FFT, 50% overlapping Hanning
window, i.e. hope size of 512)

I know that it shouldn't look this noisy, can anyone think of any bug that
might cause my spectrogram to look noisy..?  :S

Thank you..

-louis
louis skrev:
> Hi there, > > Okay so I had already put up a post how in order to find the spectrogram > of an audio file, I was going to perform a series of FFT's and from these > compute the spectrogram.. > > I was confused about whether to use real fft or complex etc.. so I used > the real fft function from kiss_fft, i.e. kiss_fftr( ) > > The spectrogram I get has the right shape overall, however it looks > terribly noisy.. (I had used a 1024 point FFT, 50% overlapping Hanning > window, i.e. hope size of 512) > > I know that it shouldn't look this noisy, can anyone think of any bug that > might cause my spectrogram to look noisy..? :S
First, you don't mention the sampling rate so I assume we are talking about 44.1 kHz. A 1024 pt FFT would give a frequency resolution on the order of 44 Hz. Second, there is the issue of overlap. Usually, the frames used as innput to the FFTs in a spectrogram overlap. The more overlaop, the smoother spectrogram, but this comes at a higher computational cost. Third, use the complex-valued FFT and plot the magnitude (I assume you work off-line and can afford the extra computational time). Last -- what sound did you analyze? Rune
>First, you don't mention the sampling rate so I assume we are talking >about 44.1 kHz. A 1024 pt FFT would give a frequency resolution >on the order of 44 Hz.
Yes thats right I use a 44100 sampling rate
>Third, use the complex-valued FFT and plot the magnitude (I assume >you work off-line and can afford the extra computational time).
Will the magnitude tell me anything about the noise..?
>Last -- what sound did you analyze?
The sound I analyzed is actually the synthesis of the output from a doppler ultrasound machine. The synthesized audio sounds very close to the real thing and doesnt sound very noisy at all, so I expected its spectrogram to appear smoother
louis wrote:
> Hi there, > > Okay so I had already put up a post how in order to find the spectrogram > of an audio file, I was going to perform a series of FFT's and from these > compute the spectrogram.. > > I was confused about whether to use real fft or complex etc.. so I used > the real fft function from kiss_fft, i.e. kiss_fftr( ) > > The spectrogram I get has the right shape overall, however it looks > terribly noisy.. (I had used a 1024 point FFT, 50% overlapping Hanning > window, i.e. hope size of 512) > > I know that it shouldn't look this noisy, can anyone think of any bug that > might cause my spectrogram to look noisy..? :S > > Thank you.. > > -louis
Are you using kissfft in floating point or fixed mode? Is it consistent throughout the build? Do you have a byte-swapping issue? Do you have uninitialized data in some buffer? If these don't point you in the right direction, try putting a pure tone into the fft input and see what your spectrogram does. -- Mark Borgerding 3dB Labs, Inc Innovate. Develop. Deliver.
louis skrev:
> >Third, use the complex-valued FFT and plot the magnitude (I assume > >you work off-line and can afford the extra computational time). > > Will the magnitude tell me anything about the noise..?
What noise? If you plot the real part of the spectrum, you only use half the available information in your plots. The missing part might well account for what you think is noise. Plotting the magnitude of the complex-valued spectra uses all the available information and is very likely to produce better plots. Rune
Thanks so much for your response

>Are you using kissfft in floating point or fixed mode? Is it consistent
>throughout the build?
I am using it in floating point mode.. how would I know if it is consistent throughout the build though..?
>Do you have a byte-swapping issue?
I don't think so, I don't get garbage values anywhere?
>Do you have uninitialized data in some buffer?
I dont think so but will double check
>If these don't point you in the right direction, try putting a pure tone
>into the fft input and see what your spectrogram does.
Okay will have to get back to you on this I am currently reimplementing eveyrhting I did the last few days (my computer failed me at the worst possible time..) Anyway will get back to you on this one
Thank-you kindly for your response

>If you plot the real part of the spectrum, you only use half >the available information in your plots. The missing part might >well account for what you think is noise. > >Plotting the magnitude of the complex-valued spectra uses >all the available information and is very likely to produce >better plots.
Hmm.. I tried performing complex FFTs instead of real FFTs, and this gives me the exact same spectrogram but a mirror image of it as well. (Basically to create my spectrogram, for every FFT bin, I take the square of the real part and the square of the imaginary part and sum these together. I then have an array with 1024 magnitudes (or 512 if I'm throwing away half of it). I repeat this for 80 time steps and so I end up with a 2D array, which is my spectrogram. )
louis skrev:
> Thank-you kindly for your response > > >If you plot the real part of the spectrum, you only use half > >the available information in your plots. The missing part might > >well account for what you think is noise. > > > >Plotting the magnitude of the complex-valued spectra uses > >all the available information and is very likely to produce > >better plots. > > Hmm.. I tried performing complex FFTs instead of real FFTs, and this gives > me the exact same spectrogram but a mirror image of it as well.
Makes some sort of sense. Exactly what do you plot? Both FFTs produce complex-valued data. The complex-valued FFT produces a "mirrored" image when fed real-valued data, so that pert is good. Either variant produces complex-valued output when fed real-valued input. My previous question was related to how you plot these complex-valued data.
> (Basically to create my spectrogram, for every FFT bin, I take the square > of the real part and the square of the imaginary part and sum these > together. I then have an array with 1024 magnitudes (or 512 if I'm > throwing away half of it). I repeat this for 80 time steps and so I end up > with a 2D array, which is my spectrogram. )
This is correct. What remains, then, is to test your spectrogram with simpler data, as somebody else already suggested. Try steady-state signals first, then simple frequency sweeps. Next add noise. Last, try with coarser sampling and less overlap. Rune
"louis" <lost_bits1110@hotmail.com> writes:

> The spectrogram I get has the right shape overall, however it looks > terribly noisy.. (I had used a 1024 point FFT, 50% overlapping Hanning > window, i.e. hope size of 512) > > I know that it shouldn't look this noisy, can anyone think of any bug that > might cause my spectrogram to look noisy..? :S
Spectrograms that are published or those generated by wavesurfer/cool edit are much less noisy that plotting the magnitude of the FFT bins directly. This is because the most likely value of the magnitude of a FFT bin is zero (assuming the signal is Gaussian, the real and complex parts of the FFT bin are Gaussian, the magnitude is Laplacian with peak at zero). Some averaging over time/frequency is needed to produce a nice looking spectrogram from raw FFT output. This may not be your problem right now, but it may come up later. If anyone knows what averaging in the time/frequency domains looks prettiest I'm interested. Tony
Hi everyone,

Okay so I just plotted the spectrogram of the superposition of two pure
tones - a 1khz and a 3 khz, using my FFT/spectrogram code.. 

I'm not evne sure if its right - it looks right I see two lines - one at
the 3khz makr and the other at the 1khz mark which go across, however they
seem to be fading in and out at the edges.. It doesnt look bad but i'm just
wondering if this is accurate for the case of 2 pure tones..?? shoudlnt it
be just a solid line , or is htis the effect of the Hann window??

Then I tried getting rid of the overlapping Hann window and instead just
took FFT's of one time segment to the next , i.e. from 0-1023, then from
1024 to 2047 etc.. (so no overlaps). I didnt plot this but I just looked
at the individual power spectra, and was assuming I would see only 2
non-zero values , i.e. one at the 1khz index and the other where teh 3khz
index.. but instead I see whole bunch values that are to the 10^8, and
then around the 1 and 3khz index, they are 10^10 or something..  

I cant seem to figure out what I'm doing wrong and I really dont have
intuition wiht this stuff maybe its a problem wiht kiss_fft though i would
assume that its reliable?? I never figured the answer to the following
which someone posted though:

>Are you using kissfft in floating point or fixed mode? Is it consistent >throughout the build?
but I'm not sure how to check.. Thanks again.. LD