DSPRelated.com
Forums

producing spectrogram from audio file.. confusion with FFT

Started by louis October 10, 2006
Hi everyone,

Okay so I just plotted the spectrogram of the superposition of two pure
tones - a 1khz and a 3 khz, using my FFT/spectrogram code.. 

I'm not evne sure if its right - it looks right I see two lines - one at
the 3khz makr and the other at the 1khz mark which go across, however they
seem to be fading in and out at the edges.. It doesnt look bad but i'm just
wondering if this is accurate for the case of 2 pure tones..?? shoudlnt it
be just a solid line , or is htis the effect of the Hann window??

Then I tried getting rid of the overlapping Hann window and instead just
took FFT's of one time segment to the next , i.e. from 0-1023, then from
1024 to 2047 etc.. (so no overlaps). I didnt plot this but I just looked
at the individual power spectra, and was assuming I would see only 2
non-zero values , i.e. one at the 1khz index and the other where teh 3khz
index.. but instead I see whole bunch values that are to the 10^8, and
then around the 1 and 3khz index, they are 10^10 or something..  

I cant seem to figure out what I'm doing wrong and I really dont have
intuition wiht this stuff maybe its a problem wiht kiss_fft though i would
assume that its reliable?? I never figured the answer to the following
which someone posted though:

>Are you using kissfft in floating point or fixed mode? Is it consistent >throughout the build?
but I'm not sure how to check.. Thanks again.. LD
Hi again,

Okay actually nevermind about the float/fixed point issue - I am using
floating point and I believe this is the default with kissfft so this
shouldnlt be the problem..

Basically this is what I do to get my spectrogram:

fftsize = 1024;
buffsize = 441000;
hopsize = 512;

for( index=0; index<buffsize-fftsize; index+=hopsize )
{
	//-------------- Apply the Hann window -----------------//
	for( s=0; s<fftsize; s++ )
	{
		input_window[s] = audiobuffer[index+s]*hanning[s];
		rin[s] = input_window[s];				
	}

	//------------------ Perform FFT -----------------------//
	kiss_fftr( kiss_fftr_state, rin, sout );

	//------------------ Compute Magnitude -----------------//
	for( s=0; s<fftsize; s++ )
	{
	        output_frame[s] = powf( sout[s].r , 2.0 ) + powf( sout[s].i , 2.0
);
	}
		
	//----------------- Assort power values ----------------//
	for( s=0; s<fftsize/2; s++ )
	{
		assorted_output[s+fftsize/2] = output_frame[s];
	}
		
	for( s=fftsize; s>=fftsize/2; s-- )
	{
		assorted_output[s-fftsize/2] = output_frame[s];
	}

		
	//------------------ Write out to file -----------------//
	//---writing out--//
	
	//------------------ Increment loop -------------------//
	lp++;
}
	
I can't see any flaw with it? But there must be a bug in here because my
power spectra don't look completely right as I described in my last post..
especially for the case when I dont overlap with Hann and just do:

for( index=0; index<buffsize-fftsize; index+=fftsize )
{
	for( s=0; s<fftsize; s++ )
	{
		input_window[s] = audiobuffer[index+s];
		rin[s] = input_window[s];				
	}
           
        // etc...
}
"louis" <lost_bits1110@hotmail.com> writes:

> for( s=fftsize; s>=fftsize/2; s-- ) > { > assorted_output[s-fftsize/2] = output_frame[s]; > }
I haven't been following the thread, but in any case you have an error here. Should be for( s=fftsize-1; s>=fftsize/2; s-- ) { assorted_output[s-fftsize/2] = output_frame[s]; } -- % Randy Yates % "So now it's getting late, %% Fuquay-Varina, NC % and those who hesitate %%% 919-577-9882 % got no one..." %%%% <yates@ieee.org> % 'Waterfall', *Face The Music*, ELO http://home.earthlink.net/~yatescr
Thanks so much, you're right! I added that '-1'.. however the output still
seems wrong.. :S
I guess I'll have to try fftw or something (though I'm sure there
shouldn't be anything wrong with kissfft??) but i guess with programming
you can never be 100% sure.. sigh.

>"louis" <lost_bits1110@hotmail.com> writes: > >> for( s=fftsize; s>=fftsize/2; s-- ) >> { >> assorted_output[s-fftsize/2] = output_frame[s]; >> } > >I haven't been following the thread, but in any case you >have an error here. Should be > > for( s=fftsize-1; s>=fftsize/2; s-- ) > { > assorted_output[s-fftsize/2] = output_frame[s]; > }
"louis" <lost_bits1110@hotmail.com> writes:

> Thanks so much, you're right! I added that '-1'.. however the output still > seems wrong.. :S > I guess I'll have to try fftw or something (though I'm sure there > shouldn't be anything wrong with kissfft??) but i guess with programming > you can never be 100% sure.. sigh.
Hey Louis, You're welcome! However, I feel like I could help you more (and there's probably others here too) if you would include the entire listing instead of a piece. For example, I don't know what data types you're using for s, fftsize, etc. I also have no idea what the lp++ is for at the end of the loop. Also, what are you using input_window[] for? It looks useless from the code you've provided. Your basic approach seems sound - there may be a problem in the coding, or in your understanding of what to expect. For the latter, I suggest you use something like octave (http://www.octave.org) to take the fft of your input data and plot the resulting magnitude squared. --Randy
> >>"louis" <lost_bits1110@hotmail.com> writes: >> >>> for( s=fftsize; s>=fftsize/2; s-- ) >>> { >>> assorted_output[s-fftsize/2] = output_frame[s]; >>> } >> >>I haven't been following the thread, but in any case you >>have an error here. Should be >> >> for( s=fftsize-1; s>=fftsize/2; s-- ) >> { >> assorted_output[s-fftsize/2] = output_frame[s]; >> } > > >
-- % Randy Yates % "Rollin' and riding and slippin' and %% Fuquay-Varina, NC % sliding, it's magic." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Living' Thing', *A New World Record*, ELO http://home.earthlink.net/~yatescr
>Hey Louis, > >You're welcome! However, I feel like I could help you more (and there's >probably others here too) if you would include the entire listing
instead
>of a piece. For example, I don't know what data types you're using for >s, fftsize, etc. I also have no idea what the lp++ is for at the end >of the loop. Also, what are you using input_window[] for? It looks >useless from the code you've provided. > >Your basic approach seems sound - there may be a problem in the coding, >or in your understanding of what to expect. For the latter, I >suggest you use something like octave (http://www.octave.org) to >take the fft of your input data and plot the resulting magnitude >squared.
Hi Randy, Thanks so much for your help! (I can't believe I'm having so much trouble getting a silly spectrogram!! argh!) Here is my full code programmed in VC++ using a current version of kissfft: #include <iostream> #include <fstream> #include <string> using namespace std; #include <math.h> #include <stdlib.h> #include "_kiss_fft_guts.h" #include "kiss_fft.h" #include "kiss_fftr.h" const int fftsize = 1024; const int hopsize = 512; // buffsize hardcoded for now: 10 audio cycles x 44100 sampling rate x 0.92 seconds per audio cycle const int buffsize = 405720; const float filter=40; // all values less than 40 Hz should be set to 0. const int samplerate= 44100; float hanning[fftsize]; float input_window[fftsize]; float output_frame[fftsize]; float assorted_output[fftsize]; float audiobuffer[buffsize]; kiss_fft_scalar rin[fftsize+2]; kiss_fft_cpx sout[fftsize]; int main() { kiss_fftr_cfg kiss_fftr_state; int temp, loops, i, j, k, lp, s, index, bins; char outfilename[128]; float frequencyResolution; ifstream audiofile; ofstream powerspectfile; kiss_fftr_state = kiss_fftr_alloc(fftsize,0,0,0); kiss_fft_scalar zero; memset(&zero,0,sizeof(zero) ); filename = "audio.txt"; audiofile.open( filename, ios::in ); //******* Create Hann Window ****************// for (k = 0; k < fftsize; k++) { hanning[k] = 0.5 - 0.5 * cos(6.2832 * k / fftsize); } //****** Read-in samples from ascii audio file ******// for( i=0; i<buffsize; i++ ) { audiofile>>audiobuffer[i]; } audiofile.close(); loops = buffsize/hopsize; lp = 0; //********Loop FFT **************************// for( index=0; index<buffsize-fftsize; index+=hopsize ) { //-------------- Apply the Hann window -----------------// for( s=0; s<fftsize; s++ ) { input_window[s] = audiobuffer[index+s]*hanning[s]; rin[s] = input_window[s]; } //------------------ Perform FFT -----------------------// kiss_fftr( kiss_fftr_state, rin, sout ); //------------------ Compute Magnitude -----------------// for( s=0; s<fftsize; s++ ) { output_frame[s] = powf( sout[s].r , 2.0 ) + powf( sout[s].i , 2.0 ); } //----------------- Assort power values ----------------// for( s=0; s<fftsize/2; s++ ) { assorted_output[s+fftsize/2] = output_frame[s]; } for( s=fftsize-1; s>=fftsize/2; s-- ) { assorted_output[s-fftsize/2] = output_frame[s]; } //-----------Remove frequencies less than 40 Hz -------// frequencyResolution = (float)samplerate/(float)fftsize; bins = ceil(filter/frequencyResolution); for( j=0; j<=wallbins; j++ ) { assorted_output[ fftsize/2 - j ] = 0; assorted_output[ fftsize/2 + j ] = 0; } //------------------ Write out to file -----------------// sprintf( outfilename, "%s.powspec%5.5d", "cca", lp ); powerspectfile.open(outfilename, ios::out); for ( j = 0; j < fftsize; j++ ) { powerspectfile<<assorted_output[j]<<endl; } powerspectfile.close(); //------------------ Increment loop -------------------// lp++; } free(kiss_fft_state); free(kiss_fftr_state); return 0; }
btw.. "loop" is useless in the code i just posted, that was just for output
purposes.

So just to clarify, that code i posted basically creates a power spectra
at each "lp" value. Then I have other code which combines all the
powerspectra to make the spectrogram..

I can't figure out why its so noisy.. the audio doesn't sound noisy at
all.. but my spectrogram shows otherwise.. :(
louis wrote:

[...]
> //------------------ Perform FFT -----------------------// > kiss_fftr( kiss_fftr_state, rin, sout ); > > //------------------ Compute Magnitude -----------------// > for( s=0; s<fftsize; s++ ) > { > output_frame[s] = powf( sout[s].r , 2.0 ) + powf( sout[s].i , 2.0 ); > }
The output of the real forward fft is roughly half the FFT [0:pi]. The transform is conjugate symmetric. The second half can be easily inferred from the first half. So your frequency domain calculations should be using (fftsize/2+1) complex pairs, not fftsize. -- Mark Borgerding
Thanks for your response, you're totally right. I changed that but I still
see the same spectrogram :s

btw I made a mistake when posting my code, below "wallbins" should be just
"bins" so:

"//-----------Remove frequencies less than 40 Hz -------//

frequencyResolution = (float)samplerate/(float)fftsize;
bins = ceil(filter/frequencyResolution);

for( j=0; j<=bins; j++ )         //<-----------
{
assorted_output[ fftsize/2 - j ] = 0;
assorted_output[ fftsize/2 + j ] = 0;
}"

not a huge deal but just in case there was confusion.
louis skrev:
> Hi everyone, > > Okay so I just plotted the spectrogram of the superposition of two pure > tones - a 1khz and a 3 khz, using my FFT/spectrogram code.. > > I'm not evne sure if its right - it looks right I see two lines - one at > the 3khz makr and the other at the 1khz mark which go across, however they > seem to be fading in and out at the edges.. It doesnt look bad but i'm just > wondering if this is accurate for the case of 2 pure tones..?? shoudlnt it > be just a solid line , or is htis the effect of the Hann window??
It is not an effect of the Hann window, it might be an effect of partially filled buffers at the ends. Rune