FFT and Decibels

Started by Fred A. November 14, 2006
I would like to have a amplitude as well as a decibel display in my program 
of 16-bit stereo audio.

1) First I take the data from both channels as L,R,L,R,etc and feed it into 
the FFT as one signal.

2) Next I seperate the FFT data into alternating left and right channels. 
For example Right = FFT[0], FFT[2], FFT[4]; Left = FFT[1], FFT[3], FFT[5];

3) I take the left channel and perform: 
(sqrt((re*re)+(im*im))/HALF_FFT_LENGTH)%256 to get the frequency intensity 
scaled to 0..256 values. I realize the scaling is wrong but I am not sure 
how to scale the wide range of values to the 256 range.

4) I take the right channel and perform: 10.0 * 
log10(double(mag_sqrd(re,im))) where mag_sqrd = (re*re+im*im) to acheive the 
decibels of the frequency components.

The problem is all the frequency components seem to have approximately the 
same decibel values for each frequency and do not respond to peaks in bass 
or treble as would be expected. How can I fix this? What am I doing wrong? 


Did you use a window function on your data before the FFTs? If not, then
this could be the problem.
I just take the wave sampled audio and send it directly to the FFT. I do not 
use a window function on the stereo audio. Also I do not send the data to 
the FFT as individual channels, I send the L,R,L,R,L,R pattern to the FFT.


"Jeff Caunter" <jeffcaunter@sparkysworld.co.uk> wrote in message 
news:sfidnQGjGcTPwMfYnZ2dnUVZ_rGdnZ2d@giganews.com...
> Did you use a window function on your data before the FFTs? If not, then > this could be the problem.
Fred A. skrev:
> I would like to have a amplitude as well as a decibel display in my program > of 16-bit stereo audio. > > 1) First I take the data from both channels as L,R,L,R,etc and feed it into > the FFT as one signal.
Don't. Either use x[n] = (L+R)/2 or FFT the two channels individually and then compute the average.
> 2) Next I seperate the FFT data into alternating left and right channels. > For example Right = FFT[0], FFT[2], FFT[4]; Left = FFT[1], FFT[3], FFT[5];
Wrong.
> 3) I take the left channel and perform: > (sqrt((re*re)+(im*im))/HALF_FFT_LENGTH)%256 to get the frequency intensity > scaled to 0..256 values. I realize the scaling is wrong but I am not sure > how to scale the wide range of values to the 256 range.
Your idea may be correct, for visualization purposes. But this should be the very last step before plotting the spectrogram.
> 4) I take the right channel and perform: 10.0 * > log10(double(mag_sqrd(re,im))) where mag_sqrd = (re*re+im*im) to acheive the > decibels of the frequency components.
Correct, with some provisos. This step will give a log-scaled spectrogram, but be aware that the numerical values will not comply to the dB scales known from audiology. To achieve that, you need a calibrated sensor ans some scaling coefficients. Rune
>I just take the wave sampled audio and send it directly to the FFT. I do
not
>use a window function on the stereo audio. Also I do not send the data to
>the FFT as individual channels, I send the L,R,L,R,L,R pattern to the
FFT.
>
Do you mean that each input frame to your FFTs have LRLRLR... data in them, or do you mean that your FFT processes L and R data alternately? If the former, this is a very strange thing to do. If you are not using a window function either, then your spectrum is going to be a broadband mess. Jeff
Jeff,

As I read it, he is loading the L data into the real part and the R
data into the imaginary part of the complex input, performing the
N-point FFT, then separating the results into what he would have gotten
if he had performed an N-point FFT on L and an N-point FFT on R.  Not
really strange. One way to reduce total calculations.

Dirk

Dirk Bell
DSP Consultant

Jeff Caunter wrote:
> >I just take the wave sampled audio and send it directly to the FFT. I do > not > >use a window function on the stereo audio. Also I do not send the data to > > >the FFT as individual channels, I send the L,R,L,R,L,R pattern to the > FFT. > > > > Do you mean that each input frame to your FFTs have LRLRLR... data in > them, or do you mean that your FFT processes L and R data alternately? > > If the former, this is a very strange thing to do. If you are not using a > window function either, then your spectrum is going to be a broadband > mess. > > Jeff
Rune Allnor wrote:
> Fred A. skrev: >> I would like to have a amplitude as well as a decibel display in my program >> of 16-bit stereo audio. >> >> 1) First I take the data from both channels as L,R,L,R,etc and feed it into >> the FFT as one signal. > > Don't. Either use x[n] = (L+R)/2 or FFT the two channels individually > and then compute the average.
Why bother dividing by 2? That's just a scale factor.
>> 2) Next I seperate the FFT data into alternating left and right channels. >> For example Right = FFT[0], FFT[2], FFT[4]; Left = FFT[1], FFT[3], FFT[5]; > > Wrong. > >> 3) I take the left channel and perform: >> (sqrt((re*re)+(im*im))/HALF_FFT_LENGTH)%256 to get the frequency intensity >> scaled to 0..256 values. I realize the scaling is wrong but I am not sure >> how to scale the wide range of values to the 256 range.
No need for the square root. For the logarithm to come, hat's just a scale factor.
> Your idea may be correct, for visualization purposes. But this > should be the very last step before plotting the spectrogram. > >> 4) I take the right channel and perform: 10.0 * >> log10(double(mag_sqrd(re,im))) where mag_sqrd = (re*re+im*im) to acheive the >> decibels of the frequency components. > > Correct, with some provisos. This step will give a log-scaled > spectrogram, but be aware that the numerical values will not > comply to the dB scales known from audiology. To achieve that, > you need a calibrated sensor and some scaling coefficients.
Jerry -- Engineering is the art of making what you want from things you can get. &#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
> > Don't. Either use x[n] = (L+R)/2 or FFT the two channels individually > > and then compute the average. > > Why bother dividing by 2? That's just a scale factor.
If you're implementing this in a fixed-point system (although it sounds like the original problem is on a PC), then you could have overflow issues if L and R were both full-scale. Dividing by 2 avoids this and gives a more natural "average" of the two channels.
cincydsp@gmail.com wrote:
>>> Don't. Either use x[n] = (L+R)/2 or FFT the two channels individually >>> and then compute the average. >> Why bother dividing by 2? That's just a scale factor. > > If you're implementing this in a fixed-point system (although it sounds > like the original problem is on a PC), then you could have overflow > issues if L and R were both full-scale. Dividing by 2 avoids this and > gives a more natural "average" of the two channels.
(L + R)/2 overflows just as easily as L + R. If overflow is a concern, you have to sacrifice accuracy with L/2 + R/2, and then get the accuracy back with saved remainders is it's worth the extra code. Jerry -- Engineering is the art of making what you want from things you can get. &#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
Jerry Avins <jya@ieee.org> writes:

> cincydsp@gmail.com wrote: >>>> Don't. Either use x[n] = (L+R)/2 or FFT the two channels individually >>>> and then compute the average. >>> Why bother dividing by 2? That's just a scale factor. >> If you're implementing this in a fixed-point system (although it >> sounds >> like the original problem is on a PC), then you could have overflow >> issues if L and R were both full-scale. Dividing by 2 avoids this and >> gives a more natural "average" of the two channels. > > (L + R)/2 overflows just as easily as L + R.
Not in most DSP architectures, which have 2N+G bits in the accumulator, where N is the native datapath length. The extra G bits keep you from overflowing. -- % Randy Yates % "She has an IQ of 1001, she has a jumpsuit %% Fuquay-Varina, NC % on, and she's also a telephone." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr