Forums

Problems with Sonogram

Started by Rock Lobster January 15, 2008
Hello,

I'm programming an audio application that should display a sonagram. I
convert audio samples by FFT into the frequency data, so far so good.

The problem is that the values now are in a logarithmic scale of course
and I want to transform them to a linear scale. But I don't know how. I've
got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so
maybe it would be best to re-calculate the frequency data into floating
point values from 0 to 1 and then multiplicate with 255. But to do this, I
guess I'd have to know what the maximum value of my frequency data could
be, but how can i calculate that? Also, does that depend on the sample
rate and bit resolution?

I made a testfile with a sine wave now that has maximum volume. If I
analyze the FFT data to find out what's the highest value I get values
around 3,083,540.0 as the maximum and the same value (only negative) as
the minimum.

Is that really the maximum volume for a frequency? Or do I have to
calculate it somehow, and if so, how?

Thanks
Rock Lobster
Rock Lobster wrote:
> Hello, > > I'm programming an audio application that should display a sonagram. I > convert audio samples by FFT into the frequency data, so far so good. > > The problem is that the values now are in a logarithmic scale of course
The results of an FFT are presented logarithmically only if your program takes their logarithms. Don't do that.
> and I want to transform them to a linear scale. But I don't know how.
If x = log(y), then y = exp(x). I thought that was elementary. ...
> I made a testfile with a sine wave now that has maximum volume. If I > analyze the FFT data to find out what's the highest value I get values > around 3,083,540.0 as the maximum and the same value (only negative) as > the minimum.
Your data are evidently signed 32-bit integers. Such values range over +/- 2^31.
> Is that really the maximum volume for a frequency? Or do I have to > calculate it somehow, and if so, how?
You need to apply an appropriate scale factor. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Jerry Avins wrote:

   ...
Typo: 23, not 32.
> Your data are evidently signed 32-bit integers. Such values range over > +/- 2^22.
... Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
On Jan 15, 4:46&#2013266080;am, "Rock Lobster" <em...@christian-gleinser.de>
wrote:
> Hello, > > I'm programming an audio application that should display a sonagram. I > convert audio samples by FFT into the frequency data, so far so good. > > The problem is that the values now are in a logarithmic scale of course > and I want to transform them to a linear scale. But I don't know how. I've > got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so > maybe it would be best to re-calculate the frequency data into floating > point values from 0 to 1 and then multiplicate with 255. But to do this, I > guess I'd have to know what the maximum value of my frequency data could > be, but how can i calculate that? Also, does that depend on the sample > rate and bit resolution? > > I made a testfile with a sine wave now that has maximum volume. If I > analyze the FFT data to find out what's the highest value I get values > around 3,083,540.0 as the maximum and the same value (only negative) as > the minimum. > > Is that really the maximum volume for a frequency? Or do I have to > calculate it somehow, and if so, how? > > Thanks > Rock Lobster
Rock Lobster, I assume you mean a spectrogram for looking at audio, rather than a sonogram (often used for looking at unborn babies). Do you really want a linear scale, which would remove a great deal of observable signal detail, or do you want to convert the logarithmic scale to the range of 0-255 because that is what you can plot? A little more info would help. Dirk Do you really want
On 15 Jan, 10:46, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Hello, > > I'm programming an audio application that should display a sonagram. I > convert audio samples by FFT into the frequency data, so far so good. > > The problem is that the values now are in a logarithmic scale of course > and I want to transform them to a linear scale.
What FFT are you using? Most implementations produce data that are linear in scale and amplitude. If your results are logarithmic, the easiest way to get what you want would be to switch to a standard FFT. Rune
dbell wrote...

> I assume you mean a spectrogram for looking at audio, rather than a > sonogram (often used for looking at unborn babies).
I think he means a Sonogram, as produced by the Kay Sonograph since the 1950s - long before the invention of medical ultrasound scanning ;-) I'm afraid that the term Sonogram is as deeply ensconced in the Bioacoustics world as is the name "Hoover" as applied to any vacuum cleaner!
Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm
concerned this is called a sonagram, or not?

I'm not too good at mathematics, that's one reason why I posted here. I
can't deal too well with logarithms and so on :/

I tried by playing around with certain numbers and now I got a fairly well
result, but it's a result of trying, not of calculating. So I'd like to
have a mathematical correct result with correct values. So, the highest
number can be 23 bit long, but why? I don't understand how this range is
connected to my audio data. Can this be calculated or is it always 23
bit?

At the moment I indeed got a very similar number (23 bit would be 8388608,
I use 770732,25 which is a result of experimenting). So I take the values
from the FFT, divide it by this number, and then use log() to get it into
logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0
(which isn't always possible but for the most part). My numbers are now
mostly between 0 and 5, and then I multiply those numbers with (255/5) to
fit them into my 256-color-scale. The result is looking nice and almost
the same as in my main audio program I'm using, but as you can see, it's a
result of trying and experimenting, not something "mathematically correct".
Just as an example, this is what it currently looks like:

http://www.dr-wuro.com/zeha/mm.png

The song is "Rock is Dead" by Marilyn Manson, it's the "God is in the TV"
part (where the arpeggio can be seen very nicely). I think it's already
pretty usable, but not perfect yet. But it looks already pretty close to
sonagrams generated by other audio programs.
On 21 Jan, 10:02, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm > concerned this is called a sonagram, or not?
That's a spectrogram.
> I'm not too good at mathematics, that's one reason why I posted here. I > can't deal too well with logarithms and so on :/
Then you are in trouble. If you want to implement these things you need to be familiar with complex numbers. And you need to be familiar with logarithms and trigonometry before you start with complex numbers.
> I tried by playing around with certain numbers and now I got a fairly well > result, but it's a result of trying, not of calculating. So I'd like to > have a mathematical correct result with correct values. So, the highest > number can be 23 bit long, but why? I don't understand how this range is > connected to my audio data. Can this be calculated or is it always 23 > bit? > > At the moment I indeed got a very similar number (23 bit would be 8388608, > I use 770732,25 which is a result of experimenting). So I take the values > from the FFT, divide it by this number, and then use log() to get it into > logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0 > (which isn't always possible but for the most part). My numbers are now > mostly between 0 and 5, and then I multiply those numbers with (255/5) to > fit them into my 256-color-scale. The result is looking nice and almost > the same as in my main audio program I'm using, but as you can see, it's a > result of trying and experimenting, not something "mathematically correct".
If I understand you correctly, you want a spectrogram which is logarithmic in both frequency and amplitude. If that's the case, you can implement the spectrogram as a constant-Q filterbank. Not particularly difficult to get something quick'n dirty up and running, but you need to know the basics of filtering and FFTs. And logarithms. Rune
> >If I understand you correctly, you want a spectrogram which is >logarithmic in both frequency and amplitude. If that's the case, >you can implement the spectrogram as a constant-Q filterbank. >Not particularly difficult to get something quick'n dirty up >and running, but you need to know the basics of filtering and >FFTs. And logarithms. >
Well, in Y direction I display the frequency linearly from 0 to 20 kHz, and that's perfect for me. For the amplitude I've got a linear indexed color spectrum with colors from 0 to 255. And because of the logarithmic nature of the amplitudes for each frequency band, I need to scale those amplitudes into this linear color range, which is why I use ln() to get it back into linear values. I do understand the principles of logarithms and I do understand the principles of what the FFT gives me as a result (otherwise I wouldn't have managed to program a spectrogram anyway), I'm just not too familiar with really using them, because I'm not a mathematican. And so, I've got of course some questions about it, and personally I think those questions are specific enough to ask them here, since I think there should be a lot of people around who understand it. So, my questions are basically those: - How can I know (i.e. calculate) that my highest frequency band amplitude value can be 2^23 ? - How can I manage to "press" all my amplitude values into linear values between 0 and 1? And a new question (related to the second one and maybe answering it): - ln(x) has its most characteristic change in shape between x=0 and x=2, but how do I know which range in the x axis is best for re-shaping my amplitude data? If I use x between 0.1 and 100, around 75% of all the values will be very similar. If I use x between 0.1 and 5.0, there will be a very significant curve to my data. So what is the correct range? Is there a correct answer?