# Problems with Sonogram

Started by January 15, 2008
```Hello,

I'm programming an audio application that should display a sonagram. I
convert audio samples by FFT into the frequency data, so far so good.

The problem is that the values now are in a logarithmic scale of course
and I want to transform them to a linear scale. But I don't know how. I've
got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so
maybe it would be best to re-calculate the frequency data into floating
point values from 0 to 1 and then multiplicate with 255. But to do this, I
guess I'd have to know what the maximum value of my frequency data could
be, but how can i calculate that? Also, does that depend on the sample
rate and bit resolution?

I made a testfile with a sine wave now that has maximum volume. If I
analyze the FFT data to find out what's the highest value I get values
around 3,083,540.0 as the maximum and the same value (only negative) as
the minimum.

Is that really the maximum volume for a frequency? Or do I have to
calculate it somehow, and if so, how?

Thanks
Rock Lobster
```
```Rock Lobster wrote:
> Hello,
>
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
>
> The problem is that the values now are in a logarithmic scale of course

The results of an FFT are presented logarithmically only if your program
takes their logarithms. Don't do that.

> and I want to transform them to a linear scale. But I don't know how.

If x = log(y), then y = exp(x). I thought that was elementary.

...

> I made a testfile with a sine wave now that has maximum volume. If I
> analyze the FFT data to find out what's the highest value I get values
> around 3,083,540.0 as the maximum and the same value (only negative) as
> the minimum.

Your data are evidently signed 32-bit integers. Such values range over
+/- 2^31.

> Is that really the maximum volume for a frequency? Or do I have to
> calculate it somehow, and if so, how?

You need to apply an appropriate scale factor.

Jerry
--
Engineering is the art of making what you want from things you can get.

```
```Jerry Avins wrote:

...
Typo: 23, not 32.
> Your data are evidently signed 32-bit integers. Such values range over
> +/- 2^22.

...

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```On Jan 15, 4:46&#2013266080;am, "Rock Lobster" <em...@christian-gleinser.de>
wrote:
> Hello,
>
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
>
> The problem is that the values now are in a logarithmic scale of course
> and I want to transform them to a linear scale. But I don't know how. I've
> got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so
> maybe it would be best to re-calculate the frequency data into floating
> point values from 0 to 1 and then multiplicate with 255. But to do this, I
> guess I'd have to know what the maximum value of my frequency data could
> be, but how can i calculate that? Also, does that depend on the sample
> rate and bit resolution?
>
> I made a testfile with a sine wave now that has maximum volume. If I
> analyze the FFT data to find out what's the highest value I get values
> around 3,083,540.0 as the maximum and the same value (only negative) as
> the minimum.
>
> Is that really the maximum volume for a frequency? Or do I have to
> calculate it somehow, and if so, how?
>
> Thanks
> Rock Lobster

Rock Lobster,

I assume you mean a spectrogram for looking at audio, rather than a
sonogram (often used for looking at unborn babies).

Do you really want a linear scale, which would remove a great deal of
observable signal detail, or do you want to convert the logarithmic
scale to the range of 0-255 because that is what you can plot?

Dirk

Do you really want
```
```On 15 Jan, 10:46, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Hello,
>
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
>
> The problem is that the values now are in a logarithmic scale of course
> and I want to transform them to a linear scale.

What FFT are you using? Most implementations produce data
that are linear in scale and amplitude. If your results
are logarithmic, the easiest way to get what you want
would be to switch to a standard FFT.

Rune
```
```dbell wrote...

> I assume you mean a spectrogram for looking at audio, rather than a
> sonogram (often used for looking at unborn babies).

I think he means a Sonogram, as produced by the Kay Sonograph since the
1950s - long before the invention of medical ultrasound scanning  ;-)

I'm afraid that the term Sonogram is as deeply ensconced in the
Bioacoustics world as is the name "Hoover" as applied to any vacuum
cleaner!

```
```Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm
concerned this is called a sonagram, or not?

I'm not too good at mathematics, that's one reason why I posted here. I
can't deal too well with logarithms and so on :/

I tried by playing around with certain numbers and now I got a fairly well
result, but it's a result of trying, not of calculating. So I'd like to
have a mathematical correct result with correct values. So, the highest
number can be 23 bit long, but why? I don't understand how this range is
connected to my audio data. Can this be calculated or is it always 23
bit?

At the moment I indeed got a very similar number (23 bit would be 8388608,
I use 770732,25 which is a result of experimenting). So I take the values
from the FFT, divide it by this number, and then use log() to get it into
logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0
(which isn't always possible but for the most part). My numbers are now
mostly between 0 and 5, and then I multiply those numbers with (255/5) to
fit them into my 256-color-scale. The result is looking nice and almost
the same as in my main audio program I'm using, but as you can see, it's a
result of trying and experimenting, not something "mathematically correct".
```
```Just as an example, this is what it currently looks like:

http://www.dr-wuro.com/zeha/mm.png

The song is "Rock is Dead" by Marilyn Manson, it's the "God is in the TV"
part (where the arpeggio can be seen very nicely). I think it's already
pretty usable, but not perfect yet. But it looks already pretty close to
sonagrams generated by other audio programs.
```
```On 21 Jan, 10:02, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm
> concerned this is called a sonagram, or not?

That's a spectrogram.

> I'm not too good at mathematics, that's one reason why I posted here. I
> can't deal too well with logarithms and so on :/

Then you are in trouble. If you want to implement these things
you need to be familiar with complex numbers. And you need
to be familiar with logarithms and trigonometry before you

> I tried by playing around with certain numbers and now I got a fairly well
> result, but it's a result of trying, not of calculating. So I'd like to
> have a mathematical correct result with correct values. So, the highest
> number can be 23 bit long, but why? I don't understand how this range is
> connected to my audio data. Can this be calculated or is it always 23
> bit?
>
> At the moment I indeed got a very similar number (23 bit would be 8388608,
> I use 770732,25 which is a result of experimenting). So I take the values
> from the FFT, divide it by this number, and then use log() to get it into
> logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0
> (which isn't always possible but for the most part). My numbers are now
> mostly between 0 and 5, and then I multiply those numbers with (255/5) to
> fit them into my 256-color-scale. The result is looking nice and almost
> the same as in my main audio program I'm using, but as you can see, it's a
> result of trying and experimenting, not something "mathematically correct".

If I understand you correctly, you want a spectrogram which is
logarithmic in both frequency and amplitude. If that's the case,
you can implement the spectrogram as a constant-Q filterbank.
Not particularly difficult to get something quick'n dirty up
and running, but you need to know the basics of filtering and
FFTs. And logarithms.

Rune

```
```>
>If I understand you correctly, you want a spectrogram which is
>logarithmic in both frequency and amplitude. If that's the case,
>you can implement the spectrogram as a constant-Q filterbank.
>Not particularly difficult to get something quick'n dirty up
>and running, but you need to know the basics of filtering and
>FFTs. And logarithms.
>

Well, in Y direction I display the frequency linearly from 0 to 20 kHz,
and that's perfect for me. For the amplitude I've got a linear indexed
color spectrum with colors from 0 to 255. And because of the logarithmic
nature of the amplitudes for each frequency band, I need to scale those
amplitudes into this linear color range, which is why I use ln() to get it
back into linear values.

I do understand the principles of logarithms and I do understand the
principles of what the FFT gives me as a result (otherwise I wouldn't have
managed to program a spectrogram anyway), I'm just not too familiar with
really using them, because I'm not a mathematican. And so, I've got of
course some questions about it, and personally I think those questions are
specific enough to ask them here, since I think there should be a lot of
people around who understand it.

So, my questions are basically those:
- How can I know (i.e. calculate) that my highest frequency band amplitude
value can be 2^23 ?
- How can I manage to "press" all my amplitude values into linear values
between 0 and 1?

And a new question (related to the second one and maybe answering it):
- ln(x) has its most characteristic change in shape between x=0 and x=2,
but how do I know which range in the x axis is best for re-shaping my
amplitude data? If I use x between 0.1 and 100, around 75% of all the
values will be very similar. If I use x between 0.1 and 5.0, there will be
a very significant curve to my data. So what is the correct range? Is there