comp.dsp | Problems with Sonogram

Hello,

I'm programming an audio application that should display a sonagram. I
convert audio samples by FFT into the frequency data, so far so good.

The problem is that the values now are in a logarithmic scale of course
and I want to transform them to a linear scale. But I don't know how. I've
got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so
maybe it would be best to re-calculate the frequency data into floating
point values from 0 to 1 and then multiplicate with 255. But to do this, I
guess I'd have to know what the maximum value of my frequency data could
be, but how can i calculate that? Also, does that depend on the sample
rate and bit resolution?

I made a testfile with a sine wave now that has maximum volume. If I
analyze the FFT data to find out what's the highest value I get values
around 3,083,540.0 as the maximum and the same value (only negative) as
the minimum.

Is that really the maximum volume for a frequency? Or do I have to
calculate it somehow, and if so, how?

Thanks
Rock Lobster

Reply by Jerry Avins ●January 15, 20082008-01-15

Rock Lobster wrote:
> Hello,
> 
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
> 
> The problem is that the values now are in a logarithmic scale of course

The results of an FFT are presented logarithmically only if your program 
takes their logarithms. Don't do that.

> and I want to transform them to a linear scale. But I don't know how.

If x = log(y), then y = exp(x). I thought that was elementary.

   ...

> I made a testfile with a sine wave now that has maximum volume. If I
> analyze the FFT data to find out what's the highest value I get values
> around 3,083,540.0 as the maximum and the same value (only negative) as
> the minimum.

Your data are evidently signed 32-bit integers. Such values range over 
+/- 2^31.

> Is that really the maximum volume for a frequency? Or do I have to
> calculate it somehow, and if so, how?

You need to apply an appropriate scale factor.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Jerry Avins ●January 15, 20082008-01-15

Jerry Avins wrote:

   ...
Typo: 23, not 32.
> Your data are evidently signed 32-bit integers. Such values range over 
> +/- 2^22.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by dbell ●January 15, 20082008-01-15

On Jan 15, 4:46&#4294967295;am, "Rock Lobster" <em...@christian-gleinser.de>
wrote:
> Hello,
>
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
>
> The problem is that the values now are in a logarithmic scale of course
> and I want to transform them to a linear scale. But I don't know how. I've
> got 256 colors, so I'd like to have 256 linear steps from 0 to 255, so
> maybe it would be best to re-calculate the frequency data into floating
> point values from 0 to 1 and then multiplicate with 255. But to do this, I
> guess I'd have to know what the maximum value of my frequency data could
> be, but how can i calculate that? Also, does that depend on the sample
> rate and bit resolution?
>
> I made a testfile with a sine wave now that has maximum volume. If I
> analyze the FFT data to find out what's the highest value I get values
> around 3,083,540.0 as the maximum and the same value (only negative) as
> the minimum.
>
> Is that really the maximum volume for a frequency? Or do I have to
> calculate it somehow, and if so, how?
>
> Thanks
> Rock Lobster

Rock Lobster,

I assume you mean a spectrogram for looking at audio, rather than a
sonogram (often used for looking at unborn babies).

Do you really want a linear scale, which would remove a great deal of
observable signal detail, or do you want to convert the logarithmic
scale to the range of 0-255 because that is what you can plot?

A little more info would help.

Dirk

Do you really want

Reply by Rune Allnor ●January 15, 20082008-01-15

On 15 Jan, 10:46, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Hello,
>
> I'm programming an audio application that should display a sonagram. I
> convert audio samples by FFT into the frequency data, so far so good.
>
> The problem is that the values now are in a logarithmic scale of course
> and I want to transform them to a linear scale.

What FFT are you using? Most implementations produce data
that are linear in scale and amplitude. If your results
are logarithmic, the easiest way to get what you want
would be to switch to a standard FFT.

Rune

Reply by David Lee ●January 16, 20082008-01-16

dbell wrote...

> I assume you mean a spectrogram for looking at audio, rather than a
> sonogram (often used for looking at unborn babies).

I think he means a Sonogram, as produced by the Kay Sonograph since the 
1950s - long before the invention of medical ultrasound scanning  ;-)

I'm afraid that the term Sonogram is as deeply ensconced in the 
Bioacoustics world as is the name "Hoover" as applied to any vacuum 
cleaner!

Reply by Rock Lobster ●January 21, 20082008-01-21

Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm
concerned this is called a sonagram, or not?

I'm not too good at mathematics, that's one reason why I posted here. I
can't deal too well with logarithms and so on :/

I tried by playing around with certain numbers and now I got a fairly well
result, but it's a result of trying, not of calculating. So I'd like to
have a mathematical correct result with correct values. So, the highest
number can be 23 bit long, but why? I don't understand how this range is
connected to my audio data. Can this be calculated or is it always 23
bit?

At the moment I indeed got a very similar number (23 bit would be 8388608,
I use 770732,25 which is a result of experimenting). So I take the values
from the FFT, divide it by this number, and then use log() to get it into
logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0
(which isn't always possible but for the most part). My numbers are now
mostly between 0 and 5, and then I multiply those numbers with (255/5) to
fit them into my 256-color-scale. The result is looking nice and almost
the same as in my main audio program I'm using, but as you can see, it's a
result of trying and experimenting, not something "mathematically correct".

Reply by Rock Lobster ●January 21, 20082008-01-21

Just as an example, this is what it currently looks like:

http://www.dr-wuro.com/zeha/mm.png

The song is "Rock is Dead" by Marilyn Manson, it's the "God is in the TV"
part (where the arpeggio can be seen very nicely). I think it's already
pretty usable, but not perfect yet. But it looks already pretty close to
sonagrams generated by other audio programs.

Reply by Rune Allnor ●January 21, 20082008-01-21

On 21 Jan, 10:02, "Rock Lobster" <em...@christian-gleinser.de> wrote:
> Well, indeed a mean a spectrogram from 0 to 20 kHz, but as far as I'm
> concerned this is called a sonagram, or not?

That's a spectrogram.

> I'm not too good at mathematics, that's one reason why I posted here. I
> can't deal too well with logarithms and so on :/

Then you are in trouble. If you want to implement these things
you need to be familiar with complex numbers. And you need
to be familiar with logarithms and trigonometry before you
start with complex numbers.

> I tried by playing around with certain numbers and now I got a fairly well
> result, but it's a result of trying, not of calculating. So I'd like to
> have a mathematical correct result with correct values. So, the highest
> number can be 23 bit long, but why? I don't understand how this range is
> connected to my audio data. Can this be calculated or is it always 23
> bit?
>
> At the moment I indeed got a very similar number (23 bit would be 8388608,
> I use 770732,25 which is a result of experimenting). So I take the values
> from the FFT, divide it by this number, and then use log() to get it into
> logarithmic scale. But I add 4.5 to avoid getting numbers lower than 0
> (which isn't always possible but for the most part). My numbers are now
> mostly between 0 and 5, and then I multiply those numbers with (255/5) to
> fit them into my 256-color-scale. The result is looking nice and almost
> the same as in my main audio program I'm using, but as you can see, it's a
> result of trying and experimenting, not something "mathematically correct".

If I understand you correctly, you want a spectrogram which is
logarithmic in both frequency and amplitude. If that's the case,
you can implement the spectrogram as a constant-Q filterbank.
Not particularly difficult to get something quick'n dirty up
and running, but you need to know the basics of filtering and
FFTs. And logarithms.

Rune

Reply by Rock Lobster ●January 21, 20082008-01-21

>
>If I understand you correctly, you want a spectrogram which is
>logarithmic in both frequency and amplitude. If that's the case,
>you can implement the spectrogram as a constant-Q filterbank.
>Not particularly difficult to get something quick'n dirty up
>and running, but you need to know the basics of filtering and
>FFTs. And logarithms.
>

Well, in Y direction I display the frequency linearly from 0 to 20 kHz,
and that's perfect for me. For the amplitude I've got a linear indexed
color spectrum with colors from 0 to 255. And because of the logarithmic
nature of the amplitudes for each frequency band, I need to scale those
amplitudes into this linear color range, which is why I use ln() to get it
back into linear values.

I do understand the principles of logarithms and I do understand the
principles of what the FFT gives me as a result (otherwise I wouldn't have
managed to program a spectrogram anyway), I'm just not too familiar with
really using them, because I'm not a mathematican. And so, I've got of
course some questions about it, and personally I think those questions are
specific enough to ask them here, since I think there should be a lot of
people around who understand it.

So, my questions are basically those:
- How can I know (i.e. calculate) that my highest frequency band amplitude
value can be 2^23 ?
- How can I manage to "press" all my amplitude values into linear values
between 0 and 1?

And a new question (related to the second one and maybe answering it):
- ln(x) has its most characteristic change in shape between x=0 and x=2,
but how do I know which range in the x axis is best for re-shaping my
amplitude data? If I use x between 0.1 and 100, around 75% of all the
values will be very similar. If I use x between 0.1 and 5.0, there will be
a very significant curve to my data. So what is the correct range? Is there
a correct answer?

Previous12 Next

Problems with Sonogram

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group