comp.dsp | Amplitude Compression

I am working on audio, samples are 16 bits, values ranging from -1 to +1
for each sample.  I want to implement what a paper I am reading calls
"amplitude compression."  These are the 3 specifications given for the
amplitude compression I want to implement:

(1) Compression ratio of 8.94:1 for |A| >= -28.6 dB
(2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB
(3) Compression ratio of 1:1.61 for |A| <= -46.4 dB

So my question is, what exactly does this mean?  

Take spec (1), for example. Does this mean that for all values of the
signal >= -28.6dB (=0.00138), we scale down the amplitude of the signal to
about 1/9 of its current value?  

And for (3), we have 1:1.61 (not 1.61:1)...  So does that mean we increase
the amplitude by a factor of 1.61 for those very small sample sizes?

I'm just trying to get a feel for what these specs mean, so I can
implement them on my sound data.

Thanks!

Reply by John O'Flaherty ●August 5, 20082008-08-05

On Mon, 04 Aug 2008 21:54:04 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:

>I am working on audio, samples are 16 bits, values ranging from -1 to +1
>for each sample.  I want to implement what a paper I am reading calls
>"amplitude compression."  These are the 3 specifications given for the
>amplitude compression I want to implement:
>
>(1) Compression ratio of 8.94:1 for |A| >= -28.6 dB
>(2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB
>(3) Compression ratio of 1:1.61 for |A| <= -46.4 dB
>
>So my question is, what exactly does this mean?  
>
>Take spec (1), for example. Does this mean that for all values of the
>signal >= -28.6dB (=0.00138), we scale down the amplitude of the signal to
>about 1/9 of its current value?  
>
>And for (3), we have 1:1.61 (not 1.61:1)...  So does that mean we increase
>the amplitude by a factor of 1.61 for those very small sample sizes?
>
>I'm just trying to get a feel for what these specs mean, so I can
>implement them on my sound data.

It seems that if it was meant the way you interpret it, then for some
signals, there would be no way to know which class they came from. For
example, if a compressed sample is 7e-4, was it originally -22 db
(6e-3), or -29 db (1e-3)?
-- 
John

Reply by bogfrog ●August 5, 20082008-08-05

>It seems that if it was meant the way you interpret it, then for some
>signals, there would be no way to know which class they came from. For
>example, if a compressed sample is 7e-4, was it originally -22 db
>(6e-3), or -29 db (1e-3)?

Yes, that is true.

But the application does not intend to decompress the samples.  It is
trying to simulate signal degradations, so what you've pointed out would
not be a problem if my interpretation were correct.  

With that in mind, do you think my interpretation is correct?  How would
you interpret it?

Reply by John O'Flaherty ●August 5, 20082008-08-05

On Tue, 05 Aug 2008 00:42:53 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:

>>It seems that if it was meant the way you interpret it, then for some
>>signals, there would be no way to know which class they came from. For
>>example, if a compressed sample is 7e-4, was it originally -22 db
>>(6e-3), or -29 db (1e-3)?
>
>Yes, that is true.
>
>But the application does not intend to decompress the samples.  It is
>trying to simulate signal degradations, so what you've pointed out would
>not be a problem if my interpretation were correct.  
>
>With that in mind, do you think my interpretation is correct?  How would
>you interpret it?

 Maybe it means that the amount of the signal over the threshold is
treated differently than the amount under the threshold. Taking a
simpler case, if you had no compression up to 0 dB, and then had a
compression ratio of 10:1 above 0 dB, it might mean that a sample at 1
dB (1.26 V) would be compressed as 1 + .26/10 = 1.026 V. Then the
output would still be a reversible function of the input for all
values. 
 Doesn't the paper describing this compression give a mathematical
specification of what is meant?

-- 
John

Reply by mboigner ●August 5, 20082008-08-05

Hello,

What&acute;s missing is a reference point. For example your output @ one of
your threshold points t1_input=-46.4dB (<- which reference?) is
t1_output=-46.4dB, which would mean 0dB gain.
With that you can calculate your output powers.
For example the output at your other threshold point t2_input = -28.6dB:
t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
When you put t1 and t2 in a graph and draw a line between them you have
your behaviour in that region (you called it (2)). There the gain decreases
with increasing input (=compression)
In your region (3)you have expansion (increasing gain with increasing
input power). Caclulate there a point t3_input = t1_input -10 = -56.4;
t3_output = -46.4 + (-10) * 1.61 = -62.5; 
Draw a line from t1 trough t3. There you see your behaviour in expansion
case.
Last but not least your region (1) which has stronger compression than
(2).
Calculate a t4_input = -28.6+10=-18.6; t4_output = -28.6 + 10 /8.94 = 
-27.4814. Draw again a line from t2 to t4 -> Strong compression finished.

Now you have the shape of your IO curve - You will see that it is
one-to-one mapped input to output and reversible.
If you have different gains at a certain input point you only have to move
this curve up or down in y direction (add/substract an offset).
If you have plotted the curve described above you will have at input -46.4
an output of -46.4

Hope that helps,
markus

www.two-pi.com

Reply by bogfrog ●August 5, 20082008-08-05

> Doesn't the paper describing this compression give a mathematical
>specification of what is meant?


Nope, the 3 conditions I listed are all that it gives.  It's just one out
of a list of different signal degradations, simulated to test for
robustness.

In fact, if you want, you can take a look at the paper here:

http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf

Take a look at the beginning of section 4.4 (page 5 of the .pdf).

Reply by Richard Dobson ●August 5, 20082008-08-05

bogfrog wrote:
> I am working on audio, samples are 16 bits, values ranging from -1 to +1
> for each sample.  I want to implement what a paper I am reading calls
> "amplitude compression."  These are the 3 specifications given for the
> amplitude compression I want to implement:
> 
> (1) Compression ratio of 8.94:1 for |A| >= -28.6 dB
> (2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB
> (3) Compression ratio of 1:1.61 for |A| <= -46.4 dB
> 
> So my question is, what exactly does this mean?  
> 

See, for example, http://en.wikipedia.org/wiki/Audio_level_compression

Note that compression of this kind is not applied sample by sample, but 
with respect to the detected overall amplitude envelope, using a window 
that might be 5-15msecs long, or much longer (e.g. up to 300msecs for an 
rms tracker or simple AGC). The task is to reduce the overall dynamic 
range of the signal by passing lower-level sounds more-or-less 
unchanged, while reducing higher-level signals pro rata - like very 
rapid "fader riding" on a mixing desk.  Audio compressors have 'attack" 
and 'release' parameters, which determine, for example, hoq quickly the 
compressor acts on a new transient (drum, guitar pluck, etc), and how 
quickly the level recovers when the input falls below the threshold. A 
delay may be applied to the input signal path so that a hard attack 
transient can be acted on as a whole at the outset. A simple example in 
broadcast is the "ducker", which drops the level of a music track 
automatically when someone speaks over it.

In short: signal->envelope_detector->level_control->output.

A complementary effect is the expander/gate, which reduces the level of 
quiet material, primarily to remove underlying system noise in the gaps 
between sound events.

Such a process applied sample by sample is what computer musicians call 
"waveshaping", where (typically) an input sinusoid is warped by a 
transfer function into some completely other periodic shape.

This is all distinct from audio compression by a-law, mu-law etc, which 
is used per sample to obtain a greater dynamic range from a small sample 
wordsize - e.g. 8 bits (associated mainly with file formats, for which 
a-law and mu-law standards are defined). Not really required for 16bit 
and beyond.

Lots more to it, of course; may well be worth asking on the musicdsp 
list - they also have a code archive.

Richard Dobson

Reply by bogfrog ●August 5, 20082008-08-05

>Hello,
>
>What&acute;s missing is a reference point. 

Thank you for the reply.  I'm not sure I follow, though.  Let me comment
on some of what you wrote:

>For example your output @ one of
>your threshold points t1_input=-46.4dB (<- which reference?) is
>t1_output=-46.4dB, which would mean 0dB gain.

I'm not sure I understand this.  At -46.4dB the ratio is 1.73:1, so for
there to be a 0dB gain, I compute the following:

1.73:1 ratio => 1/1.73 fraction = -2.38 dB

So we want:   -46.4dB -2.38dB + REF = 0dB  =>   REF = 48.78dB

Is this what you had in mind for the reference point?

>With that you can calculate your output powers.
>For example the output at your other threshold point t2_input = -28.6dB:
>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.

At this point I'm pretty much lost.  I don't see how or why you are
dividing -28.6+46.4 by 1.73.  1/1.73 is a fraction, so I don't understand
how you can mix it with the decibel values.

I'm confused, but I'll read your post again in the morning, and hopefully
it will make better sense. :)

Reply by John O'Flaherty ●August 5, 20082008-08-05

On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:

>
>> Doesn't the paper describing this compression give a mathematical
>>specification of what is meant?
>
>
>Nope, the 3 conditions I listed are all that it gives.  It's just one out
>of a list of different signal degradations, simulated to test for
>robustness.
>
>In fact, if you want, you can take a look at the paper here:
>
>http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf
>
>Take a look at the beginning of section 4.4 (page 5 of the .pdf).

 I don't think the interpretation you gave is that intended, because
it is such an unlikely thing to happen to a signal as a natural
degradation. That leaves either instantaneous compression as I
described, or intentional gain compression as described in Richard
Dobson's post. I think what is intended was more likely the
instantaneous version, because no attack/decay parameters are given
for a gain shift, and the paper seems to be trying to give a complete
description of what was done. I would also reject a dB interpretation
of the numbers in the table, since they are expressed as ratios, and
dB are already ratios.
 In my opinion, they intend a gain curve that rises from a base value
of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and
changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean
calculating the net gain by seeing where you are on the curve, as I
described in my post.
For example,
 The threshold points in voltage are 
-46.4 dB : 0.00478 V
-28.6 dB : 0.03715 V

For an input sample at -20 dB, the voltage would be +/- 0.1 V. The
degraded sample would have an instantaneous amplitude of 
(.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111
= +/- 0.03395 V.

That's my opinion, at any rate. I see that the authors' email
addresses are in the paper. Since the paper itself is in English,
though they appear to be Dutch, you might consider emailing them to
ask exactly what they meant.
-- 
John

Reply by mboigner ●August 5, 20082008-08-05

>
>Is this what you had in mind for the reference point?

No. I will try to explain again:

1) What is the reference (in the paper) when they speak about dB values?
dB is always relative, for example if you speak from gain of a system your
reference is the input of the system. If we speak from dBmicrovolt (dBu)
the reference is 1uV. I think that they speak from dB_full_scale which
means that their reference is 1 (= full scale). So when they speak from
-46.4dB => 0.0048 linear (if rms).

2) I had a short look through the paper and saw that a reference point IS
missing. To print your IO curve you need to know at exactly ONE input power
the output power your system deliveres (or you know the gain, which is
output power-input power)

>>With that you can calculate your output powers.
>>For example the output at your other threshold point t2_input =
-28.6dB:
>>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
>
>At this point I'm pretty much lost.  I don't see how or why you are
>dividing -28.6+46.4 by 1.73.  1/1.73 is a fraction, so I don't
understand
>how you can mix it with the decibel values.

3) To the ratios:
Compression or Expansion ratio = (output power 2 - output power 1) /
(input power 2 - input power 1), where input power 2 > input power 1.
So if a compression or expansion ratio in a certain region of your input
power is given, the gradient k of the linear curve y = k * x + d is given!

In the article your k is given for different regions of your IO curve but
NOT the d (which is the offset). As on the treshold points the 2 different
straight lines of the left and right region near the threshold have be
connected you need only a d for one of the 3 regions, but this is not
given.
So my bad example defined some d for one of the two threshold points.
Maybe that was somehow confusing.

I think, but thats really some guess, as the maximum amplitude is 1 maybe
you could choose for input=0dB(fullscale) the output = 0dB(fullscale). I
think that would make sense, but you have to decide if that is really
usefull for your problem.

Please also note Richards post about the estimation of the power levels.
If that should be a audio compressor you would need this.

I hope this post helps more,

Regards,
Markus

www.two-pi.com

Previous12 Next

Amplitude Compression

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group