I am working on audio, samples are 16 bits, values ranging from -1 to +1 for each sample. I want to implement what a paper I am reading calls "amplitude compression." These are the 3 specifications given for the amplitude compression I want to implement: (1) Compression ratio of 8.94:1 for |A| >= -28.6 dB (2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB (3) Compression ratio of 1:1.61 for |A| <= -46.4 dB So my question is, what exactly does this mean? Take spec (1), for example. Does this mean that for all values of the signal >= -28.6dB (=0.00138), we scale down the amplitude of the signal to about 1/9 of its current value? And for (3), we have 1:1.61 (not 1.61:1)... So does that mean we increase the amplitude by a factor of 1.61 for those very small sample sizes? I'm just trying to get a feel for what these specs mean, so I can implement them on my sound data. Thanks!

# Amplitude Compression

Started by ●August 4, 2008

Reply by ●August 5, 20082008-08-05

On Mon, 04 Aug 2008 21:54:04 -0500, "bogfrog" <aj00mcgraw@gmail.com> wrote:>I am working on audio, samples are 16 bits, values ranging from -1 to +1 >for each sample. I want to implement what a paper I am reading calls >"amplitude compression." These are the 3 specifications given for the >amplitude compression I want to implement: > >(1) Compression ratio of 8.94:1 for |A| >= -28.6 dB >(2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB >(3) Compression ratio of 1:1.61 for |A| <= -46.4 dB > >So my question is, what exactly does this mean? > >Take spec (1), for example. Does this mean that for all values of the >signal >= -28.6dB (=0.00138), we scale down the amplitude of the signal to >about 1/9 of its current value? > >And for (3), we have 1:1.61 (not 1.61:1)... So does that mean we increase >the amplitude by a factor of 1.61 for those very small sample sizes? > >I'm just trying to get a feel for what these specs mean, so I can >implement them on my sound data.It seems that if it was meant the way you interpret it, then for some signals, there would be no way to know which class they came from. For example, if a compressed sample is 7e-4, was it originally -22 db (6e-3), or -29 db (1e-3)? -- John

Reply by ●August 5, 20082008-08-05

>It seems that if it was meant the way you interpret it, then for some >signals, there would be no way to know which class they came from. For >example, if a compressed sample is 7e-4, was it originally -22 db >(6e-3), or -29 db (1e-3)?Yes, that is true. But the application does not intend to decompress the samples. It is trying to simulate signal degradations, so what you've pointed out would not be a problem if my interpretation were correct. With that in mind, do you think my interpretation is correct? How would you interpret it?

Reply by ●August 5, 20082008-08-05

On Tue, 05 Aug 2008 00:42:53 -0500, "bogfrog" <aj00mcgraw@gmail.com> wrote:>>It seems that if it was meant the way you interpret it, then for some >>signals, there would be no way to know which class they came from. For >>example, if a compressed sample is 7e-4, was it originally -22 db >>(6e-3), or -29 db (1e-3)? > >Yes, that is true. > >But the application does not intend to decompress the samples. It is >trying to simulate signal degradations, so what you've pointed out would >not be a problem if my interpretation were correct. > >With that in mind, do you think my interpretation is correct? How would >you interpret it?Maybe it means that the amount of the signal over the threshold is treated differently than the amount under the threshold. Taking a simpler case, if you had no compression up to 0 dB, and then had a compression ratio of 10:1 above 0 dB, it might mean that a sample at 1 dB (1.26 V) would be compressed as 1 + .26/10 = 1.026 V. Then the output would still be a reversible function of the input for all values. Doesn't the paper describing this compression give a mathematical specification of what is meant? -- John

Reply by ●August 5, 20082008-08-05

Hello, What´s missing is a reference point. For example your output @ one of your threshold points t1_input=-46.4dB (<- which reference?) is t1_output=-46.4dB, which would mean 0dB gain. With that you can calculate your output powers. For example the output at your other threshold point t2_input = -28.6dB: t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110. When you put t1 and t2 in a graph and draw a line between them you have your behaviour in that region (you called it (2)). There the gain decreases with increasing input (=compression) In your region (3)you have expansion (increasing gain with increasing input power). Caclulate there a point t3_input = t1_input -10 = -56.4; t3_output = -46.4 + (-10) * 1.61 = -62.5; Draw a line from t1 trough t3. There you see your behaviour in expansion case. Last but not least your region (1) which has stronger compression than (2). Calculate a t4_input = -28.6+10=-18.6; t4_output = -28.6 + 10 /8.94 = -27.4814. Draw again a line from t2 to t4 -> Strong compression finished. Now you have the shape of your IO curve - You will see that it is one-to-one mapped input to output and reversible. If you have different gains at a certain input point you only have to move this curve up or down in y direction (add/substract an offset). If you have plotted the curve described above you will have at input -46.4 an output of -46.4 Hope that helps, markus www.two-pi.com

Reply by ●August 5, 20082008-08-05

> Doesn't the paper describing this compression give a mathematical >specification of what is meant?Nope, the 3 conditions I listed are all that it gives. It's just one out of a list of different signal degradations, simulated to test for robustness. In fact, if you want, you can take a look at the paper here: http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf Take a look at the beginning of section 4.4 (page 5 of the .pdf).

Reply by ●August 5, 20082008-08-05

bogfrog wrote:> I am working on audio, samples are 16 bits, values ranging from -1 to +1 > for each sample. I want to implement what a paper I am reading calls > "amplitude compression." These are the 3 specifications given for the > amplitude compression I want to implement: > > (1) Compression ratio of 8.94:1 for |A| >= -28.6 dB > (2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB > (3) Compression ratio of 1:1.61 for |A| <= -46.4 dB > > So my question is, what exactly does this mean? >See, for example, http://en.wikipedia.org/wiki/Audio_level_compression Note that compression of this kind is not applied sample by sample, but with respect to the detected overall amplitude envelope, using a window that might be 5-15msecs long, or much longer (e.g. up to 300msecs for an rms tracker or simple AGC). The task is to reduce the overall dynamic range of the signal by passing lower-level sounds more-or-less unchanged, while reducing higher-level signals pro rata - like very rapid "fader riding" on a mixing desk. Audio compressors have 'attack" and 'release' parameters, which determine, for example, hoq quickly the compressor acts on a new transient (drum, guitar pluck, etc), and how quickly the level recovers when the input falls below the threshold. A delay may be applied to the input signal path so that a hard attack transient can be acted on as a whole at the outset. A simple example in broadcast is the "ducker", which drops the level of a music track automatically when someone speaks over it. In short: signal->envelope_detector->level_control->output. A complementary effect is the expander/gate, which reduces the level of quiet material, primarily to remove underlying system noise in the gaps between sound events. Such a process applied sample by sample is what computer musicians call "waveshaping", where (typically) an input sinusoid is warped by a transfer function into some completely other periodic shape. This is all distinct from audio compression by a-law, mu-law etc, which is used per sample to obtain a greater dynamic range from a small sample wordsize - e.g. 8 bits (associated mainly with file formats, for which a-law and mu-law standards are defined). Not really required for 16bit and beyond. Lots more to it, of course; may well be worth asking on the musicdsp list - they also have a code archive. Richard Dobson

Reply by ●August 5, 20082008-08-05

>Hello, > >What´s missing is a reference point.Thank you for the reply. I'm not sure I follow, though. Let me comment on some of what you wrote:>For example your output @ one of >your threshold points t1_input=-46.4dB (<- which reference?) is >t1_output=-46.4dB, which would mean 0dB gain.I'm not sure I understand this. At -46.4dB the ratio is 1.73:1, so for there to be a 0dB gain, I compute the following: 1.73:1 ratio => 1/1.73 fraction = -2.38 dB So we want: -46.4dB -2.38dB + REF = 0dB => REF = 48.78dB Is this what you had in mind for the reference point?>With that you can calculate your output powers. >For example the output at your other threshold point t2_input = -28.6dB: >t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.At this point I'm pretty much lost. I don't see how or why you are dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don't understand how you can mix it with the decibel values. I'm confused, but I'll read your post again in the morning, and hopefully it will make better sense. :)

Reply by ●August 5, 20082008-08-05

On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com> wrote:> >> Doesn't the paper describing this compression give a mathematical >>specification of what is meant? > > >Nope, the 3 conditions I listed are all that it gives. It's just one out >of a list of different signal degradations, simulated to test for >robustness. > >In fact, if you want, you can take a look at the paper here: > >http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf > >Take a look at the beginning of section 4.4 (page 5 of the .pdf).I don't think the interpretation you gave is that intended, because it is such an unlikely thing to happen to a signal as a natural degradation. That leaves either instantaneous compression as I described, or intentional gain compression as described in Richard Dobson's post. I think what is intended was more likely the instantaneous version, because no attack/decay parameters are given for a gain shift, and the paper seems to be trying to give a complete description of what was done. I would also reject a dB interpretation of the numbers in the table, since they are expressed as ratios, and dB are already ratios. In my opinion, they intend a gain curve that rises from a base value of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean calculating the net gain by seeing where you are on the curve, as I described in my post. For example, The threshold points in voltage are -46.4 dB : 0.00478 V -28.6 dB : 0.03715 V For an input sample at -20 dB, the voltage would be +/- 0.1 V. The degraded sample would have an instantaneous amplitude of (.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111 = +/- 0.03395 V. That's my opinion, at any rate. I see that the authors' email addresses are in the paper. Since the paper itself is in English, though they appear to be Dutch, you might consider emailing them to ask exactly what they meant. -- John

Reply by ●August 5, 20082008-08-05

> >Is this what you had in mind for the reference point?No. I will try to explain again: 1) What is the reference (in the paper) when they speak about dB values? dB is always relative, for example if you speak from gain of a system your reference is the input of the system. If we speak from dBmicrovolt (dBu) the reference is 1uV. I think that they speak from dB_full_scale which means that their reference is 1 (= full scale). So when they speak from -46.4dB => 0.0048 linear (if rms). 2) I had a short look through the paper and saw that a reference point IS missing. To print your IO curve you need to know at exactly ONE input power the output power your system deliveres (or you know the gain, which is output power-input power)>>With that you can calculate your output powers. >>For example the output at your other threshold point t2_input =-28.6dB:>>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110. > >At this point I'm pretty much lost. I don't see how or why you are >dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don'tunderstand>how you can mix it with the decibel values.3) To the ratios: Compression or Expansion ratio = (output power 2 - output power 1) / (input power 2 - input power 1), where input power 2 > input power 1. So if a compression or expansion ratio in a certain region of your input power is given, the gradient k of the linear curve y = k * x + d is given! In the article your k is given for different regions of your IO curve but NOT the d (which is the offset). As on the treshold points the 2 different straight lines of the left and right region near the threshold have be connected you need only a d for one of the 3 regions, but this is not given. So my bad example defined some d for one of the two threshold points. Maybe that was somehow confusing. I think, but thats really some guess, as the maximum amplitude is 1 maybe you could choose for input=0dB(fullscale) the output = 0dB(fullscale). I think that would make sense, but you have to decide if that is really usefull for your problem. Please also note Richards post about the estimation of the power levels. If that should be a audio compressor you would need this. I hope this post helps more, Regards, Markus www.two-pi.com