Reply by bogfrog August 5, 20082008-08-05
Thanks for the input, everyone.  I'll take some time to digest what's been
written.

I'm an undergrad student, and even though I've finished implementing the
algorithms in this paper, I am not required to do every single robustness
test, even though I'd like to.  The other stuff is easier (filtering,
resampling, noise, etc), so maybe I'll finish implementing all those first
and come back to this.

Thanks again!




Reply by John O'Flaherty August 5, 20082008-08-05
On Tue, 05 Aug 2008 08:23:28 -0500, John O'Flaherty
<quiasmox@yeeha.com> wrote:

>On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com> >wrote: > >> >>> Doesn't the paper describing this compression give a mathematical >>>specification of what is meant? >> >> >>Nope, the 3 conditions I listed are all that it gives. It's just one out >>of a list of different signal degradations, simulated to test for >>robustness. >> >>In fact, if you want, you can take a look at the paper here: >> >>http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf >> >>Take a look at the beginning of section 4.4 (page 5 of the .pdf). > > I don't think the interpretation you gave is that intended, because >it is such an unlikely thing to happen to a signal as a natural >degradation. That leaves either instantaneous compression as I >described, or intentional gain compression as described in Richard >Dobson's post. I think what is intended was more likely the >instantaneous version, because no attack/decay parameters are given >for a gain shift, and the paper seems to be trying to give a complete >description of what was done. I would also reject a dB interpretation >of the numbers in the table, since they are expressed as ratios, and >dB are already ratios. > In my opinion, they intend a gain curve that rises from a base value >of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and >changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean >calculating the net gain by seeing where you are on the curve, as I >described in my post. >For example, > The threshold points in voltage are >-46.4 dB : 0.00478 V >-28.6 dB : 0.03715 V > >For an input sample at -20 dB, the voltage would be +/- 0.1 V. The >degraded sample would have an instantaneous amplitude of >(.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111 >= +/- 0.03395 V. > >That's my opinion, at any rate. I see that the authors' email >addresses are in the paper. Since the paper itself is in English, >though they appear to be Dutch, you might consider emailing them to >ask exactly what they meant.
The others who pointed out the need for a reference are correct. The calculations I gave assume dBV, quite possibly an unwarranted assumption. Apart from that, the method may be right. -- John
Reply by Richard Dobson August 5, 20082008-08-05
mboigner wrote:
>> Is this what you had in mind for the reference point? > > No. I will try to explain again: > > 1) What is the reference (in the paper) when they speak about dB values? > dB is always relative, for example if you speak from gain of a system your > reference is the input of the system. If we speak from dBmicrovolt (dBu) > the reference is 1uV. I think that they speak from dB_full_scale which > means that their reference is 1 (= full scale). So when they speak from > -46.4dB => 0.0048 linear (if rms). > > 2) I had a short look through the paper and saw that a reference point IS > missing. To print your IO curve you need to know at exactly ONE input power > the output power your system deliveres (or you know the gain, which is > output power-input power)
All audio dB values are by convention relative to 0dB=digital full-scale (0dBFS). Hence all working dB values are negative. The only exception to this is in pro mixing systems where they like to keep some quasi-analog "headroom" available, and define (say) -18dBFS as nominal 0dB. That seems unlikely in this case. The dB values (and ratios) listed in that paper seem more than a little arbitrary - they have no special audio significance that I can identify. I see that they list Winamp in the references; I would not be surprised if they simply ran some audio through a Winamp plugin, fiddled a bit and just read off whatever values the parameters displayed. There is at least one winamp compressor plugin that does offer a dual-knee model so they can, say, boost low sigs as well as reduce high-level ones. It is a relatively small component of the paper as a whole, but IMO they are nevertheless remiss in not giving more comprehensive details of what they used. Even the question of whether they used (if available) a"soft knee" at the transition points is surely relevant to the topic. Richard Dobson
Reply by mboigner August 5, 20082008-08-05
> >Is this what you had in mind for the reference point?
No. I will try to explain again: 1) What is the reference (in the paper) when they speak about dB values? dB is always relative, for example if you speak from gain of a system your reference is the input of the system. If we speak from dBmicrovolt (dBu) the reference is 1uV. I think that they speak from dB_full_scale which means that their reference is 1 (= full scale). So when they speak from -46.4dB => 0.0048 linear (if rms). 2) I had a short look through the paper and saw that a reference point IS missing. To print your IO curve you need to know at exactly ONE input power the output power your system deliveres (or you know the gain, which is output power-input power)
>>With that you can calculate your output powers. >>For example the output at your other threshold point t2_input =
-28.6dB:
>>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110. > >At this point I'm pretty much lost. I don't see how or why you are >dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don't
understand
>how you can mix it with the decibel values.
3) To the ratios: Compression or Expansion ratio = (output power 2 - output power 1) / (input power 2 - input power 1), where input power 2 > input power 1. So if a compression or expansion ratio in a certain region of your input power is given, the gradient k of the linear curve y = k * x + d is given! In the article your k is given for different regions of your IO curve but NOT the d (which is the offset). As on the treshold points the 2 different straight lines of the left and right region near the threshold have be connected you need only a d for one of the 3 regions, but this is not given. So my bad example defined some d for one of the two threshold points. Maybe that was somehow confusing. I think, but thats really some guess, as the maximum amplitude is 1 maybe you could choose for input=0dB(fullscale) the output = 0dB(fullscale). I think that would make sense, but you have to decide if that is really usefull for your problem. Please also note Richards post about the estimation of the power levels. If that should be a audio compressor you would need this. I hope this post helps more, Regards, Markus www.two-pi.com
Reply by John O'Flaherty August 5, 20082008-08-05
On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:

> >> Doesn't the paper describing this compression give a mathematical >>specification of what is meant? > > >Nope, the 3 conditions I listed are all that it gives. It's just one out >of a list of different signal degradations, simulated to test for >robustness. > >In fact, if you want, you can take a look at the paper here: > >http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf > >Take a look at the beginning of section 4.4 (page 5 of the .pdf).
I don't think the interpretation you gave is that intended, because it is such an unlikely thing to happen to a signal as a natural degradation. That leaves either instantaneous compression as I described, or intentional gain compression as described in Richard Dobson's post. I think what is intended was more likely the instantaneous version, because no attack/decay parameters are given for a gain shift, and the paper seems to be trying to give a complete description of what was done. I would also reject a dB interpretation of the numbers in the table, since they are expressed as ratios, and dB are already ratios. In my opinion, they intend a gain curve that rises from a base value of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean calculating the net gain by seeing where you are on the curve, as I described in my post. For example, The threshold points in voltage are -46.4 dB : 0.00478 V -28.6 dB : 0.03715 V For an input sample at -20 dB, the voltage would be +/- 0.1 V. The degraded sample would have an instantaneous amplitude of (.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111 = +/- 0.03395 V. That's my opinion, at any rate. I see that the authors' email addresses are in the paper. Since the paper itself is in English, though they appear to be Dutch, you might consider emailing them to ask exactly what they meant. -- John
Reply by bogfrog August 5, 20082008-08-05
>Hello, > >What&acute;s missing is a reference point.
Thank you for the reply. I'm not sure I follow, though. Let me comment on some of what you wrote:
>For example your output @ one of >your threshold points t1_input=-46.4dB (<- which reference?) is >t1_output=-46.4dB, which would mean 0dB gain.
I'm not sure I understand this. At -46.4dB the ratio is 1.73:1, so for there to be a 0dB gain, I compute the following: 1.73:1 ratio => 1/1.73 fraction = -2.38 dB So we want: -46.4dB -2.38dB + REF = 0dB => REF = 48.78dB Is this what you had in mind for the reference point?
>With that you can calculate your output powers. >For example the output at your other threshold point t2_input = -28.6dB: >t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
At this point I'm pretty much lost. I don't see how or why you are dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don't understand how you can mix it with the decibel values. I'm confused, but I'll read your post again in the morning, and hopefully it will make better sense. :)
Reply by Richard Dobson August 5, 20082008-08-05
bogfrog wrote:
> I am working on audio, samples are 16 bits, values ranging from -1 to +1 > for each sample. I want to implement what a paper I am reading calls > "amplitude compression." These are the 3 specifications given for the > amplitude compression I want to implement: > > (1) Compression ratio of 8.94:1 for |A| >= -28.6 dB > (2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB > (3) Compression ratio of 1:1.61 for |A| <= -46.4 dB > > So my question is, what exactly does this mean? >
See, for example, http://en.wikipedia.org/wiki/Audio_level_compression Note that compression of this kind is not applied sample by sample, but with respect to the detected overall amplitude envelope, using a window that might be 5-15msecs long, or much longer (e.g. up to 300msecs for an rms tracker or simple AGC). The task is to reduce the overall dynamic range of the signal by passing lower-level sounds more-or-less unchanged, while reducing higher-level signals pro rata - like very rapid "fader riding" on a mixing desk. Audio compressors have 'attack" and 'release' parameters, which determine, for example, hoq quickly the compressor acts on a new transient (drum, guitar pluck, etc), and how quickly the level recovers when the input falls below the threshold. A delay may be applied to the input signal path so that a hard attack transient can be acted on as a whole at the outset. A simple example in broadcast is the "ducker", which drops the level of a music track automatically when someone speaks over it. In short: signal->envelope_detector->level_control->output. A complementary effect is the expander/gate, which reduces the level of quiet material, primarily to remove underlying system noise in the gaps between sound events. Such a process applied sample by sample is what computer musicians call "waveshaping", where (typically) an input sinusoid is warped by a transfer function into some completely other periodic shape. This is all distinct from audio compression by a-law, mu-law etc, which is used per sample to obtain a greater dynamic range from a small sample wordsize - e.g. 8 bits (associated mainly with file formats, for which a-law and mu-law standards are defined). Not really required for 16bit and beyond. Lots more to it, of course; may well be worth asking on the musicdsp list - they also have a code archive. Richard Dobson
Reply by bogfrog August 5, 20082008-08-05
> Doesn't the paper describing this compression give a mathematical >specification of what is meant?
Nope, the 3 conditions I listed are all that it gives. It's just one out of a list of different signal degradations, simulated to test for robustness. In fact, if you want, you can take a look at the paper here: http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf Take a look at the beginning of section 4.4 (page 5 of the .pdf).
Reply by mboigner August 5, 20082008-08-05
Hello,

What&acute;s missing is a reference point. For example your output @ one of
your threshold points t1_input=-46.4dB (<- which reference?) is
t1_output=-46.4dB, which would mean 0dB gain.
With that you can calculate your output powers.
For example the output at your other threshold point t2_input = -28.6dB:
t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
When you put t1 and t2 in a graph and draw a line between them you have
your behaviour in that region (you called it (2)). There the gain decreases
with increasing input (=compression)
In your region (3)you have expansion (increasing gain with increasing
input power). Caclulate there a point t3_input = t1_input -10 = -56.4;
t3_output = -46.4 + (-10) * 1.61 = -62.5; 
Draw a line from t1 trough t3. There you see your behaviour in expansion
case.
Last but not least your region (1) which has stronger compression than
(2).
Calculate a t4_input = -28.6+10=-18.6; t4_output = -28.6 + 10 /8.94 = 
-27.4814. Draw again a line from t2 to t4 -> Strong compression finished.

Now you have the shape of your IO curve - You will see that it is
one-to-one mapped input to output and reversible.
If you have different gains at a certain input point you only have to move
this curve up or down in y direction (add/substract an offset).
If you have plotted the curve described above you will have at input -46.4
an output of -46.4

Hope that helps,
markus

www.two-pi.com

Reply by John O'Flaherty August 5, 20082008-08-05
On Tue, 05 Aug 2008 00:42:53 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:

>>It seems that if it was meant the way you interpret it, then for some >>signals, there would be no way to know which class they came from. For >>example, if a compressed sample is 7e-4, was it originally -22 db >>(6e-3), or -29 db (1e-3)? > >Yes, that is true. > >But the application does not intend to decompress the samples. It is >trying to simulate signal degradations, so what you've pointed out would >not be a problem if my interpretation were correct. > >With that in mind, do you think my interpretation is correct? How would >you interpret it?
Maybe it means that the amount of the signal over the threshold is treated differently than the amount under the threshold. Taking a simpler case, if you had no compression up to 0 dB, and then had a compression ratio of 10:1 above 0 dB, it might mean that a sample at 1 dB (1.26 V) would be compressed as 1 + .26/10 = 1.026 V. Then the output would still be a reversible function of the input for all values. Doesn't the paper describing this compression give a mathematical specification of what is meant? -- John