Thanks for the input, everyone. I'll take some time to digest what's been
written.
I'm an undergrad student, and even though I've finished implementing the
algorithms in this paper, I am not required to do every single robustness
test, even though I'd like to. The other stuff is easier (filtering,
resampling, noise, etc), so maybe I'll finish implementing all those first
and come back to this.
Thanks again!
Reply by John O'Flaherty●August 5, 20082008-08-05
On Tue, 05 Aug 2008 08:23:28 -0500, John O'Flaherty
<quiasmox@yeeha.com> wrote:
>On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com>
>wrote:
>
>>
>>> Doesn't the paper describing this compression give a mathematical
>>>specification of what is meant?
>>
>>
>>Nope, the 3 conditions I listed are all that it gives. It's just one out
>>of a list of different signal degradations, simulated to test for
>>robustness.
>>
>>In fact, if you want, you can take a look at the paper here:
>>
>>http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf
>>
>>Take a look at the beginning of section 4.4 (page 5 of the .pdf).
>
> I don't think the interpretation you gave is that intended, because
>it is such an unlikely thing to happen to a signal as a natural
>degradation. That leaves either instantaneous compression as I
>described, or intentional gain compression as described in Richard
>Dobson's post. I think what is intended was more likely the
>instantaneous version, because no attack/decay parameters are given
>for a gain shift, and the paper seems to be trying to give a complete
>description of what was done. I would also reject a dB interpretation
>of the numbers in the table, since they are expressed as ratios, and
>dB are already ratios.
> In my opinion, they intend a gain curve that rises from a base value
>of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and
>changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean
>calculating the net gain by seeing where you are on the curve, as I
>described in my post.
>For example,
> The threshold points in voltage are
>-46.4 dB : 0.00478 V
>-28.6 dB : 0.03715 V
>
>For an input sample at -20 dB, the voltage would be +/- 0.1 V. The
>degraded sample would have an instantaneous amplitude of
>(.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111
>= +/- 0.03395 V.
>
>That's my opinion, at any rate. I see that the authors' email
>addresses are in the paper. Since the paper itself is in English,
>though they appear to be Dutch, you might consider emailing them to
>ask exactly what they meant.
The others who pointed out the need for a reference are correct. The
calculations I gave assume dBV, quite possibly an unwarranted
assumption. Apart from that, the method may be right.
--
John
Reply by Richard Dobson●August 5, 20082008-08-05
mboigner wrote:
>> Is this what you had in mind for the reference point?
>
> No. I will try to explain again:
>
> 1) What is the reference (in the paper) when they speak about dB values?
> dB is always relative, for example if you speak from gain of a system your
> reference is the input of the system. If we speak from dBmicrovolt (dBu)
> the reference is 1uV. I think that they speak from dB_full_scale which
> means that their reference is 1 (= full scale). So when they speak from
> -46.4dB => 0.0048 linear (if rms).
>
> 2) I had a short look through the paper and saw that a reference point IS
> missing. To print your IO curve you need to know at exactly ONE input power
> the output power your system deliveres (or you know the gain, which is
> output power-input power)
All audio dB values are by convention relative to 0dB=digital full-scale
(0dBFS). Hence all working dB values are negative. The only exception
to this is in pro mixing systems where they like to keep some
quasi-analog "headroom" available, and define (say) -18dBFS as nominal
0dB. That seems unlikely in this case.
The dB values (and ratios) listed in that paper seem more than a little
arbitrary - they have no special audio significance that I can identify.
I see that they list Winamp in the references; I would not be surprised
if they simply ran some audio through a Winamp plugin, fiddled a bit and
just read off whatever values the parameters displayed. There is at
least one winamp compressor plugin that does offer a dual-knee model so
they can, say, boost low sigs as well as reduce high-level ones. It is a
relatively small component of the paper as a whole, but IMO they are
nevertheless remiss in not giving more comprehensive details of what
they used. Even the question of whether they used (if available) a"soft
knee" at the transition points is surely relevant to the topic.
Richard Dobson
Reply by mboigner●August 5, 20082008-08-05
>
>Is this what you had in mind for the reference point?
No. I will try to explain again:
1) What is the reference (in the paper) when they speak about dB values?
dB is always relative, for example if you speak from gain of a system your
reference is the input of the system. If we speak from dBmicrovolt (dBu)
the reference is 1uV. I think that they speak from dB_full_scale which
means that their reference is 1 (= full scale). So when they speak from
-46.4dB => 0.0048 linear (if rms).
2) I had a short look through the paper and saw that a reference point IS
missing. To print your IO curve you need to know at exactly ONE input power
the output power your system deliveres (or you know the gain, which is
output power-input power)
>>With that you can calculate your output powers.
>>For example the output at your other threshold point t2_input =
-28.6dB:
>>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
>
>At this point I'm pretty much lost. I don't see how or why you are
>dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don't
understand
>how you can mix it with the decibel values.
3) To the ratios:
Compression or Expansion ratio = (output power 2 - output power 1) /
(input power 2 - input power 1), where input power 2 > input power 1.
So if a compression or expansion ratio in a certain region of your input
power is given, the gradient k of the linear curve y = k * x + d is given!
In the article your k is given for different regions of your IO curve but
NOT the d (which is the offset). As on the treshold points the 2 different
straight lines of the left and right region near the threshold have be
connected you need only a d for one of the 3 regions, but this is not
given.
So my bad example defined some d for one of the two threshold points.
Maybe that was somehow confusing.
I think, but thats really some guess, as the maximum amplitude is 1 maybe
you could choose for input=0dB(fullscale) the output = 0dB(fullscale). I
think that would make sense, but you have to decide if that is really
usefull for your problem.
Please also note Richards post about the estimation of the power levels.
If that should be a audio compressor you would need this.
I hope this post helps more,
Regards,
Markus
www.two-pi.com
Reply by John O'Flaherty●August 5, 20082008-08-05
On Tue, 05 Aug 2008 03:20:25 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:
>
>> Doesn't the paper describing this compression give a mathematical
>>specification of what is meant?
>
>
>Nope, the 3 conditions I listed are all that it gives. It's just one out
>of a list of different signal degradations, simulated to test for
>robustness.
>
>In fact, if you want, you can take a look at the paper here:
>
>http://www.cs.northwestern.edu/~pardo/courses/eecs352/papers/audio%20fingerprint%20-%20haitsma.pdf
>
>Take a look at the beginning of section 4.4 (page 5 of the .pdf).
I don't think the interpretation you gave is that intended, because
it is such an unlikely thing to happen to a signal as a natural
degradation. That leaves either instantaneous compression as I
described, or intentional gain compression as described in Richard
Dobson's post. I think what is intended was more likely the
instantaneous version, because no attack/decay parameters are given
for a gain shift, and the paper seems to be trying to give a complete
description of what was done. I would also reject a dB interpretation
of the numbers in the table, since they are expressed as ratios, and
dB are already ratios.
In my opinion, they intend a gain curve that rises from a base value
of 1.61 at -oo, changes slope to 1/1.73 = 0.578 at -46.4 dB, and
changes slope again to 1/8.94 = 0.111 at -28.6 dB. That would mean
calculating the net gain by seeing where you are on the curve, as I
described in my post.
For example,
The threshold points in voltage are
-46.4 dB : 0.00478 V
-28.6 dB : 0.03715 V
For an input sample at -20 dB, the voltage would be +/- 0.1 V. The
degraded sample would have an instantaneous amplitude of
(.00478 * 1.73) + (0.03715 - .00478) * 0.578 + (0.1 - .03715) * 0.111
= +/- 0.03395 V.
That's my opinion, at any rate. I see that the authors' email
addresses are in the paper. Since the paper itself is in English,
though they appear to be Dutch, you might consider emailing them to
ask exactly what they meant.
--
John
Reply by bogfrog●August 5, 20082008-08-05
>Hello,
>
>What´s missing is a reference point.
Thank you for the reply. I'm not sure I follow, though. Let me comment
on some of what you wrote:
>For example your output @ one of
>your threshold points t1_input=-46.4dB (<- which reference?) is
>t1_output=-46.4dB, which would mean 0dB gain.
I'm not sure I understand this. At -46.4dB the ratio is 1.73:1, so for
there to be a 0dB gain, I compute the following:
1.73:1 ratio => 1/1.73 fraction = -2.38 dB
So we want: -46.4dB -2.38dB + REF = 0dB => REF = 48.78dB
Is this what you had in mind for the reference point?
>With that you can calculate your output powers.
>For example the output at your other threshold point t2_input = -28.6dB:
>t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
At this point I'm pretty much lost. I don't see how or why you are
dividing -28.6+46.4 by 1.73. 1/1.73 is a fraction, so I don't understand
how you can mix it with the decibel values.
I'm confused, but I'll read your post again in the morning, and hopefully
it will make better sense. :)
Reply by Richard Dobson●August 5, 20082008-08-05
bogfrog wrote:
> I am working on audio, samples are 16 bits, values ranging from -1 to +1
> for each sample. I want to implement what a paper I am reading calls
> "amplitude compression." These are the 3 specifications given for the
> amplitude compression I want to implement:
>
> (1) Compression ratio of 8.94:1 for |A| >= -28.6 dB
> (2) Compression ratio of 1.73:1 for -46.4 dB < |A| < -28.6 dB
> (3) Compression ratio of 1:1.61 for |A| <= -46.4 dB
>
> So my question is, what exactly does this mean?
>
See, for example, http://en.wikipedia.org/wiki/Audio_level_compression
Note that compression of this kind is not applied sample by sample, but
with respect to the detected overall amplitude envelope, using a window
that might be 5-15msecs long, or much longer (e.g. up to 300msecs for an
rms tracker or simple AGC). The task is to reduce the overall dynamic
range of the signal by passing lower-level sounds more-or-less
unchanged, while reducing higher-level signals pro rata - like very
rapid "fader riding" on a mixing desk. Audio compressors have 'attack"
and 'release' parameters, which determine, for example, hoq quickly the
compressor acts on a new transient (drum, guitar pluck, etc), and how
quickly the level recovers when the input falls below the threshold. A
delay may be applied to the input signal path so that a hard attack
transient can be acted on as a whole at the outset. A simple example in
broadcast is the "ducker", which drops the level of a music track
automatically when someone speaks over it.
In short: signal->envelope_detector->level_control->output.
A complementary effect is the expander/gate, which reduces the level of
quiet material, primarily to remove underlying system noise in the gaps
between sound events.
Such a process applied sample by sample is what computer musicians call
"waveshaping", where (typically) an input sinusoid is warped by a
transfer function into some completely other periodic shape.
This is all distinct from audio compression by a-law, mu-law etc, which
is used per sample to obtain a greater dynamic range from a small sample
wordsize - e.g. 8 bits (associated mainly with file formats, for which
a-law and mu-law standards are defined). Not really required for 16bit
and beyond.
Lots more to it, of course; may well be worth asking on the musicdsp
list - they also have a code archive.
Richard Dobson
Reply by bogfrog●August 5, 20082008-08-05
> Doesn't the paper describing this compression give a mathematical
>specification of what is meant?
Hello,
What´s missing is a reference point. For example your output @ one of
your threshold points t1_input=-46.4dB (<- which reference?) is
t1_output=-46.4dB, which would mean 0dB gain.
With that you can calculate your output powers.
For example the output at your other threshold point t2_input = -28.6dB:
t2_output = (-28.6+46.4)/1.73 -46.4 = -36.1110.
When you put t1 and t2 in a graph and draw a line between them you have
your behaviour in that region (you called it (2)). There the gain decreases
with increasing input (=compression)
In your region (3)you have expansion (increasing gain with increasing
input power). Caclulate there a point t3_input = t1_input -10 = -56.4;
t3_output = -46.4 + (-10) * 1.61 = -62.5;
Draw a line from t1 trough t3. There you see your behaviour in expansion
case.
Last but not least your region (1) which has stronger compression than
(2).
Calculate a t4_input = -28.6+10=-18.6; t4_output = -28.6 + 10 /8.94 =
-27.4814. Draw again a line from t2 to t4 -> Strong compression finished.
Now you have the shape of your IO curve - You will see that it is
one-to-one mapped input to output and reversible.
If you have different gains at a certain input point you only have to move
this curve up or down in y direction (add/substract an offset).
If you have plotted the curve described above you will have at input -46.4
an output of -46.4
Hope that helps,
markus
www.two-pi.com
Reply by John O'Flaherty●August 5, 20082008-08-05
On Tue, 05 Aug 2008 00:42:53 -0500, "bogfrog" <aj00mcgraw@gmail.com>
wrote:
>>It seems that if it was meant the way you interpret it, then for some
>>signals, there would be no way to know which class they came from. For
>>example, if a compressed sample is 7e-4, was it originally -22 db
>>(6e-3), or -29 db (1e-3)?
>
>Yes, that is true.
>
>But the application does not intend to decompress the samples. It is
>trying to simulate signal degradations, so what you've pointed out would
>not be a problem if my interpretation were correct.
>
>With that in mind, do you think my interpretation is correct? How would
>you interpret it?
Maybe it means that the amount of the signal over the threshold is
treated differently than the amount under the threshold. Taking a
simpler case, if you had no compression up to 0 dB, and then had a
compression ratio of 10:1 above 0 dB, it might mean that a sample at 1
dB (1.26 V) would be compressed as 1 + .26/10 = 1.026 V. Then the
output would still be a reversible function of the input for all
values.
Doesn't the paper describing this compression give a mathematical
specification of what is meant?
--
John