DSPRelated.com
Forums

Rate-distortion optimisation

Started by stephen henry September 22, 2004
Hi all,

Could someone please help me figure out how to use rate-distortion
optimisation?

I'm working on a video codec at the moment and, from what I
understand, the modes on encoding are decided on the
sum-of-absolute-difference (SAD) value, which is effectively a measure
of the distortion, and the number of bits required to encode the
residual. These two values are summed, one being weighted by a lambda,
and the best mode is decided based on that which minimises the result.

The reason, I am told, the summation of distortion and bit-rate is
used is so the encoder can make a trade-off between image quality and
bit-rate. So, for example, if the encoder encountered a highly complex
macroblock that would suffer from a great deal of distortion if it
used the same quantisation parameter as before, it could lower the
quanisation value to allow for a trade of in distortion, even though
it would require more bits to encode.

The question I have is that I'm not really sure where this lambda
value comes from. I've read a couple of papers on the codec in
question (H.264) and they suggest a empirically derived formula
dependent on the quantisation parameter for calculating it. The
difficulty I am having in understanding this is how can one calculate
the lambda value if the quantisation parameter is not fixed and on
what basis is the quantisation parameter changed based on the outcome
of the encoding process.

I apologise for the admittadly vague question, but I'm really having
difficulty understanding exactly what rate-distortion optimisation in
video codecs is trying to achieve.

Thanks,

Stephen Henry
Hi Stephen.

stephen> Hi all, Could someone please help me figure out how to use
stephen> rate-distortion optimisation?

stephen> I'm working on a video codec at the moment and, from what I
stephen> understand, the modes on encoding are decided on the
stephen> sum-of-absolute-difference (SAD) value, which is effectively
stephen> a measure of the distortion, and the number of bits required
stephen> to encode the residual. These two values are summed, one
stephen> being weighted by a lambda, and the best mode is decided
stephen> based on that which minimises the result.

stephen> The reason, I am told, the summation of distortion and
stephen> bit-rate is used is so the encoder can make a trade-off
stephen> between image quality and bit-rate. So, for example, if the
stephen> encoder encountered a highly complex macroblock that would
stephen> suffer from a great deal of distortion if it used the same
stephen> quantisation parameter as before, it could lower the
stephen> quanisation value to allow for a trade of in distortion, even
stephen> though it would require more bits to encode.

stephen> The question I have is that I'm not really sure where this
stephen> lambda value comes from. I've read a couple of papers on the
stephen> codec in question (H.264) and they suggest a empirically
stephen> derived formula dependent on the quantisation parameter for
stephen> calculating it. The difficulty I am having in understanding
stephen> this is how can one calculate the lambda value if the
stephen> quantisation parameter is not fixed and on what basis is the
stephen> quantisation parameter changed based on the outcome of the
stephen> encoding process.

stephen> I apologise for the admittadly vague question, but I'm really
stephen> having difficulty understanding exactly what rate-distortion
stephen> optimisation in video codecs is trying to achieve.

The problem of minimizing the distortion given a bit budget is a
constrained optimization problem and can be solved efficiently using
the Lagrange multiplier method. What you would do in practice is that
you sweep over lambda until your rate-constraint is met.

Have a look at 

Yair Shoham, Allen Gersho; Efficient bit allocation for an arbitrary
set of quantizers, IEEE Trans. Acoust., Speech, Signal Processing,
vol. 36, pp. 1445 - 1453, September 1988.

Or if you're looking for an easier read

Paolo Prandoni, Martin Vetterli; R/D Optimal linear prediction, IEEE
Trans. Speech Audio Processing, vol. 8, pp. 646 - 655, November 2000. 

Also, Antonio Ortage has done some work on this in videa/image
coding. 

-- 
/Mads (http://kom.aau.dk/~mgc)