Hi all, Could someone please help me figure out how to use rate-distortion optimisation? I'm working on a video codec at the moment and, from what I understand, the modes on encoding are decided on the sum-of-absolute-difference (SAD) value, which is effectively a measure of the distortion, and the number of bits required to encode the residual. These two values are summed, one being weighted by a lambda, and the best mode is decided based on that which minimises the result. The reason, I am told, the summation of distortion and bit-rate is used is so the encoder can make a trade-off between image quality and bit-rate. So, for example, if the encoder encountered a highly complex macroblock that would suffer from a great deal of distortion if it used the same quantisation parameter as before, it could lower the quanisation value to allow for a trade of in distortion, even though it would require more bits to encode. The question I have is that I'm not really sure where this lambda value comes from. I've read a couple of papers on the codec in question (H.264) and they suggest a empirically derived formula dependent on the quantisation parameter for calculating it. The difficulty I am having in understanding this is how can one calculate the lambda value if the quantisation parameter is not fixed and on what basis is the quantisation parameter changed based on the outcome of the encoding process. I apologise for the admittadly vague question, but I'm really having difficulty understanding exactly what rate-distortion optimisation in video codecs is trying to achieve. Thanks, Stephen Henry
Rate-distortion optimisation
Started by ●September 22, 2004
Reply by ●September 27, 20042004-09-27
Hi Stephen. stephen> Hi all, Could someone please help me figure out how to use stephen> rate-distortion optimisation? stephen> I'm working on a video codec at the moment and, from what I stephen> understand, the modes on encoding are decided on the stephen> sum-of-absolute-difference (SAD) value, which is effectively stephen> a measure of the distortion, and the number of bits required stephen> to encode the residual. These two values are summed, one stephen> being weighted by a lambda, and the best mode is decided stephen> based on that which minimises the result. stephen> The reason, I am told, the summation of distortion and stephen> bit-rate is used is so the encoder can make a trade-off stephen> between image quality and bit-rate. So, for example, if the stephen> encoder encountered a highly complex macroblock that would stephen> suffer from a great deal of distortion if it used the same stephen> quantisation parameter as before, it could lower the stephen> quanisation value to allow for a trade of in distortion, even stephen> though it would require more bits to encode. stephen> The question I have is that I'm not really sure where this stephen> lambda value comes from. I've read a couple of papers on the stephen> codec in question (H.264) and they suggest a empirically stephen> derived formula dependent on the quantisation parameter for stephen> calculating it. The difficulty I am having in understanding stephen> this is how can one calculate the lambda value if the stephen> quantisation parameter is not fixed and on what basis is the stephen> quantisation parameter changed based on the outcome of the stephen> encoding process. stephen> I apologise for the admittadly vague question, but I'm really stephen> having difficulty understanding exactly what rate-distortion stephen> optimisation in video codecs is trying to achieve. The problem of minimizing the distortion given a bit budget is a constrained optimization problem and can be solved efficiently using the Lagrange multiplier method. What you would do in practice is that you sweep over lambda until your rate-constraint is met. Have a look at Yair Shoham, Allen Gersho; Efficient bit allocation for an arbitrary set of quantizers, IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 1445 - 1453, September 1988. Or if you're looking for an easier read Paolo Prandoni, Martin Vetterli; R/D Optimal linear prediction, IEEE Trans. Speech Audio Processing, vol. 8, pp. 646 - 655, November 2000. Also, Antonio Ortage has done some work on this in videa/image coding. -- /Mads (http://kom.aau.dk/~mgc)