comp.dsp | Entropy before and after Huffmann coding vs. channel capacity| page 2

Am 03.07.2015 um 19:41 schrieb qfran:
> Dear all,
> please help me to interpret some strange(?) behaviour of Huffmann coding
> and entropy. Let me explain it at an example:
> Consider a source emitting four symbols x1-x4 with probabilities
> P(x1)=1/3, P(x2)=1/3, P(x3)=1/6, P(x4)=1/6.
> Calculating the entropy with H=sum(P(xi)*log2(P(xi)) gives us: H=1.918
> Bit/symbol. So far so good.
>
> We perform Huffmann-Coding, arranging the symbols in the order of their
> decreasing probabilities x1 - x4. We aggregate x3 and x4, having now nodes
> with probabilities all to be 1/3. In the sequel we have to possibilities:
> 1. We first aggregate x2 with the (x3+x4) and get the Huffmann-code: x1 =
> 0, x2=10, x3=110, x4=111.
> 2. We first aggregate x1 with x2 and get the Huffmann-code x1=00, x2=01,
> x3=10, x4=11.

Actually, that's not a Huffman code (since that starts with the two 
smallest probabilities) but it is a prefix-free code of the same average 
codelength.

> Both codes have an average code length:
> L1=1*1/3+2*1/3+3*1/6+3*1/6 = 2
> L2 = 2*1/3 + 2/1/3 + 2*1/6 + 2*1/6 = 2
> So it's the same, which was to be expected and which tell's us that we
> could choose either one, it doesn't matter.
> The fact that the Huffmann-codes don't gain anything with respect to
> natural encoding is of no importance here.
>
> Now let's say that we want to transmit the data over a binary channel.
>  From the point of view of the channel, we see a transparent bit stream of
> 0 and 1. The channel doesn't know that some Huffmann coding took place and
> he just looks at the data to calculate the a-priori-probabilities for the
> 0 and 1.
>
> In the case of the first Huffmann code, the 0 comes with the probability:
> P(0) = 1/3 + 1/3*1/2 + 1/6*1/3 = 10/18 = 5/9 = 0.5556
> In the case of the second Huffmann code, the 0 comes with probability:
> P(0) = 1/3*2/2 + 1/3*1/2 + 1/6*1/2 = 7/12 = 0.5833
> They are different! Ok, one sees from the codewords that they are
> different. But this means, that the two Huffmann codes are not
> equivalent!?

Actually, the problem is here that you re-interpret now the output of 
the Huffman code that come from a model of four symbols x1,x2,x3,x4 as a 
*random source* of zero order with two binary symbols 0 and 1, and of 
course, you get a different entropy. The entropy is different because 
your model is different. The first model had the alphabet x1,x2,x3,x4, 
and this model has the alphabet 0,1. However, the second model surely 
does not reflect reality but is a simplification thereof as we *know* 
how the zeros and ones were generated in first place.

> The corresponding entropies (if we consider the two bits 0 and 1 with
> their above calculated probabilities) are:
> after coding with Huffman code 1:  H=0.9799 Bit/bit
> after coding with Huffmann code 2: H=0.9911 Bit/bit

Yes, but entropy relative to a different model.

> If we take the entropy of the source and divide it by two (average
> codeword length of the Huffmann code), we get:  H=0.9591Bit/bit. Smaller
> than both. Of course, since I can't loose information.

> But after Huffmann coding it seems that I would have gained information.
> Yes, Huffmann code was not able in this case to remove some redundancy.
> But why part (and in the two cases different parts) of the redundancy has
> turned apparently to information?

It did not turn into information. Remember, entropy measures the minimum 
average amount of binary questions to guess the value of a symbol. For 
the original source, we know as much as having an alphabet of four 
symbols with given probabilities. Now you only look at the channel and 
say, "hey, I've only binary symbols here", and I forgot where they came 
from. Now, if you now need to guess whether we get a zero or a one, you 
ignore some part of the information you already had before (namely that 
this is a source of four symbols), and hence, on average, you need to 
ask more questions. It's quite natural!

> Shannon tells me, that if my channel capacity is larger than the entropy
> of the source, I will be able to find a code to transmit data over the
> channel without losses (of course, in the ideal case).

In the case of a noiseless channel, yes.

> Which is now the
> entropy to compare with my channel capacity?

That depends on the model, of course. In the first case, you already 
have the "side information" on the source, so the decoder knows that we 
have a source (on the sender side) that consists of three symbols. 
*That* source requires a lower channel capacity than the source were you 
just forgot this information. In other words, that you *seem* to require 
a larger channel capacity in case you just look at the raw binary values 
is just a consequence that you do not know the source well enough and 
cannot make use of this information.

Another example?

Consider a source that consists of a 512x512 image. What is the channel 
capacity for that? If we model such sources as if each pixel is the 
result of a zero-order random source, the required channel capacity is 
enormous. (eight bits per pixel, one bit per symbol).

However, if we know that the image stems from the JPEG AIC1 test corpus 
(~30 images), we can indeed represent each image by five bits! Whoa, 
what a compression!

Knowledge on the source gives better models, and that means better 
compression. What the channel capacity need to be for a lossless 
compression depends on the model, of course, and your knowledge on the 
source.

> The original one of the
> source (because there is not more than that information available)? Or the
> apparent entropy after the Huffmann encoding, because it's the data after
> the Huffmann code, which enters the channel?

The original one, in the absence of an even better model. Or rather, the 
channel capacity need to be *at least* 1.918 bits per symbol. Probably 
even less if we get a refined model. That the Huffman code requires more 
is a result of a sub-optimal code (arithmetic coding would be close to 
the ideal). That you seem to require even more if you just look at the 
streams of zeros and ones is just a result that you use now an even 
weaker model, namely that of a binary zero-order source.


> And in the later case, it would mean, that average codeword length of the
> Huffmann code is of no interest at all (although this is the usual measure
> to compare source codes), but more important would be the probabily
> distribution of the 0 and 1 in the resulting bitstream!

Neither is of interest as far as the channel capacity is concerned. The 
entropy of the source is of interest. The Huffman code is a non-ideal 
code, so you spend on average more bits than required.

Shannon did not say that the Huffman code is the right answer to 
transmit this sequence over a limited capacity channel, just that there 
is an n-block code with n large enough that does. So in essence, you 
would need to merge n symbols of your source into a meta-symbol, compute 
the probabilities of such n-blocks, find for that the ideal encoding, 
and send n to infinity. That gives you a code that lies just an epsilon 
above the channel capacity, with epsilon as small as you like it to be, 
if you just allow n to be large enough.

In reality, use arithmetic coding.

Greetings,
	Thomas
Entropy before and after Huffmann coding vs. channel capacity

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group