DSPRelated.com
Forums

Entropy before and after Huffmann coding vs. channel capacity

Started by qfran July 3, 2015
Am 03.07.2015 um 19:41 schrieb qfran:
> Dear all, > please help me to interpret some strange(?) behaviour of Huffmann coding > and entropy. Let me explain it at an example: > Consider a source emitting four symbols x1-x4 with probabilities > P(x1)=1/3, P(x2)=1/3, P(x3)=1/6, P(x4)=1/6. > Calculating the entropy with H=sum(P(xi)*log2(P(xi)) gives us: H=1.918 > Bit/symbol. So far so good. > > We perform Huffmann-Coding, arranging the symbols in the order of their > decreasing probabilities x1 - x4. We aggregate x3 and x4, having now nodes > with probabilities all to be 1/3. In the sequel we have to possibilities: > 1. We first aggregate x2 with the (x3+x4) and get the Huffmann-code: x1 = > 0, x2=10, x3=110, x4=111. > 2. We first aggregate x1 with x2 and get the Huffmann-code x1=00, x2=01, > x3=10, x4=11.
Actually, that's not a Huffman code (since that starts with the two smallest probabilities) but it is a prefix-free code of the same average codelength.
> Both codes have an average code length: > L1=1*1/3+2*1/3+3*1/6+3*1/6 = 2 > L2 = 2*1/3 + 2/1/3 + 2*1/6 + 2*1/6 = 2 > So it's the same, which was to be expected and which tell's us that we > could choose either one, it doesn't matter. > The fact that the Huffmann-codes don't gain anything with respect to > natural encoding is of no importance here. > > Now let's say that we want to transmit the data over a binary channel. > From the point of view of the channel, we see a transparent bit stream of > 0 and 1. The channel doesn't know that some Huffmann coding took place and > he just looks at the data to calculate the a-priori-probabilities for the > 0 and 1. > > In the case of the first Huffmann code, the 0 comes with the probability: > P(0) = 1/3 + 1/3*1/2 + 1/6*1/3 = 10/18 = 5/9 = 0.5556 > In the case of the second Huffmann code, the 0 comes with probability: > P(0) = 1/3*2/2 + 1/3*1/2 + 1/6*1/2 = 7/12 = 0.5833 > They are different! Ok, one sees from the codewords that they are > different. But this means, that the two Huffmann codes are not > equivalent!?
Actually, the problem is here that you re-interpret now the output of the Huffman code that come from a model of four symbols x1,x2,x3,x4 as a *random source* of zero order with two binary symbols 0 and 1, and of course, you get a different entropy. The entropy is different because your model is different. The first model had the alphabet x1,x2,x3,x4, and this model has the alphabet 0,1. However, the second model surely does not reflect reality but is a simplification thereof as we *know* how the zeros and ones were generated in first place.
> The corresponding entropies (if we consider the two bits 0 and 1 with > their above calculated probabilities) are: > after coding with Huffman code 1: H=0.9799 Bit/bit > after coding with Huffmann code 2: H=0.9911 Bit/bit
Yes, but entropy relative to a different model.
> If we take the entropy of the source and divide it by two (average > codeword length of the Huffmann code), we get: H=0.9591Bit/bit. Smaller > than both. Of course, since I can't loose information.
> But after Huffmann coding it seems that I would have gained information. > Yes, Huffmann code was not able in this case to remove some redundancy. > But why part (and in the two cases different parts) of the redundancy has > turned apparently to information?
It did not turn into information. Remember, entropy measures the minimum average amount of binary questions to guess the value of a symbol. For the original source, we know as much as having an alphabet of four symbols with given probabilities. Now you only look at the channel and say, "hey, I've only binary symbols here", and I forgot where they came from. Now, if you now need to guess whether we get a zero or a one, you ignore some part of the information you already had before (namely that this is a source of four symbols), and hence, on average, you need to ask more questions. It's quite natural!
> Shannon tells me, that if my channel capacity is larger than the entropy > of the source, I will be able to find a code to transmit data over the > channel without losses (of course, in the ideal case).
In the case of a noiseless channel, yes.
> Which is now the > entropy to compare with my channel capacity?
That depends on the model, of course. In the first case, you already have the "side information" on the source, so the decoder knows that we have a source (on the sender side) that consists of three symbols. *That* source requires a lower channel capacity than the source were you just forgot this information. In other words, that you *seem* to require a larger channel capacity in case you just look at the raw binary values is just a consequence that you do not know the source well enough and cannot make use of this information. Another example? Consider a source that consists of a 512x512 image. What is the channel capacity for that? If we model such sources as if each pixel is the result of a zero-order random source, the required channel capacity is enormous. (eight bits per pixel, one bit per symbol). However, if we know that the image stems from the JPEG AIC1 test corpus (~30 images), we can indeed represent each image by five bits! Whoa, what a compression! Knowledge on the source gives better models, and that means better compression. What the channel capacity need to be for a lossless compression depends on the model, of course, and your knowledge on the source.
> The original one of the > source (because there is not more than that information available)? Or the > apparent entropy after the Huffmann encoding, because it's the data after > the Huffmann code, which enters the channel?
The original one, in the absence of an even better model. Or rather, the channel capacity need to be *at least* 1.918 bits per symbol. Probably even less if we get a refined model. That the Huffman code requires more is a result of a sub-optimal code (arithmetic coding would be close to the ideal). That you seem to require even more if you just look at the streams of zeros and ones is just a result that you use now an even weaker model, namely that of a binary zero-order source.
> And in the later case, it would mean, that average codeword length of the > Huffmann code is of no interest at all (although this is the usual measure > to compare source codes), but more important would be the probabily > distribution of the 0 and 1 in the resulting bitstream!
Neither is of interest as far as the channel capacity is concerned. The entropy of the source is of interest. The Huffman code is a non-ideal code, so you spend on average more bits than required. Shannon did not say that the Huffman code is the right answer to transmit this sequence over a limited capacity channel, just that there is an n-block code with n large enough that does. So in essence, you would need to merge n symbols of your source into a meta-symbol, compute the probabilities of such n-blocks, find for that the ideal encoding, and send n to infinity. That gives you a code that lies just an epsilon above the channel capacity, with epsilon as small as you like it to be, if you just allow n to be large enough. In reality, use arithmetic coding. Greetings, Thomas