Polar Coding Notes: A Simple Proof

Lyons ZhangNovember 8, 2018

For any B-DMC $W$, the channels $\{W_N^{(i)}\}$ polarize in the sense that, for any fixed $\delta \in (0, 1)$, as $N$ goes to infinity through powers of two, the fraction of indices $i \in \{1, \dots, N\}$ for which $I(W_N^{(i)}) \in (1 − \delta, 1]$ goes to $I(W)$ and the fraction for which $I(W_N^{(i)}) \in [0, \delta)$ goes to $1−I(W)^{[1]}$.

Mrs. Gerber’s Lemma

Mrs. Gerber’s Lemma provides a lower bound on the entropy of the modulo-$2$ sum of two binary random vectors$^{[2][3]}$.  

Let $h^{-1} : [0, 1] \to [0, 1/2]$ be the inverse of the binary entropy function $h(p) = -p\log p - (1-p)\log(1-p)$.  

Here we set a binary symmetric channal with crossover probability $p_0$.  

The convolution of $a$ and $b$ is denoted by  

$a \ast b := a(1 − b) + (1 − a)b$

Convex: The function $f(u) = h(h^{-1}(u)\ast p_0), u \in [0,1]$ is convex in $u$ for every fixed $p_0 \in (0,1/2]^{[2]}$.  

Scalar MGL: Let $X$ be a binary random variable and let $U$ be an arbitrary random variable. If $Z \sim Bern(p)$ is independent of $(X, U)$ and $Y = X \oplus Z$, then

$h(Y|U) \ge h(h^{−1}(h(X|U)) \ast p)$

Vector MGL: Let $X^n$ be a binary random vector and $U$ be an arbitrary random variable. If $Z^n$ is a vector of independent and identically distributed $Bern(p)$ random variables independent of $(X^n, U)$ and $Y^n = X^n \oplus Z^n$, then  

${h(Y^n|U) \over n} \ge h(h^{−1}({h(X^n|U) \over n}) \ast p)$

Let $X,Y$ are two binary random variable, from$^{[2]}$  

$H(X|Y) = \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) $  


$\beta_0(y) = p(X=1|Y=y) \tag{1}$
$1 - \beta_0(y) = p(X=0|Y=y)$

Then from conditional entropy definition   

$\begin{align} H(X|Y=y) &= \sum_{x\in\{0,1\}} p(x|y)\log p(x|y) \\&= h(\beta_0(y)) \tag{2}\end{align}$

So that  

$h^{-1}(H(X|Y=y)) = p(X=1|Y=y) \tag{3}$

and consider $\beta_0(y)=p(X=1|Y)$ as a RV

$\begin{align} H(X|Y) &= \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) \\&= \sum_{y\in\{0,1\}} p(Y=y)h(\beta_0(y)) \\&= Eh(\beta_0) \end{align}$

Let us consider first  

$\begin{align} H(X_1,X_2, ... ,X_n) &= \sum_{i=1}^n H(X_i|X_{i-1}, ... , X_1) \\&= \sum_{i=1}^n Eh(\beta_k) \end{align}$

Here $\beta_k = p(X_k=1 | X_1, ... , X_{k-1}), 1 \le k \le n$. Now we have  

$\begin{align} p(Y_k=1 | X_1, ... , X_{k-1}) &= \sum_{X_k \in \{0,1\}} p(Y_k | X_k) p(X_k | X_1, ... , X_{k-1}) \\&= p_0 \ast \beta_k \end{align}$


$H(Y_k=1 | X_1, ... , X_{k-1}) = Eh(p_0 \ast \beta_k)$

Strict Polarization for Binary-Input Channels

First notice that  

$\begin{align} I(W^-) &= I(U_1; Y_1,Y_2) \\&=  I(X_1+X_2; Y_1,Y_2) \\&= H(X_1+X_2) - H(X_1+X_2|Y_1,Y_2) \end{align}$


$I(W) = I(X_1; Y_1) = 1- H(X_1|Y_1)$

Since $H(X_1|Y_1) \in (0, 1)$, there exists an $\alpha \in (0, 1/2)$ such that $H(X_1|Y_1) = h(\alpha)$.  

$\begin{align} H(X_1+X_2|Y_1,Y_2) &= \sum_{y_1,y_2} p(y_1)p(y_2)H(X_1+X_2|y_1y_2) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(\beta_0(y_1,y_2)\big) \tag{from (2)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1+X_2 = 1|y_1y_2)\big) \tag{from (1)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)p(X_2=0|y_2) + p(X_1=0|y_1)p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)\ast p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\Big(h^{-1}\big(H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{from (3)} \\&\ge \sum_{y_2} p(y_2)h\Big(h^{-1}\big(\sum_{y_1}p(y_1)H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{Jensen’s inequality} \\&= \sum_{y_2} p(y_2)h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \\&\ge h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|Y_2)\big)\Big) \tag{Jensen’s inequality} \\&= h(\alpha \ast \alpha) \end{align}$

Thus, we have that  

$I(W^-) \le 1 − h(\alpha \ast \alpha) < 1 − h(\alpha) = I(W)$

So what we have concluded is that for every $\delta > 0$, there exists $\kappa(\delta) > 0$ such that if $I(W) \in (\delta, 1 − \delta)$, we have  

$\Delta(W) = {1\over 2}[I(W^+) - I(W^-) \ge \kappa(\delta) >0$

Proof of Channel Polarization 

Given $W$ and $\delta > 0$, define$^{[4][5]}$  

$\begin{align} \theta_n(\delta) &:= {1\over 2^n} \#\big\{s \in \{+, −\}^n : I(W^s) \in (\delta, 1-\delta)\big\} \tag{$\#$ means the cardinality of a set} \\&=  {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \in (\delta, 1-\delta])\}} \tag{$\mathbb{1}_{\{\cdot\}}=1$ if {$\cdot$} is true or esle $=0$} \end{align}$


$\begin{align} \mu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} I(W^s) \tag{e.g. $\mu_1 = {1\over 2}[I(W^+)+I(W^-)]$} \\ \nu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} [I(W^s)]^2 \tag{e.g. $\nu_1 = {1\over 2} [I^2(W^+)+I^2(W^-)]$} \end{align}$

We have 

$\begin{align}\mu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} I(W^s) \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {1\over 2} [I(W^{t+})+I(W^{t-})] \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} I(W^t) \\&= \mu_n = \mu_0 = I(W) \\ \nu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} [I(W^s)]^2 \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {[I(W^{t+})]^2+[I(W^{t-}]^2 \over 2} \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} [I(W^t)]^2+[\Delta(W^t)]^2 \tag{${a^2+b^2 \over 2} = ({a+b \over 2})^2+({a-b \over 2})^2$} \\&\ge \nu_n + \theta_n(\delta)\kappa(\delta)^2 \tag{definition of $\theta_n(\delta)$} \end{align}$

The sequence $\nu_n$ is thus bounded and monotone and consequently convergent; in particular $\nu_{n+1}−\nu_n$ converges to zero. As $\theta_n$ is sandwiched by  

$0 \le \theta_n(\delta) \le {\nu_{n+1} - \nu_n \over \kappa(\delta)^2}$

between two quantities both convergent to zero, we conclude  

$\lim\limits_{n \to \infty} \theta_n(\delta) = 0, \forall\delta > 0$

This means that for large enough $n$, the fraction of mediocre channels (i.e., those with symmetric capacities in $(\delta, 1 − \delta)$) vanishes to zero. But by preservation of mutual information, we also know that if we define  

$\alpha_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \ge 1-\delta\}}$
$\beta_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \le \delta\}}$

we automatically have  

$\lim\limits_{n \to \infty} \alpha_n(\delta) = I(W)$
$\lim\limits_{n \to \infty} \beta_n(\delta) = 1 - I(W)$

This M.Alsan and E.Telatar's proof is much simpler than the martingale convergence theorem used by Arıkan$^{[1]}$.


1. E. Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. on Information Theory, vol.55, no.7, pp.3051–3073, July 2009.  

2. A. D. Wyner and J. Ziv. A theorem on the entropy of certain binary sequences and applications (Part I). IEEE Trans.Inform.Theory, vol.19, no.6, pp.769-772, Nov.1973.  

3. Abbas El Gamal and Young-Han Kim. Network Information Theory. Cambridge University Press. 2011.  

4. M.Alsan and E.Telatar. A simple proof of polarization and polarization for non-stationary memoryless channels. IEEE Trans.Info.Theory, vol.62, no.9,pp.4873-4878. 2016.  

5. Vincent Y. F. Tan. EE5139R: Information Theory for Communication Systems:2016/7, Semester 1. https://www.ece.nus.edu.sg/  

6. Eren Sasoglu. Polarization and Polar Codes. Foundations and Trends in Communications and Information Theory Vol. 8, No. 4 (2011) 259–381 

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: