Polar Coding Notes: A Simple Proof

Lyons ZhangNovember 8, 2018

For any B-DMC $W$, the channels $\{W_N^{(i)}\}$ polarize in the sense that, for any fixed $\delta \in (0, 1)$, as $N$ goes to infinity through powers of two, the fraction of indices $i \in \{1, \dots, N\}$ for which $I(W_N^{(i)}) \in (1 − \delta, 1]$ goes to $I(W)$ and the fraction for which $I(W_N^{(i)}) \in [0, \delta)$ goes to $1−I(W)^{[1]}$.

Mrs. Gerber’s Lemma

Mrs. Gerber’s Lemma provides a lower bound on the entropy of the modulo-$2$ sum of two binary random vectors$^{[2][3]}$.  

Let $h^{-1} : [0, 1] \to [0, 1/2]$ be the inverse of the binary entropy function $h(p) = -p\log p - (1-p)\log(1-p)$.  

Here we set a binary symmetric channal with crossover probability $p_0$.  

The convolution of $a$ and $b$ is denoted by  

$a \ast b := a(1 − b) + (1 − a)b$
  

Convex: The function $f(u) = h(h^{-1}(u)\ast p_0), u \in [0,1]$ is convex in $u$ for every fixed $p_0 \in (0,1/2]^{[2]}$.  

Scalar MGL: Let $X$ be a binary random variable and let $U$ be an arbitrary random variable. If $Z \sim Bern(p)$ is independent of $(X, U)$ and $Y = X \oplus Z$, then

$h(Y|U) \ge h(h^{−1}(h(X|U)) \ast p)$
 

Vector MGL: Let $X^n$ be a binary random vector and $U$ be an arbitrary random variable. If $Z^n$ is a vector of independent and identically distributed $Bern(p)$ random variables independent of $(X^n, U)$ and $Y^n = X^n \oplus Z^n$, then  

${h(Y^n|U) \over n} \ge h(h^{−1}({h(X^n|U) \over n}) \ast p)$
  

Let $X,Y$ are two binary random variable, from$^{[2]}$  

$H(X|Y) = \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) $  

Let  

$\beta_0(y) = p(X=1|Y=y) \tag{1}$
  
$1 - \beta_0(y) = p(X=0|Y=y)$
  

Then from conditional entropy definition   

$\begin{align} H(X|Y=y) &= \sum_{x\in\{0,1\}} p(x|y)\log p(x|y) \\&= h(\beta_0(y)) \tag{2}\end{align}$
  

So that  

$h^{-1}(H(X|Y=y)) = p(X=1|Y=y) \tag{3}$
  

and consider $\beta_0(y)=p(X=1|Y)$ as a RV

$\begin{align} H(X|Y) &= \sum_{y\in\{0,1\}} p(Y=y)H(X|Y=y) \\&= \sum_{y\in\{0,1\}} p(Y=y)h(\beta_0(y)) \\&= Eh(\beta_0) \end{align}$
  

Let us consider first  

$\begin{align} H(X_1,X_2, ... ,X_n) &= \sum_{i=1}^n H(X_i|X_{i-1}, ... , X_1) \\&= \sum_{i=1}^n Eh(\beta_k) \end{align}$
  

Here $\beta_k = p(X_k=1 | X_1, ... , X_{k-1}), 1 \le k \le n$. Now we have  

$\begin{align} p(Y_k=1 | X_1, ... , X_{k-1}) &= \sum_{X_k \in \{0,1\}} p(Y_k | X_k) p(X_k | X_1, ... , X_{k-1}) \\&= p_0 \ast \beta_k \end{align}$
  

and  

$H(Y_k=1 | X_1, ... , X_{k-1}) = Eh(p_0 \ast \beta_k)$
  

Strict Polarization for Binary-Input Channels

First notice that  

$\begin{align} I(W^-) &= I(U_1; Y_1,Y_2) \\&=  I(X_1+X_2; Y_1,Y_2) \\&= H(X_1+X_2) - H(X_1+X_2|Y_1,Y_2) \end{align}$
  

and  

$I(W) = I(X_1; Y_1) = 1- H(X_1|Y_1)$
  

Since $H(X_1|Y_1) \in (0, 1)$, there exists an $\alpha \in (0, 1/2)$ such that $H(X_1|Y_1) = h(\alpha)$.  

$\begin{align} H(X_1+X_2|Y_1,Y_2) &= \sum_{y_1,y_2} p(y_1)p(y_2)H(X_1+X_2|y_1y_2) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(\beta_0(y_1,y_2)\big) \tag{from (2)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1+X_2 = 1|y_1y_2)\big) \tag{from (1)} \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)p(X_2=0|y_2) + p(X_1=0|y_1)p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\big(p(X_1=1|y_1)\ast p(X_2=1|y_2)\big) \\&= \sum_{y_1,y_2} p(y_1)p(y_2)h\Big(h^{-1}\big(H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{from (3)} \\&\ge \sum_{y_2} p(y_2)h\Big(h^{-1}\big(\sum_{y_1}p(y_1)H(X_1|y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \tag{Jensen’s inequality} \\&= \sum_{y_2} p(y_2)h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|y_2)\big)\Big) \\&\ge h\Big(h^{-1}\big(H(X_1|Y_1)\big)\ast h^{-1}\big(H(X_2|Y_2)\big)\Big) \tag{Jensen’s inequality} \\&= h(\alpha \ast \alpha) \end{align}$
  

Thus, we have that  

$I(W^-) \le 1 − h(\alpha \ast \alpha) < 1 − h(\alpha) = I(W)$
  

So what we have concluded is that for every $\delta > 0$, there exists $\kappa(\delta) > 0$ such that if $I(W) \in (\delta, 1 − \delta)$, we have  

$\Delta(W) = {1\over 2}[I(W^+) - I(W^-) \ge \kappa(\delta) >0$
  

Proof of Channel Polarization 

Given $W$ and $\delta > 0$, define$^{[4][5]}$  

$\begin{align} \theta_n(\delta) &:= {1\over 2^n} \#\big\{s \in \{+, −\}^n : I(W^s) \in (\delta, 1-\delta)\big\} \tag{$\#$ means the cardinality of a set} \\&=  {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \in (\delta, 1-\delta])\}} \tag{$\mathbb{1}_{\{\cdot\}}=1$ if {$\cdot$} is true} \end{align}$
  

Let  

$\begin{align} \mu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} I(W^s) \tag{e.g. $\mu_1 = {1\over 2}[I(W^+)+I(W^-)]$} \\ \nu_n &= {1\over 2^n} \sum_{s \in \{\pm\}^n} [I(W^s)]^2 \tag{e.g. $\nu_1 = {1\over 2} [I^2(W^+)+I^2(W^-)]$} \end{align}$

We have 

$\begin{align}\mu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} I(W^s) \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {1\over 2} [I(W^{t+})+I(W^{t-})] \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} I(W^t) \\&= \mu_n = \mu_0 = I(W) \\ \nu_{n+1} &= {1\over 2^{n+1}} \sum_{s \in \{\pm\}^{n+1}} [I(W^s)]^2 \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} {[I(W^{t+})]^2+[I(W^{t-}]^2 \over 2} \\&= {1\over 2^n} \sum_{t \in \{\pm\}^n} [I(W^t)]^2+[\Delta(W^t)]^2 \tag{${a^2+b^2 \over 2} = ({a+b \over 2})^2+({a-b \over 2})^2$} \\&\ge \nu_n + \theta_n(\delta)\kappa(\delta)^2 \tag{definition of $\theta_n(\delta)$} \end{align}$

The sequence $\nu_n$ is thus bounded and monotone and consequently convergent; in particular $\nu_{n+1}−\nu_n$ converges to zero. As $\theta_n$ is sandwiched by  

$0 \le \theta_n(\delta) \le {\nu_{n+1} - \nu_n \over \kappa(\delta)^2}$
  

between two quantities both convergent to zero, we conclude  

$\lim\limits_{n \to \infty} \theta_n(\delta) = 0, \forall\delta > 0$
  

This means that for large enough $n$, the fraction of mediocre channels (i.e., those with symmetric capacities in $(\delta, 1 − \delta)$ vanishes to zero. But by preservation of mutual information, we also know that if we define  

$\alpha_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \ge 1-\delta\}}$
  
$\beta_n(\delta) = {1\over 2^n} \sum_{s \in \{\pm\}^n} \mathbb{1}_{\{I(W^s) \le \delta\}}$
  

we automatically have  

$\lim\limits_{n \to \infty} \alpha_n(\delta) = I(W)$
  
$\lim\limits_{n \to \infty} \beta_n(\delta) = 1 - I(W)$

This M.Alsan and E.Telatar's proof is much simpler than the martingale convergence theorem used by Arıkan$^{[1]}$.


Reference:  

1. E. Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. on Information Theory, vol.55, no.7, pp.3051–3073, July 2009.  

2. A. D. Wyner and J. Ziv. A theorem on the entropy of certain binary sequences and applications (Part I). IEEE Trans.Inform.Theory, vol.19, no.6, pp.769-772, Nov.1973.  

3. Abbas El Gamal and Young-Han Kim. Network Information Theory. Cambridge University Press. 2011.  

4. M.Alsan and E.Telatar. A simple proof of polarization and polarization for non-stationary memoryless channels. IEEE Trans.Info.Theory, vol.62, no.9,pp.4873-4878. 2016.  

5. Vincent Y. F. Tan. EE5139R: Information Theory for Communication Systems:2016/7, Semester 1. https://www.ece.nus.edu.sg/  

6. Eren Sasoglu. Polarization and Polar Codes. Foundations and Trends in Communications and Information Theory Vol. 8, No. 4 (2011) 259–381 


Previous post by Lyons Zhang:
   Polar Coding Notes: Channel Combining and Channel Splitting

Comments:

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up

I agree with the terms of use and privacy policy.

Subscribe to occasional newsletter. VERY easy to unsubscribe.
or Sign in