Lecture 11 - 2025 / 4 / 29
Define C : = max P ( X ) I ( X ; Y ) C:= \max_{P(X)} I(X; Y) C := max P ( X ) I ( X ; Y )
Channel Coding Theorem
If R a t e < C , e r r o r → 0 Rate < C, error \to 0 R a t e < C , error → 0
If R a t e > C , e r r o r ≥ ε 0 Rate > C, error \ge \varepsilon_0 R a t e > C , error ≥ ε 0
Asymptotic Equipartition Property (AEP)
The law of large numbers
X 1 , ⋯ , X n X_1, \cdots, X_n X 1 , ⋯ , X n i.i.d., 1 n ∑ i = 1 n X i → X \frac{1}{n}\sum_{i=1}^{n} X_i\to X n 1 ∑ i = 1 n X i → X i.e.
P ( ∣ 1 n ∑ i = 1 n X i − E ( X ) ∣ ≥ ε ) → n → ∞ 0 P\left( \left| \frac{1}{n} \sum_{i=1}^{n} X_i - E(X) \right| \ge \varepsilon \right) \xrightarrow{n \to \infty} 0 P ( n 1 i = 1 ∑ n X i − E ( X ) ≥ ε ) n → ∞ 0
X 1 , ⋯ , X n X_1, \cdots, X_n X 1 , ⋯ , X n i.i.d., let g g g be a function
P ( ∣ 1 n ∑ i = 1 n g ( X i ) − E ( g ( X ) ) ∣ ≥ ε ) → 0 P\left( \left| \frac{1}{n} \sum_{i=1}^{n} g(X_i) - E(g(X)) \right| \ge \varepsilon \right) \to 0 P ( n 1 i = 1 ∑ n g ( X i ) − E ( g ( X )) ≥ ε ) → 0
r.v. X X X discrete, pmf of X X X is p ( x ) : = Pr ( X = x ) p(x) := \Pr(X = x) p ( x ) := Pr ( X = x ) , let g ( x ) : = log 1 p ( x ) g(x) := \log \frac{1}{p(x)} g ( x ) := log p ( x ) 1
P ( ∣ 1 n ∑ i = 1 n log 1 p ( X i ) − E ( log 1 p ( X ) ) ∣ ≥ ε ) → 0 P\left( \left| \frac{1}{n} \sum_{i=1}^{n} \log \frac{1}{p(X_i)} - E\left(\log \frac{1}{p(X)} \right) \right| \ge \varepsilon \right) \to 0 P ( n 1 i = 1 ∑ n log p ( X i ) 1 − E ( log p ( X ) 1 ) ≥ ε ) → 0
w.p. ≈ 1 \approx 1 ≈ 1 , P ( X 1 , ⋯ , X n ) ∈ 2 − n ( H ( X ) ± ε ) P(X_1, \cdots, X_n) \in 2^{-n (H(X) \pm \varepsilon)} P ( X 1 , ⋯ , X n ) ∈ 2 − n ( H ( X ) ± ε )
Which implies A ⊆ { 0 , 1 } n , A : = { ( x 1 , ⋯ , x n ) : P ( x 1 , ⋯ , x n ) ∈ 2 − n ( H ( X ) ± ε ) } A \sube \{0, 1\}^n, A:= \{ (x_1, \cdots, x_n) : P(x_1, \cdots, x_n) \in 2^{-n (H(X) \pm \varepsilon)}\} A ⊆ { 0 , 1 } n , A := {( x 1 , ⋯ , x n ) : P ( x 1 , ⋯ , x n ) ∈ 2 − n ( H ( X ) ± ε ) } , A c = { 0 , 1 } n \ A A^c = \{0, 1\}^n \backslash A A c = { 0 , 1 } n \ A , P ( A ) ≈ 1 , P ( A c ) ≈ 0 P(A) \approx 1, P(A^c) \approx 0 P ( A ) ≈ 1 , P ( A c ) ≈ 0
Only need to care about A A A
P ( x 1 , ⋯ , x n ) ≈ 2 − n H ( X ) P(x_1, \cdots, x_n) \approx 2^{-nH(X)} P ( x 1 , ⋯ , x n ) ≈ 2 − n H ( X ) in A A A , so ∣ A ∣ ≈ 2 n H ( X ) |A| \approx 2^{nH(X)} ∣ A ∣ ≈ 2 n H ( X )
Joint Typical Sequence
( X , Y ) , ( X 1 , Y 1 ) , ⋯ , ( X n , Y n ) (X, Y), (X_1, Y_1), \cdots, (X_n, Y_n) ( X , Y ) , ( X 1 , Y 1 ) , ⋯ , ( X n , Y n ) i.i.d. ∼ p ( x , y ) \sim p(x, y) ∼ p ( x , y )
Jointly Typical Set
A : = { ( x 1 , y 1 , ⋯ , x n , y n ) : P X Y ( x 1 , y 1 , ⋯ , x n , y n ) ∈ 2 − n ( H ( X , Y ) ± ε ) P X ( x 1 , ⋯ , x n ) ∈ 2 − n ( H ( X ) ± ε ) P Y ( y 1 , ⋯ , y n ) ∈ 2 − n ( H ( Y ) ± ε ) } A := \left\{ (x_1,y_1, \cdots, x_n, y_n) : \begin{aligned} P_{XY}(x_1, y_1, \cdots, x_n, y_n) &\in 2^{-n(H(X, Y) \pm \varepsilon)} \\ P_X(x_1, \cdots, x_n) &\in 2^{-n(H(X) \pm \varepsilon)} \\ P_Y(y_1, \cdots, y_n) &\in 2^{-n(H(Y) \pm \varepsilon)} \end{aligned} \right\} A := ⎩ ⎨ ⎧ ( x 1 , y 1 , ⋯ , x n , y n ) : P X Y ( x 1 , y 1 , ⋯ , x n , y n ) P X ( x 1 , ⋯ , x n ) P Y ( y 1 , ⋯ , y n ) ∈ 2 − n ( H ( X , Y ) ± ε ) ∈ 2 − n ( H ( X ) ± ε ) ∈ 2 − n ( H ( Y ) ± ε ) ⎭ ⎬ ⎫
P ( A ) ≈ 1 P(A) \approx 1 P ( A ) ≈ 1
∀ ( x 1 , ⋯ , y n ) ∈ A \forall (x_1, \cdots, y_n) \in A ∀ ( x 1 , ⋯ , y n ) ∈ A , P ( x 1 , ⋯ , y n ) ≈ 2 − n H ( X , y ) P(x_1, \cdots, y_n) \approx 2^{-nH(X, y)} P ( x 1 , ⋯ , y n ) ≈ 2 − n H ( X , y ) , ∣ A ∣ ≈ 2 n H ( X , Y ) |A| \approx 2^{nH(X, Y)} ∣ A ∣ ≈ 2 n H ( X , Y )
How to generate X 1 , Y 1 , ⋯ , X n , Y n X_1, Y_1, \cdots, X_n, Y_n X 1 , Y 1 , ⋯ , X n , Y n ?
( X i , Y i ) ∼ P ( X , Y ) (X_i, Y_i) \sim P(X, Y) ( X i , Y i ) ∼ P ( X , Y )
X 1 , ⋯ , X n ∼ P ( X ) X_1, \cdots, X_n \sim P(X) X 1 , ⋯ , X n ∼ P ( X ) ,
Y 1 , ⋯ , Y n ∼ P ( Y ∣ X ) Y_1, \cdots, Y_n \sim P(Y | X) Y 1 , ⋯ , Y n ∼ P ( Y ∣ X )
For a fixed typical sequence x 1 , x 2 , ⋯ , x n x_1, x_2, \cdots, x_n x 1 , x 2 , ⋯ , x n , the number of corresponding typical sequences of y 1 , ⋯ , y n y_1, \cdots, y_n y 1 , ⋯ , y n is 2 n H ( Y ∣ X ) 2^{n H(Y | X)} 2 n H ( Y ∣ X ) .
If we consider typical sequences of X X X and Y Y Y independently, then the probability that they form a joint typical sequence is 2 n H ( X , Y ) 2 n ( H ( X ) + H ( Y ) ) = 2 − n I ( X ; Y ) \dfrac{2^{nH(X, Y)}}{2^{n( H(X) + H(Y))}} = 2^{-n I(X;Y)} 2 n ( H ( X ) + H ( Y )) 2 n H ( X , Y ) = 2 − n I ( X ; Y )