Lecture 5 - 2025 / 3 / 18

For general P=(p1,,pn),Q=(q1,,qn)P = (p_1, \cdots, p_n), Q= (q_1, \cdots, q_n).

Let P~=(pi>qipi,piqipi),Q~=(pi>qiqi,piqiqi)\tilde P = (\sum_{p_i > q_i} p_i, \sum_{p_i \le q_i} p_i), \tilde Q = (\sum_{p_i > q_i} q_i, \sum_{p_i \le q_i} q_i)

Then PQ1=P~Q~1\Vert P - Q \Vert _1 = \Vert \tilde P - \tilde Q \Vert _ 1

Entropy Rate

r.v. XX, P=(0.01,0.99)P = (0.01, 0.99)

(1) H(X)=0.07H(X) = 0.07 bit

(2) Min average code length = 11 bit

Ratio = 1H(X)\dfrac {1}{H(X)} too large!

X1,X2,,Xt,X_1, X_2, \cdots, X_t, \cdots i.i.d. P\sim P

Pack (X1,,XT)(X_1, \cdots, X_T) together

(1) H(X1,,XT)=TH(X)H(X_1, \cdots, X_T) = T \cdot H(X)

(2) Min average code length TH(X)+1\le T \cdot H(X) + 1 bit

Ratio = TH(X)+1TH(X)=1+O(1T)\dfrac{T \cdot H(X) + 1}{T \cdot H(X)} = 1 + O(\dfrac 1 T)

Per r.v. TH(X)+1TH(X)\dfrac{T \cdot H(X) + 1}{T} \to H(X)

Source X:X1,X2,,Xt,\mathcal X : X_1, X_2, \cdots, X_t, \cdots

H(X)=limT1TH(X1,,XT)H(\mathcal X) = \lim_{T \to \infty} \dfrac 1 T H(X_1, \cdots, X_T)

H(X)=limTH(XTX1,,XT1)H(\mathcal X) = \lim_{T \to \infty} H(X_T | X_1, \cdots, X_{T-1})

Differential Entropy

Continuous r.v. XX pdf f(x)f(x)

h(X):=+f(x)logf(x)dxh(X) := - \int_{-\infty}^{+\infty} f(x) \log f(x) \text d x

Continuous r.v. XX pdf f(x)f(x) \Rightarrow discrete r.v. XΔ(Δ>0)X_{\Delta} (\Delta > 0), where P(XΔ=i)=(i1)ΔiΔf(x)dxP(X_{\Delta} = i) = \int_{(i-1)\Delta}^{i\Delta} f(x) \text d x

h(X)+log1ΔH(XΔ)h(X) + \log \frac 1 \Delta \approx H(X_{\Delta})

Discrete r.v. XX, r.v. Z=aXZ = aX, a>0a > 0, then H(X)=H(Z)H(X) = H(Z)

Continuous r.v. XX, r.v. Y=aXY = aX, a>0a > 0, then h(X)h(Y)h(X) \ne h(Y)

r.v. XN(μ,σ2),h(x)=12+log(2πσ)X \sim N(\mu, \sigma^2), h(x) = \dfrac 1 2 + \log (\sqrt{2\pi }\sigma)