Lecture 4 - 2025 / 3 / 11

Conditioning reduces entropy

H(Y)H(YX)H(Y) \ge H(Y | X)

Mutual Information

Definition: r.v. X,YX, Y, define their mutual information as: I(X;Y):=H(X)H(XY)=H(Y)H(YX)=H(X)+H(Y)H(X,Y)I(X;Y):=H(X) - H(X|Y) = H(Y)-H(Y|X) = H(X)+H(Y)-H(X,Y)

I(X;Y)=i=1nj=1mP(X=xi,Y=yj)logP(X=xi,Y=yj)P(X=xi)P(Y=yj)I(X; Y) = \sum_{i=1}^{n} \sum_{j=1}^{m} P(X=x_i, Y=y_j) \log \dfrac{P(X=x_i, Y=y_j)}{P(X=x_i) P(Y=y_j)}

I(X;Y)0I(X; Y) \ge 0

Joint Entropy for r.v. X1,,XnX_1, \cdots, X_n

H(X1,,Xn)=i1,,inpi1,,Inlog21pi1,,inH(X_1, \cdots, X_n) = \sum_{i_1, \cdots, i_n} p_{i_1, \cdots, I_n} \log_2 \dfrac{1}{p_{i_1, \cdots, i_n}}

Conditional Entropy for r.v. X1,,Xn,Y1,,YnX_1, \cdots, X_n, Y_1, \cdots, Y_n...

Mutual Information for r.v. X1,,Xn,Y1,,YnX_1, \cdots, X_n, Y_1, \cdots, Y_n...

H(X1,,Xn)=H(X1)+H(X2X1)++H(XnX1Xn1)H(X_1, \cdots, X_n) = H(X_1) + H(X_2 | X_1) + \cdots + H(X_n | X_1\cdots X_{n-1})

Kullback-Leibler Divergence (KL-divergence) / Relative Entropy

r.v. XX true PMF us P=(p1,,pn)P = (p_1, \cdots, p_n) (unknown), estimation (approximation) Q=(q1,,qn)Q = (q_1, \cdots, q_n)

Definition (KL-divergence / relative entropy): For pmf P=(p1,,pn),Q=(q1,,qn)P = (p_1, \cdots, p_n), Q = (q_1, \cdots, q_n),
D(PQ):=ipilog2piqiD(P \Vert Q) := \sum_{i} p_i \log_2 \dfrac{p_i}{q_i}

Entropy is concave:
H(λP+(1λ)P)λH(P)+(1λ)H(P)H(\lambda P + (1-\lambda) P') \ge \lambda H(P) + (1-\lambda) H(P')

KL divergence is ?

KL divergence is convex in pairs:
D(λP+(1λ)PλQ+(1λ)Q)λD(PQ)+(1λ)D(PQ)D(\lambda P + (1-\lambda) P' \Vert \lambda Q + (1 - \lambda) Q') \le \lambda D(P \Vert Q) + (1-\lambda) D(P' \Vert Q')

The relation between PQ1=i=1npiqi\Vert P - Q \Vert_1 = \sum_{i=1}^{n} |p_i - q_i| and D(PQ)D(P \Vert Q)?