Lecture 4 - 2025 / 3 / 11
Conditioning reduces entropy
H(Y)≥H(Y∣X)
Definition: r.v. X,Y, define their mutual information as: I(X;Y):=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=H(X)+H(Y)−H(X,Y)
I(X;Y)=∑i=1n∑j=1mP(X=xi,Y=yj)logP(X=xi)P(Y=yj)P(X=xi,Y=yj)
I(X;Y)≥0
Joint Entropy for r.v. X1,⋯,Xn
H(X1,⋯,Xn)=i1,⋯,in∑pi1,⋯,Inlog2pi1,⋯,in1
Conditional Entropy for r.v. X1,⋯,Xn,Y1,⋯,Yn...
Mutual Information for r.v. X1,⋯,Xn,Y1,⋯,Yn...
H(X1,⋯,Xn)=H(X1)+H(X2∣X1)+⋯+H(Xn∣X1⋯Xn−1)
Kullback-Leibler Divergence (KL-divergence) / Relative Entropy
r.v. X true PMF us P=(p1,⋯,pn) (unknown), estimation (approximation) Q=(q1,⋯,qn)
Definition (KL-divergence / relative entropy): For pmf P=(p1,⋯,pn),Q=(q1,⋯,qn),
D(P∥Q):=i∑pilog2qipi
Entropy is concave:
H(λP+(1−λ)P′)≥λH(P)+(1−λ)H(P′)
KL divergence is ?
- Fix Q, in P
- Fix P, in Q
- (P,Q) ?
KL divergence is convex in pairs:
D(λP+(1−λ)P′∥λQ+(1−λ)Q′)≤λD(P∥Q)+(1−λ)D(P′∥Q′)
The relation between ∥P−Q∥1=∑i=1n∣pi−qi∣ and D(P∥Q)?