Lecture 3 - 2025 / 3 / 4
r.v. X=(m1,⋯,mn) pmf p=(p1,⋯,pn)
H(X):=i=1∑npilog2pi1(bits)
Information
- r.v. XH(X)
- message milog2pi1
X:p1,p2,⋯,pn−1,pn, where pn=q1+q2
Y:q1+q2q1,q1+q2q2
Z:p1,p2,⋯,pn−1,q1,q2
H(Z)=H(X)+pnH(Y)
Huffman Encoding
r.v. X,P=(p1,⋯,pn)
Code C=(c1,⋯,cn),ci∈{0,1}∗
Goal: ∑pi∣ci∣ is minimal. Optimal code.
Properties of Optimal Code
-
Assume p1≥p2,⋯,pn, then ∣c1∣≤∣c2∣≤⋯≤∣cn∣.
-
∣cn∣=∣cn−1∣⇒ sibling nodes.
-
If (c1,⋯,cn) is optimal code for (p1,⋯,pn), then (c1,⋯,cn−2,c~n−1) is optimal code for (p1,⋯,pn−2,pn−1+pn).
Joint Entropy
r.v. X,Y joint probability distribution PXY=(pij)m×n
H(X,Y):=ij∑pijlog2pij1(bits)
When X,Y are independent:
H(X,Y)=H(X)+H(Y)
When X=Y:
H(X,Y)=H(X)=H(Y)
Conditional Entropy
r.v. X,Y
H(Y∣X=xi):=j=1∑nP(Y=yj∣X=xi)log2P(Y=yj∣X=xi)1
Definition (conditional entropy)
H(Y∣X)=i=1∑mP(X=xi)⋅H(Y∣X=xi)
We have:
H(X,Y)=H(Y∣X)+H(X)=H(X∣Y)+H(Y)