Lecture 3 - 2025 / 3 / 4

r.v. X=(m1,,mn)X = (m_1, \cdots, m_n) pmf p=(p1,,pn)p = (p_1, \cdots, p_n)

H(X):=i=1npilog21pi(bits)H(X) := \sum_{i=1}^{n} p_i \log_2 \frac{1}{p_i} \qquad (\text{bits})

Information

  1. r.v. XH(X)X \qquad \qquad H(X)
  2. message milog21pim_i \quad \log_2 \dfrac{1}{p_i}

X:p1,p2,,pn1,pnX : p_1, p_2, \cdots, p_{n-1}, p_n, where pn=q1+q2p_n = q_1 + q_2

Y:q1q1+q2,q2q1+q2Y : \dfrac{q_1}{q_1 + q_2}, \dfrac{q_2}{q_1 + q_2}

Z:p1,p2,,pn1,q1,q2Z : p_1, p_2, \cdots, p_{n-1}, q_1, q_2

H(Z)=H(X)+pnH(Y)H(Z) = H(X) + p_n H(Y)

Huffman Encoding

r.v. X,P=(p1,,pn)X, P = (p_1, \cdots, p_n)

Code C=(c1,,cn),ci{0,1}C = (c_1, \cdots, c_n), c_i \in \{0, 1\}^*

Goal: pici\sum p_i |c_i| is minimal. Optimal code.

Properties of Optimal Code

  1. Assume p1p2,,pnp_1 \ge p_2, \cdots, p_n, then c1c2cn|c_1| \le |c_2| \le \cdots \le |c_n|.

  2. cn=cn1|c_n| = |c_{n-1}| \Rightarrow sibling nodes.

  3. If (c1,,cn)(c_1, \cdots, c_n) is optimal code for (p1,,pn)(p_1, \cdots, p_n), then (c1,,cn2,c~n1)(c_1, \cdots, c_{n-2}, \tilde c_{n-1}) is optimal code for (p1,,pn2,pn1+pn)(p_1, \cdots, p_{n-2}, p_{n-1} + p_n).

Joint Entropy

r.v. X,YX, Y joint probability distribution PXY=(pij)m×nP_{XY} = (p_{ij})_{m\times n}

H(X,Y):=ijpijlog21pij(bits)H(X, Y) := \sum_{ij}p_{ij} \log_2 \dfrac{1}{p_{ij}} \qquad \rm (bits)

When X,YX, Y are independent:
H(X,Y)=H(X)+H(Y)H(X, Y) = H(X) + H(Y)

When X=YX = Y:
H(X,Y)=H(X)=H(Y)H(X, Y) = H(X) = H(Y)

Conditional Entropy

r.v. X,YX, Y
H(YX=xi):=j=1nP(Y=yjX=xi)log21P(Y=yjX=xi)H(Y | X = x_i) := \sum_{j=1}^{n} P(Y = y_j | X = x_i) \log_2 \dfrac{1}{P(Y = y_j | X = x_i)}

Definition (conditional entropy)
H(YX)=i=1mP(X=xi)H(YX=xi)H(Y | X) = \sum_{i=1}^{m} P(X = x_i) \cdot H(Y | X = x_i)

We have:
H(X,Y)=H(YX)+H(X)=H(XY)+H(Y)H(X, Y) = H(Y | X) + H(X) = H(X | Y) + H(Y)