Lecture 13 - 2025 / 5 / 20

Fan's Inequality

Theorem. Let X,YX, Y be discrete r.v. XH,H<X \in \mathcal H, |\mathcal H| < \infty. We want to estimate XX using YY. That is we use X^=g(Y)\hat X = g(Y) as an estimation of XX. Our goal is to minimize error Pe:=P(X^X)P_e := P(\hat X \ne X).
[ PeP_e is related to P(XY)P(X | Y) and H(XY)H(X | Y) ]

PeH(XY)1logHP_e \ge \frac{H(X | Y) - 1}{ \log |\mathcal H|}

Channel Coding Theorem part 2 (R>C)(R > C)

H(MY1,,Yn)=H(M)I(M;Y1,,Yn)H(M| Y_1, \cdots, Y_n) = H(M) - I(M; Y_1, \cdots, Y_n)
H(M)=nR,I(M;Y1,,Yn)=nCH(M) = {nR}, I(M; Y_1, \cdots, Y_n) = nC

Pen(RC)1nRRCR=ε0P_e \ge \frac{n(R - C) - 1}{nR} \approx \frac{R - C}{R} = \varepsilon_0

Fisher Information and Cramér-Rao Inequality

1. Unbiased Estimation

Sample X=(X1,,Xn)X = (X_1, \cdots, X_n), typically i.i.d. XX has density function f(X;θ)=i=1nf(Xi;θ)f(X; \theta) = \prod_{i=1}^{n} f(X_i; \theta)

Our goal: estimate θ\theta using X1,,XnX_1, \cdots, X_n using θ^=ϕ(X1,,Xn)\hat\theta = \phi(X_1, \cdots, X_n), ϕ:XR\phi : \mathcal X \to \R.

Unbiased estimation E(ϕ(X))=θE(\phi(X)) = \theta, want to give Var(ϕ(X))Var(\phi (X)) a lower bound.

2. Fisher Information

Definition (score function): S(X;θ):=θlnf(X;θ)S(X; \theta) := \dfrac{\partial}{\partial \theta} \ln f(X; \theta)

E(S(X;θ))=S(x;θ)f(x;θ)dx=θlnf(x;θ)f(x;θ)dx=θf(x;θ)dx=θf(x;θ)dx=θ1=0\begin{aligned} E(S(X; \theta)) & = \int S(x; \theta) f(x; \theta) \text d x\\ & = \int \frac{\partial}{\partial \theta} \ln f(x; \theta) \cdot f(x;\theta) \text d x\\ & = \int \frac{\partial}{\partial \theta} f(x; \theta) \text d x\\ & = \frac{\partial}{\partial \theta} \int f(x; \theta) \text d x = \frac{\partial}{\partial \theta} 1 = 0 \end{aligned}

E(S2(X;θ))=Var(S(X;θ))E(S^2(X; \theta)) = Var(S(X; \theta))

Definition (Fisher Information): I(θ)=Var(S(X;θ))I(\theta) = Var(S(X; \theta))

Proposition: I(θ)=E(2θ2lnf(X;θ))I(\theta) = - E\left( \dfrac{\partial^2} {\partial \theta ^2} \ln f(X; \theta) \right)

3. Cramér-Rao Inequality

Theorem. For any unbiased estimator ϕ:XR\phi : \mathcal X \to \R, we have Var(ϕ(X))1I(θ)Var(\phi(X)) \ge \dfrac{1}{I(\theta)}.

Generally, for dd-dim