Lecture 13 - 2025 / 5 / 20
Fan's Inequality
Theorem. Let X,Y be discrete r.v. X∈H,∣H∣<∞. We want to estimate X using Y. That is we use X^=g(Y) as an estimation of X. Our goal is to minimize error Pe:=P(X^=X).
[ Pe is related to P(X∣Y) and H(X∣Y) ]
Pe≥log∣H∣H(X∣Y)−1
Channel Coding Theorem part 2 (R>C)
H(M∣Y1,⋯,Yn)=H(M)−I(M;Y1,⋯,Yn)
H(M)=nR,I(M;Y1,⋯,Yn)=nC
Pe≥nRn(R−C)−1≈RR−C=ε0
1. Unbiased Estimation
Sample X=(X1,⋯,Xn), typically i.i.d. X has density function f(X;θ)=∏i=1nf(Xi;θ)
Our goal: estimate θ using X1,⋯,Xn using θ^=ϕ(X1,⋯,Xn), ϕ:X→R.
Unbiased estimation E(ϕ(X))=θ, want to give Var(ϕ(X)) a lower bound.
Definition (score function): S(X;θ):=∂θ∂lnf(X;θ)
E(S(X;θ))=∫S(x;θ)f(x;θ)dx=∫∂θ∂lnf(x;θ)⋅f(x;θ)dx=∫∂θ∂f(x;θ)dx=∂θ∂∫f(x;θ)dx=∂θ∂1=0
E(S2(X;θ))=Var(S(X;θ))
Definition (Fisher Information): I(θ)=Var(S(X;θ))
Proposition: I(θ)=−E(∂θ2∂2lnf(X;θ))
3. Cramér-Rao Inequality
Theorem. For any unbiased estimator ϕ:X→R, we have Var(ϕ(X))≥I(θ)1.
Generally, for d-dim
- Score function: E(∇θlnf(X;θ))
- Fisher information: −E(∇θ2lnf(X;θ))
- Cramér-Rao: Cov(ϕ(X))≽I(θ)−1 (The difference between the two is positive semidefinite)