Lecture 12 - 2025 / 3 / 27

Balls and Bins (2)

Lemma: 设 $\cal E$ 是关于 bin loads 的事件，且 $\Pr[\cal E]$ 关于 $m$ 递增是单调上升 / 单调下降的，则 $\Pr_X[\mathcal E] \le 4 \Pr_Y[\mathcal E]$ ，其中 $X$ 为 Balls and Bins 模型， $Y$ 为 $n$ 个独立的 $\pi(m/n)$ 。

不妨设 $\Pr[\cal E]$ 单调上升，则

$\begin{aligned} \Pr_Y[\mathcal E] & = \sum_{k=0}^{\infty} \Pr_Y\left[\mathcal E \mid \sum_{i=1}^{n} Y_i = k\right] \Pr\left[ \sum_{i=1}^{n} Y_i = k \right] \\ & \ge \sum_{k=m}^{\infty} \Pr_Y\left[\mathcal E \mid \sum_{i=1}^{n} Y_i = m\right] \Pr\left[ \sum_{i=1}^{n} Y_i = k \right]\\ & \ge \Pr_Y\left[\mathcal E \mid \sum_{i=1}^{n} Y_i = m\right] \Pr\left[ \sum_{i=1}^{n} Y_i \ge m \right]\\ & \ge \Pr_X [\mathcal E] \cdot \frac{1}{4} \end{aligned}$

最后一步用到对于 $\lambda \in \N$ ，对于 $X \sim \pi(\lambda)$ ，有 $\Pr[X \ge \lambda] \ge 1/4$ 。

Corollary: $\Pr[\forall i, X_i \le c] \le 4 \Pr[\forall i, Y_i \le c]$

Theorem: 将 $n$ 个球独立均匀放进 $n$ 个桶里，最大负载量 w.h.p 是 $\Omega(\dfrac{\ln n}{\ln \ln n})$ 。

记 $\mathcal{E}_2$ 表示所有 $Y_i \le (1-\varepsilon)\dfrac{\ln n}{\ln \ln n}$ 我们需要证明 $\Pr [\mathcal{E}_2] = 1 / \text{poly}(n)$ 。

由于 $Y_1 \sim \pi(1)$ ，所以 $\Pr[Y_1 \ge k] = \sum_{j = k}^{\infty} \dfrac{e^{-1}}{j!} \le \dfrac{1}{k!}$ 。这是因为 $e = 1 + 1/2 + 1/3! + \cdots$ 。当然，更直接的有 $\Pr[Y_1 \ge k] \ge \dfrac{1}{ek!}$ 。
$\begin{aligned} \Pr[\mathcal E_2] & = (1 - \Pr[Y_1 \ge k])^n\\ & \le \left( 1 - \frac{1}{ek!} \right)^n\\ & \le \exp\left(-\frac{n}{ek!} \right) \\ & \le \exp(-\exp(\Theta(\varepsilon \ln n))) \\ & = \exp(-n^{\Theta(\varepsilon)}) \end{aligned}$

于是以指数速度趋于 $0$ 。

综上所述，最大负载量 w.h.p 是 $\Theta(\dfrac{\ln n}{\ln \ln n})$ 。

Stochastic Dominance

Definition (SD w.r.t. random variables): 对于两个在 $[a, b]$ 上的随机变量 $X, Y$ ，如果 $\forall c \in [a, b], \Pr[Y \ge c] \ge \Pr[X \ge c]$ ，则称 $Y$ stochastic dominates $X$ ，记作 $X \preceq Y$ 。

Definiton (SD w.r.t. functions): 对于两个在 $[a, b]$ 上的函数 $f, g$ ，如果 $\forall c \in [a, b]$
$\int_{x \ge c} f(x) \text d x \le \int_{y \ge c} g(y) \text d y$

则称 $f$ stochastic dominates $g$ ，记作 $f \preceq g$ 。

Lemma: $X_1 \preceq Y_1, X_2 \preceq Y_2$ ，且 $X_1, X_2$ 独立， $Y_1, Y_2$ 独立，则 $X_1 + X_2 \preceq Y_1 + Y_2$ 。

对于任何 $c$ ，我们只需证明 $Y_1 + X_2 \preceq Y_1 + Y_2$ ，则根据对称性得证。
$\begin{aligned} \Pr[Y_1 + Y_2 \ge c] & = \sum_{y_1} \Pr[Y_1 = y_1] \Pr[Y_2 \ge c - y_1] \\ & \ge \sum_{y_1} \Pr[Y_1 = y_1] \Pr[X_2 \ge c - y_1] \\ & = \Pr[Y_1 + X_2 \ge c] \end{aligned}$

Corollary: 如果函数列 $\{g_j\}_{j=1}^{m}$ 和 $\{f_j\}_{j=1}^{m}$ 满足 $f_j(\cdot; x_1, \cdots, x_{i-1}) \preceq g_j(\cdot)$ ，则
$\int_{\sum x_j \ge c} f_1(x_1)\cdots f_m(x_m; x_1, \cdots, x_{m-1})\text dx \le \int_{\sum x_j \ge c} g_1(x_1)\cdots g_m(x_m) \text dx$

归纳法，先固定 $x_1, \cdots, x_{m-1}$ ，将 $f_m(\cdot; x_1, \cdots, x_{m-1})$ 替换为 $g(\cdot)$ ，然后重复上述过程。

Power of 2 Choices (1)

将 $m$ 个球独立放入 $n$ 个桶中，每个球随机选择两个桶，放入负载较小的那个桶。

Theorem: $m = n$ 时，最大负载量 w.h.p 不超过 $\dfrac{\ln \ln n}{\ln 2} + \Theta(1)$ 。

证明的大体思路是，设 $B_i$ 为负载量 $\ge i$ 的桶的个数。我们试图找到一系列 bound $\beta_i$ ，使得 w.h.p $B_i \le \beta_i$ ，则对于任何一个特定的球，其落在负载 $\ge i$ 的桶的概率 $\le \left( \dfrac{\beta_i}{n} \right)^2$ 。从而 $B_{i+1} \preceq \mathcal B(n, (\beta_i / n)^2)$ ，均值为 $\beta_i^2 / n$ ，可以根据 Chernoff bound 取 $\beta_{i+1} = c \beta_i^2 /n$ ，于是有 $\dfrac{\beta_{i+1}}{n} = c \left( \dfrac{\beta_i}{n} \right)^2$ ，即 $\beta_i / n$ 平方速度下降，当 $i \approx \dfrac{\ln \ln n}{\ln 2}$ 时有 $\beta_i < 1$ ，这便是最大负载量。