统计量

定义:设\((X_1,\dots,X_n)\)为取自总体的一组样本,若函数\(g(X_1,\dots,X_n)\)不包含总体分布中的任何参数,则称\(g(X_1,\dots,X_n)\)统计量

样本均值和样本方差

\[ \begin{gather} \text{样本均值:}\bar X = \frac{1}{n} \sum_{i=1}^n X_i \\ \text{样本方差:}S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2 = \frac{1}{n-1} (\sum_{i=1}^n X_i^2 - n \bar X^2) \end{gather} \]

\(m_k = \frac{1}{n} \sum_{i=1}^n X_i^k\)样本的\(k\)阶原点矩\(a_k = \frac{1}{n} \sum_{i=1}^n (X_i - \bar X)^k\)样本的\(k\)阶中心矩,这些也都是统计量。特别地,当\(k=2\)时,我们令\(S_n^2 \triangleq a_2 = \frac{1}{n} \sum_{i=1}^n X_i^2 - \bar X^2\)

由于统计量是随机变量的函数,故统计量也是随机变量。设总体\(X\)的期望\(\E(X) = \mu\),方差\(\Var(X) = \sigma^2\),关于统计量有如下定理: \[ \begin{gather} & \E(\bar X) = \mu, \Var(\bar X) = \frac{\sigma^2}{n} \\ \notag \\ & \E(S^2) = \sigma^2, \E(S_n^2) = \frac{n-1}{n} \sigma^2 \\ \notag \\ & \bar X \stackrel{P}{\to} \mu, S^2 \stackrel{P}{\to} \sigma^2, S_n^2 \stackrel{P}{\to} \sigma^2 \end{gather} \] 有关证明如下: \[ \begin{gather} \E(\bar X) = \E (\frac{1}{n} \sum_{i=1}^n X_i) = \frac{1}{n} \sum_{i=1}^n \E(X_i) = \mu \\ \Var(\bar X) = \Var(\frac{1}{n} \sum_{i=1}^n X_i) = \frac{1}{n^2} \sum_{i=1}^n \Var(X_i) = \frac{\sigma^2}{n} \end{gather} \]

\[ \begin{gather} \begin{aligned}[t] \E(S^2) &= \E [\frac{1}{n-1} (\sum_{i=1}^n X_i^2 - n \bar X^2)] \\ &= \frac{1}{n-1} \big( \sum_{i=1}^n \E (X_i^2 ) - n \E(\bar X^2) \big) \\ &\Downarrow_ {\E(X_i^2) = \Var(X_i) + \E^2(X_i) = \sigma^2 + \mu^2, \E(\bar X^2) = \Var(\bar X) + \E^2(\bar X) = \frac{\sigma^2}{n} + \mu^2} \\ &= \frac{1}{n-1} \big( \sum_{i=1}^n (\sigma^2 + \mu^2) - n (\frac{\sigma^2}{n} + \mu^2) \big) \\ &= \sigma^2 \end{aligned} \begin{aligned}[t] \E(S_n^2) &= \E [\frac{n-1}{n} \frac{1}{n-1} (\sum_{i=1}^n X_i^2 - n \bar X^2)] \\ &= \frac{n-1}{n} \E [\frac{1}{n-1} (\sum_{i=1}^n X_i^2 - n \bar X^2)] \\ &= \frac{n-1}{n} \sigma^2 \end{aligned} \end{gather} \]

归根结底,样本方差使用\(\frac{1}{n-1}\)而不是\(\frac{1}{n}\)的原因是,其使用的“均值”为\(\bar X\)而不是\(\mu\),这导致了一个自由度的缺失。而假设\(\mu\)已知,我们定义一个新的统计量\(S'^2 = \frac{1}{n} \sum_{i=1}^N (X_i - \mu)^2 = \frac{1}{n} (n \mu^2 - 2n\mu \bar X + \sum_{i=1}^n X_i^2)\),我们会发现\(\E(S'^2) = \sigma^2\)\[ \begin{aligned} &\E(S'^2) = \frac{1}{n} \E(n \mu^2 - 2n\mu \bar X + \sum_{i=1}^n X_i^2) \\ &= \frac{1}{n} (n \mu^2 - 2n\mu \E(\bar X) + \sum_{i=1}^n \E (X_i^2)) \\ &= \frac{1}{n} (n \mu^2 - 2n\mu^2 + \sum_{i=1}^n (\sigma^2 + \mu^2)) \\ &= \sigma^2 \end{aligned} \] 至于三个统计量的依概率收敛证明,根据相互独立同分布大数定律,有 \[ \begin{gather} \bar X = \frac{1}{n} \sum_{i=1}^n X_i \stackrel{P}{\to} \mu \\ \frac{1}{n} \sum_{i=1}^n X_i^2 \stackrel{P}{\to} \frac{1}{n} \sum_{i=1}^n \E (X_i^2) = \sigma^2 + \mu^2 \end{gather} \]

对于任意\(\epsilon, \delta > 0\),存在\(N_1, N_2 > 0\),使得当\(n > \max(N_1, N_2)\)时,始终有 \[ \begin{gather} 0 < P(|\frac{1}{n} \sum_{i=1}^n X_i^2 - (\sigma^2 + \mu^2)| \ge \epsilon / 2) < \delta / 2 \\ 0 < P(|\mu^2 - \bar X^2| \ge \epsilon / 2) < \delta / 2 \\ \end{gather} \] 记事件\(A\)\(|\frac{1}{n} \sum_{i=1}^n X_i^2 - (\sigma^2 + \mu^2)| \ge \epsilon / 2\)、事件\(B\)\(|\bar X^2 - \mu^2| \ge \epsilon / 2\)、事件\(C\)\(|\frac{1}{n} \sum_{i=1}^n X_i^2 - \bar X^2 - \sigma^2| \ge \epsilon / 2\)。由于\(|\frac{1}{n} \sum_{i=1}^n X_i^2 - (\sigma^2 + \mu^2)| + |\mu^2 - \bar X^2| \ge |\frac{1}{n} \sum_{i=1}^n X_i^2 - \bar X^2 - \sigma^2|\),则事件\(C\)发生时,事件\(A\)\(B\)至少发生其中之一,即事件\(C\)是事件\(A\)与事件\(B\)并集的子集。故

\[ 0 < P(\text{事件$C$}) \le P(\text{事件$A$ 或 事件$B$}) \le P(\text{事件$A$}) + P(\text{事件$B$}) < \delta \]\(0 < P(|\frac{1}{n} \sum_{i=1}^n X_i^2 - \bar X^2 \ - \sigma^2| \ge \epsilon) < \delta\)。又由于对于任意\(\epsilon, \delta > 0\)该结论都成立,故 \[ \begin{gathered} \lim_{n \to \infty} P(|\underbrace{\frac{1}{n} \sum_{i=1}^n X_i^2 - \bar X^2}_{S_n^2} - \sigma^2| \ge \epsilon) = 0 \iff \\ S_n^2 \stackrel{P}{\to} \sigma^2 \end{gathered} \] 运用类似的\(\epsilon, \delta\)语言,我们可以证明\(S^2 = \frac{n}{n-1} S_n^2 \stackrel{P}{\to} \sigma^2\)

次序统计量

\((X_{(1)}, \dots, X_{(n)})\)为样本\((X_1, \dots, X_n)\)排序后的结果,则\(X_{(1)} = \min (X_1, \dots, X_n), X_{(n)} = \max (X_1, \dots, X_n)\)亦是统计量。

\(X_{(1)}, X_{(n)}\)的概率密度函数分别为\(p_{X_{(1)}}, p_{X_{(n)}}\),则 \[ \begin{gather} p_{X_{(1)}}(u) = n \big( 1 - P_X(u) \big)^{n-1} p_X(u) \\ p_{X_{(n)}}(u) = n \big( P_X(u) \big)^{n-1} p_X(u) \end{gather} \]\(X_{(k)}\)的概率密度函数为\(p_{X_{(k)}}\),则…

Previous
Next