三大分布与正态总体的抽样分布
\(\chi^2\)分布、\(t\)分布、\(F\)分布都由正态分布衍生而来,常见统计量在正态总体的假设下,都与这三种分布有关,所以他们在正态总体的统计推断中起着很大的作用。
前置知识
Gamma函数
\[ \Gamma (x) = \int_0^{+\infty} e^{-t} t^{x-1} dt \quad (x > 0) \]
\(\Gamma\)函数具有\(\Gamma(x + 1) = x \Gamma(x)\)的性质:
\[ \Gamma (x+1) = \int_0^{+\infty} e^{-t} t^{x} dt = [-e^{-t} t^x] \bigg|^{+\infty}_{t=0} - \int_0^{+\infty} -e^{-t} xt^{x-1} dt \] 根据洛必达法则,\(\lim_{t \to +\infty} = \frac{-t^x}{e^t} = \lim_{t \to +\infty} \frac{x!}{e^t} = 0\),故 \[ \Gamma (x+1) = 0 + x\int_0^{+\infty} e^{-t} t^{x-1} dt = x \Gamma(x) \]
\(\Gamma(1) = \int_0^{+\infty} e^{-t} dt = 1\),故\(x\)为正整数时,\(\Gamma(x) = x!\);
\(\Gamma(1/2) = \sqrt \pi\),故\(x = 2k + 1\)为正奇数时,\(\Gamma(\frac{x}{2}) = \sqrt \pi \prod_{i=0}^{k-1} \frac{2 * i + 1}{2}\):
\[ \begin{gather} \Gamma (\frac{1}{2}) = \int_0^{+\infty} e^{-t} t^{-\frac{1}{2}} \d t \stackrel{u = t^\frac{1}{2}}{\Longrightarrow} \int_0^{+\infty} e^{-u^2} u^{-1} \;2u \d u = 2 \int_0^{+\infty} e^{-u^2} \d u = \int_{-\infty}^{+\infty} e^{-u^2} \d u \\ \notag \\ \begin{aligned} &\Gamma^2(\frac{1}{2}) = (\int_{-\infty}^{+\infty} e^{-u^2} \d u)^2 \\ &= (\int_{-\infty}^{+\infty} e^{-u^2} du)(\int_{-\infty}^{+\infty} e^{-v^2} \d v) \\ &= \int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} e^{-(u^2+v^2)} \d u \d v \\ &\downarrow_{u = r\sin\theta, v = r\cos\theta} \\ &= \int_{0}^{2\pi} \int_{0}^{+\infty} e^{-r^2}\; r \d r \d \theta \\ &= \int_{0}^{2\pi} [-\frac{1}{2}e^{-r^2}] \bigg|_{r=0}^{+\infty} \d \theta \\ &= \int_{0}^{2\pi} \frac{1}{2} \d \theta \\ &= \pi \\ &\Gamma (\frac{1}{2}) = \sqrt \pi \end{aligned} \end{gather} \]
参考
The Gamma Function || The derivation of \(\Gamma(1/2)\)
\(\chi^2\)分布
设\(X_1,\dots,X_n\)为相互独立的标准正态分布随机变量,即\(X_i \sim N(0,1)\)则称\(Y = X_1^2 + \dots + X_n^2\)服从自由度为\(n\)的\(\chi^2\)分布,记作\(Y \sim \chi^2(n)\),其中\(\E[Y] = n, \Var[Y] = 2n\)。
\(n=1\)时,容易得到\(\forall y \le 0, P_Y(y) = 0, p_Y(y)= 0\), $$ \[\begin{align} \begin{split} \forall y > 0, P_Y(y) &= 2P_X(\sqrt y) - 1 \\ \end{split} \\ \begin{split} &p_Y(y) = 2P_X'(\sqrt y) \frac 1 {2 \sqrt y} \\ &= \frac 1 {\sqrt {2\pi y}} e^{-\frac 1 2 y} \\ &= \frac 1 {2^\frac{1}{2} \Gamma(\frac 1 2)} y^{\frac 1 2 - 1} e^{-\frac y 2} \end{split} \end{align}\] \[ $\chi^2(1)$分布的密度函数为: \] p_Y(y) =
\[\begin{cases} \frac 1 {2^\frac{1}{2} \Gamma(\frac 1 2)} y^{\frac 1 2 - 1} e^{-\frac y 2}, &y > 0 \\ 0, & y \le 0 \end{cases}\]$$
\(n=k\)时,令\(X_1,\dots,X_k\)表示一个\(k\)维空间中的点, \[ \begin{aligned} p_Y(y) = P_Y(Y \le y) &= \int_\mathcal V \prod_{i=1}^k N(0,1,x_i)\; \d x_1 \dots \d x_k \\ &= \int_\mathcal V \frac{e^{-\frac 1 2 (x_1^2 + \dots + x_k^2)}} {(2\pi)^{k / 2}}\ \d x_1 \dots \d x_k \\ \end{aligned} \] 其中\(\mathcal V\)表示\(\sum_{i=1}^k x_i^2 \le y\)的积分区域。可以看出,\(\mathcal V\)对应一个\(k\)维球体,且其半径\(R = \sqrt y\)。对此,作高维球坐标变换: \[ \begin{aligned} &P_Y(Y \le y) = \int_\mathcal V \prod_{i=1}^k N(0,1,x_i)\; \d x_1 \dots \d x_k = \int_\mathcal V \frac{e^{-\frac 1 2 (x_1^2 + \dots + x_k^2)}} {(2\pi)^{k / 2}}\ \d x_1 \dots \d x_k \\ &= \int_0^{2\pi} \underbrace{\int_0^\pi \dots \int_0^\pi}_{k-2} \int_0^\sqrt{y} \\ &\quad\quad\quad\frac{e^{-\frac 1 2 (r^2\cos^2 \varphi_1 + r^2\sin^2 \varphi_1 \cos^2 \varphi_2 + \dots + r^2\sin^2 \varphi_1 \dots \sin^2 \varphi_{k-2} \cos^2 \varphi_{k-1} + r^2\sin^2 \varphi_1 \dots \sin^2 \varphi_{k-2} \sin^2 \varphi_{k-1})} } {(2\pi)^{k / 2}}\\ &\quad\quad\quad r^{k-1} \sin(\varphi_1)^{k-2} \sin(\varphi_2)^{k-3} \dots \sin(\varphi_{k-2}) \ \d r\ \d \varphi_1 \dots \d \varphi_k \\ &= \int_0^{2\pi} \underbrace{\int_0^\pi \dots \int_0^\pi}_{k-2} \int_0^\sqrt{y} \frac{e^{-\frac 1 2 r^2}} {(2\pi)^{k / 2}} r^{k-1} \sin(\varphi_1)^{k-2} \sin(\varphi_2)^{k-3} \dots \sin(\varphi_{k-2}) \ \d r\ \d \varphi_1 \dots \d \varphi_k \\ &= \int_0^\sqrt{y} \underbrace{ \int_0^{2\pi} \underbrace{\int_0^\pi \dots \int_0^\pi}_{k-2} \frac{1} {(2\pi)^{k / 2}} \sin(\varphi_1)^{k-2} \sin(\varphi_2)^{k-3} \dots \sin(\varphi_{k-2})\ \d \varphi_1 \dots \d \varphi_k}_{c_k} e^{-\frac 1 2 r^2} r^{k-1} \d r \\ &= c_k \int_0^\sqrt{y} e^{-\frac 1 2 r^2} r^{k-1} \d r \\ \end{aligned} \]
其中\(c_k\)是和\(k\)相关的常数项,并且由于\(P_Y(Y \le \infty) = 1\),有 \[ \begin{aligned} 1 &= c_k \int_0^\infty e^{-\frac 1 2 r^2} r^{k-1} \d r \\ &\Downarrow_{r = \sqrt{2t}} \\ 1 &= c_k \int_0^\infty e^{-t} \sqrt{2t}^{k-1} \frac{1}{\sqrt{2t}} \d t \\ 1 &= 2^{(k-2)/2} c_k \int_0^\infty e^{-t} t^{(k-2)/2} \d t \\ 1 &= 2^{(k-2)/2} c_k \Gamma(\frac{k}{2}) \\ c_k &= \frac{1} {2^{(k-2)/2} \Gamma(\frac{k}{2})} \end{aligned} \]
故可得密度函数: \[ \begin{aligned} &p_Y(y) = \frac{\d P_Y(Y \le y)}{\d y} \\ &= \frac{\d [c_k \int_0^\sqrt{y} e^{-\frac 1 2 r^2} r^{k-1} \d r]}{\d y} \\ &= \frac{1} {2^{(k-2)/2} \Gamma(\frac{k}{2})} e^{-\frac y 2} y^\frac{k-1}{2} \frac{1}{2\sqrt y} \\ &= \frac{1} {2^{\frac k 2} \Gamma(\frac{k}{2})} e^{-\frac y 2} y^{\frac{k}{2} - 1} \end{aligned} \]
最终可得密度函数如下: \[ p_Y(y) = \begin{cases} \frac 1 {2^\frac{n}{2} \Gamma(\frac n 2)} e^{-\frac y 2} y^{\frac n 2 - 1}, &y > 0 \\ 0, & y \le 0 \end{cases} \] 另外,该密度函数也可以通过数学归纳法验证。
参考
Chi Squared Distribution || Generating Function of Chi Squared Distribution || 正向推导 || 数学归纳法
\(t\)分布
设随机变量\(X\)和\(Y\)相互独立,且\(X \sim N(0,1), Y \sim \chi^2(n)\),则称\(Z = \frac{X}{\sqrt{Y/n}}\)服从自由度为\(n\)的\(t\)分布,记为\(Z \sim t(n)\)。其密度函数为: \[ p_Z(z) = \frac{\Gamma((n+1)/2)} {\sqrt{n\pi} \Gamma(n/2)} \big( 1 + \frac{z^2} n \big)^{-(n+1)/2} \]
\(F\)分布
设随机变量\(X\)和\(Y\)相互独立,且\(X \sim \chi^2(m), Y \sim \chi^2(n)\),则称\(Z = \frac{X/m}{Y/n}\)服从自由度为\((m,n)\)的\(F\)分布,记为\(Z \sim F(m,n)\)。其密度函数为: \[ p_Z(z) = \begin{cases} \frac{\Gamma((m+n)/2} {\Gamma(m/2) \Gamma(n/2)} {m \choose n}^{\frac m 2} z^{\frac m 2 - 1} (1 + \frac m n z)^{-\frac{m+n}{2}}, &z > 0 \\ 0, &\text{otherwise} \end{cases} \]
正态总体的抽样分布
设\(X_1, \dots, X_n\)是抽自正态总体\(N(\mu, \sigma^2)\)的一组样本,样本均值\(\bar X = \frac 1 n \sum_{i=1}^n X_i\),样本方差\(S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2\),则 \[ \begin{gather} \bar X \sim N(\mu, \frac{\sigma^2}{n}) \\ X_i - \bar X \sim N(0, \frac{2(n-1)^2 \sigma^2}{n^2}) \\ \frac{\sum_{i=1}^n (X_i - \bar X)^2}{\sigma^2} \sim \chi^2(n - 1),\text 即 \frac{(n-1) S^2}{\sigma^2} = \frac{n S_n^2}{\sigma^2} \sim \chi^2(n-1) \\ \text{$\bar X$与$S^2$相互独立,$\bar X$与$S_n^2$相互独立} \label{independence} \\ \frac{(\bar X - \mu)\sqrt{n}}{S} \sim t(n-1) \\ \end{gather} \]
实际上\(\eqref{independence}\)与“总体为正态分布”互为充要条件。
两个独立正态总体的抽样分布
设\(X_1, \dots, X_{n_X}\)是抽自正态总体\(N(\mu_X, \sigma_X^2)\)的一组样本,样本均值\(\bar X = \frac 1 {n_X} \sum_{i=1}^{n_X} X_i\),样本方差\(S_X^2 = \frac{1}{n_X-1} \sum_{i=1}^{n_X} (X_i - \bar X)^2\);\(Y_1, \dots, Y_{n_Y}\)是抽自正态总体\(N(\mu_Y, \sigma_Y^2)\)的一组样本,样本均值\(\bar Y = \frac 1 {n_Y} \sum_{i=1}^{n_Y} Y_i\),样本方差\(S_Y^2 = \frac{1}{n_Y-1} \sum_{i=1}^{n_Y} (Y_i - \bar Y)^2\);且两个正态总体相互独立,令\(S_w = \frac{1}{n_X + n_Y - 2} (\sum_{i=1}^{n_X} (X_i - \bar X)^2 + \sum_{i=0}^{n_Y} (Y_i - \bar Y)^2)\),则 \[ \begin{gather} \bar X - \bar Y \sim N(\mu_X - \mu_Y, \frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y}) \\ X_i - Y_j \sim N(\mu_X - \mu_Y, \sigma_X^2 + \sigma_Y^2) \\ \frac{(n_X - 1) S_X^2}{\sigma_X^2} + \frac{(n_Y - 1) S_Y^2}{\sigma_Y^2} \sim \chi^2(n_X + n_Y - 2) \\ \frac{S_X^2 / \sigma_X^2}{S_Y^2 / \sigma_Y^2} = \frac{S_X^2 / S_Y^2}{\sigma_X^2 / \sigma_Y^2} \sim F(n_X - 1, n_Y - 1) \\ \text{当$\sigma_X^2 = \sigma_Y^2 = \sigma^2$时,} \frac{\bar X - \bar Y - (\mu_1 - \mu_2)}{S_w \sqrt{\frac{1}{n_X} + \frac{1}{n_Y}}} \sim t(n_X + n_Y - 2) \end{gather} \]
分布计算器
Normal Distribution Applet/Calculator (uiowa.edu)
Chi-Square Distribution Applet/Calculator (uiowa.edu)