12 比例检验

12.1 单总体

如果样本比较小，则使用二项分布进行统计. 在R中，对于小样本，采用 binom.test()，对于大样本使用正态分布近似二项分布，利用 prop.test()进行分析。在单样本比例检验中，我们关心的是具有同种特性的两个群体，在该特性总体中所占有的比例情况。

\[ Z=\frac{p-\pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}\sim N(0,1) \]

CI：\(\bar p \pm z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}}\)

对于小样本，可以连续校正

\[ Z_{corr}=\frac{|p-\pi_0|-1/(2n)}{\sqrt{\pi_0(1-\pi_0)/n}}\sim N(0,1) \]

例如，小鼠中公鼠母鼠各有一半，有100只患有某种疾病，其中有公鼠60只，母鼠40只。想知道是否公鼠患病率比母鼠高。在该问题中成功次数为公鼠患病数60，总次数为100，预期比例为50% ( 公母鼠数量相等)

Code

p <- 0.6
pi_0 <- 0.5
n <- 100
Z <- (p-pi_0)/sqrt(pi_0*(1-pi_0)/n)
Z_corr <- (p-pi_0-1/(2*n))/sqrt(pi_0*(1-pi_0)/n)

Code

# 示例数据
successes <- 60
total <- 100
p0 <- 0.5

# 使用 prop.test() 函数进行比例检验
prop.test(
    x = successes,
    n = total,
    p = p0,
    alternative = "greater",
    correct = F
)
#> 
#>  1-sample proportions test without continuity correction
#> 
#> data:  successes out of total, null probability p0
#> X-squared = 4, df = 1, p-value = 0.02275
#> alternative hypothesis: true p is greater than 0.5
#> 95 percent confidence interval:
#>  0.5178095 1.0000000
#> sample estimates:
#>   p 
#> 0.6

Code

prop.test(
    x = successes,
    n = total,
    p = p0,
    alternative = "greater",
    correct = T
)
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  successes out of total, null probability p0
#> X-squared = 3.61, df = 1, p-value = 0.02872
#> alternative hypothesis: true p is greater than 0.5
#> 95 percent confidence interval:
#>  0.5127842 1.0000000
#> sample estimates:
#>   p 
#> 0.6

12.2 两总体

当样本量较小时(所有np和n(1-p)都小于5)，通常采用非参数检验 Fisher Exact probability test 进行分析。当样本量较大时，使用近似正态分布z检验来进行预测。

当\(n_ip_i(1-p_i)≥5,(i=1,2)\)时，\(p_i\dot\sim N(\pi_i,\frac{\pi_i(1-\pi_i)}{n_i})\)

\[ (p_1-p_2)\dot\sim N(\pi_1-\pi_2,\frac{\pi_1(1-\pi_1)}{n_1}+\frac{\pi_2(1-\pi_2)}{n_2}) \]

\(H_0:\pi_1=\pi_2\)

\[ Z=\frac{(p_1-p_2)-(\pi_1-\pi_2)}{S_{p_1-p_2}}=\frac{p_1-p_2}{\sqrt{p_C(1-p_C)(\frac{1}{n_1}+\frac{1}{n_2})}}\dot\sim N(0,1) \]

其中,合并比例\(p_C=\frac{n_1p_1+n_2p_2}{n_1+n_2}\) 。

如果我们已知两组具有不同特性(A和B)样本的样本量和这两样本中具有某种共同特性(C)的个体数量(也就是知道了C特性各自群体比例和总体比例)，想要计算具有C特性的个体在A特性群体和B特性群体中的比例是否一样，就需要用到双比例检验。

例如，男生500人，女生500人，其中喜欢阅读的男生有400人，喜欢阅读的女生有460人。男生喜欢阅读的比例是否比女生高。我们假设男生喜欢阅读的比例比女生高，则备择假设是男生喜欢阅读的比例比女生低。

Code

# 示例数据
successes1 <- 400
total1 <- 500
successes2 <- 460
total2 <- 500

# 使用 prop.test() 函数进行两总体比例检验
prop.test(x = c(successes1, successes2), n = c(total1, total2), alternative = "less")
#> 
#>  2-sample test for equality of proportions with continuity correction
#> 
#> data:  c(successes1, successes2) out of c(total1, total2)
#> X-squared = 28.912, df = 1, p-value = 3.787e-08
#> alternative hypothesis: less
#> 95 percent confidence interval:
#>  -1.0000000 -0.0824468
#> sample estimates:
#> prop 1 prop 2 
#>   0.80   0.92

功效分析

\(1-\beta\)
\(n1,n_2=kn_1\)