2 定性数据的统计描述

Modified

November 20, 2024

2.1 率

率（rate）表示在一定空间或时间范围内某现象的发生数与可能发生的总数之比，说明某现象出现的频率。

标准化率（standardized rate）

2.2 构成比

构成比（proportion）

2.3 相对比

相对比（relative ratio）是A和B两个有关联指标值之比。

2.3.1 RR

Using R for Biomedical Statistics

相对危险度（Relative Risk，RR），是指暴露组人群的发病率与非暴露组人群的发病率之比。RR 用于反映暴露因素与结局事件的关联程度，其取值范围为 0 到无穷大。数值为 1 时，表明暴露因素与结局事件无关联；小于 1 时，表明暴露因素导致结局事件的发生率降低；大于 1 时，表明暴露因素导致结局事件的发生率增加。相对风险适用于前瞻性队列研究。

Show the code

x <- matrix(c(156,9421,1531,14797),nrow=2,byrow=TRUE,
             dimnames = list(c("Exposed","Unexposed"),c("Disease","Control")))

x
#>           Disease Control
#> Exposed       156    9421
#> Unexposed    1531   14797
# RR
156/(156+9421)*(1531+14797)/1531
#> [1] 0.1737212
source("function/calcRelativeRisk.R")
calcRelativeRisk(x,alpha=0.05)
#> [1] "category = Exposed , relative risk =  0.173721236521721"
#> [1] "category = Exposed ,  95 % confidence interval = [ 0.147624440337197 , 0.204431379720742 ]"

# OR
156/9421/(1531/14797)
#> [1] 0.1600391
source("function/calcOddsRatio.R")
calcOddsRatio(x,alpha = 0.05)
#> [1] "category = Exposed , odds ratio =  0.160039091621751"
#> [1] "category = Exposed ,  95 % confidence interval = [ 0.135460641900536 , 0.189077140693912 ]"

2.3.2 OR

优势比（Odds Ratio，OR），是指暴露组中病例与非病例人数的比值除以非暴露组中病例与非病例人数的比值。　　OR 的取值范围也为 0 到无穷大。如果 OR 值大于 1 ，说明该暴露因素更容易导致结果事件发生，或者说该因素是一个危险因素；小于 1 ，则说明该暴露因素更不容易导致结果事件发生，或者说该因素是一个保护因素。比值比适用于队列研究和病例对照研究。

Show the code

y <- matrix(c(30,24,76,241,82,509),nrow=3,byrow=TRUE,
            dimnames = list(c("Exposure1","Exposure2","Unexposed"),
                            c("Disease","Control")))
y
#>           Disease Control
#> Exposure1      30      24
#> Exposure2      76     241
#> Unexposed      82     509
calcOddsRatio(y, referencerow=3)
#> [1] "category = Exposure1 , odds ratio =  7.75914634146342"
#> [1] "category = Exposure1 ,  95 % confidence interval = [ 4.32163714854064 , 13.9309131884372 ]"
#> [1] "category = Exposure2 , odds ratio =  1.95749418075094"
#> [1] "category = Exposure2 ,  95 % confidence interval = [ 1.38263094540732 , 2.77137111707344 ]"
calcRelativeRisk(y, referencerow=3)
#> [1] "category = Exposure1 , relative risk =  4.00406504065041"
#> [1] "category = Exposure1 ,  95 % confidence interval = [ 2.93130744422409 , 5.46941498113737 ]"
#> [1] "category = Exposure2 , relative risk =  1.72793721628068"
#> [1] "category = Exposure2 ,  95 % confidence interval = [ 1.30507489771431 , 2.2878127750653 ]"

2.4 列联表

Show the code

eg <- matrix(c(156,9421,1531,14797),nrow=2,byrow=TRUE)
colnames(eg) <- c("Disease","Control")
rownames(eg) <- c("Exposed","Unexposed")
print(eg)
#>           Disease Control
#> Exposed       156    9421
#> Unexposed    1531   14797
prop.table(eg)          #各单元格比例
#>               Disease   Control
#> Exposed   0.006022003 0.3636750
#> Unexposed 0.059100560 0.5712025
prop.table(eg,margin = 1)        #行比例和=1
#>              Disease   Control
#> Exposed   0.01628903 0.9837110
#> Unexposed 0.09376531 0.9062347

2.5 边际列联表

Show the code

# 边际
margin.table(x=eg,margin = 2)      #列和
#> Disease Control 
#>    1687   24218
addmargins(eg)          #添加行和、列和
#>           Disease Control   Sum
#> Exposed       156    9421  9577
#> Unexposed    1531   14797 16328
#> Sum          1687   24218 25905
addmargins(eg,1)        #添加列和
#>           Disease Control
#> Exposed       156    9421
#> Unexposed    1531   14797
#> Sum          1687   24218
addmargins(eg,2)        #添加行和
#>           Disease Control   Sum
#> Exposed       156    9421  9577
#> Unexposed    1531   14797 16328
addmargins(prop.table(eg,1))
#>              Disease   Control Sum
#> Exposed   0.01628903 0.9837110   1
#> Unexposed 0.09376531 0.9062347   1
#> Sum       0.11005434 1.8899457   2

ftable(eg)   # "平铺式"列联表
#>            Disease Control
#>                           
#> Exposed        156    9421
#> Unexposed     1531   14797

2.6 关联度量

2.6.1 卡方系数

Show the code

chisq.test(eg)
#> 
#>  Pearson's Chi-squared test with Yates' continuity correction
#> 
#> data:  eg
#> X-squared = 593.88, df = 1, p-value < 2.2e-16

2.6.2 列联系数

Show the code

Contingency <- function(x) {
    chi <- chisq.test(x)
    unname(sqrt(chi$statistic / (chi$statistic + sum(x))))
}
Contingency(eg)
#> [1] 0.1497052

Show the code

library(DescTools)

ContCoef(eg)
#> [1] 0.1498618

2.6.3 Phi and Cramer’s V 系数

Show the code

# Phi coefficient
PhiCoef <- function(x){
    unname(sqrt(chisq.test(x)$statistic / sum(x)))
}

# Cramer's V coefficient
V <- function(x) {
    unname(sqrt(chisq.test(x)$statistic / (sum(x) * (min(dim(x)) - 1))))
}

Show the code

PhiCoef(eg) 
#> [1] 0.1514115
V(eg) 
#> [1] 0.1514115

Show the code

library(DescTools)
Phi(eg)
#> [1] 0.1515735
CramerV(eg)
#> [1] 0.1515735

Show the code

library(vcd)

assocstats(eg)
#>                     X^2 df P(> X^2)
#> Likelihood Ratio 722.30  1        0
#> Pearson          595.16  1        0
#> 
#> Phi-Coefficient   : 0.152 
#> Contingency Coeff.: 0.15 
#> Cramer's V        : 0.152

2.7 马赛克图

Show the code

mosaicplot(eg)