---
title: "研究间异质性"
---
Between-Study Heterogeneity $\zeta_k$
## Cochran's Q
Cochran's Q定义为每一项研究与总体效应的偏差**加权平方和** (weighted sum of squares, WSS)。
$$
Q=\sum_{k=1}^K \omega_k(\hat \theta_k - \hat\theta)^2
$$
当研究之间没有异质性即$\zeta_k =0$ 仅有抽样误差,假设
$$
\hat\theta_k-\hat\theta\sim N(0,1)
$$
```{r}
set.seed (123 ) # needed to reproduce results
rnorm (n = 40 , mean = 0 , sd = 1 )
```
```{r}
set.seed (123 )
error_fixed <- replicate (n = 10000 , rnorm (40 ))
hist (error_fixed,
xlab = expression (hat (theta[k])~- ~ hat (theta)), prob = TRUE ,
breaks = 100 , ylim = c (0 , .45 ), xlim = c (- 4 ,4 ),
main = "No Heterogeneity" )
lines (seq (- 4 , 4 , 0.01 ), dnorm (seq (- 4 , 4 , 0.01 )),
col = "blue" , lwd = 2 )
```
假设研究之间存在异质性 ($\zeta_k$)和抽样误差($\epsilon_k$)
```{r}
set.seed (123 )
error_random <- replicate (n = 10000 , rnorm (40 ) + rnorm (40 ))
hist (error_random,
xlab = expression (hat (theta[k])~- ~ hat (theta)), prob = TRUE ,
breaks = 100 ,ylim = c (0 , .45 ), xlim = c (- 4 ,4 ),
main = "Heterogeneity" )
lines (seq (- 4 , 4 , 0.01 ), dnorm (seq (- 4 , 4 , 0.01 )),
col = "blue" , lwd = 2 )
```
计算Q,假设加权=1
```{r}
set.seed (123 )
Q_fixed <- replicate (10000 , sum (rnorm (40 )^ 2 ))
Q_random <- replicate (10000 , sum ((rnorm (40 ) + rnorm (40 ))^ 2 ))
```
自由度为K−1的卡方分布(其中K是我们荟萃分析中的研究数量)
```{r}
df <- 40-1
hist (Q_fixed, xlab = expression (italic ("Q" )), prob = TRUE ,
breaks = 100 , ylim = c (0 , .06 ),xlim = c (0 ,160 ),
main = "No Heterogeneity" )
lines (seq (0 , 100 , 0.01 ), dchisq (seq (0 , 100 , 0.01 ), df = df),
col = "blue" , lwd = 2 )
hist (Q_random, xlab = expression (italic ("Q" )), prob = TRUE ,
breaks = 100 , ylim = c (0 , .06 ), xlim = c (0 ,160 ),
main = "Heterogeneity" )
lines (seq (0 , 100 , 0.01 ), dchisq (seq (0 , 100 , 0.01 ), df = df),
col = "blue" , lwd = 2 )
```
## **Higgins & Thompson’s** $I^2$Statistic
$I^2$ 是另一种估计异质性的方法,基于Cochran's Q。 它被定义为不是由抽样误差引起的效应量的变异百分比
$I^2$ 基于以下假设:在无异质性的零假设下,Q遵循具有K-1自由度的χ2分布。它以百分比的形式量化了在没有异质性(即K-1)的情况下,观察到的Q值超过预期Q值的程度。
$$
I^2=\frac{Q-(K-1)}{Q}
$$
I2的值不能低于0%,因此如果Q恰好小于K-1,我们只需使用0而不是负值
```{r}
# Display the value of the 10th simulation of Q
Q_fixed[10 ]
# Define k
k <- 40
# Calculate I^2
(Q_fixed[10 ] - (k-1 ))/ Q_fixed[10 ]
```
$I^2$ = 0% 说明效应量的0%是由研究间的异质性引起的
```{r}
(Q_random[10 ] - (k-1 ))/ Q_random[10 ]
```
大约一半的差异是由于研究之间的异质性造成的。这也符合我们的预期,因为此示例中的变异同等程度地基于模拟的抽样误差和研究之间的异质性。
- I2 = 25%: low heterogeneity
- I2 = 50%: moderate heterogeneity
- I2 = 75%: substantial heterogeneity.
## $H^2$ statistic
它描述了由Q测量的观察到的变化与由于采样误差引起的预期变化的比率
$$
H^2=\frac{Q}{K-1}
$$
当研究之间没有异质性时,H2等于1(或更小)。大于1的值表示研究之间存在异质性。
## **Heterogeneity Variance** $\tau^2$& Standard Deviation $\tau$
```{r}
library (meta)
m.gen <- metagen (TE = TE,
seTE = seTE,
studlab = Author,
data = dmetar:: ThirdWave,
sm = "SMD" ,
fixed = FALSE ,
random = TRUE ,
method.tau = "REML" ,
method.random.ci = "HK" ,
title = "Third Wave Psychotherapies" )
```
```{r}
# Pooled effect
m.gen$ TE.random
# Estimate of tau
m.gen$ tau
```
我们看到了g=0.58 和τ=0.29. 根据这些数据,我们可以计算出 95% 真实效应量置信区间的下限和上限:0.58−1.96×0.29 = 0.01 和 0.58 1.96+×0.29 = 1.15。
## Prediction intervals (PIs)
95% 预测区间的公式如下所示:
 {fig-align="center"}
```{r}
m.gen <- update (m.gen, prediction = TRUE )
summary (m.gen)
```
在合并效应的直接作用下,我们看到了预测区间。其范围为g=-0.06至1.21。这意味着,根据目前的证据,未来的一些研究可能会发现负面的治疗效果。然而,间隔相当宽,这意味着也可能产生非常高的效果。
## **异常值**
### 删除
```{r}
dmetar:: find.outliers (m.gen)
```
### 影响分析
```{r}
m.gen.inf <- dmetar:: InfluenceAnalysis (m.gen, random = TRUE )
```
```{r}
plot (m.gen.inf, "baujat" )
```
```{r}
plot (m.gen.inf, "influence" )
```
```{r}
plot (m.gen.inf, "es" )
plot (m.gen.inf, "i2" )
```
### **GOSH Plot Analysis**
Graphic Display of Heterogeneity (GOSH)
```{r}
library (metafor)
m.rma <- rma (yi = m.gen$ TE,
sei = m.gen$ seTE,
method = m.gen$ method.tau,
test = "knha" )
```
```{r eval=FALSE}
res.gosh <- gosh(m.rma)
save(res.gosh,file = "data/res.gosh.RData")
```
```{r}
load ("data/res.gosh.RData" )
plot (res.gosh, alpha = 0.01 )
```
GOSH 诊断图
```{r}
#' @usage gosh.diagnostics(data, km = TRUE, db = TRUE, gmm = TRUE,
#' km.params = list(centers = 3,
#' iter.max = 10, nstart = 1,
#' algorithm = c("Hartigan-Wong",
#' "Lloyd", "Forgy","MacQueen"),
#' trace = FALSE),
#' db.params = list(eps = 0.15, MinPts = 5,
#' method = c("hybrid", "raw", "dist")),
#' gmm.params = list(G = NULL, modelNames = NULL,
#' prior = NULL, control = emControl(),
#' initialization = list(hcPairs = NULL,
#' subset = NULL,
#' noise = NULL),
#' Vinv = NULL,
#' warn = mclust.options("warn"),
#' x = NULL, verbose = FALSE),
#' seed = 123,
#' verbose = TRUE)
```
```{r eval=FALSE}
res.gosh.diag <- dmetar::gosh.diagnostics(res.gosh,
km.params = list(centers = 2),
db.params = list(eps = 0.08,
MinPts = 50))
```
```{r}
data ("m.gosh" ,package = "dmetar" )
res.diag <- dmetar:: gosh.diagnostics (m.gosh)
res.diag
plot (res.diag)
```
```{r}
update (m.gen, exclude = c (3 , 4 , 16 )) %>%
summary ()
```