Code
Code
ggplot(data = mtcars, aes(x = mpg)) +
geom_histogram(color = "black", binwidth = diff(range(mtcars$mpg)) / 9)
极差(Range) : \(R=X_{max}-X_{min}\)
组数 (Number of Bins) \(k\) : 通常选择 \(8\) 到 \(15\) 之间的值。
组距 (Bin Width) : \(interval=\frac{R}{k}\)
频数 (Frequency) : \(Frequency = count\)
频率 (Relative Frequency): \(Relative\ Frequency = \frac{count}{n} \times 100\%\)
总体方差除以 n
样本方差除以(n-1),R 计算的是样本方差
注意:函数rstatix::get_mode()
可能返回多个众数,如果存在多个众数,请检查其处理方式。
# 值域
range(mtcars$mpg)
#> [1] 10.4 33.9
# 极差 or 全距
diff(range(mtcars$mpg) )
#> [1] 23.5
# 分位数
quantile(mtcars$mpg,probs = c(0,0.1,0.25,0.5,0.75,1))
#> 0% 10% 25% 50% 75% 100%
#> 10.400 14.340 15.425 19.200 22.800 33.900
# 四分位数间距
IQR(mtcars$mpg)
#> [1] 7.375
# 方差 variance
var(mtcars$mpg)
#> [1] 36.3241
# 标准差 standard deviation
sd(mtcars$mpg)
#> [1] 6.026948
# 变异系数 Coefficient of Variation
CV <- function(x, na.rm = TRUE) {
if (na.rm) x <- x[!is.na(x)]
CV = sd(x) / mean(x) * 100
sprintf("%.6f%%", CV)
}
CV(mtcars$mpg)
#> [1] "29.998808%"
# 绝对中位差 median absolute deviation
mad(mtcars$mpg,constant = 1.4826)
#> [1] 5.41149
median(abs(mtcars$mpg-median(mtcars$mpg)))
#> [1] 3.65
median(abs(mtcars$mpg-median(mtcars$mpg)))*1.4826
#> [1] 5.41149
说明:mad()
计算时乘以比例因子 constant = 1.4826
以实现渐进正态一致性。
表示随机变量概率分布的不对称性。
https://www.macroption.com/skewness-formula/
三阶中心矩。二阶中心矩即方差。
\[ Population\ Skewness (X) = \frac{E(X_i-E(X))^3}{Var(X)^{\frac{3}{2}}} =E [(\frac{X_i-\mu}{\sigma})^3]= \frac{1}{n} \sum_{i=1}^{n} (\frac{X_i-\mu}{\sigma} )^3 \]
偏度的取值范围: \((-\infty,+\infty)\)
Skew>0,正偏态分布,右偏 = 尾部向右延伸。Mode < Median < Mean;
Skew=0,数据相对均匀的分布在均值两侧;
Skew<0,负偏态分布,左偏 = 尾部向左延伸;Mode > Median > Mean。
x <- c(1,2,3,5,6,10)
skewness <- function(x,na.rm=TRUE){
if(na.rm) x <- x[!is.na(x)]
n=length(x)
μ=mean(x)
SD=sd(x)
return(c(population_sknewness = mean(((x-μ)/SD)^3),
sample_sknewness = sum(((x-μ)/SD)^3)*n/(n-1)/(n-2)))
}
skewness(x)
#> population_sknewness sample_sknewness
#> 0.5142767 0.9256980
e1071::skewness(x,type = 2) # 样本偏度
#> [1] 0.925698
e1071::skewness(x,type = 3) # 总体偏度
#> [1] 0.5142767
e1071::skewness(x,type = 1) # 无偏偏度
#> [1] 0.6760343
moments::skewness(x)
#> [1] 0.6760343
\[ Sample\ Skewness(X) = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left [\frac{X_i-\bar X}{S} \right ]^3 \]
表示随机变量概率分布的尖峭程度。四阶中心矩与方差平方的比值。
https://www.macroption.com/kurtosis-formula/
超额峰度 excess kurtosis :四阶中心矩与方差平方的比值减3。
https://www.macroption.com/excess-kurtosis/
\[ Population\ Kurtosis(X) = \frac{E(X_i-E(X))^4}{Var(X)^{2}}-3= E [(\frac{X_i-\mu}{\sigma})^4] - 3= \frac{1}{n} \sum_{i=1}^{n} (\frac{X_i-\mu}{\sigma} )^4-3 \]
超额峰度的取值范围:\([-2,+\infty)\)
超额峰度<0,数据分布与正态分布相比较为扁平;
超额峰度=0,正态分布;
超额峰度>0,数据分布与正态分布相比较为高尖。
kurtosis<-function(x,na.rm=TRUE){
if(na.rm) x<-x[!is.na(x)]
n=length(x)
μ=mean(x)
SD=sd(x)
return(c(population_kurtosis= mean(((x-μ)/SD)^4)-3,
sample_kurtosis = sum(((x-μ)/SD)^4)*n*(n+1)/(n-1)/(n-2)/(n-3)-3*(n-1)^2/(n-2)/(n-3)))
}
kurtosis(x)
#> population_kurtosis sample_kurtosis
#> -1.377770 0.563368
e1071::kurtosis(x,type = 3)# 默认
#> [1] -1.37777
e1071::kurtosis(x,type = 2)
#> [1] 0.563368
\[ Sample \ Kurtosis(X) = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n} \left [\frac{X_i-\bar X}{S} \right]^4-\frac{3(n-1)^2}{(n-2)(n-3)} \]
summary(mtcars$mpg)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 10.40 15.43 19.20 20.09 22.80 33.90
rstatix::get_summary_stats(mtcars,mpg,type = "full")
variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
---|---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 32 | 10.4 | 33.9 | 19.2 | 15.425 | 22.8 | 7.375 | 5.411 | 20.091 | 6.027 | 1.065 | 2.173 |
psych::describeBy(mtcars$mpg,group =NULL)
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 32 | 20.09062 | 6.026948 | 19.2 | 19.69615 | 5.41149 | 10.4 | 33.9 | 23.5 | 0.610655 | -0.372766 | 1.065424 |