1 t-SNE

t-Distributed Stochastic Neighbor Embedding (t-SNE)是一种降维技术,用于在二维或三维的低维空间中表示高维数据集,从而使其可视化。与其他降维算法(如PCA)相比,t-SNE创建了一个缩小的特征空间,相似的样本由附近的点建模,不相似的样本由高概率的远点建模。

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Code
library(tsne)

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

colors = rainbow(length(unique(iris$Species)))
names(colors) = unique(iris$Species)
ecb = function(x, y) {
    plot(x, t = 'n')
    text(x, labels = iris$Species, col = colors[iris$Species])
}
tsne_iris = tsne(iris[,1:4], epoch_callback = ecb, perplexity=50)
#> sigma summary: Min. : 0.565012665854053 |1st Qu. : 0.681985646004023 |Median : 0.713004330336136 |Mean : 0.716213420895748 |3rd Qu. : 0.74581655363904 |Max. : 0.874979764925049 |
#> Epoch: Iteration #100 error is: 12.0521968962333
#> Epoch: Iteration #200 error is: 0.278293775495887

#> Epoch: Iteration #300 error is: 0.277972566238466

#> Epoch: Iteration #400 error is: 0.277972360425316

#> Epoch: Iteration #500 error is: 0.277972360400364

#> Epoch: Iteration #600 error is: 0.277972360400287

#> Epoch: Iteration #700 error is: 0.277972360400287

#> Epoch: Iteration #800 error is: 0.277972360400287

#> Epoch: Iteration #900 error is: 0.277972360400287

#> Epoch: Iteration #1000 error is: 0.277972360400287

Code

# compare to PCA
dev.new()
pca_iris = princomp(iris[,1:4])$scores[,1:2]
plot(pca_iris, t='n')
text(pca_iris, labels=iris$Species,col=colors[iris$Species])
Back to top