t-SNE
t-Distributed Stochastic Neighbor Embedding (t-SNE)是一种降维技术,用于在二维或三维的低维空间中表示高维数据集,从而使其可视化。与其他降维算法(如PCA)相比,t-SNE创建了一个缩小的特征空间,相似的样本由附近的点建模,不相似的样本由高概率的远点建模。
t-Distributed Stochastic Neighbor Embedding (t-SNE)
Code
library(tsne)
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
colors = rainbow(length(unique(iris$Species)))
names(colors) = unique(iris$Species)
ecb = function(x, y) {
plot(x, t = 'n')
text(x, labels = iris$Species, col = colors[iris$Species])
}
tsne_iris = tsne(iris[,1:4], epoch_callback = ecb, perplexity=50)
#> sigma summary: Min. : 0.565012665854053 |1st Qu. : 0.681985646004023 |Median : 0.713004330336136 |Mean : 0.716213420895748 |3rd Qu. : 0.74581655363904 |Max. : 0.874979764925049 |
#> Epoch: Iteration #100 error is: 12.0521968962333
#> Epoch: Iteration #200 error is: 0.278293775495887
#> Epoch: Iteration #300 error is: 0.277972566238466
#> Epoch: Iteration #400 error is: 0.277972360425316
#> Epoch: Iteration #500 error is: 0.277972360400364
#> Epoch: Iteration #600 error is: 0.277972360400287
#> Epoch: Iteration #700 error is: 0.277972360400287
#> Epoch: Iteration #800 error is: 0.277972360400287
#> Epoch: Iteration #900 error is: 0.277972360400287
#> Epoch: Iteration #1000 error is: 0.277972360400287
Code
# compare to PCA
dev.new()
pca_iris = princomp(iris[,1:4])$scores[,1:2]
plot(pca_iris, t='n')
text(pca_iris, labels=iris$Species,col=colors[iris$Species])
Back to top