2018年McInnes提出了算法,UMAP(Uniform Manifold Approximation and Projection for Dimension Reduction,一致的流形逼近和投影以进行降维)。 一致的流形近似和投影(UMAP)是一种降维技术,类似于t-SNE,可用于可视化,但也可用于一般的非线性降维。 该算法基于关于数据的三个假设:
library(uwot)head(iris)#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species#> 1 5.1 3.5 1.4 0.2 setosa#> 2 4.9 3.0 1.4 0.2 setosa#> 3 4.7 3.2 1.3 0.2 setosa#> 4 4.6 3.1 1.5 0.2 setosa#> 5 5.0 3.6 1.4 0.2 setosa#> 6 5.4 3.9 1.7 0.4 setosacolors=rainbow(length(unique(iris$Species)))names(colors)=unique(iris$Species)# umap2 is a version of the umap() function with better defaultsiris_umap2<-umap2(iris[1:4])|>as_tibble()#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if#> `.name_repair` is omitted as of tibble 2.0.0.#> ℹ Using compatibility `.name_repair`.ggplot(iris_umap2,aes(V1,V2))+geom_text(aes(label=iris$Species),color=colors[iris$Species])
Code
# but you can still use the umap function (which most of the existing # documentation does)iris_umap<-umap(iris[1:4])|>as_tibble()ggplot(iris_umap,aes(V1,V2))+geom_text(aes(label=iris$Species),color=colors[iris$Species])
Code
library(uwot)set.seed(42)# 为了结果可重复uwot_result<-umap(iris[1:4])# 将结果转换为数据框uwot_df<-as.data.frame(uwot_result)colnames(uwot_df)<-c("UMAP1", "UMAP2")uwot_df$Species<-iris$Species# 可视化ggplot(uwot_df, aes(x =UMAP1, y =UMAP2, color =Species))+geom_point(size =2)+labs(title ="UMAP of Iris Dataset (uwot)", x ="UMAP Dimension 1", y ="UMAP Dimension 2")+theme_minimal()
# UMAP[Uniform Manifold Approximation and Projection (UMAP)](https://umap-learn.readthedocs.io/en/latest/){.uri}<https://github.com/jlmelville/uwot><https://github.com/tkonopka/umap>2018年McInnes提出了算法,UMAP(Uniform Manifold Approximation and Projection for Dimension Reduction,一致的流形逼近和投影以进行降维)。 一致的流形近似和投影(UMAP)是一种降维技术,类似于t-SNE,可用于可视化,但也可用于一般的非线性降维。 该算法基于关于数据的三个假设:- 数据均匀分布在黎曼流形上(Riemannian manifold);- 黎曼度量是局部恒定的(或可以这样近似);- 流形是局部连接的。<https://jlmelville.github.io/uwot/index.html>```{r}library(uwot)head(iris)colors =rainbow(length(unique(iris$Species)))names(colors) =unique(iris$Species)# umap2 is a version of the umap() function with better defaultsiris_umap2 <-umap2(iris[1:4]) |>as_tibble()ggplot(iris_umap2,aes(V1,V2))+geom_text(aes(label=iris$Species),color=colors[iris$Species])``````{r}# but you can still use the umap function (which most of the existing # documentation does)iris_umap <-umap(iris[1:4]) |>as_tibble()ggplot(iris_umap,aes(V1,V2))+geom_text(aes(label=iris$Species),color=colors[iris$Species])``````{r}library(uwot)set.seed(42) # 为了结果可重复uwot_result <-umap(iris[1:4])# 将结果转换为数据框uwot_df <-as.data.frame(uwot_result)colnames(uwot_df) <-c("UMAP1", "UMAP2")uwot_df$Species <- iris$Species# 可视化ggplot(uwot_df, aes(x = UMAP1, y = UMAP2, color = Species)) +geom_point(size =2) +labs(title ="UMAP of Iris Dataset (uwot)",x ="UMAP Dimension 1",y ="UMAP Dimension 2") +theme_minimal()```