First, we analyze a simulated spatial transcriptomics data from ST
platform, which is a Seurat object format. It is noted that the
meta.data
must include spatial coordinates in columns named
“row” (x coordinates) and “col” (y coordinates)! This data is available
here.
This preprocessing includes Log-normalization and feature selection. Here we select highly variable genes for example first. The selected genes’ names are saved in “seu@assays$RNA@var.features”.
For function DR.SC
, users can specify the number of
clusters \(K\) or set K
to
be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified
number of clusters. Then we show the version chosen by MBIC.
After finishing model fitting, we use ajusted rand index
(ARI
) to check the performance of clustering
Next, we show the application of DR-SC in visualization. First, we can visualize the clusters from DR-SC on the spatial coordinates.
We can also visualize the clusters from DR-SC on the two-dimensional tSNE based on the extracted features from DR-SC.
Show the UMAP plot based on the extracted features from DR-SC.
Use MBIC to choose number of clusters:
First, we select the spatilly variable genes using funciton
FindSVGs
.
### Given K
seu <- NormalizeData(seu, verbose=F)
# choose 400 spatially variable features using FindSVGs
seus <- FindSVGs(seu, nfeatures = 400, verbose = F)
seu2 <- DR.SC(seus, q=15, K=4, platform = 'ST', verbose=F)
Using ARI to check the performance of clustering
Show the spatial scatter plot for clusters
Show the tSNE plot based on the extracted features from DR-SC.
Show the UMAP plot based on the extracted features from DR-SC.
Use MBIC to choose number of clusters:
Conduct visualization of marker gene expression. ### Ridge plots Visualize single cell expression distributions in each cluster from Seruat.
dat <- FindAllMarkers(seu2)
suppressPackageStartupMessages(library(dplyr) )
# Find the top 1 marker genes, user can change n to access more marker genes
dat %>%group_by(cluster) %>%
top_n(n = 1, wt = avg_log2FC) -> top
genes <- top$gene
RidgePlot(seu2, features = genes, ncol = 2)
Visualize single cell expression distributions in each cluster
We extract tSNE based on the features from DR-SC and then visualize feature expression in the low-dimensional space
The size of the dot corresponds to the percentage of cells expressing the feature in each cluster. The color represents the average expression level
Single cell heatmap of feature expression
# standard scaling (no regression)
dat %>%group_by(cluster) %>%
top_n(n = 30, wt = avg_log2FC) -> top
### select the marker genes that are also the variable genes.
genes <- intersect(top$gene, VariableFeatures(seu2))
## Change the HVGs to SVGs
# <- topSVGs(seu2, 400)
seu2 <- ScaleData(seu2, verbose = F)
DoHeatmap(subset(seu2, downsample = 500),features = genes, size = 5)