dSVD)In this vignette, we consider approximating a matrix as a product of two low-rank matrices (a.k.a., factor matrices).
Test data is available from toyModel.
library("dcTensor")
X <- dcTensor::toyModel("dSVD")You will see that there are five blocks in the data matrix as follows.
suppressMessages(library("fields"))
image.plot(X, main="Original Data", legend.mar=8)Here, we introduce the ternary regularization to take {-1,0,1} values in \(U\) as below:
\[
X \approx U V' \ \mathrm{s.t.}\ U \in \{-1,0,1\},
\] where \(X\) (\(N \times M\)) is a data matrix, \(U\) (\(N \times J\)) is a ternary score matrix, and \(V\) (\(M \times J\)) is a loading matrix. In dcTensor package, the object function is optimized by combining gradient-descent algorithm (Tsuyuzaki 2020) and ternary regularization.
In STMF, a rank parameter \(J\) (\(\leq \min(N, M)\)) is needed to be set in advance. Other settings such as the number of iterations (num.iter) are also available. For the details of arguments of dSVD, see ?dSVD. After the calculation, various objects are returned by dSVD. STMF is achieved by specifying the ternary regularization parameter as a large value like the below:
set.seed(123456)
out_STMF <- dSVD(X, Ter_U=1E+10, J=5)
str(out_STMF, 2)## List of 6
##  $ U            : num [1:100, 1:5] 0.00592 0.00582 0.00626 0.00641 0.00611 ...
##  $ V            : num [1:300, 1:5] 89.8 94.8 93.6 101 87.6 ...
##  $ RecError     : Named num [1:101] 1.00e-09 4.24e+05 3.67e+05 3.63e+05 3.65e+05 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TrainRecError: Named num [1:101] 1.00e-09 4.24e+05 3.67e+05 3.63e+05 3.65e+05 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ RelChange    : Named num [1:101] 1.00e-09 9.70e-01 1.55e-01 1.25e-02 4.53e-03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...The reconstruction error (RecError) and relative error (RelChange, the amount of change from the reconstruction error in the previous step) can be used to diagnose whether the calculation is converged or not.
layout(t(1:2))
plot(log10(out_STMF$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_STMF$RelChange[-1]), type="b", main="Relative Change")The product of \(U\) and \(V\) shows whether the original data is well-recovered by dSVD.
recX <- out_STMF$U %*% t(out_STMF$V)
layout(t(1:2))
image.plot(X, main="Original Data", legend.mar=8)
image.plot(recX, main="Reconstructed Data (STMF)", legend.mar=8)The histograms of \(U\) and \(V\) show that \(U\) looks ternary but \(V\) does not.
layout(t(1:2))
hist(out_STMF$U, breaks=100)
hist(out_STMF$V, breaks=100)## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Rocky Linux 9.5 (Blue Onyx)
## 
## Matrix products: default
## BLAS:   /opt/R/4.4.3/lib64/R/lib/libRblas.so 
## LAPACK: /opt/R/4.4.3/lib64/R/lib/libRlapack.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Asia/Tokyo
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] nnTensor_1.3.0    fields_16.3.1     viridisLite_0.4.2 spam_2.11-1      
## [5] dcTensor_1.3.1   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_1.8.9     dplyr_1.1.4        compiler_4.4.3    
##  [5] maps_3.4.3         tidyselect_1.2.1   Rcpp_1.1.0         plot3D_1.4.2      
##  [9] tagcloud_0.7.0     jquerylib_0.1.4    scales_1.3.0       yaml_2.3.10       
## [13] fastmap_1.2.0      ggplot2_3.5.1      R6_2.6.1           generics_0.1.3    
## [17] tcltk_4.4.3        knitr_1.50         MASS_7.3-65        dotCall64_1.1-1   
## [21] misc3d_0.9-1       tibble_3.3.0       munsell_0.5.1      pillar_1.10.1     
## [25] bslib_0.9.0        RColorBrewer_1.1-3 rlang_1.1.6        cachem_1.1.0      
## [29] xfun_0.53          sass_0.4.10        cli_3.6.5          magrittr_2.0.3    
## [33] digest_0.6.37      grid_4.4.3         rTensor_1.4.9      lifecycle_1.0.4   
## [37] vctrs_0.6.5        evaluate_1.0.3     glue_1.8.0         colorspace_2.1-1  
## [41] rmarkdown_2.29     pkgconfig_2.0.3    tools_4.4.3        htmltools_0.5.8.1Tsuyuzaki, K. et al. 2020. “Benchmarking Principal Component Analysis for Large-Scale Single-Cell Rna-Sequencing.” BMC Genome Biology 21(1): 9.