A bunch of R functions related to spectral clustering.
The SpecClustPack package can be installed in R directly from GitHub by using devtools.
library(devtools)
install_github("norbertbin/SpecClustPack")
blockPMat = matrix(c(.6,.2,.2,.6), nrow=2)
nMembers = c(5,5)
adjMat = simSBM(blockPMat, nMembers)
adjMat
## 10 x 10 sparse Matrix of class "dsCMatrix"
##
## [1,] . . 1 1 1 . . . . 1
## [2,] . . 1 1 1 . . 1 . .
## [3,] 1 1 . . . . . . . .
## [4,] 1 1 . . 1 . . . 1 .
## [5,] 1 1 . 1 . . . . . .
## [6,] . . . . . . 1 1 . 1
## [7,] . . . . . 1 . 1 . 1
## [8,] . 1 . . . 1 1 . . 1
## [9,] . . . 1 . . . . . .
## [10,] 1 . . . . 1 1 1 . .
plotSBM(blockPMat, nMembers)
plotAdj(adjMat)
By default, the specClust function uses regularized spectral clustering (Qin and Rohe, 2013) with row normalization, but can be adjusted by changing the method and rowNorm parameters.
(clusters = specClust(adjMat, nBlocks = 2))
## [1] 1 1 1 1 1 2 2 2 1 2
The function misClustRate computes the proportion of mis-clustered nodes (up to identifiability) given the cluster sizes.
misClustRate(clusters, nMembers)
## [1] 0.1
The function estSBM estimates the block probability matrix given the adjacency matrix and the cluster assignments.
estSBM(adjMat, clusters)
## [,1] [,2]
## [1,] 1.00000000 0.08333333
## [2,] 0.08333333 0.53333333
covProbMat = matrix(c(.8,.2,.2,.8), nrow=2)
nMembers = c(5,5)
covMat = simBernCovar(covProbMat, nMembers)
covMat
## [1,] 1 .
## [2,] 1 1
## [3,] 1 .
## [4,] . .
## [5,] 1 1
## [6,] . .
## [7,] . 1
## [8,] . 1
## [9,] 1 1
##[10,] . 1
The required input for the casc function includes an adjacency matrix, adjMat, a node covariate matrix, covMat, and the number of blocks to be recovered, nBlocks. For more details see the documentation.
casc(adjMat, covMat, nBlocks=2)
## $cluster
## [1] 1 1 1 1 1 2 2 2 2 2
##
## $h
## [1] 0.08101691
##
## $wcss
## [1] 0.1789759
##
## $eigenGap
## [1] 0.06532486
The partSpecClust function only runs an eigendecomposition on the adjacency matrix of the highest degree nodes in the network and uses the Nystrom extension to approximate the full eigenvectors (Belabbas and Wolfe, 2009). The approximate eigenvectors are then used for spectral clustering. The parameter subSampleSize specifies how many of the top degree nodes should be used.
(clusters = partSpecClust(adjMat, nBlocks = 2, subSampleSize = 8))
## [1] 1 1 1 1 1 2 2 2 1 2