26/01/2015
This is now officially the OLD version of SC3 and is not maintained anymore. The newest version (developmental) can be found and installed from this GitHub repository:
https://github.com/hemberg-lab/SC3
The current stable version can be installed directly from BioConductor (package name is SC3).
14/01/2015
SC3 manuscript is submitted and also available on bioRxiv: http://biorxiv.org/content/early/2016/01/13/036558
06/01/2015
SC3 is under review at Bioconductor. This GitHub package will not be updated anymore. Once SC3 is released by Bioconudctor I will post a link to it here. All further updates will be via Bioconductor.
25/11/2015
Started writing a manual. Meanwhile, if you have any questions on the usage of SC3 please send your questions to Vladimir Kiselev.
24/11/2015
SC3 (Single-Cell Consensus Clustering) is an interactive tool (implemented as an R package) for the unsupervised clustering of cells from single cell RNA-Seq experiments.
Please keep in mind that this is a developmental version of SC3 and some of the functionality may not work on some systems (currently completely tested on MacOS only). We are in the process of submitting SC3 to Bioconductor. Once it is done we will provide a link to it here. If you have any problems with the installation or running SC3, please contact Vladimir Kiselev.
We are also in the process of submitting the manuscript, describing SC3, to a journal. Meanwhile, if you need more technical details of the tool, please send a request to Vladimir Kiselev and he will share the technical part of the manuscript with you.
There is also a poster and a presentation available (note, that they are already a couple of months old):
Start R and then type:
install.packages("devtools")
devtools::install_github("hemberg-lab/SC3")
library(SC3)
RSelenium::checkForServer()
To test that the package has been installed successfully please run the following command:
library(SC3)
sc3(treutlein, ks = 3:7, cell.filter = TRUE)
It should open SC3 in a browser window without providing any error. If there is any error please send it to Vladimir Kiselev.
There are two built-in datasets that are automatically loaded with SC3:
Dataset | Source | N cells | k clusters |
---|---|---|---|
Treutlein | Distal lung epithelium | 80 | 5 |
Deng | Mouse embryos | 268 | 10 |
One can explore clusterings of these datasets by running the following commands (ks parameter defines a region of k needed to be investigated - see the next paragraph):
sc3(treutlein, ks = 3:7)
sc3(deng, ks = 8:12)
To run SC3 please use the following function:
sc3(dataset, ks = k.min:k.max,
cell.filter = FALSE, cell.filter.genes = 2000,
interactivity = TRUE,
svm.num.cells = 1000,
show.original.labels = FALSE,
d.region.min = 0.04,
d.region.max = 0.07,
chisq.quantile = 0.9999)
- dataset is either an R matrix / data.frame / data.table object OR a path to your input file containing an expression matrix.
- ks is a range of the number of clusters that needs to be tested. k.min is the minimum number of clusters (default is 3). k.max is the maximum number of clusters (default is 7).
- (optional) cell.filter is used to filter cells that express less than cell.filter.genes genes (lowly expressed cells). By default it is OFF. To switch it ON please use TRUE value as in the Test run above. Should be used if it is not possible to properly cluster original cells - filtering of lowly expressed cells usually improves clustering.
- (optional) cell.filter.genes - if cell.filter is used then this parameter defines the minimum number of genes that have to be expressed in each cell (i.e. have more than zero reads). If there are fewer, the cell will be removed from the analysis. The default is 2000.
- (optional) interactivity defines whether a browser interactive window should be open after all computation is done. By default it is ON. To switch it OFF please use FALSE value. This option can be used to separate clustering calculations from visualisation, e.g. long and time-consuming clustering of really big datasets can be run on a computing cluster and visualisations can be done using a personal laptop afterwards. If interactivity is OFF then all clustering results will be saved to dataset.rds file. To run interactive visulisation with the precomputed clustering results please use
sc3_interactive(readRDS("dataset.rds"))
. - (optional) svm.num.cells - if number of cells in your dataset is greater than this parameter, then an SVM prediction will be used. The default is 1000.
- (optional) show.original.labels - if cell labels in the dataset are not unique, but represent clusters expected from the experiment, they can be visualised by setting show.original.labels to TRUE. The default is FALSE.
- (optional) d.region.min - the lower boundary of the optimum region of d. The default is 0.04.
- (optional) d.region.max - the upper boundary of the optimum region of d. The default is 0.07.
- (optional) chisq.quantile - a treshold used for cell outliers detection. The default is 0.9999.
Usage example: if you would like to check clustering of your dataset for ks from 2 to 5, then you need to run the following:
sc3(dataset, ks = 2:5) # without filtering of lowly expressed cells
sc3(dataset, ks = 2:5, cell.filter = TRUE) # with filtering of lowly expressed cells
sc3(dataset, ks = 2:5, interactivity = FALSE) # without interactive visualisation
To run SC3 on an input file containing an expression matrix one need to preprocess the input file so that it looks as follows:
cell1 | cell2 | cell3 | cell4 | cell5 | |
---|---|---|---|---|---|
gene1 | 1 | 2 | 3 | 4 | 5 |
gene2 | 1 | 2 | 3 | 4 | 5 |
gene3 | 1 | 2 | 3 | 4 | 5 |
The first row of the expression matrix (with cell labels, e.g. cell1, cell2, etc.) should contain one fewer field than all other rows. Separators should be either spaces or tabs. If separators are commas (,) then the extension of the file must be .csv. If a path to your input file is "/path/to/input/file/expression-matrix.txt", to run it:
sc3("/path/to/input/file/expression-matrix.txt", ks = 2:5)
GPL-3