Skip to content

An interactive tool for the unsupervised clustering of cells from single cell RNA-Seq experiments.

Notifications You must be signed in to change notification settings

wikiselev/SC3-old

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NEWS

26/01/2015

This is now officially the OLD version of SC3 and is not maintained anymore. The newest version (developmental) can be found and installed from this GitHub repository:
https://github.com/hemberg-lab/SC3

The current stable version can be installed directly from BioConductor (package name is SC3).

14/01/2015

SC3 manuscript is submitted and also available on bioRxiv: http://biorxiv.org/content/early/2016/01/13/036558

06/01/2015

SC3 is under review at Bioconductor. This GitHub package will not be updated anymore. Once SC3 is released by Bioconudctor I will post a link to it here. All further updates will be via Bioconductor.

25/11/2015

Started writing a manual. Meanwhile, if you have any questions on the usage of SC3 please send your questions to Vladimir Kiselev.

24/11/2015

SC3 (Single-Cell Consensus Clustering) is an interactive tool (implemented as an R package) for the unsupervised clustering of cells from single cell RNA-Seq experiments.

Please keep in mind that this is a developmental version of SC3 and some of the functionality may not work on some systems (currently completely tested on MacOS only). We are in the process of submitting SC3 to Bioconductor. Once it is done we will provide a link to it here. If you have any problems with the installation or running SC3, please contact Vladimir Kiselev.

We are also in the process of submitting the manuscript, describing SC3, to a journal. Meanwhile, if you need more technical details of the tool, please send a request to Vladimir Kiselev and he will share the technical part of the manuscript with you.

There is also a poster and a presentation available (note, that they are already a couple of months old):

SC3 poster

SC3 presentation

1. Installation

Start R and then type:

install.packages("devtools")
devtools::install_github("hemberg-lab/SC3")
library(SC3)
RSelenium::checkForServer()

2. Test run

To test that the package has been installed successfully please run the following command:

library(SC3)
sc3(treutlein, ks = 3:7, cell.filter = TRUE)

It should open SC3 in a browser window without providing any error. If there is any error please send it to Vladimir Kiselev.

3. "Built-in" datasets

There are two built-in datasets that are automatically loaded with SC3:

Dataset Source N cells k clusters
Treutlein Distal lung epithelium 80 5
Deng Mouse embryos 268 10

One can explore clusterings of these datasets by running the following commands (ks parameter defines a region of k needed to be investigated - see the next paragraph):

sc3(treutlein, ks = 3:7)
sc3(deng, ks = 8:12)

4. Running SC3

The SC3 pipeline:

To run SC3 please use the following function:

sc3(dataset, ks = k.min:k.max,
    cell.filter = FALSE, cell.filter.genes = 2000,
    interactivity = TRUE,
    svm.num.cells = 1000,
    show.original.labels = FALSE,
    d.region.min = 0.04,
    d.region.max = 0.07,
    chisq.quantile = 0.9999)
  • dataset is either an R matrix / data.frame / data.table object OR a path to your input file containing an expression matrix.
  • ks is a range of the number of clusters that needs to be tested. k.min is the minimum number of clusters (default is 3). k.max is the maximum number of clusters (default is 7).
  • (optional) cell.filter is used to filter cells that express less than cell.filter.genes genes (lowly expressed cells). By default it is OFF. To switch it ON please use TRUE value as in the Test run above. Should be used if it is not possible to properly cluster original cells - filtering of lowly expressed cells usually improves clustering.
  • (optional) cell.filter.genes - if cell.filter is used then this parameter defines the minimum number of genes that have to be expressed in each cell (i.e. have more than zero reads). If there are fewer, the cell will be removed from the analysis. The default is 2000.
  • (optional) interactivity defines whether a browser interactive window should be open after all computation is done. By default it is ON. To switch it OFF please use FALSE value. This option can be used to separate clustering calculations from visualisation, e.g. long and time-consuming clustering of really big datasets can be run on a computing cluster and visualisations can be done using a personal laptop afterwards. If interactivity is OFF then all clustering results will be saved to dataset.rds file. To run interactive visulisation with the precomputed clustering results please use sc3_interactive(readRDS("dataset.rds")).
  • (optional) svm.num.cells - if number of cells in your dataset is greater than this parameter, then an SVM prediction will be used. The default is 1000.
  • (optional) show.original.labels - if cell labels in the dataset are not unique, but represent clusters expected from the experiment, they can be visualised by setting show.original.labels to TRUE. The default is FALSE.
  • (optional) d.region.min - the lower boundary of the optimum region of d. The default is 0.04.
  • (optional) d.region.max - the upper boundary of the optimum region of d. The default is 0.07.
  • (optional) chisq.quantile - a treshold used for cell outliers detection. The default is 0.9999.

Usage example: if you would like to check clustering of your dataset for ks from 2 to 5, then you need to run the following:

sc3(dataset, ks = 2:5)                        # without filtering of lowly expressed cells
sc3(dataset, ks = 2:5, cell.filter = TRUE)    # with filtering of lowly expressed cells
sc3(dataset, ks = 2:5, interactivity = FALSE) # without interactive visualisation

5. Input file format

To run SC3 on an input file containing an expression matrix one need to preprocess the input file so that it looks as follows:

cell1 cell2 cell3 cell4 cell5
gene1 1 2 3 4 5
gene2 1 2 3 4 5
gene3 1 2 3 4 5

The first row of the expression matrix (with cell labels, e.g. cell1, cell2, etc.) should contain one fewer field than all other rows. Separators should be either spaces or tabs. If separators are commas (,) then the extension of the file must be .csv. If a path to your input file is "/path/to/input/file/expression-matrix.txt", to run it:

sc3("/path/to/input/file/expression-matrix.txt", ks = 2:5)

6. License

GPL-3

About

An interactive tool for the unsupervised clustering of cells from single cell RNA-Seq experiments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages