Skip to content
/ SCS Public

Sub-cellular spatial transcriptomics Cell Segmentation

License

Notifications You must be signed in to change notification settings

chenhcs/SCS

Repository files navigation

SCS

License: MIT

SCS (Sub-cellular spatial transcriptomics Cell Segmentation) is a method that combines sequencing and staining data to accurately identify cell boundaries from high-resolution spatial transcriptomics.

System requirements

Operating system

The software has been tested on the CentOS Linux 7 system.

Software requirements

  • python 3.9
  • anndata
  • matplotlib
  • numpy
  • pandas
  • scanpy
  • scikit-learn
  • scipy
  • tensorflow
  • tensorflow_addons
  • imagecodecs
  • scikit-misc
  • spateo

Installation

It is recommended to create a virtual environment using Conda. After successfully installing Anaconda/Miniconda, create an environment using the provided environment.yml file, then manually install the spateo package:

conda env create -f environment.yml
conda activate SCS
pip install spateo-release

Usage

Example

This section describes an example of how to use SCS to perform cell segmentation on the high-resolution spatial transcriptomics data.

An example is provided for one mouse adult brain section generated from the Stereo-seq platform. To run the example, download the Mouse_brain_Adult_GEM_bin1.tsv.gz file from the MOSTA data portal and save it to the data folder under this project directory, then unzip the file by running the following command.

gunzip data/Mouse_brain_Adult_GEM_bin1.tsv.gz

The file of detected RNAs should follow the following format in a tab-delimited file:

geneID  row  column  counts

The corresponding staining image data is already in the data folder. Run the following script from the project home directory to take one patch from the whole section as an example:

python patch_cut.py

Then use the following python code to run SCS on the example patch or run the example.py script:

from src import scs

bin_file = 'data/Mouse_brain_Adult_GEM_bin1_sub.tsv'
image_file = 'data/Mouse_brain_Adult_sub.tif'
scs.segment_cells(bin_file, image_file, align='rigid')

Use help(scs.segment_cells) in python to see more instructions on the usages.

The segment_cells function will run three steps to segment the provided patch: (i) preprocessing, i.e., identifying nuclei and preparing data for the transformer, (ii) training the transformer and inference on all the spots in the patch, (iii), postprocessing, i.e., gradient flow tracking. The preprocessing time on the demo patch will be about 10 minutes, transformer training will take roughly 1 hour with an Nvidia GeForce 10 series graphics card, and the postprocessing will take about 5 minutes.

Processing large-scale data

SCS can process large-scale spatial data by splitting the provided section into patches, and process the data patch by patch. This makes the prediction on very large datasets feasible on normal computers.

The example of running SCS on the whole mouse brain section of Stereo-seq is as follows. Before running the example, the transcriptomics data Mouse_brain_Adult_GEM_bin1.tsv.gz should be downloaded and saved to the data folder under this project directory and uncompressed. The corresponding image data Mouse_brain_Adult.tif should be downloaded and saved to the same data folder as well.

Next, run the following code or the large_scale.py script from the project home directory to run SCS on the whole mouse brain section, in which SCS will split the section into patches of size (patch_size) 1200 spots x 1200 spots, and make predictions patch by patch.

from src import scs

bin_file = 'data/Mouse_brain_Adult_GEM_bin1.tsv'
image_file = 'data/Mouse_brain_Adult.tif'
scs.segment_cells(bin_file, image_file, align='rigid', patch_size=1200)

The patch_size parameter controls how large one patch will be.

We also advise the users to save patches into separate files as done in the "Example" section and run SCS on patches parallelly on different CPUs/GPUs.

Reproducing cell segmentations for the Stereo-seq and Seq-scope datasets

The cell segmentations for the whole Stereo-seq section can be generated following the instruction in the "Processing large-scale data" section.

Follow the instruction below to generate cell segmentations for the Seq-Scope mouse liver dataset. The Seq-Scope transcriptomics data can be downloaded from GEO. Save the three files in the link to the data folder and unzip the tsv.gz files. The file for coordinates of sequencing spots can be downloaded from Deep Blue Data. Save this file to the data folder as well. Then run the following script to convert data format. Or directly use the processed .tsv files saved in the data folder.

python format.py

The paired H&E images can be found at Deep Blue Data, the processed images corresponding to tiles 2104-2107 have already been saved to the data folder. Run the following script to make predictions for the four tiles (2104-2107) of the Seq-Scope data:

python seqscope.py

Output

Results will be saved to results directory.

The output file cell_masks.png visualizes cell boundaries in the sequencing section.

The output file spot2cell.txt contains the mapping from spot coordinates to cell indexes.

Each line has the following format:

row:column  cell_id

where row:column is the coordinate of one spot indicating which row and column the spot is located in from the upper left corner, and cell_id is the index of the cell to which the spot belongs.

A statistical summary for the segmented cells cell_stats.txt, including the number of cells identified and cell size statistics, will be saved to the results directory.

Evaluation

Run the following script for an example of comparing SCS segmentation with Watershed segmentation:

python evaluation.py data/Mouse_brain_Adult_GEM_bin1.tsv 5700 5700 1200 results/spot2nucl_5700:5700:1200:1200.txt results/spot2cell_SCS_5700:5700:1200:1200.txt results/spot2cell_watershed_5700:5700:1200:1200.txt

The script takes seven input: (i) gene counts of spots, (ii) row start index of the patch, (iii) column start index of the patch, (iv) patch size, (v) nucleus segmentation, i.e., mapping from spots to nuclei, (vi) cell segmentation of method 1, i.e., mapping from spots to cells, and (vii) cell segmentation of method 2, i.e., mapping from spots to cells.

The Pearson correlation statistics will be printed, and a boxplot summarizing the correlations will be saved in the results folder.

Credits

The software is an implementation of the method SCS, jointly developed by Hao Chen, Dongshunyi Li, and Ziv Bar-Joseph from the System Biology Group @ Carnegie Mellon University.

Contact

Contact us if you have any questions:
Hao Chen: hchen4 at andrew.cmu.edu
Ziv Bar-Joseph: zivbj at andrew.cmu.edu

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find SCS is useful for your research, please cite the following paper:

Chen, H., Li, D. & Bar-Joseph, Z.
SCS: cell segmentation for high-resolution spatial transcriptomics.
Nat Methods (2023). https://doi.org/10.1038/s41592-023-01939-3

About

Sub-cellular spatial transcriptomics Cell Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages