Skip to content

Commit

Permalink
building doc tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
quentinblampey committed Jan 9, 2024
1 parent 89ec381 commit 1fc358a
Show file tree
Hide file tree
Showing 9 changed files with 283 additions and 34 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ explore
sandbox
*.html
.env
tuto.*

# OS related
.DS_Store
Expand Down
8 changes: 4 additions & 4 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ Choose one of the following, depending on your needs (it should take at most a f
```

!!! note "Baysor usage"
Even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it inside the [snakemake pipeline](../pipeline)
Even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it inside the [snakemake pipeline](../tutorials/snakemake)

## Usage

Sopa comes with three different flavors, each corresponding to a different use case:

- `Snakemake pipeline`: choose a config, and run our pipeline on your spatial data in a couple of minutes. See our [snakemake guide](../pipeline).
- `CLI`: use our [command-line-interface](../cli) to prototype quickly your own pipeline
- `API`: use directly `sopa` as a python package for full flexibility and customization (see the API documentation on the sidebar)
- `Snakemake pipeline`: choose a config, and run our pipeline on your spatial data in a couple of minutes. See our [snakemake guide](../tutorials/snakemake).
- `CLI`: use our [command-line-interface](../tutorials/cli_usage) to prototype quickly your own pipeline
- `API`: use directly `sopa` as a python package for full flexibility and customization (see a tutorial [here](../tutorials/api_usage))
100 changes: 79 additions & 21 deletions docs/tutorials/advanced_segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,71 +2,119 @@ For staining-based segmentation, the Sopa CLI and pipeline are based on [Cellpos

## Multi-step segmentation

### Using the CLI [WIP]
Multi-step segmentation consist in running multiple times Cellpose over the whole slides with different parameters. For instance, we can first run a nucleus segmentation using DAPI, then another round using DAPI and a membrane staining, and finally using DAPI and a cell boundary staining. This can make the segmentation more robust. Note that the results of the multiple steps are combine into one segmentation result.

### 1. Save your data

For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.

```sh
sopa read . --sdata-path tuto.zarr --technology uniform
```

### 2. Run multi-step segmentation

Then, generate the bounding boxes of the patches on which Cellpose will be run. Here, the patches have a width and height of 1500 pixels, and an overlap of 50 pixels. We advise bigger sizes for real datasets (see our default parameters in one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)). On the toy dataset, this will generate **4** patches.

```sh
sopa patchify image tuto.zarr --patch-width-pixel 1500 --patch-overlap-pixel 50
```

```sh
sopa segmentation cellpose tuto.zarr \
--channels DAPI --channels CK \
--patch-index 0 \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
--diameter 40 \
--min-area 1000 --clip-limit 0.01 # these are optional parameters
```
Now, we can run Cellpose on each individual patch, and for each "segmentation step" we want. On this toy example, we run 3 steps (don't forget to execute the three steps), with (i) DAPI + CK, (ii) DAPI + CD3, and (iii) DAPI + CD20.

!!! Advice
Running the commands below manually can involve using many consecutive command, so we recommend automatizing it. For instance, this can be done using Snakemake or Nextflow. Mainly, this will help you parallelizing it, since you can run each task on seperate jobs, or using multithreading.

=== "Step 1"

Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CK

```sh
sopa segmentation cellpose tuto.zarr \
--channels DAPI --channels CK \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
--diameter 35 \
--min-area 2000 \
--patch-index 0
```

=== "Step 2"

Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CD3

```sh
sopa segmentation cellpose tuto.zarr \
--channels DAPI --channels CD3 \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CD3 \
--diameter 35 \
--min-area 2000 \
--patch-index 0
```

Same for CD3 and CD20
=== "Step 3"

Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CD20

```sh
sopa segmentation cellpose tuto.zarr \
--channels DAPI --channels CD20 \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CD20 \
--diameter 35 \
--min-area 2000 \
--patch-index 0
```

!!! Note
In the above commands, the `--diameter` and `--min-area` parameters are specific to the data type we work on. For your own data, consider using the default parameters from one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config). Here, `min-area` is in pixels^2.

At this stage, you executed 12 times Cellpose (4 times on each of the three steps). Now, we need to resolve the conflict, i.e. merging the three segmentations into one. Note that we gave the paths to the temporary boundaries that we made above.
```sh
sopa resolve cellpose tuto.zarr \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CD3 \
--patch-dir tuto.zarr/.sopa_cache/cellpose_CD20
```

### 3. Post-segmentation

Now, we can count the transcript inside each cell (by providing the name of the points dataframe, see `--gene-column genes` below), average the channels intensities inside each cell (using `average-intensities`). In the example below, we also filter cells whose average intensity if lower that `0.25 * Q90`, where Q90 is the 90th quantile (**Warning**: this may remove a lot of cells, we advise not to use this parameter when trying Sopa for the first time).
```sh
sopa aggregate tuto.zarr --gene-column genes --average-intensities --min-intensity-ratio 0.25
```

Other post-segmentations methods are available [here](../../cli). Among then, one can be used to convert the results for the Xenium Explorer:
```sh
sopa explorer write tuto.zarr --gene-column genes
```

If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium`
If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium` (if using a Unix operating system), or double click on the latter file.

You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.

This can be automatized in a snakemake pipeline, ...
!!! Note
You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.

## Custom staining-based segmentation

You can use your own segmentation model and plug it into Sopa to benefit from all the others functionnalities. Especially, it will scale the segmentation, since Sopa will be run on small patches.

For this, you need a python function as described below:
### 1. Define your segmentation function

You need a python function as described below:

- The function input is an image of shape `(C, Y, X)` (`C` is the number of desired channels, it can be one if you want DAPI only)

- The function output is a mask of shape `(Y, X)`. This mask should contain positive values representing the segmented cells, and contain `0` outside of the cells. For instance, if 4 cells are segmented, the mask **should** contain the values 1, 2, 3, and eventually 0 (where there is no cell).

### Using the API

An example of custom segmentation using the API is detailed [here](../../api/segmentation/stainings/#sopa.segmentation.stainings.StainingSegmentation).
If you want to use our API, you can find a detailed example of custom segmentation [here](../../api/segmentation/stainings/#sopa.segmentation.stainings.StainingSegmentation). Else, if you want to use the CLI, continue below.

### Using the CLI
### 2. Setup

To use the CLI here, you'll need to clone the repository:
To use the CLI, you'll need to clone the repository:
```sh
git clone https://github.com/gustaveroussy/sopa.git
cd sopa
```

Then, add your method inside the `sopa/segmentation/methods.py` file. An example function, called `dummy_method`, is given.
Then, add the function that you define above to the `sopa/segmentation/methods.py` file. An example function, called `dummy_method`, is given.

Now, install `sopa` to have your new method in the installation:
```sh
Expand All @@ -77,6 +125,16 @@ pip install -e .`
pip install -e '.[cellpose,baysor,...]'
```

### 3. Save your data

For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.

```sh
sopa read . --sdata-path tuto.zarr --technology uniform
```

### 4. Run your custom segmentation

Afterwards, simply call the CLI by providing the name of your function as the `<FUNCTION_NAME>` in the following commands:

- `sopa segmentation generic-staining <SDATA_PATH> --method-name <FUNCTION_NAME> ...` (see [here](../../cli/#sopa-segmentation-generic-staining) for CLI details)
Expand Down
9 changes: 9 additions & 0 deletions docs/tutorials/api_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Coming soon

<!-- ```
from sopa.utils.data import uniform
sdata = uniform()
```
For more details, see the [function documentation](../api/utils/data/#sopa.utils.data.uniform). -->
143 changes: 143 additions & 0 deletions docs/tutorials/cli_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Save the `SpatialData` object


For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.

```sh
# this generates a 'tuto.zarr' directory
sopa read . --sdata-path tuto.zarr --technology uniform
```

!!! Note
This generates a `.zarr` directory corresponding to a [`SpatialData` object](https://github.com/scverse/spatialdata).

## (Optional) ROI selection

Sometimes, your slide may contain a region with low quality data, and we want to run the analysis only on the good quality region. For this, we can interactively select a region of interest (ROI), and Sopa will only run on the selected ROI.

=== "If working locally"
Run the following command line, and follow the instructions displayed in the console:
```sh
sopa crop --sdata-path tuto.zarr --channels DAPI
```

=== "If working on a machine without interative mode"
When interactive mode is not available, the ROI selection will be performed in three steps.

1. On the machine where the data is stored, save a light resized view of the original image (here, it will create a file called `image.zarr.zip`):
```sh
sopa crop --sdata-path tuto.zarr --channels DAPI --intermediate-image image.zarr.zip
```

2. Download the `image.zip` file locally (or on a machine with interactive mode), and select the ROI. Here, it will create a file called `roi.zarr.zip`:
```sh
sopa crop --intermediate-image image.zarr.zip --intermediate-polygon roi.zarr.zip
```

3. Upload the `roi.zarr.zip` file, and save it inside the `SpatialData` object:
```sh
sopa crop --sdata-path tuto.zarr --intermediate-polygon roi.zarr.zip
```

## Run segmentation

### Option 1: Cellpose

Then, generate the bounding boxes of the patches on which Cellpose will be run. Here, the patches have a width and height of 1500 pixels, and an overlap of 50 pixels. We advise bigger sizes for real datasets (see our default parameters in one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)). On the toy dataset, this will generate **4** patches.

```sh
sopa patchify image tuto.zarr --patch-width-pixel 1500 --patch-overlap-pixel 50
```

Now, we can run Cellpose on each individual patch, and for each "segmentation step" we want. On this toy example, we run 3 steps (don't forget to execute the three steps), with (i) DAPI + CK, (ii) DAPI + CD3, and (iii) DAPI + CD20.

!!! Advice
Running manually the commands below can involve using many consecutive command, so we recommend automatizing it. For instance, this can be done using Snakemake or Nextflow. Mainly, this will help you parallelizing it, since you can run each task on seperate jobs, or using multithreading. You can also see how we do it in the [Sopa Snakemake pipeline](https://github.com/gustaveroussy/sopa/blob/master/workflow/Snakefile).

To automatically get the number of patches, you can either open the `tuto.zarr/.sopa_cache/patches_file_image` file, or compute `len(sdata['sopa_patches'])` in Python.

Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CK

```sh
sopa segmentation cellpose tuto.zarr \
--channels DAPI \
--patch-dir tuto.zarr/.sopa_cache/cellpose \
--diameter 35 \
--min-area 2000 \
--patch-index 0
```

!!! Note
In the above commands, the `--diameter` and `--min-area` parameters are specific to the data type we work on. For your own data, consider using the default parameters from one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config). Here, `min-area` is in pixels^2.

At this stage, you executed 4 times Cellpose. Now, we need to resolve the conflict, i.e. where boundaries are overlapping due to segmentation on multiple patches.
```sh
sopa resolve cellpose tuto.zarr --patch-dir tuto.zarr/.sopa_cache/cellpose
```

### Option 2: Baysor

## Aggregation

To turn the data into an `AnnData` object, we can do count the transcript inside each cell, and/or average each channel intensity inside each cell boundary.

count the transcript inside each cell (by providing the name of the points dataframe, see `--gene-column genes` below), average the channels intensities inside each cell (using `average-intensities`).
```sh
sopa aggregate tuto.zarr --gene-column genes --average-intensities
```

## Annotation

Currently, we support Tangram for transcript-based annotation, and a simple scoring approach for channel-based annotation (called channel z-score).

=== "Tangram annotation"
...
=== "Channel Z-score annotation"
...


## Pipeline report

You can create an HTML report of the pipeline run (on the example below, we save it under `report.html`). It contains some quality controls about your data.

```sh
sopa report tuto.zarr report.html
```

## Visualization (Xenium Explorer)
The Xenium Explorer is a software developed by 10X Genomics for visualizing spatial data, and it can be downloaded freely [here](https://www.10xgenomics.com/support/software/xenium-explorer/latest). Sopa allows the convertion to the Xenium Explorer, whatever the type of spatial data you worked on.

```sh
sopa explorer write tuto.zarr --gene-column genes
```

If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium` (if using a Unix operating system), or double click on the latter file.

!!! note "Time efficiency"
Creating the image needed by the Xenium Explorer can be time consuming. Therefore, we recommend to perform one run for the image generation (below) and another to save the transcripts/boundaries/observations.
```sh
# this can be done directly after saving the raw data in a .zarr directory
sopa explorer write tuto.zarr --mode '+i' --no-save-h5ad
```

After running everything with Sopa, you can finally save all the other Xenium Explorer input (e.g. boundaries and cell categories):
```sh
# this should be done after aggregation and an eventual annotation
sopa explorer write tuto.zarr --mode '-i'
```
For more details and customization, refer to the [command line helper](../../cli/#sopa-explorer-write).

## Geometric and spatial statistics

All functions to compute geometric and spatial statistics are detailed in the `sopa.stats` [API](../../api/stats). You can also read [this tutorial](../stats).

## Further analysis

- If you are familiar with the [`spatialdata` library](https://github.com/scverse/spatialdata), you can directly use the `tuto.zarr` directory, corresponding to a `SpatialData` object:
```python
import spatialdata

sdata = spatialdata.read_zarr("tuto.zarr")
```
- You can use [Squidpy](https://squidpy.readthedocs.io/en/latest/index.html) which operates on both the `SpatialData` object or the `AnnData` object, or use other tools of the `scverse` ecosystem such as [`scanpy`](https://scanpy.readthedocs.io/en/stable/index.html).
- You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.
33 changes: 32 additions & 1 deletion docs/pipeline.md → docs/tutorials/snakemake.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Snakemake pipeline

If you don't want to dig into the CLI/API, you can directly use our existing [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline. It will not involve any coding, but requires some setup for `snakemake`.
Sopa comes with an existing [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline to get started quickly. This will not involve any coding, but requires some setup specific to `snakemake`.

## Setup

Expand Down Expand Up @@ -61,6 +61,37 @@ cd sopa/workflow # your own personal path to the workflow directory

For more customization, see the [snakemake CLI documentation](https://snakemake.readthedocs.io/en/stable/executing/cli.html).

## Toy example

In the example below, we run the pipeline on a generated toy dataset. Running it locally can help testing a new pipeline or a new config.

Make sure you have setup everything as detailed in this tutorial, and then run the following command lines:

=== "Cellpose usage"
Make sure you have installed sopa with the Cellpose extra
```sh
conda activate sopa # or an environment that has `snakemake`
cd sopa/workflow # your own personal path to the workflow directory

# replace tuto.zarr by the path where you want the data to be saved
snakemake --config data_path=. sdata_path=tuto.zarr --configfile=config/toy/uniform_cellpose.yaml --cores 1 --use-conda
```

=== "Baysor usage"
Make sure you have installed sopa with the Baysor extra, and that you have installed the `baysor` command
```sh
conda activate sopa # or an environment that has `snakemake`
cd sopa/workflow # your own personal path to the workflow directory

# replace tuto.zarr by the path where you want the data to be saved
snakemake --config data_path=. sdata_path=tuto.zarr --configfile=config/toy/uniform_baysor.yaml --cores 1 --use-conda
```

!!! notes
On the above example, it executes snakemake sequentially (one core), which is enough for debugging purposes

You can then check `toy.explorer` for output files. Notably, if you have installed the [Xenium Explorer](https://www.10xgenomics.com/support/software/xenium-explorer), double-click on `experiment.xenium` to visualize the results.

## Create your own config

If the existing `config` files are not suited for your project, you can update an existing one, or create a whole new one. For this, use [this commented config](https://github.com/gustaveroussy/sopa/blob/master/workflow/config/example_commented.yaml) to understand the purpose of each argument. Note that some sections are optional: in this case, just remove the section or the argument, and sopa will not run it.
1 change: 1 addition & 0 deletions docs/tutorials/stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Coming soon
Loading

0 comments on commit 1fc358a

Please sign in to comment.