building doc tutorials

loic-lb · Jan 9, 2024 · 1fc358a · 1fc358a
1 parent 89ec381
commit 1fc358a
Show file tree

Hide file tree

Showing 9 changed files with 283 additions and 34 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ explore
 sandbox
 *.html
 .env
+tuto.*
 
 # OS related
 .DS_Store

diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -45,12 +45,12 @@ Choose one of the following, depending on your needs (it should take at most a f
     ```
 
 !!! note "Baysor usage"
-    Even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it inside the [snakemake pipeline](../pipeline)
+    Even though `pip install 'sopa[baysor]'` will install some dependencies related to baysor, you still have to install the `baysor` command line (see the [official repository](https://github.com/kharchenkolab/Baysor)) if you want to use it inside the [snakemake pipeline](../tutorials/snakemake)
 
 ## Usage
 
 Sopa comes with three different flavors, each corresponding to a different use case:
 
-- `Snakemake pipeline`: choose a config, and run our pipeline on your spatial data in a couple of minutes. See our [snakemake guide](../pipeline).
-- `CLI`: use our [command-line-interface](../cli) to prototype quickly your own pipeline
-- `API`: use directly `sopa` as a python package for full flexibility and customization (see the API documentation on the sidebar)
+- `Snakemake pipeline`: choose a config, and run our pipeline on your spatial data in a couple of minutes. See our [snakemake guide](../tutorials/snakemake).
+- `CLI`: use our [command-line-interface](../tutorials/cli_usage) to prototype quickly your own pipeline
+- `API`: use directly `sopa` as a python package for full flexibility and customization (see a tutorial [here](../tutorials/api_usage))
diff --git a/docs/tutorials/advanced_segmentation.md b/docs/tutorials/advanced_segmentation.md
@@ -2,71 +2,119 @@ For staining-based segmentation, the Sopa CLI and pipeline are based on [Cellpos
 
 ## Multi-step segmentation
 
-### Using the CLI [WIP]
+Multi-step segmentation consist in running multiple times Cellpose over the whole slides with different parameters. For instance, we can first run a nucleus segmentation using DAPI, then another round using DAPI and a membrane staining, and finally using DAPI and a cell boundary staining. This can make the segmentation more robust. Note that the results of the multiple steps are combine into one segmentation result.
+
+### 1. Save your data
+
+For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.
 
 ```sh
 sopa read . --sdata-path tuto.zarr --technology uniform
 ```
 
+### 2. Run multi-step segmentation
+
+Then, generate the bounding boxes of the patches on which Cellpose will be run. Here, the patches have a width and height of 1500 pixels, and an overlap of 50 pixels. We advise bigger sizes for real datasets (see our default parameters in one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)). On the toy dataset, this will generate **4** patches.
+
 ```sh
 sopa patchify image tuto.zarr --patch-width-pixel 1500 --patch-overlap-pixel 50
 ```
 
-```sh
-sopa segmentation cellpose tuto.zarr \
-    --channels DAPI --channels CK \
-    --patch-index 0 \
-    --patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
-    --diameter 40 \
-    --min-area 1000 --clip-limit 0.01 # these are optional parameters
-```
+Now, we can run Cellpose on each individual patch, and for each "segmentation step" we want. On this toy example, we run 3 steps (don't forget to execute the three steps), with (i) DAPI + CK, (ii) DAPI + CD3, and (iii) DAPI + CD20.
+
+!!! Advice
+    Running the commands below manually can involve using many consecutive command, so we recommend automatizing it. For instance, this can be done using Snakemake or Nextflow. Mainly, this will help you parallelizing it, since you can run each task on seperate jobs, or using multithreading.
+
+=== "Step 1"
+
+    Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CK
+
+    ```sh
+    sopa segmentation cellpose tuto.zarr \
+        --channels DAPI --channels CK \
+        --patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
+        --diameter 35 \
+        --min-area 2000 \
+        --patch-index 0
+    ```
+
+=== "Step 2"
+
+    Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CD3
+
+    ```sh
+    sopa segmentation cellpose tuto.zarr \
+        --channels DAPI --channels CD3 \
+        --patch-dir tuto.zarr/.sopa_cache/cellpose_CD3 \
+        --diameter 35 \
+        --min-area 2000 \
+        --patch-index 0
+    ```
 
-Same for CD3 and CD20
+=== "Step 3"
 
+    Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CD20
+
+    ```sh
+    sopa segmentation cellpose tuto.zarr \
+        --channels DAPI --channels CD20 \
+        --patch-dir tuto.zarr/.sopa_cache/cellpose_CD20 \
+        --diameter 35 \
+        --min-area 2000 \
+        --patch-index 0
+    ```
+
+!!! Note
+    In the above commands, the `--diameter` and `--min-area` parameters are specific to the data type we work on. For your own data, consider using the default parameters from one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config). Here, `min-area` is in pixels^2.
+
+At this stage, you executed 12 times Cellpose (4 times on each of the three steps). Now, we need to resolve the conflict, i.e. merging the three segmentations into one. Note that we gave the paths to the temporary boundaries that we made above.
 ```sh
 sopa resolve cellpose tuto.zarr \
     --patch-dir tuto.zarr/.sopa_cache/cellpose_CK \
     --patch-dir tuto.zarr/.sopa_cache/cellpose_CD3 \
     --patch-dir tuto.zarr/.sopa_cache/cellpose_CD20
 ```
 
+### 3. Post-segmentation
+
+Now, we can count the transcript inside each cell (by providing the name of the points dataframe, see `--gene-column genes` below), average the channels intensities inside each cell (using `average-intensities`). In the example below, we also filter cells whose average intensity if lower that `0.25 * Q90`, where Q90 is the 90th quantile (**Warning**: this may remove a lot of cells, we advise not to use this parameter when trying Sopa for the first time).
 ```sh
 sopa aggregate tuto.zarr --gene-column genes --average-intensities --min-intensity-ratio 0.25
 ```
 
+Other post-segmentations methods are available [here](../../cli). Among then, one can be used to convert the results for the Xenium Explorer:
 ```sh
 sopa explorer write tuto.zarr --gene-column genes
 ```
 
-If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium`
+If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium` (if using a Unix operating system), or double click on the latter file.
 
-You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.
-
-This can be automatized in a snakemake pipeline, ...
+!!! Note
+    You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.
 
 ## Custom staining-based segmentation
 
 You can use your own segmentation model and plug it into Sopa to benefit from all the others functionnalities. Especially, it will scale the segmentation, since Sopa will be run on small patches.
 
-For this, you need a python function as described below:
+### 1. Define your segmentation function
+
+You need a python function as described below:
 
 - The function input is an image of shape `(C, Y, X)` (`C` is the number of desired channels, it can be one if you want DAPI only)
 
 - The function output is a mask of shape `(Y, X)`. This mask should contain positive values representing the segmented cells, and contain `0` outside of the cells. For instance, if 4 cells are segmented, the mask **should** contain the values 1, 2, 3, and eventually 0 (where there is no cell).
 
-### Using the API
-
-An example of custom segmentation using the API is detailed [here](../../api/segmentation/stainings/#sopa.segmentation.stainings.StainingSegmentation).
+If you want to use our API, you can find a detailed example of custom segmentation [here](../../api/segmentation/stainings/#sopa.segmentation.stainings.StainingSegmentation). Else, if you want to use the CLI, continue below.
 
-### Using the CLI
+### 2. Setup
 
-To use the CLI here, you'll need to clone the repository:
+To use the CLI, you'll need to clone the repository:
 ```sh
 git clone https://github.com/gustaveroussy/sopa.git
 cd sopa
 ```
 
-Then, add your method inside the `sopa/segmentation/methods.py` file. An example function, called `dummy_method`, is given.
+Then, add the function that you define above to the `sopa/segmentation/methods.py` file. An example function, called `dummy_method`, is given.
 
 Now, install `sopa` to have your new method in the installation:
 ```sh
@@ -77,6 +125,16 @@ pip install -e .`
 pip install -e '.[cellpose,baysor,...]'
 ```
 
+### 3. Save your data
+
+For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.
+
+```sh
+sopa read . --sdata-path tuto.zarr --technology uniform
+```
+
+### 4. Run your custom segmentation
+
 Afterwards, simply call the CLI by providing the name of your function as the `<FUNCTION_NAME>` in the following commands:
 
 - `sopa segmentation generic-staining <SDATA_PATH> --method-name <FUNCTION_NAME> ...` (see [here](../../cli/#sopa-segmentation-generic-staining) for CLI details)

diff --git a/docs/tutorials/api_usage.md b/docs/tutorials/api_usage.md
@@ -0,0 +1,9 @@
+Coming soon
+
+<!-- ```
+from sopa.utils.data import uniform
+
+sdata = uniform()
+```
+
+For more details, see the [function documentation](../api/utils/data/#sopa.utils.data.uniform). -->
diff --git a/docs/tutorials/cli_usage.md b/docs/tutorials/cli_usage.md
@@ -0,0 +1,143 @@
+## Save the `SpatialData` object
+
+
+For this tutorial, we use a generated dataset. The command below will generate it and save it on-disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details to use your own technology.
+
+```sh
+# this generates a 'tuto.zarr' directory
+sopa read . --sdata-path tuto.zarr --technology uniform
+```
+
+!!! Note
+    This generates a `.zarr` directory corresponding to a [`SpatialData` object](https://github.com/scverse/spatialdata).
+
+## (Optional) ROI selection
+
+Sometimes, your slide may contain a region with low quality data, and we want to run the analysis only on the good quality region. For this, we can interactively select a region of interest (ROI), and Sopa will only run on the selected ROI.
+
+=== "If working locally"
+    Run the following command line, and follow the instructions displayed in the console:
+    ```sh
+    sopa crop --sdata-path tuto.zarr --channels DAPI
+    ```
+
+=== "If working on a machine without interative mode"
+    When interactive mode is not available, the ROI selection will be performed in three steps.
+
+    1. On the machine where the data is stored, save a light resized view of the original image (here, it will create a file called `image.zarr.zip`):
+    ```sh
+    sopa crop --sdata-path tuto.zarr --channels DAPI --intermediate-image image.zarr.zip
+    ```
+
+    2. Download the `image.zip` file locally (or on a machine with interactive mode), and select the ROI. Here, it will create a file called `roi.zarr.zip`:
+    ```sh
+    sopa crop --intermediate-image image.zarr.zip --intermediate-polygon roi.zarr.zip
+    ```
+
+    3. Upload the `roi.zarr.zip` file, and save it inside the `SpatialData` object:
+    ```sh
+    sopa crop --sdata-path tuto.zarr --intermediate-polygon roi.zarr.zip
+    ```
+
+## Run segmentation
+
+### Option 1: Cellpose
+
+Then, generate the bounding boxes of the patches on which Cellpose will be run. Here, the patches have a width and height of 1500 pixels, and an overlap of 50 pixels. We advise bigger sizes for real datasets (see our default parameters in one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)). On the toy dataset, this will generate **4** patches.
+
+```sh
+sopa patchify image tuto.zarr --patch-width-pixel 1500 --patch-overlap-pixel 50
+```
+
+Now, we can run Cellpose on each individual patch, and for each "segmentation step" we want. On this toy example, we run 3 steps (don't forget to execute the three steps), with (i) DAPI + CK, (ii) DAPI + CD3, and (iii) DAPI + CD20.
+
+!!! Advice
+    Running manually the commands below can involve using many consecutive command, so we recommend automatizing it. For instance, this can be done using Snakemake or Nextflow. Mainly, this will help you parallelizing it, since you can run each task on seperate jobs, or using multithreading. You can also see how we do it in the [Sopa Snakemake pipeline](https://github.com/gustaveroussy/sopa/blob/master/workflow/Snakefile).
+
+    To automatically get the number of patches, you can either open the `tuto.zarr/.sopa_cache/patches_file_image` file, or compute `len(sdata['sopa_patches'])` in Python.
+
+Execute the following command line on all `patch-index` (i.e., `0`, `1`, `2`, and `3`) to run Cellpose using DAPI + CK
+
+```sh
+sopa segmentation cellpose tuto.zarr \
+    --channels DAPI \
+    --patch-dir tuto.zarr/.sopa_cache/cellpose \
+    --diameter 35 \
+    --min-area 2000 \
+    --patch-index 0
+```
+
+!!! Note
+    In the above commands, the `--diameter` and `--min-area` parameters are specific to the data type we work on. For your own data, consider using the default parameters from one of our [config files](https://github.com/gustaveroussy/sopa/tree/master/workflow/config). Here, `min-area` is in pixels^2.
+
+At this stage, you executed 4 times Cellpose. Now, we need to resolve the conflict, i.e. where boundaries are overlapping due to segmentation on multiple patches.
+```sh
+sopa resolve cellpose tuto.zarr --patch-dir tuto.zarr/.sopa_cache/cellpose
+```
+
+### Option 2: Baysor
+
+## Aggregation
+
+To turn the data into an `AnnData` object, we can do count the transcript inside each cell, and/or average each channel intensity inside each cell boundary.
+
+count the transcript inside each cell (by providing the name of the points dataframe, see `--gene-column genes` below), average the channels intensities inside each cell (using `average-intensities`).
+```sh
+sopa aggregate tuto.zarr --gene-column genes --average-intensities
+```
+
+## Annotation
+
+Currently, we support Tangram for transcript-based annotation, and a simple scoring approach for channel-based annotation (called channel z-score).
+
+=== "Tangram annotation"
+    ...
+=== "Channel Z-score annotation"
+    ...   
+
+
+## Pipeline report
+
+You can create an HTML report of the pipeline run (on the example below, we save it under `report.html`). It contains some quality controls about your data.
+
+```sh
+sopa report tuto.zarr report.html
+```
+
+## Visualization (Xenium Explorer)
+The Xenium Explorer is a software developed by 10X Genomics for visualizing spatial data, and it can be downloaded freely [here](https://www.10xgenomics.com/support/software/xenium-explorer/latest). Sopa allows the convertion to the Xenium Explorer, whatever the type of spatial data you worked on.
+
+```sh
+sopa explorer write tuto.zarr --gene-column genes
+```
+
+If you have downloaded the Xenium Explorer, you can now open the results in the explorer: `open tuto.explorer/experiment.xenium` (if using a Unix operating system), or double click on the latter file.
+
+!!! note "Time efficiency"
+    Creating the image needed by the Xenium Explorer can be time consuming. Therefore, we recommend to perform one run for the image generation (below) and another to save the transcripts/boundaries/observations.
+    ```sh
+    # this can be done directly after saving the raw data in a .zarr directory
+    sopa explorer write tuto.zarr --mode '+i' --no-save-h5ad
+    ```
+
+    After running everything with Sopa, you can finally save all the other Xenium Explorer input (e.g. boundaries and cell categories):
+    ```sh
+    # this should be done after aggregation and an eventual annotation
+    sopa explorer write tuto.zarr --mode '-i'
+    ```
+    For more details and customization, refer to the [command line helper](../../cli/#sopa-explorer-write).
+
+## Geometric and spatial statistics
+
+All functions to compute geometric and spatial statistics are detailed in the `sopa.stats` [API](../../api/stats). You can also read [this tutorial](../stats).
+
+## Further analysis
+
+- If you are familiar with the [`spatialdata` library](https://github.com/scverse/spatialdata), you can directly use the `tuto.zarr` directory, corresponding to a `SpatialData` object:
+```python
+import spatialdata
+
+sdata = spatialdata.read_zarr("tuto.zarr")
+```
+- You can use [Squidpy](https://squidpy.readthedocs.io/en/latest/index.html) which operates on both the `SpatialData` object or the `AnnData` object, or use other tools of the `scverse` ecosystem such as [`scanpy`](https://scanpy.readthedocs.io/en/stable/index.html).
+- You can also use the file `tuto.explorer/adata.h5ad` if you prefer the `AnnData` object instead of the full `SpatialData` object.
diff --git a/docs/pipeline.md → docs/tutorials/snakemake.md b/docs/pipeline.md → docs/tutorials/snakemake.md
@@ -1,6 +1,6 @@
 # Snakemake pipeline
 
-If you don't want to dig into the CLI/API, you can directly use our existing [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline. It will not involve any coding, but requires some setup for `snakemake`.
+Sopa comes with an existing [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline to get started quickly. This will not involve any coding, but requires some setup specific to `snakemake`.
 
 ## Setup
 
@@ -61,6 +61,37 @@ cd sopa/workflow       # your own personal path to the workflow directory
 
 For more customization, see the [snakemake CLI documentation](https://snakemake.readthedocs.io/en/stable/executing/cli.html).
 
+## Toy example
+
+In the example below, we run the pipeline on a generated toy dataset. Running it locally can help testing a new pipeline or a new config.
+
+Make sure you have setup everything as detailed in this tutorial, and then run the following command lines:
+
+=== "Cellpose usage"
+    Make sure you have installed sopa with the Cellpose extra
+    ```sh
+    conda activate sopa    # or an environment that has `snakemake`
+    cd sopa/workflow       # your own personal path to the workflow directory
+
+    # replace tuto.zarr by the path where you want the data to be saved
+    snakemake --config data_path=. sdata_path=tuto.zarr --configfile=config/toy/uniform_cellpose.yaml --cores 1 --use-conda
+    ```
+
+=== "Baysor usage"
+    Make sure you have installed sopa with the Baysor extra, and that you have installed the `baysor` command
+    ```sh
+    conda activate sopa    # or an environment that has `snakemake`
+    cd sopa/workflow       # your own personal path to the workflow directory
+
+    # replace tuto.zarr by the path where you want the data to be saved
+    snakemake --config data_path=. sdata_path=tuto.zarr --configfile=config/toy/uniform_baysor.yaml --cores 1 --use-conda
+    ```
+
+!!! notes
+    On the above example, it executes snakemake sequentially (one core), which is enough for debugging purposes
+
+You can then check `toy.explorer` for output files. Notably, if you have installed the [Xenium Explorer](https://www.10xgenomics.com/support/software/xenium-explorer), double-click on `experiment.xenium` to visualize the results.
+
 ## Create your own config
 
 If the existing `config` files are not suited for your project, you can update an existing one, or create a whole new one. For this, use [this commented config](https://github.com/gustaveroussy/sopa/blob/master/workflow/config/example_commented.yaml) to understand the purpose of each argument. Note that some sections are optional: in this case, just remove the section or the argument, and sopa will not run it.
diff --git a/docs/tutorials/stats.md b/docs/tutorials/stats.md
@@ -0,0 +1 @@
+Coming soon
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ explore @@
     sandbox
     *.html
     .env
+    tuto.*
     # OS related
     .DS_Store
@@ Expand Down @@