Import the Scanpy API as:
import scanpy.api as sc
The typical workflow consists of subsequent calls of data analysis tools
in sc.tl
, e.g.:
sc.tl.louvain(adata, **tool_params) # cluster cells using Louvain clustering
where adata
is an :class:`~scanpy.api.AnnData` object. Each of these calls adds annotation to an expression matrix X, which stores n_obs observations of n_vars gene expression variables. For each tool, there is at least one associated plotting function in sc.pl
, which retrieves and plots the added annotation:
sc.pl.louvain(adata, **plotting_params)
If you pass show=False
, a matplotlib.Axes instance is returned and you have all of matplotlib's detailed configuration possibilities.
To facilitate writing memory-efficient pipelines, by default, Scanpy tools operate inplace on adata
and return None
- this also allows to easily transition to out-of-memory pipelines. If you want to return a copy of the :class:`~scanpy.api.AnnData` object and leave the passed adata
unchanged, pass copy=True
.
Scanpy is based on anndata, which provides the :class:`~scanpy.api.AnnData` class.
At the most basic level, an :class:`~scanpy.api.AnnData` object adata
stores
a data matrix (adata.X
), dataframe-like annotation of observations
(adata.obs
) and variables (adata.var
) and unstructured dict-like
annotation (adata.uns
). Values can be retrieved and appended via
adata.obs['key1']
and adata.var['key2']
. Names of observations and
variables can be accessed via adata.obs_names
and adata.var_names
,
respectively. :class:`~scanpy.api.AnnData` objects can be sliced like
dataframes, for example, adata_subset = adata[:, list_of_gene_names]
.
For more, see this blog post.
To read a data file to an :class:`~scanpy.api.AnnData` object, call:
adata = sc.read(filename)
to initialize an :class:`~scanpy.api.AnnData` object. Possibly add further annotation using, e.g., pd.read_csv
:
import pandas as pd anno = pd.read_csv(filename_sample_annotation) adata.obs['cell_groups'] = anno['cell_groups'] # categorical annotation of type pandas.Categorical adata.obs['time'] = anno['time'] # numerical annotation of type float # alternatively, you could also set the whole dataframe # adata.obs = anno
To write, use:
adata.write(filename) adata.write_csvs(filename) adata.write_loom(filename)