Note
Also see the release notes of :mod:`anndata`.
- :func:`~scanpy.api.pp.calculate_qc_metrics` caculates a number of quality control metrics, similar to calculateQCMetrics from Scater [McCarthy17]_ thanks to I Virshup
- :func:`~scanpy.api.pp.read_10x_h5` and :func:`~scanpy.api.pp.read_10x_mtx` read Cell Ranger 3.0 outputs, see here thanks to Q. Gong
- :func:`~scanpy.api.pp.highly_variable_genes` replaces :func:`~scanpy.api.pp.filter_genes_dispersion`, it gives the same results but, by default, expects logarithmized data and doesn't subset thanks to S. Rybakov
RNA velocity in single cells [Manno18]_:
- Scanpy and AnnData support loom's layers so that computations for single-cell RNA velocity [Manno18]_ become feasible thanks to S Rybakov and V Bergen
- the package scvelo perfectly harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [Manno18]_, it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments (preprint to come)
Plotting of marker genes and quality control, see this section and scroll down, a few examples are
- :func:`~scanpy.api.pl.dotplot` for visualizing genes across conditions and clusters, see here thanks to F Ramirez
- :func:`~scanpy.api.pl.heatmap` for pretty heatmaps, see here thanks to F Ramirez
- :func:`~scanpy.api.pl.violin` now produces very compact overview figures with many panels, see here thanks to F Ramirez
- :func:`~scanpy.api.pl.highest_expr_genes` for quality control, see here; plot genes with highest mean fraction of cells, similar to plotQC of Scater [McCarthy17]_ thanks to F Ramirez
There is a section on imputation:
- :func:`~scanpy.api.pp.magic` for imputation using data diffusion [vanDijk18]_ thanks to S Gigante
- :func:`~scanpy.api.pp.dca` for imputation and latent space construction using an autoencoder [Eraslan18]_
Further changes:
- frameon=False enables easy removal of frames in scatter plots and in :func:`~scanpy.api.set_figure_params`
And several consistency fixes.
- :func:`~scanpy.api.tl.paga` improved, see theislab/paga; the default model changed, restore the previous default model by passing model='v1.0'
- :func:`~scanpy.api.set_figure_params` by default passes vector_friendly=True and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots
- :func:`~scanpy.api.tl.draw_graph` now defaults to the ForceAtlas2 layout [Jacomy14]_ [Chippada18]_, which is often more visually appealing and whose computation is much faster thanks to S Wollock
- :func:`~scanpy.api.pl.scatter` also plots along variables axis thanks to MD Luecken
- :func:`~scanpy.api.pp.pca` and :func:`~scanpy.api.pp.log1p` support chunk processing thanks to S Rybakov
- :func:`~scanpy.api.pp.regress_out` is back to multiprocessing thanks to F Ramirez
- :func:`~scanpy.api.read` reads compressed text files thanks to G Eraslan
- :func:`~scanpy.api.queries.mitochondrial_genes` for querying mito genes thanks to FG Brundu
- :func:`~scanpy.api.pp.mnn_correct` for batch correction [Haghverdi18]_ [Kang18]_
- :func:`~scanpy.api.tl.phate` for low-dimensional embedding [Moon17]_ thanks to S Gigante
- :func:`~scanpy.api.tl.sandbag`, :func:`~scanpy.api.tl.cyclone` for scoring genes [Scialdone15]_ [Fechtner18]_
Scanpy is much faster and more memory efficient. Preprocess, cluster and visualize 1.3M cells in 6 h, 130K cells in 14 min and 68K cells in 3 min.
The API gained a preprocessing function :func:`~scanpy.api.pp.neighbors` and a class :func:`~scanpy.api.Neighbors` to which all basic graph computations are delegated.
Upgrading to 1.0 isn't fully backwards compatible in the following changes:
- the graph-based tools :func:`~scanpy.api.tl.louvain`
:func:`~scanpy.api.tl.dpt` :func:`~scanpy.api.tl.draw_graph`
:func:`~scanpy.api.tl.umap` :func:`~scanpy.api.tl.diffmap`
:func:`~scanpy.api.tl.paga` now require prior computation of the graph:
sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata)
instead of previouslysc.tl.louvain(adata, n_neighbors=5)
- install numba via
conda install numba
, which replaces cython - the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance this example
- namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
- replace occurances of group_by with groupby (consistency with pandas)
- it is worth checking out the notebook examples to see changes, e.g., here
- upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different
Further changes are:
- UMAP [McInnes18]_ can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is now also used for measuring connectivities and computing neighbors, see :func:`~scanpy.api.pp.neighbors`
- graph abstraction: AGA is renamed to PAGA: :func:`~scanpy.api.tl.paga`; now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via :func:`~scanpy.api.tl.louvain` and :func:`~scanpy.api.tl.dpt`, the connectivity measure has been improved
- logistic regression for finding marker genes :func:`~scanpy.api.tl.rank_genes_groups` with parameter method='logreg'
- :func:`~scanpy.api.tl.louvain` now provides a better implementation for reclustering via restrict_to
- scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the 'scanpy style'
- default cache directory is
./cache/
, set settings.cachedir to change this; nested directories in this are now avoided - show edges in scatter plots based on graph visualization :func:`~scanpy.api.tl.draw_graph` and :func:`~scanpy.api.umap` by passing edges=True
- :func:`~scanpy.api.pp.downsample_counts` for downsampling counts thanks to MD Luecken
- default 'louvain_groups' are now called 'louvain'
- 'X_diffmap' now contains the zero component, plotting remains unchanged
- embed cells using :func:`~scanpy.api.tl.umap` [McInnes18]_: examples
- score sets of genes, e.g. for cell cycle, using :func:`~scanpy.api.tl.score_genes` [Satija15]_: notebook
- :func:`~scanpy.api.pl.clustermap`: heatmap from hierarchical clustering, based on :func:`seaborn.clustermap` [Waskom16]_
- only return matplotlib.Axis in plotting functions of
sc.pl
when show=False, otherwise None
- amendments in PAGA and its plotting functions
- export to SPRING [Weinreb17]_ for interactive visualization of data: tutorial, docs
- finding marker genes via :func:`~scanpy.api.pl.rank_genes_groups_violin` improved: example
- :class:`~anndata.AnnData` can be :meth:`~anndata.AnnData.concatenate` d.
- :class:`~anndata.AnnData` is available as a separate package
- results of PAGA are simplified
Initial release of partition-based graph abstraction (PAGA).
Scanpy now includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells.
Scanpy computationally outperforms the Cell Ranger R kit and allows reproducing most of Seurat's guided clustering tutorial.