Skip to content

Commit

Permalink
Use typing.Literal (scverse#878)
Browse files Browse the repository at this point in the history
  • Loading branch information
flying-sheep authored Oct 18, 2019
1 parent ffb0685 commit f539870
Show file tree
Hide file tree
Showing 34 changed files with 390 additions and 306 deletions.
52 changes: 30 additions & 22 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,31 +36,31 @@ We use the numpydoc style for writing docstrings.
Look at [`sc.tl.louvain`][] as an example for everything mentioned here:

The `Params` abbreviation is a legit replacement for `Parameters`.
There are two ways of documenting parameter types:

1. In most cases, the type annotations you add to function parameters should be enough.
Use the [`typing`](https://docs.python.org/3/library/typing.html) module for containers,
e.g. `Sequence`s (like `list`), `Iterable`s (like `set`), and `Mapping`s (like `dict`).
Always specify what these contain, e.g. `{'a': (1, 2)}``Mapping[str, Tuple[int, int]]`.
If you can’t use one of those, use a concrete class like `AnnData`.
2. If your parameter only accepts an enumeration of strings, specify them like so:
``{`'elem-1'`, `'elem-2'`}``. These contain `a``z`, `0`-`9`, and sometimes `.`, `_` or `-`.


To document parameter types use type annotations on function parameters.
Use the [`typing`][] module for containers, e.g. `Sequence`s (like `list`),
`Iterable`s (like `set`), and `Mapping`s (like `dict`). Always specify
what these contain, e.g. `{'a': (1, 2)}``Mapping[str, Tuple[int, int]]`.
If you can’t use one of those, use a concrete class like `AnnData`.
If your parameter only accepts an enumeration of strings, specify them like so:
`Literal['elem-1', 'elem-2']`.

The `Returns` section deserves special attention:
There are three types of return sections – prose, tuple, and a mix of both.

1. Prose is for simple cases.
2. Tuple return sections are formatted like parameters.
Other than in numpydoc, each tuple is first characterized by the identifier name
and *not* by its type. You can provide type annotation in the function header
or by separation with a colon, as in parameters.
Other than in numpydoc, each tuple is first characterized by the identifier
and *not* by its type. Provide type annotation in the function header.
3. Mix of prose and tuple is relevant in complicated cases,
e.g. when you want to describe that you *added something as annotation to an `AnnData` object*.
e.g. when you want to describe that you
*added something as annotation to an `AnnData` object*.

[`sc.tl.louvain`]: https://github.com/theislab/scanpy/blob/a811fee0ef44fcaecbde0cad6336336bce649484/scanpy/tools/_louvain.py#L22-L90
[`typing`]: https://docs.python.org/3/library/typing.html

#### Examples
For simple cases, use prose as in [`pp.normalize_total`](https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.normalize_total.html)
For simple cases, use prose as in [`pp.normalize_total`][].

```rst
Returns
Expand All @@ -70,9 +70,9 @@ or updates ``adata`` with normalized versions of the original
``adata.X`` and ``adata.layers``, depending on ``inplace``.
```

You can use the standard numpydoc way of populating it, e.g. as in
[`pp.calculate_qc_metrics`](https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.calculate_qc_metrics.html).
If you just use a plain type name here, there will be an automatically created link.
You can use the standard numpydoc way of populating it,
e.g. as in [`pp.calculate_qc_metrics`][].
If you use a plain type name here, a link will be created.

```rst
Returns
Expand All @@ -84,7 +84,7 @@ second_identifier : another.module.and_type
```

Many functions also just modify parts of the passed AnnData object,
like e.g. [`tl.dpt`](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.dpt.html).
like e.g. [`tl.dpt`][].
You can then combine prose and lists to best describe what happens.

```rst
Expand All @@ -103,10 +103,18 @@ dpt_groups : :class:`pandas.Series` (``adata.obs``, dtype ``category``)
'progenitor cells', 'undecided cells' or 'branches' of a process.
```

[`pp.normalize_total`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.normalize_total.html
[`pp.calculate_qc_metrics`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.calculate_qc_metrics.html
[`tl.dpt`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.dpt.html

### Performance

We defer loading a few modules until they’re first needed.
If you want realistic performance measures, be sure to import them before running scanpy functions:
If you want realistic performance measures,
be sure to import them before running scanpy functions:

- Check the list in `test_deferred_imports()` from [`scanpy.tests.test_performance`][]
- Everything in [`scanpy.external`][] wraps a 3rd party import.

- Check the list in `test_deferred_imports()` from [`scanpy/tests/test_performance.py`](https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_performance.py)
- Everything in [`scanpy.external`](https://scanpy.readthedocs.io/en/stable/external/) wraps a 3rd party import.
[`scanpy.tests.test_performance`]: https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_performance.py
[`scanpy.external`]: https://scanpy.readthedocs.io/en/stable/external/
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sphinx_rtd_theme>=0.3.1
# Sphinx 2 has nicer looking sections
sphinx>=2.0.1
sphinx-autodoc-typehints
scanpydoc>=0.4.1
scanpydoc>=0.4.2
# same as ../requires.txt, but omitting the c++ packages
anndata>=0.6.18
matplotlib>=2.2
Expand Down
15 changes: 15 additions & 0 deletions scanpy/_compat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
try:
from typing import Literal
except ImportError:
try:
from typing_extensions import Literal
except ImportError:

class LiteralMeta(type):
def __getitem__(cls, values):
if not isinstance(values, tuple):
values = (values,)
return type('Literal_', (Literal,), dict(params=values))

class Literal(metaclass=LiteralMeta):
pass
14 changes: 11 additions & 3 deletions scanpy/_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

from . import logging
from .logging import _set_log_level, _set_log_file, _RootLogger
from ._compat import Literal

_VERBOSITY_TO_LOGLEVEL = {
'error': 'ERROR',
Expand Down Expand Up @@ -378,6 +379,13 @@ def categories_to_ignore(self, categories_to_ignore: Iterable[str]):
# Functions
# --------------------------------------------------------------------------------

# Collected from the print_* functions in matplotlib.backends
_Format = Literal[
'png', 'jpg', 'tif', 'tiff',
'pdf', 'ps', 'eps', 'svg', 'svgz', 'pgf',
'raw', 'rgba',
]

def set_figure_params(
self,
scanpy: bool = True,
Expand All @@ -387,7 +395,7 @@ def set_figure_params(
vector_friendly: bool = True,
fontsize: int = 14,
color_map: Optional[str] = None,
format: Union[str, Iterable[str]] = "pdf",
format: _Format = "pdf",
transparent: bool = False,
ipython_format: str = "png2x",
):
Expand All @@ -399,7 +407,7 @@ def set_figure_params(
scanpy
Init default values for :obj:`matplotlib.rcParams` suited for Scanpy.
dpi
Resolution of rendered figures - this influences the size of figures in notebooks.
Resolution of rendered figures this influences the size of figures in notebooks.
dpi_save
Resolution of saved figures. This should typically be higher to achieve
publication quality.
Expand All @@ -411,7 +419,7 @@ def set_figure_params(
Set the fontsize for several `rcParams` entries. Ignored if `scanpy=False`.
color_map
Convenience method for setting the default color map. Ignored if `scanpy=False`.
format: {`'png'`, `'pdf'`, `'svg'`, etc.}, optional (default: `'pdf'`)
format
This sets the default format for saving figures: `file_format_figs`.
transparent
Save figures with transparent back ground. Sets
Expand Down
5 changes: 3 additions & 2 deletions scanpy/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from textwrap import dedent

from ._settings import settings
from ._compat import Literal
from . import logging as logg

EPS = 1e-15
Expand Down Expand Up @@ -204,7 +205,7 @@ def compute_association_matrix_of_groups(
adata: AnnData,
prediction: str,
reference: str,
normalization: str = 'prediction',
normalization: Literal['prediction', 'reference'] = 'prediction',
threshold: float = 0.01,
max_n_names: Optional[int] = 2,
):
Expand All @@ -219,7 +220,7 @@ def compute_association_matrix_of_groups(
Field name of adata.obs.
reference
Field name of adata.obs.
normalization: {`'prediction'`, `'reference'`}
normalization
Whether to normalize with respect to the predicted groups or the
reference groups.
threshold
Expand Down
2 changes: 1 addition & 1 deletion scanpy/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
# it would be nice to make the simple data types "properties of the
# module"... putting setters and getters for all of them wouldn't be very nice
from .._settings import settings
# for now - or maybe as the permanently favored solution - put the single function here
# for now or maybe as the permanently favored solution put the single function here
# from ..settings import set_figure_params
set_figure_params = settings.set_figure_params

Expand Down
34 changes: 19 additions & 15 deletions scanpy/external/pp/_dca.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,24 +115,28 @@ def dca(
-------
If `copy` is true and `return_model` is false, AnnData object is returned.
In "denoise" mode, `adata.X` is overwritten with the denoised values. In "latent" mode, latent\
low dimensional representation of cells are stored in `adata.obsm['X_dca']` and `adata.X`\
is not modified. Note that these values are not corrected for library size effects.
If `return_info` is true, all estimated distribution parameters are stored in AnnData such as:
- `.obsm["X_dca_dropout"]` which is the mixture coefficient (pi) of the zero component\
in ZINB, i.e. dropout probability (only if `ae_type` is `zinb` or `zinb-conddisp`).
- `.obsm["X_dca_dispersion"]` which is the dispersion parameter of NB.
- `.uns["dca_loss_history"]` which stores the loss history of the training. See `.history`\
attribute of Keras History class for mode details.
In "denoise" mode, `adata.X` is overwritten with the denoised values.
In "latent" mode, latent low dimensional representation of cells are stored
in `adata.obsm['X_dca']` and `adata.X` is not modified.
Note that these values are not corrected for library size effects.
If `return_info` is true, all estimated distribution parameters are stored
in AnnData like this:
`.obsm["X_dca_dropout"]`
The mixture coefficient (pi) of the zero component in ZINB,
i.e. dropout probability (if `ae_type` is `zinb` or `zinb-conddisp`).
`.obsm["X_dca_dispersion"]`
The dispersion parameter of NB.
`.uns["dca_loss_history"]`
The loss history of the training.
See `.history` attribute of Keras History class for mode details.
Finally, the raw counts are stored in `.raw` attribute of AnnData object.
If `return_model` is given, trained model is returned. When both `copy` and `return_model`\
are true, a tuple of anndata and model is returned in that order.
If `return_model` is given, trained model is returned.
When both `copy` and `return_model` are true,
a tuple of anndata and model is returned in that order.
"""

try:
Expand Down
2 changes: 1 addition & 1 deletion scanpy/external/pp/_magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def magic(
)
# update AnnData instance
if name_list == "pca_only":
# special case - update adata.obsm with smoothed values
# special case update adata.obsm with smoothed values
adata.obsm["X_magic"] = X_magic.X
elif copy:
# just return X_magic
Expand Down
5 changes: 3 additions & 2 deletions scanpy/external/pp/_mnn_correct.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from anndata import AnnData

from ..._settings import settings
from ..._compat import Literal


def mnn_correct(
Expand All @@ -22,7 +23,7 @@ def mnn_correct(
var_adj: bool = True,
compute_angle: bool = False,
mnn_order: Optional[Sequence[int]] = None,
svd_mode: str = 'rsvd',
svd_mode: Literal['svd', 'rsvd', 'irlb'] = 'rsvd',
do_concatenate: bool = True,
save_raw: bool = False,
n_jobs: Optional[int] = None,
Expand Down Expand Up @@ -93,7 +94,7 @@ def mnn_correct(
mnn_order
The order in which batches are to be corrected. When set to None, datas
are corrected sequentially.
svd_mode : {`'svd'`, `'rsvd'`, `'irlb'`}
svd_mode
`'svd'` computes SVD using a non-randomized SVD-via-ID algorithm,
while `'rsvd'` uses a randomized version. `'irlb'` perfores
truncated SVD by implicitly restarted Lanczos bidiagonalization
Expand Down
10 changes: 2 additions & 8 deletions scanpy/external/tl/_palantir.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,30 +48,24 @@ def palantir(
-------
`.uns['palantir_norm_data']`
A `data_df` copy of adata if normalized
`pca_results`
PCA projections and explained variance ratio of adata:
- `.uns['palantir_pca_results']['pca_projections']`
- `.uns['palantir_pca_results']['variance_ratio']`
`dm_res`
Diffusion components, corresponding eigen values and diffusion operator:
- `.uns['palantir_diff_maps']['EigenVectors']`
- `.uns['palantir_diff_maps']['EigenValues']`
- `.uns['palantir_diff_maps']['T']`
`.uns['palantir_ms_data']`
The `ms_data` - Multi scale data matrix
The `ms_data` – Multi scale data matrix
`.uns['palantir_tsne']` : `tsne`
tSNE on diffusion maps
`.uns['palantir_imp_df']` : `imp_df`
Imputed data matrix (MAGIC imputation)
Example
-------
>>> import scanpy.external as sce
>>> import scanpy as sc
Expand All @@ -90,7 +84,7 @@ def palantir(
>>> d = sce.tl.palantir(adata)
At this point, a new class object, `d`, will be instantiated. If the data
needs pre-processing - filtering low genes/cells counts, or normalization,
needs pre-processing filtering low genes/cells counts, or normalization,
or log transformation, set the `filter_low`, `normalize`, or `log_transform`
to `True`:
Expand Down
5 changes: 3 additions & 2 deletions scanpy/external/tl/_phate.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from numpy.random.mtrand import RandomState

from ..._settings import settings
from ..._compat import Literal
from ... import logging as logg


Expand All @@ -21,7 +22,7 @@ def phate(
n_pca: int = 100,
knn_dist: str = 'euclidean',
mds_dist: str = 'euclidean',
mds: str = 'metric',
mds: Literal['classic', 'metric', 'nonmetric'] = 'metric',
n_jobs: Optional[int] = None,
random_state: Optional[Union[int, RandomState]] = None,
verbose: Union[bool, int, None] = None,
Expand Down Expand Up @@ -76,7 +77,7 @@ def phate(
recommended values: 'euclidean' and 'cosine'
Any metric from `scipy.spatial.distance` can be used
distance metric for MDS
mds : {`'classic'`, `'metric'`, `'nonmetric'`}
mds
Selects which MDS algorithm is used for dimensionality reduction.
n_jobs
The number of jobs to use for the computation.
Expand Down
10 changes: 6 additions & 4 deletions scanpy/external/tl/_phenograph.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
from anndata import AnnData
from scipy.sparse import spmatrix

from ...neighbors import _Metric
from ..._compat import Literal
from ... import logging as logg


Expand All @@ -18,11 +20,11 @@ def phenograph(
prune: bool = False,
min_cluster_size: int = 10,
jaccard: bool = True,
primary_metric: str = 'euclidean',
primary_metric: _Metric = 'euclidean',
n_jobs: int = -1,
q_tol: float = 1e-3,
louvain_time_limit: int = 2000,
nn_method: str = 'kdtree',
nn_method: Literal['kdtree', 'brute'] = 'kdtree',
) -> Tuple[np.ndarray, spmatrix, float]:
"""\
PhenoGraph clustering [Levine15]_.
Expand All @@ -48,7 +50,7 @@ def phenograph(
jaccard
If `True`, use Jaccard metric between k-neighborhoods to build graph.
If `False`, use a Gaussian kernel.
primary_metric : {`'euclidean'`, `'manhattan'`, `'correlation'`, `'cosine'`}
primary_metric
Distance metric to define nearest neighbors.
Note that performance will be slower for correlation and cosine.
n_jobs
Expand All @@ -59,7 +61,7 @@ def phenograph(
louvain_time_limit
Maximum number of seconds to run modularity optimization.
If exceeded the best result so far is returned.
nn_method : {`'kdtree'`, `'brute'`}
nn_method
Whether to use brute force or kdtree for nearest neighbor search.
For very large high-dimensional data sets, brute force
(with parallel computation) performs faster than kdtree.
Expand Down
Loading

0 comments on commit f539870

Please sign in to comment.