Use typing.Literal (scverse#878)

yunguan-wang · Oct 18, 2019 · f539870 · f539870
1 parent ffb0685
commit f539870
Show file tree

Hide file tree

Showing 34 changed files with 390 additions and 306 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -36,31 +36,31 @@ We use the numpydoc style for writing docstrings.
 Look at [`sc.tl.louvain`][] as an example for everything mentioned here:
 
 The `Params` abbreviation is a legit replacement for `Parameters`.
-There are two ways of documenting parameter types:
-
-1. In most cases, the type annotations you add to function parameters should be enough.
-   Use the [`typing`](https://docs.python.org/3/library/typing.html) module for containers,
-   e.g. `Sequence`s (like `list`), `Iterable`s (like `set`), and `Mapping`s (like `dict`).
-   Always specify what these contain, e.g. `{'a': (1, 2)}` → `Mapping[str, Tuple[int, int]]`.
-   If you can’t use one of those, use a concrete class like `AnnData`.
-2. If your parameter only accepts an enumeration of strings, specify them like so:
-   ``{`'elem-1'`, `'elem-2'`}``. These contain `a`–`z`, `0`-`9`, and sometimes `.`, `_` or `-`.
-
+
+To document parameter types use type annotations on function parameters.
+Use the [`typing`][] module for containers, e.g. `Sequence`s (like `list`),
+`Iterable`s (like `set`), and `Mapping`s (like `dict`). Always specify
+what these contain, e.g. `{'a': (1, 2)}` → `Mapping[str, Tuple[int, int]]`.
+If you can’t use one of those, use a concrete class like `AnnData`.
+If your parameter only accepts an enumeration of strings, specify them like so:
+`Literal['elem-1', 'elem-2']`.
+
 The `Returns` section deserves special attention:
 There are three types of return sections – prose, tuple, and a mix of both.
 
 1. Prose is for simple cases.
 2. Tuple return sections are formatted like parameters.
-   Other than in numpydoc, each tuple is first characterized by the identifier name
-   and *not* by its type. You can provide type annotation in the function header
-   or by separation with a colon, as in parameters.
+   Other than in numpydoc, each tuple is first characterized by the identifier
+   and *not* by its type. Provide type annotation in the function header.
 3. Mix of prose and tuple is relevant in complicated cases,
-   e.g. when you want to describe that you *added something as annotation to an `AnnData` object*.
+   e.g. when you want to describe that you
+   *added something as annotation to an `AnnData` object*.
 
 [`sc.tl.louvain`]: https://github.com/theislab/scanpy/blob/a811fee0ef44fcaecbde0cad6336336bce649484/scanpy/tools/_louvain.py#L22-L90
+[`typing`]: https://docs.python.org/3/library/typing.html
 
 #### Examples
-For simple cases, use prose as in [`pp.normalize_total`](https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.normalize_total.html)
+For simple cases, use prose as in [`pp.normalize_total`][].
 
 ```rst
 Returns
@@ -70,9 +70,9 @@ or updates ``adata`` with normalized versions of the original
 ``adata.X`` and ``adata.layers``, depending on ``inplace``.
 ```
 
-You can use the standard numpydoc way of populating it, e.g. as in
-[`pp.calculate_qc_metrics`](https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.calculate_qc_metrics.html).
-If you just use a plain type name here, there will be an automatically created link.
+You can use the standard numpydoc way of populating it,
+e.g. as in [`pp.calculate_qc_metrics`][].
+If you use a plain type name here, a link will be created.
 
 ```rst
 Returns
@@ -84,7 +84,7 @@ second_identifier : another.module.and_type
 ```
 
 Many functions also just modify parts of the passed AnnData object,
-like e.g. [`tl.dpt`](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.dpt.html).
+like e.g. [`tl.dpt`][].
 You can then combine prose and lists to best describe what happens.
 
 ```rst
@@ -103,10 +103,18 @@ dpt_groups : :class:`pandas.Series` (``adata.obs``, dtype ``category``)
     'progenitor cells', 'undecided cells' or 'branches' of a process.
 ```
 
+[`pp.normalize_total`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.normalize_total.html
+[`pp.calculate_qc_metrics`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.pp.calculate_qc_metrics.html
+[`tl.dpt`]: https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.dpt.html
+
 ### Performance
 
 We defer loading a few modules until they’re first needed.
-If you want realistic performance measures, be sure to import them before running scanpy functions:
+If you want realistic performance measures,
+be sure to import them before running scanpy functions:
+
+- Check the list in `test_deferred_imports()` from [`scanpy.tests.test_performance`][]
+- Everything in [`scanpy.external`][] wraps a 3rd party import.
 
-- Check the list in `test_deferred_imports()` from [`scanpy/tests/test_performance.py`](https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_performance.py)
-- Everything in [`scanpy.external`](https://scanpy.readthedocs.io/en/stable/external/) wraps a 3rd party import.
+[`scanpy.tests.test_performance`]: https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_performance.py
+[`scanpy.external`]: https://scanpy.readthedocs.io/en/stable/external/
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -4,7 +4,7 @@ sphinx_rtd_theme>=0.3.1
 # Sphinx 2 has nicer looking sections
 sphinx>=2.0.1
 sphinx-autodoc-typehints
-scanpydoc>=0.4.1
+scanpydoc>=0.4.2
 # same as ../requires.txt, but omitting the c++ packages
 anndata>=0.6.18
 matplotlib>=2.2

diff --git a/scanpy/_compat.py b/scanpy/_compat.py
@@ -0,0 +1,15 @@
+try:
+    from typing import Literal
+except ImportError:
+    try:
+        from typing_extensions import Literal
+    except ImportError:
+
+        class LiteralMeta(type):
+            def __getitem__(cls, values):
+                if not isinstance(values, tuple):
+                    values = (values,)
+                return type('Literal_', (Literal,), dict(params=values))
+
+        class Literal(metaclass=LiteralMeta):
+            pass
diff --git a/scanpy/_settings.py b/scanpy/_settings.py
@@ -10,6 +10,7 @@
 
 from . import logging
 from .logging import _set_log_level, _set_log_file, _RootLogger
+from ._compat import Literal
 
 _VERBOSITY_TO_LOGLEVEL = {
     'error': 'ERROR',
@@ -378,6 +379,13 @@ def categories_to_ignore(self, categories_to_ignore: Iterable[str]):
     # Functions
     # --------------------------------------------------------------------------------
 
+    # Collected from the print_* functions in matplotlib.backends
+    _Format = Literal[
+        'png', 'jpg', 'tif', 'tiff',
+        'pdf', 'ps', 'eps', 'svg', 'svgz', 'pgf',
+        'raw', 'rgba',
+    ]
+
     def set_figure_params(
         self,
         scanpy: bool = True,
@@ -387,7 +395,7 @@ def set_figure_params(
         vector_friendly: bool = True,
         fontsize: int = 14,
         color_map: Optional[str] = None,
-        format: Union[str, Iterable[str]] = "pdf",
+        format: _Format = "pdf",
         transparent: bool = False,
         ipython_format: str = "png2x",
     ):
@@ -399,7 +407,7 @@ def set_figure_params(
         scanpy
             Init default values for :obj:`matplotlib.rcParams` suited for Scanpy.
         dpi
-            Resolution of rendered figures - this influences the size of figures in notebooks.
+            Resolution of rendered figures – this influences the size of figures in notebooks.
         dpi_save
             Resolution of saved figures. This should typically be higher to achieve
             publication quality.
@@ -411,7 +419,7 @@ def set_figure_params(
             Set the fontsize for several `rcParams` entries. Ignored if `scanpy=False`.
         color_map
             Convenience method for setting the default color map. Ignored if `scanpy=False`.
-        format: {`'png'`, `'pdf'`, `'svg'`, etc.}, optional (default: `'pdf'`)
+        format
             This sets the default format for saving figures: `file_format_figs`.
         transparent
             Save figures with transparent back ground. Sets

diff --git a/scanpy/_utils.py b/scanpy/_utils.py
@@ -16,6 +16,7 @@
 from textwrap import dedent
 
 from ._settings import settings
+from ._compat import Literal
 from . import logging as logg
 
 EPS = 1e-15
@@ -204,7 +205,7 @@ def compute_association_matrix_of_groups(
     adata: AnnData,
     prediction: str,
     reference: str,
-    normalization: str = 'prediction',
+    normalization: Literal['prediction', 'reference'] = 'prediction',
     threshold: float = 0.01,
     max_n_names: Optional[int] = 2,
 ):
@@ -219,7 +220,7 @@ def compute_association_matrix_of_groups(
         Field name of adata.obs.
     reference
         Field name of adata.obs.
-    normalization: {`'prediction'`, `'reference'`}
+    normalization
         Whether to normalize with respect to the predicted groups or the
         reference groups.
     threshold

diff --git a/scanpy/api/__init__.py b/scanpy/api/__init__.py
@@ -32,7 +32,7 @@
 # it would be nice to make the simple data types "properties of the
 # module"... putting setters and getters for all of them wouldn't be very nice
 from .._settings import settings
-# for now - or maybe as the permanently favored solution - put the single function here
+# for now – or maybe as the permanently favored solution – put the single function here
 # from ..settings import set_figure_params
 set_figure_params = settings.set_figure_params
 

diff --git a/scanpy/external/pp/_dca.py b/scanpy/external/pp/_dca.py
@@ -115,24 +115,28 @@ def dca(
     -------
     If `copy` is true and `return_model` is false, AnnData object is returned.
 
-    In "denoise" mode, `adata.X` is overwritten with the denoised values. In "latent" mode, latent\
-    low dimensional representation of cells are stored in `adata.obsm['X_dca']` and `adata.X`\
-    is not modified. Note that these values are not corrected for library size effects.
-
-    If `return_info` is true, all estimated distribution parameters are stored in AnnData such as:
-
-    - `.obsm["X_dca_dropout"]` which is the mixture coefficient (pi) of the zero component\
-    in ZINB, i.e. dropout probability (only if `ae_type` is `zinb` or `zinb-conddisp`).
-
-    - `.obsm["X_dca_dispersion"]` which is the dispersion parameter of NB.
-
-    - `.uns["dca_loss_history"]` which stores the loss history of the training. See `.history`\
-    attribute of Keras History class for mode details.
+    In "denoise" mode, `adata.X` is overwritten with the denoised values.
+    In "latent" mode, latent low dimensional representation of cells are stored
+    in `adata.obsm['X_dca']` and `adata.X` is not modified.
+    Note that these values are not corrected for library size effects.
+
+    If `return_info` is true, all estimated distribution parameters are stored
+    in AnnData like this:
+
+    `.obsm["X_dca_dropout"]`
+        The mixture coefficient (pi) of the zero component in ZINB,
+        i.e. dropout probability (if `ae_type` is `zinb` or `zinb-conddisp`).
+    `.obsm["X_dca_dispersion"]`
+        The dispersion parameter of NB.
+    `.uns["dca_loss_history"]`
+        The loss history of the training.
+        See `.history` attribute of Keras History class for mode details.
 
     Finally, the raw counts are stored in `.raw` attribute of AnnData object.
 
-    If `return_model` is given, trained model is returned. When both `copy` and `return_model`\
-    are true, a tuple of anndata and model is returned in that order.
+    If `return_model` is given, trained model is returned.
+    When both `copy` and `return_model` are true,
+    a tuple of anndata and model is returned in that order.
     """
 
     try:

diff --git a/scanpy/external/pp/_magic.py b/scanpy/external/pp/_magic.py
@@ -153,7 +153,7 @@ def magic(
     )
     # update AnnData instance
     if name_list == "pca_only":
-        # special case - update adata.obsm with smoothed values
+        # special case – update adata.obsm with smoothed values
         adata.obsm["X_magic"] = X_magic.X
     elif copy:
         # just return X_magic

diff --git a/scanpy/external/pp/_mnn_correct.py b/scanpy/external/pp/_mnn_correct.py
@@ -5,6 +5,7 @@
 from anndata import AnnData
 
 from ..._settings import settings
+from ..._compat import Literal
 
 
 def mnn_correct(
@@ -22,7 +23,7 @@ def mnn_correct(
     var_adj: bool = True,
     compute_angle: bool = False,
     mnn_order: Optional[Sequence[int]] = None,
-    svd_mode: str = 'rsvd',
+    svd_mode: Literal['svd', 'rsvd', 'irlb'] = 'rsvd',
     do_concatenate: bool = True,
     save_raw: bool = False,
     n_jobs: Optional[int] = None,
@@ -93,7 +94,7 @@ def mnn_correct(
     mnn_order
         The order in which batches are to be corrected. When set to None, datas
         are corrected sequentially.
-    svd_mode : {`'svd'`, `'rsvd'`, `'irlb'`}
+    svd_mode
         `'svd'` computes SVD using a non-randomized SVD-via-ID algorithm,
         while `'rsvd'` uses a randomized version. `'irlb'` perfores
         truncated SVD by implicitly restarted Lanczos bidiagonalization

diff --git a/scanpy/external/tl/_palantir.py b/scanpy/external/tl/_palantir.py
@@ -48,30 +48,24 @@ def palantir(
     -------
     `.uns['palantir_norm_data']`
         A `data_df` copy of adata if normalized
-
     `pca_results`
         PCA projections and explained variance ratio of adata:
         - `.uns['palantir_pca_results']['pca_projections']`
         - `.uns['palantir_pca_results']['variance_ratio']`
-
     `dm_res`
         Diffusion components, corresponding eigen values and diffusion operator:
         - `.uns['palantir_diff_maps']['EigenVectors']`
         - `.uns['palantir_diff_maps']['EigenValues']`
         - `.uns['palantir_diff_maps']['T']`
-
     `.uns['palantir_ms_data']`
-        The `ms_data` - Multi scale data matrix
-
+        The `ms_data` – Multi scale data matrix
     `.uns['palantir_tsne']` : `tsne`
         tSNE on diffusion maps
-
     `.uns['palantir_imp_df']` : `imp_df`
         Imputed data matrix (MAGIC imputation)
 
     Example
     -------
-
     >>> import scanpy.external as sce
     >>> import scanpy as sc
 
@@ -90,7 +84,7 @@ def palantir(
     >>> d = sce.tl.palantir(adata)
 
     At this point, a new class object, `d`, will be instantiated. If the data
-    needs pre-processing - filtering low genes/cells counts, or normalization,
+    needs pre-processing – filtering low genes/cells counts, or normalization,
     or log transformation, set the `filter_low`, `normalize`, or `log_transform`
     to `True`:
 

diff --git a/scanpy/external/tl/_phate.py b/scanpy/external/tl/_phate.py
@@ -7,6 +7,7 @@
 from numpy.random.mtrand import RandomState
 
 from ..._settings import settings
+from ..._compat import Literal
 from ... import logging as logg
 
 
@@ -21,7 +22,7 @@ def phate(
     n_pca: int = 100,
     knn_dist: str = 'euclidean',
     mds_dist: str = 'euclidean',
-    mds: str = 'metric',
+    mds: Literal['classic', 'metric', 'nonmetric'] = 'metric',
     n_jobs: Optional[int] = None,
     random_state: Optional[Union[int, RandomState]] = None,
     verbose: Union[bool, int, None] = None,
@@ -76,7 +77,7 @@ def phate(
         recommended values: 'euclidean' and 'cosine'
         Any metric from `scipy.spatial.distance` can be used
         distance metric for MDS
-    mds : {`'classic'`, `'metric'`, `'nonmetric'`}
+    mds
         Selects which MDS algorithm is used for dimensionality reduction.
     n_jobs
         The number of jobs to use for the computation.

diff --git a/scanpy/external/tl/_phenograph.py b/scanpy/external/tl/_phenograph.py
@@ -7,6 +7,8 @@
 from anndata import AnnData
 from scipy.sparse import spmatrix
 
+from ...neighbors import _Metric
+from ..._compat import Literal
 from ... import logging as logg
 
 
@@ -18,11 +20,11 @@ def phenograph(
     prune: bool = False,
     min_cluster_size: int = 10,
     jaccard: bool = True,
-    primary_metric: str = 'euclidean',
+    primary_metric: _Metric = 'euclidean',
     n_jobs: int = -1,
     q_tol: float = 1e-3,
     louvain_time_limit: int = 2000,
-    nn_method: str = 'kdtree',
+    nn_method: Literal['kdtree', 'brute'] = 'kdtree',
 ) -> Tuple[np.ndarray, spmatrix, float]:
     """\
     PhenoGraph clustering [Levine15]_.
@@ -48,7 +50,7 @@ def phenograph(
     jaccard
         If `True`, use Jaccard metric between k-neighborhoods to build graph.
         If `False`, use a Gaussian kernel.
-    primary_metric : {`'euclidean'`, `'manhattan'`, `'correlation'`, `'cosine'`}
+    primary_metric
         Distance metric to define nearest neighbors.
         Note that performance will be slower for correlation and cosine.
     n_jobs
@@ -59,7 +61,7 @@ def phenograph(
     louvain_time_limit
         Maximum number of seconds to run modularity optimization.
         If exceeded the best result so far is returned.
-    nn_method : {`'kdtree'`, `'brute'`}
+    nn_method
         Whether to use brute force or kdtree for nearest neighbor search.
         For very large high-dimensional data sets, brute force
         (with parallel computation) performs faster than kdtree.