This document explains how to set up a development environment for contributing to cleanlab.
While this is not required, we recommend that you do development and testing in a virtual environment. There are a number of tools to do this, including virtualenv, pipenv, and venv. You can compare the tools and choose what is right for you. Here, we'll explain how to get set up with venv, which is built in to Python 3.
$ python3 -m venv ./ENV # create a new virtual environment in the directory ENV
$ source ./ENV/bin/activate # switch to using the virtual environment
You only need to create the virtual environment once, but you will need to
activate it every time you start a new shell. Once the virtual environment is
activated, the pip install
commands below will install dependencies into the
virtual environment rather than your system Python installation.
Run the following commands in the repository's root directory.
-
Install development requirements with
pip install -r requirements-dev.txt
-
Install cleanlab as an editable package with
pip install -e .
Run all the tests:
$ pytest
Run a specific file or test:
$ pytest -k <filename or filter expression>
Run with verbose output:
$ pytest --verbose
Run with code coverage:
$ pytest --cov=cleanlab/ --cov-config .coveragerc --cov-report=html
The coverage report will be available in coverage_html_report/index.html
,
which you can open with your web browser.
You can check that the examples still work with changes you make to cleanlab by manually testing them.
You can build the docs from your local cleanlab version by following these instructions.
cleanlab follows the Black code style. This is enforced by CI, so please format your code before submitting a pull request.
This repo uses the pre-commit framework to easily set up code style checks that run automatically whenever you make a commit. You can install the git hook scripts with:
$ pre-commit install
This repo uses EditorConfig to keep code style consistent across editors and IDEs. You can install a plugin for your editor, and then your editor will automatically ensure that indentation and line endings match the project style.
cleanlab uses NumPy style docstrings (example).
Aspects that are not covered in the NumPy style or that are different from the NumPy style are documented below:
-
Referring to the cleanlab package: we refer to cleanlab without any special formatting, so no
cleanlab
, just cleanlab. -
Cross-referencing: when mentioning functions/classes/methods, always cross-reference them to create a clickable link. Cross-referencing code from Jupyter notebooks is not currently supported.
-
Variable, module, function, and class names: when not cross-references, should be written between single back-ticks, like
`pred_probs`
. Such names in Jupyter notebooks (Markdown) can be written between single back-ticks as well. -
Math: We support LaTeX math with the inline
:math:`x+y`
or the block:.. math:: \sum_{0}^{n} 2n+1
-
Pseudocode vs math: prefer pseudocode in double backticks over LaTeX math.
-
Bold vs italics: Use italics when defining a term, and use bold sparingly for extra emphasis.
-
Shapes: Do not include shapes in the type of parameters, instead use
np.array
orarray_like
as the type and specify allowed shapes in the description. See, for example, the documentation forcleanlab.classification.CleanLearning.fit()
. -
Optional arguments: for the most part, just put
, optional
in the type. -
Type unions: if a parameter or return type is something like "a numpy array or None", you can use "or" to separate types, e.g.
np.array or None
, and it'll be parsed correctly. -
Parameterized types: Use standard Python type hints for referring to parameters and parameterized types in docs, e.g.
Iterable[int]
orlist[float]
.