Skip to content

Commit

Permalink
support using @DataClass for engine params (pantsbuild#8181)
Browse files Browse the repository at this point in the history
### Problem

A first step to fixing pantsbuild#7074. We would like to be able to make use of python 3 `@dataclass`es, partially because they are the canonical way to make typed tuples in python now, and partially because it allows us to use mypy to resolve type errors at "compile" time instead of performing type checks at runtime when the tuple is constructed, which is what we do now, e.g.: https://github.com/pantsbuild/pants/blob/b4e316f4badc2703cea566267886d8b094d0f569/src/python/pants/util/objects.py#L120-L125

### Solution

- Edit the engine `README.md` to demonstrate how to use `@dataclass` and remove mentions of the old `datatype()`.
- Add an integration test to verify that mypy will correctly detect type errors when constructing `@dataclass` objects.

### Result

Users are able to use a very canonical `@dataclass` declaration syntax for engine params which is statically type-checked with mypy!
  • Loading branch information
cosmicexplorer authored Oct 20, 2019
1 parent b720e69 commit 00c4f8c
Show file tree
Hide file tree
Showing 14 changed files with 187 additions and 64 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ env:
- AWS_ACCESS_KEY_ID__TO_BE_REEXPORTED_ON_DEPLOYS=AKIAV6A6G7RQWPRUWIXR
- secure: hFVAQGLVkexzTd3f9NF+JoG1dE+CPICKqOcdvQYv8+YB2rwwqu0/J6MnqKUZSmec4AM4ZvyPUBIHnSw8aMJysYs+GZ6iG/8ZRRmdbmo2WBPbSZ+ThRZxx/F6AjmovUmf8Zt366ZAZXpc9NHKREkTUGl6UL7FFe9+ouVnb90asdw=
- RUST_BACKTRACE="all"
- PYTHONWARNINGS=ignore::DeprecationWarning
matrix:
include:
- addons:
Expand Down
1 change: 1 addition & 0 deletions build-support/bin/generate_travis_yml.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def all_entries(cls) -> List[Dict[str, str]]:
# https://docs.travis-ci.com/user/environment-variables#defining-encrypted-variables-in-travisyml.
{"secure": "hFVAQGLVkexzTd3f9NF+JoG1dE+CPICKqOcdvQYv8+YB2rwwqu0/J6MnqKUZSmec4AM4ZvyPUBIHnSw8aMJysYs+GZ6iG/8ZRRmdbmo2WBPbSZ+ThRZxx/F6AjmovUmf8Zt366ZAZXpc9NHKREkTUGl6UL7FFe9+ouVnb90asdw="},
'RUST_BACKTRACE="all"',
'PYTHONWARNINGS=ignore::DeprecationWarning',
]

# ----------------------------------------------------------------------
Expand Down
6 changes: 4 additions & 2 deletions src/python/pants/base/deprecated.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,8 @@ def deprecated(
removal_version: str,
hint_message: Optional[str] = None,
subject: Optional[str] = None,
ensure_stderr: bool = False
ensure_stderr: bool = False,
stacklevel: int = 3
):
"""Marks a function or method as deprecated.
Expand Down Expand Up @@ -268,7 +269,8 @@ def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
warn_or_error(removal_version, subject or func_full_name, hint_message,
ensure_stderr=ensure_stderr)
ensure_stderr=ensure_stderr,
stacklevel=stacklevel)
return func(*args, **kwargs)
return wrapper
return decorator
Expand Down
5 changes: 3 additions & 2 deletions src/python/pants/bin/pants_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ def setup_warnings():
#
# However, we do turn off deprecation warnings for libraries that Pants uses for which we do
# not have a fixed upstream version, typically because the library is no longer maintained.
warnings.simplefilter('default', category=DeprecationWarning)
if not os.environ.get('PYTHONWARNINGS'):
warnings.simplefilter('default', category=DeprecationWarning)
# TODO: Eric-Arellano has emailed the author to see if he is willing to accept a PR fixing the
# deprecation warnings and to release the fix. If he says yes, remove this once fixed.
warnings.filterwarnings('ignore', category=DeprecationWarning, module="ansicolors")
Expand All @@ -47,7 +48,7 @@ def ensure_locale(cls):
Fix it by setting the LC_* and LANG environment settings. Example:
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8
Or, bypass it by setting the below environment variable.
Or, bypass it by setting the below environment variable.
{}=1
Note: we cannot guarantee consistent behavior with this bypass enabled.
""".format(encoding, cls.ENCODING_IGNORE_ENV_VAR)
Expand Down
1 change: 1 addition & 0 deletions src/python/pants/engine/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ python_library(
sources=['rules.py'],
dependencies=[
'3rdparty/python:asttokens',
'3rdparty/python:dataclasses',
'3rdparty/python/twitter/commons:twitter.common.collections',
':goal',
':selectors',
Expand Down
88 changes: 54 additions & 34 deletions src/python/pants/engine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ produce the requested value.
### Products and Params

The engine executes your `@rule`s in order to (recursively) compute a `Product` of the requested
type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet still
type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet still
statically checked) form of dependency injection.

When an `@rule` runs, it requires a set of `Param`s that the engine has determined are needed to
Expand Down Expand Up @@ -76,52 +76,72 @@ In cases where this search detects any ambiguity (generally because there are tw
that can provide the same product with the same number of parameters), rule graph compilation will
fail with a useful error message.

### Datatypes
### How to Declare Engine `Param` Types

In practical use, builtin types like `str` or `int` do not provide enough information to
disambiguate between various types of data in `@rule` signatures, so declaring small `datatype`
definitions to provide a unique and descriptive type is highly recommended:
disambiguate between various types of data in `@rule` signatures, so declaring small, unique classes
to encapsulate different states can help to make `@rule` sets with complex control flow between rules
more declarative and self-documenting.

#### Requirements for an Engine `Param`

To use an instance of some class `C` as a `Param` to the engine:
1. instances of `C` must be immutable,
2. `__hash__` and `__eq__` must be implemented such that when constructing two separate instances of `C` with the same argument list `...`, `C(...) == C(...)` and `hash(C(...)) == hash(C(...))`.
- This can be ignored for singleton `Param`s.

#### Benefits of using `@dataclass` for a `Param`
[Python 3 `@dataclass`es](https://docs.python.org/3/library/dataclasses.html) satisfy the above requirements for engine `Param`s. Using `@dataclass` to declare engine `Param` types also provides additional benefits compared to alternatives:
1. a compact and high-performance representation which can be stably shared across FFI boundaries,
2. static type checking via [the `dataclasses` mypy plugin](https://github.com/python/mypy/blob/master/mypy/plugins/dataclasses.py),
3. a concise, standard, and Pythonic way to declare classes.


#### Example Usage of `@dataclass` for Engine `Param`s

*Note that the [3rdparty `dataclasses` library](https://github.com/ericvsmith/dataclasses) must be in your BUILD file and `import`ed if you're running Pants with Python 3.6!*

```python
class FormattedInt(datatype(['content'])): pass
from dataclasses import dataclass

@rule
def int_to_str(value: int) -> FormattedInt:
return FormattedInt('{}'.format(value))
from pants.util.meta import frozen_after_init

# Field values can be specified with positional and/or keyword arguments in the constructor:
x = FormattedInt('a string')
x = FormattedInt(content='a string')
# Pants requires that engine Params have a stable hash. This can be accomplished with the
# `frozen=True` argument set in the `@dataclass` decorator.
@dataclass(frozen=True)
class FormattedInt:
content: str

# Field values can be accessed after construction by name or index:
print(x.content) # 'a string'
print(x[0]) # 'a string'
# `@frozen_after_init` can also be used with `unsafe_hash=True` to create engine Params which can
# modify their fields within the `__init__()` method:
@frozen_after_init
@dataclass(unsafe_hash=True)
class ValidatedInt:
x: int

# datatype objects can be easily inspected:
print(x) # 'FormattedInt(content=a string)'
```
def __init__(self, x):
self.x = x

#### Types of Fields
@rule
def int_to_str(value: ValidatedInt) -> FormattedInt:
return FormattedInt('{}'.format(value.x))

`datatype()` accepts a list of *field declarations*, and returns a type which can be subclassed. A
*field declaration* can just be a string (e.g. `'field_name'`), which is then used as the field
name, as with `FormattedInt` above. A field can also be declared with a tuple of two elements: the
field name string, and a `TypeConstraint` for the field (e.g. `('field_name',
Exactly(FieldType))`). The bare type name (e.g. `FieldType`) can also be used as a shorthand for
`Exactly(FieldType)`. If the tuple form is used, the constructor will create your object, then raise
an error if the field value does not satisfy the type constraint.
# Instances can be constructed and accessed the same way as for normal `@dataclass`es.
x = FormattedInt('a string')
assert x == FormattedInt(content='a string')
print(x.content) # 'a string'

``` python
class TypedDatatype(datatype([('field_name', Exactly(str, int))])):
# `@dataclass` objects can be easily inspected:
print(x) # 'FormattedInt(content="a string")'

# All mypy types, including parameterized types, may be used in field definitions.
@dataclass(frozen=True)
class TypedDatatype:
"""Example of a datatype with a more complex field type constraint."""
```
field_name: Union[str, int]

Assigning a specific type to a field can be somewhat unidiomatic in Python, and may be unexpected or
unnatural to use. However, regardless of whether the object is created directly with type-checked
fields or whether it's produced from a set of rules by the engine's dependency injection, it is
extremely useful to formalize the assumptions made about the value of an object into a specific
type, even if the type just wraps a single field. The `datatype()` function makes it simple and
efficient to apply that strategy.
print(TypedDatatype("huh")) # 'TypedDatatype(field_name=huh)'
```

### Gets and RootRules

Expand Down
37 changes: 29 additions & 8 deletions src/python/pants/engine/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Licensed under the Apache License, Version 2.0 (see LICENSE).

import ast
import dataclasses
import inspect
import itertools
import logging
Expand All @@ -12,7 +13,7 @@
from collections.abc import Iterable
from dataclasses import dataclass
from textwrap import dedent
from typing import Any, Callable, Dict, Optional, Tuple, Type, cast
from typing import Any, Callable, Dict, Optional, Set, Tuple, Type, cast

import asttokens
from twitter.common.collections import OrderedSet
Expand Down Expand Up @@ -465,6 +466,19 @@ def dependency_optionables(self):
"""A tuple of Optionable classes that are known to be necessary to run this rule."""
return ()

@classmethod
def _validate_type_field(cls, type_obj, description):
if not isinstance(type_obj, type):
raise TypeError(f'{description} provided to @rules must be types! Was: {type_obj}.')
if dataclasses.is_dataclass(type_obj):
if not (type_obj.__dataclass_params__.frozen or
getattr(type_obj, frozen_after_init.sentinel_attr, False)):
raise TypeError(
f'{description} {type_obj} is a dataclass declared without `frozen=True`, or without '
'both `unsafe_hash=True` and the `@frozen_after_init` decorator! '
'The engine requires that fields in params are immutable for stable hashing!')
return type_obj


@frozen_after_init
@dataclass(unsafe_hash=True)
Expand Down Expand Up @@ -493,8 +507,16 @@ def __init__(
dependency_optionables: Optional[Tuple] = None,
cacheable: bool = True,
):
self._output_type = output_type
self.input_selectors = input_selectors
self._output_type = self._validate_type_field(output_type, '@rule output type')
self.input_selectors = tuple(
self._validate_type_field(t, '@rule input selector')
for t in input_selectors
)
for g in input_gets:
product_type = g.product
subject_type = g.subject_declared_type
self._validate_type_field(product_type, 'Get product type')
self._validate_type_field(subject_type, 'Get subject type')
self.input_gets = input_gets
self.func = func
self._dependency_rules = dependency_rules or ()
Expand Down Expand Up @@ -534,7 +556,7 @@ class RootRule(Rule):
_output_type: Type

def __init__(self, output_type: Type) -> None:
self._output_type = output_type
self._output_type = self._validate_type_field(output_type, 'RootRule declared type')

@property
def output_type(self):
Expand All @@ -549,13 +571,12 @@ def dependency_optionables(self):
return tuple()


# TODO: add typechecking here -- use dicts for `union_rules`.
@dataclass(frozen=True)
class RuleIndex:
"""Holds a normalized index of Rules used to instantiate Nodes."""
rules: Any
roots: Any
union_rules: Any
rules: Dict[Any, Any]
roots: Set[Any]
union_rules: Dict[Any, Any]

@classmethod
def create(cls, rule_entries, union_rules=None):
Expand Down
2 changes: 2 additions & 0 deletions src/python/pants/util/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ python_library(
sources = ['objects.py'],
dependencies = [
'3rdparty/python/twitter/commons:twitter.common.collections',
'src/python/pants/base:deprecated',
':collections',
':strutil',
':memo',
':meta',
Expand Down
4 changes: 4 additions & 0 deletions src/python/pants/util/meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,4 +137,8 @@ def new_setattr(self, key: str, value: Any) -> None:

cls.__init__ = new_init # type: ignore
cls.__setattr__ = new_setattr # type: ignore
setattr(cls, frozen_after_init.sentinel_attr, True)

return cls

frozen_after_init.sentinel_attr = '_frozen_after_init'
15 changes: 3 additions & 12 deletions src/python/pants/util/objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from collections import OrderedDict, namedtuple
from collections.abc import Iterable

from pants.base.deprecated import deprecated
from pants.util.memo import memoized_classproperty
from pants.util.meta import classproperty
from pants.util.strutil import pluralize
Expand Down Expand Up @@ -43,12 +44,8 @@ def make_type_error(cls, msg, *args, **kwargs):
"""
return cls.type_check_error_type(cls.__name__, msg, *args, **kwargs)

@abstractmethod
def copy(self, **kwargs):
"""Return a new object of the same type, replacing specified fields with new values"""


# TODO(#7074): Migrate to python 3 dataclasses!
@deprecated('1.24.0.dev2', hint_message='Use @dataclass to declare typed named tuples instead!', stacklevel=5)
def datatype(field_decls, superclass_name=None, **kwargs):
"""A wrapper for `namedtuple` that accounts for the type of the object in equality.
Expand Down Expand Up @@ -401,15 +398,9 @@ def exclude_iterable_constraint(cls):
def __init__(self, constraint):
"""Create a `TypeConstraint` which validates each member of a collection with `constraint`.
:param TypeOnlyConstraint constraint: the `TypeConstraint` to apply to each element. This is
currently required to be a `TypeOnlyConstraint` to avoid
complex prototypal type relationships.
:param TypeConstraint constraint: the `TypeConstraint` to apply to each element.
"""

if not isinstance(constraint, TypeOnlyConstraint):
raise TypeError("constraint for collection must be a {}! was: {}"
.format(TypeOnlyConstraint.__name__, constraint))

description = '{}({})'.format(type(self).__name__, constraint)

self._constraint = constraint
Expand Down
2 changes: 1 addition & 1 deletion tests/python/pants_test/engine/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ python_tests(
'src/python/pants/util:dirutil',
'tests/python/pants_test/engine/examples:mapper_test',
'tests/python/pants_test/engine/examples:parsers',
]
],
)

python_tests(
Expand Down
Loading

0 comments on commit 00c4f8c

Please sign in to comment.