-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More type hints for rdflib.graph
and related
#1853
More type hints for rdflib.graph
and related
#1853
Conversation
16455d3
to
575d4df
Compare
575d4df
to
e8e90fb
Compare
rdflib.graph
and related
@ajnelson-nist please have a look if you can spare some time and feel comfortable doing a review. |
e8e90fb
to
71a1446
Compare
It seems a shame the I'm also surprised so many of the type references need to be quoted strings, but I'll trust this was an effect of some need of the type-checking system. Overall, this looks like good work in the right direction. I'm happier with the distinct type names for the subject, predicate, and object positions. |
b8161a2
to
faf95a9
Compare
I double checked now, and the second overload was actually wrong and I fixed it in faf95a9, overloads are now as follow: Lines 473 to 498 in faf95a9
The third overload is identical to the implementation, and a catch all for cases where it receives an undifferentiated predicate which is needed for the implementations of
They don't really, but I had to move it in and out of
In part I went for this because I could not really decide what the best restrictions are for them. Currently it is: Lines 49 to 58 in 71a1446
However, Thanks for having a look. |
@eggplants any feedback from you would also be appreciated. |
N3 goes well beyond the RDF spec: def test_triple_types():
from rdflib import _typing
from rdflib.graph import QuotedGraph
from rdflib.paths import SequencePath
from rdflib.term import Variable
# In Notation3, predicates can be one of a variety of Path subclasses
g = Graph()
g = g.parse(data='''
@prefix : <ex:> .
:a :p1 :c ; :p2 :f .
:c :p2 :e ; :p3 :g .
:g :p3 :h ; :p2 :j .
:h :p3 :a ; :p2 :g .
:q :px :q .
''', format='n3')
e = Namespace('ex:')
triple = (e.a, e.p1/e.p2, e.e)
assert triple in g
with pytest.raises(AssertionError):
assert type(triple) == _typing._TripleType
assert type(triple[1]) == SequencePath
# In Notation3 subjects and objects can be QuotedGraphs
g = Graph()
g.parse(data="""
@prefix log: <http://www.w3.org/2000/10/swap/log#> .
:bob :hasmother :alice .
:john :hasmother :alice .
@forAll :x, :y, :z .
{ :x :hasmother :z . :y :hasmother :z } log:implies { :x :issibling :y } .
""", format="n3")
triple = list(g.triples(
(None, URIRef("http://www.w3.org/2000/10/swap/log#implies"), None)))[0]
with pytest.raises(AssertionError):
assert type(triple) == _typing._TripleType
assert type(triple[0]) == QuotedGraph
# In QuotedGraphs, subjects and objects can be Variable
triple = list(sorted([t for t in triple[0]]))[0]
with pytest.raises(AssertionError):
assert type(triple) == _typing._TripleType
assert type(triple[0]) == Variable |
@gjhiggins good point, I was actually just noticing this when working on #1701 I think we have to expand the types as follow to accommodate N3: _SubjectType = Union[Identifier, "QuotedGraph"]
_PredicateType = Union[Identifier, "QuotedGraph"]
_ObjectType = Union[Identifier, "QuotedGraph"] |
A bit of a pity that there is no abstract syntax spec for N3's graph model. |
To clarify why: AFAIU, variables can be parts of graphs in N3, and Literals and BNodes can be any of subject, predicate or object. Possibly we could create a helper Protocol that exposes an interface which only allows RDF 1.1 compliant triples but as Graph can contain much more we have to keep the typing broad enough. |
Marking this as draft, will be still be good to get some more input here but I will add some more tests for quotedgraph first and ensure that the typing actually accommodates this. |
LTGM. I can't wait to get type annotations on all of the rdflib! tips: I recommend |
There sort of is. There's an EBNF expression in the more extensive description of Notation3 which references a downloadable ebnf file. IIRC you mentioned the possibility of using the Lark parser, fwiw I took a pass by pymantic's use of the Lark parser (in which they feed lark with a string containing the EBNF expression) in the context of the EBNF descriptions provided in the RDF Star spec. |
_ObjectType = Identifier | ||
|
||
_TripleType = Tuple["_SubjectType", "_PredicateType", "_ObjectType"] | ||
_QuadType = Tuple["_SubjectType", "_PredicateType", "_ObjectType", "Graph"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have to be updated if @gjhiggins PR's come through!
I will republish this after #1858 is merged, likely the types will become: _SubjectType = Node
_PredicateType = Node
_ObjectType = Node This is mainly because N3 allows a much broader range of subjects, predicates and objects. |
4a7ea04
to
5cb10c6
Compare
#1858 is merged and I updated this to support N3, the typing is now broader. The highlight is likely this: Lines 43 to 63 in 5cb10c6
|
5cb10c6
to
7d0da30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This patch primarily adds more type hints for `rdflib.graph`, but also adds type hints to some related modules in order to work with the new type hints for `rdflib.graph`. I'm mainly doing this as a baseline for adding type hints to `rdflib.store`. I have created type aliases to make it easier to type everything cosnsitently and to make type hints easier easier to change in the future. The type aliases are private however (i.e. `_`-prefixed) and should be kept as such for now. This patch only contains typing changes and does not change runtime behaviour. Broken off from RDFLib#1850
- Fixed second overload of `_TripleType` to return `Path` instead of `_PredicateType`.
Changed type hints based on more information discovered in mean time.
7d0da30
to
a79650d
Compare
I'm a little disappointed (I hope this doesn't come across as being stern! or upset! Just bummed.) with this line: _SubjectType = Node My original goal with nudging type review was to assist with catching thing-or-string errors, particularly easy to do through various context dictionary and/or typo scenarios in JSON-LD. Moving Meanwhile, I've thought about this remark earlier, and I'm not sure it will work:
I'm not so sure about the However, I am aware that subclassing So, there's potentially a significant (to the type system) downstream consequence to supporting One solution to this (caveat - given not too much thought) might be implementing everything but the iterators in an abstract base class |
Thinking about it, I'm not so sure it will work either without an error, but it could still work using
On this, there are quite a lot of Liskov Substitution Principle violations already, we should not add more, but fixing them will take time, my primary goal with contributions at the moment is to get things into the best state we can have it with the current design and interface, and have as much validation as possible, both static and dynamic, and only then start re-architecting. Your suggestion of having a more narrow/constained Graph base class makes a lot of sense, and we should try and head towards that so we can fix all the type violations. If we could move towards that while maintaining backwards compatibility it will be ideal. There may also be some options with generics, something like this:
Ideally in such cases the Graph should do some strict validation also, but I can't say I have the perfect answer as it is now. In short term I think the easiest will be a Protocol that one can cast to, but even if I make this I will only keep it in For now I have little choice but to go with it as is, but it also does not sit right with me, as this significantly reduces the level of static validation offered to users. |
Summary of changes
This patch primarily adds more type hints for
rdflib.graph
, but alsoadds type hints to some related modules in order to work with the new
type hints for
rdflib.graph
.I'm mainly doing this as a baseline for adding type hints to
rdflib.store
.I have created type aliases to make it easier to type everything
cosnsitently and to make type hints easier easier to change in the
future. The type aliases are private however (i.e.
_
-prefixed) andshould be kept as such for now.
This patch only contains typing changes and does not change runtime
behaviour.
Broken off from #1850
Checklist
the same change.
so maintainers can fix minor issues and keep your PR up to date.