Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/1369 custom resolver #1385

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

alexdutton
Copy link
Contributor

Fixes #1369

Proposed Changes

This adds a new Resolver framework, and updates the JSON-LD parser to use it.

As a result, implementors have much more control over which URLs referenced in JSON-LD documents will be resolved.

This PR implements a default-deny approach to URL resolution, which will need explaining in release notes and documentation. Implementors can revert to the old behaviour via environment variable if they are happy with the risks.

This also provides a base for future extensibility, e.g. caching or using locally-held copies of external resources.

No tests or documentation yet, but all existing tests pass when using the new PermissiveResolver.

@alexdutton alexdutton force-pushed the fix/1369-custom-resolver branch 2 times, most recently from 6ecbac9 to f511df9 Compare August 4, 2021 13:53
This adds a new Resolver class which can be overridden with a call to
`set_default_resolver()`.

This is the start of resolving RDFLib#1369, and we'll add more functionality
to the Resolver class in subsequent commits.
This adds individual methods for resolving file: and http[s]: URLs, and
a mechanism to extend for additional schemes and behaviour in
subclasses.
To maintain the existing behaviour of create_input_source, we add a
`trust` parameter which bypasses the new `is_resolution_allowed()`
check.

Our default otherwise is to say no, with other behaviour currently
configurable using `set_default_resolver()`.

At the moment there is no code that uses resolvers directly, except for
`create_input_source()`. Soon, we'll update the JSON-LD parser to use it
directly (i.e. with `trust=False`), at which point the new resolution
restrictions will take effect.
These provide, respectively:

1. a straightforward way to allow-list schemes and URLs by environment
   variable or instantiation parameter, as the default resolver.
2. a way to allow all resolution, which was the pre-6.0.x behaviour.
People deploying can now use e.g.
`RDFLIB_DEFAULT_RESOLVER_CLASS=rdflib.resolver.PermissiveResolver` to
change resolution behaviour without having to resort to code.

It doesn't provide a way to specify instantiation arguments for the
given resolver, but that's fine.

Setting the resolver is now performed while holding a
`threading.Lock()`.

We find the default resolver lazily so to avoid any import loops where a
custom resolver (necessarily) imports the `rdflib.resolver` module.
We don't need to return anything other than the InputSource. We ensure
that the system ID is set, as that should contain the absolute_location.

The previous return values can be reconstructed in the internal
`_create_input_source_from_location` function.
We pass an explicit resolver around, defaulting to
`get_default_resolver()` if not specified.

`source_to_json()` has been removed as it uses `create_input_source()`,
which trusts all locations passed to it. Instead, we parse json
explicitly in the two places where it was called.

Resolves RDFLib#1369.
@alexdutton alexdutton force-pushed the fix/1369-custom-resolver branch from f511df9 to 9455370 Compare August 4, 2021 14:11
@nicholascar nicholascar requested review from nicholascar, ashleysommer and white-gecko and removed request for nicholascar August 28, 2021 06:39
@nicholascar
Copy link
Member

Hi @alexdutton I am watching this PR with interest - I think it's a great idea to have "everything" available to RDFlib users, and, if scary, off by default - so look forward to you bringing it out of draft.

Just a couple of questions:

  1. Has this work of yours been motivated by the JSON-LD parser & serializer now being in RDFlib core?
  2. Do you have your own code like this in action already, just not in RDFlib?
  3. Can you please add some notes to the top of the new file rdflib/resolver.py to explains what it's doing, perhaps something like rdflib/compare.py (though doesn't have to be as long!) This just helps with people and file browsing

@alexdutton
Copy link
Contributor Author

Hi @nicholascar

  1. I didn't realise the JSON-LD support had been moved into core when I first realised this was an issue, but still saw the fix as being in core as that's where the resolution behaviour was.
  2. No, I don't have similar code elsewhere. This was precipitated by needing to implement a Linked Data Notifications inbox for the COAR Notify initiative, which involves receiving JSON-LD documents to a publicly accessible HTTP endpoint.
  3. Yep, I'll do that now :-)

This is currently the main documentation for the resolver functionality.
Before this, there were two different parameter names across the
`is_resolution_allowed` and `resolve_*` methods. They are now both
called `url`, and are now explicitly both passed as `str` objects
instead of `URIRef` objects.
@nicholascar
Copy link
Member

@alexdutton this PR is still in draft. Are you planning on taking it out of draft soon so we can review it formally?

@ramcaat
Copy link

ramcaat commented Dec 7, 2022

This is a critical vulnerability. Will this be fixed in 6.3 release? Do we have a date for it?

@aucampia
Copy link
Member

This is a critical vulnerability. Will this be fixed in 6.3 release? Do we have a date for it?

The plan is to fix in in 6.3, but sadly I have very little time to work on this. I will be having some days off work later this month and I hope to finish it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants