-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/1369 custom resolver #1385
base: main
Are you sure you want to change the base?
Fix/1369 custom resolver #1385
Conversation
6ecbac9
to
f511df9
Compare
This adds a new Resolver class which can be overridden with a call to `set_default_resolver()`. This is the start of resolving RDFLib#1369, and we'll add more functionality to the Resolver class in subsequent commits.
This adds individual methods for resolving file: and http[s]: URLs, and a mechanism to extend for additional schemes and behaviour in subclasses.
To maintain the existing behaviour of create_input_source, we add a `trust` parameter which bypasses the new `is_resolution_allowed()` check. Our default otherwise is to say no, with other behaviour currently configurable using `set_default_resolver()`. At the moment there is no code that uses resolvers directly, except for `create_input_source()`. Soon, we'll update the JSON-LD parser to use it directly (i.e. with `trust=False`), at which point the new resolution restrictions will take effect.
These provide, respectively: 1. a straightforward way to allow-list schemes and URLs by environment variable or instantiation parameter, as the default resolver. 2. a way to allow all resolution, which was the pre-6.0.x behaviour.
People deploying can now use e.g. `RDFLIB_DEFAULT_RESOLVER_CLASS=rdflib.resolver.PermissiveResolver` to change resolution behaviour without having to resort to code. It doesn't provide a way to specify instantiation arguments for the given resolver, but that's fine. Setting the resolver is now performed while holding a `threading.Lock()`. We find the default resolver lazily so to avoid any import loops where a custom resolver (necessarily) imports the `rdflib.resolver` module.
We don't need to return anything other than the InputSource. We ensure that the system ID is set, as that should contain the absolute_location. The previous return values can be reconstructed in the internal `_create_input_source_from_location` function.
We pass an explicit resolver around, defaulting to `get_default_resolver()` if not specified. `source_to_json()` has been removed as it uses `create_input_source()`, which trusts all locations passed to it. Instead, we parse json explicitly in the two places where it was called. Resolves RDFLib#1369.
f511df9
to
9455370
Compare
Hi @alexdutton I am watching this PR with interest - I think it's a great idea to have "everything" available to RDFlib users, and, if scary, off by default - so look forward to you bringing it out of draft. Just a couple of questions:
|
Hi @nicholascar
|
This is currently the main documentation for the resolver functionality.
Before this, there were two different parameter names across the `is_resolution_allowed` and `resolve_*` methods. They are now both called `url`, and are now explicitly both passed as `str` objects instead of `URIRef` objects.
@alexdutton this PR is still in draft. Are you planning on taking it out of draft soon so we can review it formally? |
This is a critical vulnerability. Will this be fixed in 6.3 release? Do we have a date for it? |
The plan is to fix in in 6.3, but sadly I have very little time to work on this. I will be having some days off work later this month and I hope to finish it then. |
Fixes #1369
Proposed Changes
This adds a new Resolver framework, and updates the JSON-LD parser to use it.
As a result, implementors have much more control over which URLs referenced in JSON-LD documents will be resolved.
This PR implements a default-deny approach to URL resolution, which will need explaining in release notes and documentation. Implementors can revert to the old behaviour via environment variable if they are happy with the risks.
This also provides a base for future extensibility, e.g. caching or using locally-held copies of external resources.
No tests or documentation yet, but all existing tests pass when using the new
PermissiveResolver
.