Skip to content

Latest commit

 

History

History
344 lines (270 loc) · 15.6 KB

importers.md

File metadata and controls

344 lines (270 loc) · 15.6 KB

Python Importers and Hooks

The primary description of the import system is the reference documentation. This more accurately describes the current system than either of the PEPs below, and should probably be read first.

PEP 302 "New Import Hooks" describes the import system introduced in Python 2.3. This was heavily modified (with backward compatability retained) by PEP 451 "A ModuleSpec Type for the Import System" (Python 3.4), but much documentation still references only the (now not entirely accurate) PEP 302.

The importlib library implements the above and documents some further details.

Readings:

Importer Protocol

An importer builds modules, usually including loading Python code from some location. It consists of two logically distinct objects: the finder which identifies the loader to use (and sometimes other information) and the loader which actually does the loading. In PEP 451 loading the finder creates a spec which includes the loader along with additional information.

Finders

Finder objects are normally added to sys.meta_path and/or sys.path_hooks (see below).

PEP 302

The finder interface of PEP 302 (deprecated since 3.3) has a find_module(fullname, path=None) method taking the fully qualified (dotted) name of the module and a path argument of None for a top-level module or the parent module's package.__path__ for submodules. It returns the loader. If no loader is found, it returns None or, before Python 3.4, raises NotImplementedError.

It appears that PEP 302 finders may also return (None, portions) to indicate part of a possible namespace package. XXX Find further docs on this.

PEP 451

The finder interface of PEP 451 (often importlib.abc's MetaPathFinder or PathEntryFinder) has a find_spec(fullname, path, target=None) method taking the name and path as above and an optional target module object used if reloading. It returns the ModuleSpec for the module (which includes the loader) or None if no loader was found (or no loader that could reload into target).

PEP 451 finders may cache data related to module searches; if so they can be invalidated with their invalidate_caches() method. If the finder has no caches this returns None or, before Python 3.4, NotImplemented.

For backwards compatibility, importlib.abc.MetaPathFinder and PathEntryFinder implement a find_module method that returns the loader in the spec returned by find_spec(). (Having specs implement the loader interface was considered an unnecessary complication.)

Python 3.4 and above offer some factory functions, spec_from_file_location and spec_from_loader, to help build specs.

Loaders

Loaders (unlike finders) can depend on parents having been imported and existing in sys.modules , e.g., when load_module('foo.bar.baz') is called foo and foo.bar are already imported.

PEP 302

The loader interface of PEP 302 has a load_module(fullname) method that returns the loaded module or raises an exception, usually ImportError if no other exception is being propagated. load_module is responsible for some significant work (see the link above and the PEP 451 loader description) including various kinds of validation and setup. Methods in the Python 2 import libraries and Python 3 importlib.util (particularly importlib.util.module_for_loader() are designed to help with this.

PEP 451

A PEP 451 loader has an exec_module(module) method that, given a module object, executes the module code within it to finish building (or reloading) the module. It must handle being called more than once on the same module object, though may do this by throwing ImportError on calls after the first.

These loaders also have a create_module(spec) method that can create and return the new module object to be passed to exec_module. If it it returns None the import system will create the module object itself in the default way.

Neither of these should set any import-related attributes on the module.

The PEP 420 module_repr() method is deprecated but if it exists on a loader it will be used exclusively.

Module Search and Load Process

Modules are loaded only after their parent package modules are loaded. The following search process will be followed first for the highest-level unloaded module in the full module name and then for each immediate child module.

Cache Lookup

Modules are first looked up in the module cache, sys.modules, which contains all explicitly and automatically loaded modules. (Thus, import a.b.c will insert a, a.b and a.b.c into the cache if a had not been previously loaded.) The cache is writable so deleting a key or setting its value to None will force module creation anew on next import, though other modules will still have references to the old module object. (Use importlib.reload() to have the existing module object reloaded.)

Meta-path Searches

If a module must be loaded, the interpreter does a meta-path search search, walking through walking through the list of finder objects in sys.meta_path and on each calling its find_spec() (if not present, find_module()) method. (See below for the arguments.) If all of these fail, an ModuleNotFoundError is thrown.

The default meta_path in ≥3.4 includes the following finders:

[<class '_frozen_importlib.BuiltinImporter'>,
 <class '_frozen_importlib.FrozenImporter'>,
 <class '_frozen_importlib_external.PathFinder'>]

(In Python <3.4, the default meta_path is empty and the system internally tries the hardcoded equivalant procedures of the above finders when no finder in meta_path is successful.)

The arguments to find_spec() are:

  • The fully-qualified name of the module. (str)
  • Path entries to use for module search: (iter)
    • None if it's a top-level module.
    • a.b.__path__ where a.b is the parent module. If the parent module's __path__ attribute is None or missing, ModuleNotFoundError is raised.
  • Only when reloading, an existing module object that will be the target of the reload. (module)

Path Searches

The _frozen_importlib_external.PathFinder (≥3.4) or Python internally (<3.4) does a path search in the following manner.

The search path is sys.path for top-level modules or the parent module's __path__ for child modules. In either case it consists of an iterable of strings representing locations.

For each location in turn, a finder for that location (if any) is queried by calling its find_spec() method or, if not present, find_module(). If that finder returns a spec (or loader), it is used, otherwise the search moves on to the next location in the path.

To get the finder, the location is first looked up in the sys.path_importer_cache dictionary. If the key is present the value is either a finder that is used or None in which case the import fails because previous searches for a finder for this location failed.

If the key is not found, the location is passed in turn to each hook (a callable object) in sys.path_hooks. The first hook that returns a finder rather than throwing ImportError is stored in the cache. If no hook returns a finder, None is stored in the cache.

Import-related Libraries

Various import-related libraries are listed in Importing Modules in the standard library documentation.

importlib replaced the deprecated imp library in 3.4.

from importlib.util import spec_from_loader
from importlib.machinery import SourceFileLoader

spec = spec_from_loader("foobar",
    SourceFileLoader("foobar", "/path/to/foobar"))
foobar = module_from_spec(spec)
spec.loader.exec_module(foobar)

# To keep importing by name after first load:
sys.modules['foobar'] = foobar

Building a Custom Finder/Loader

Import almost anything in Python gives a more detailed example; this is a brief sumary.

importlib.abc provides useful abstract classes with the interfaces you need to implement below. (The example above doesn't use these.)

Create a finder object with a find_spec() function returning a importlib.machinery.ModuleSpec object or None. The ModuleSpec must have a loader implementing the [PEP 302] interface.

The loader's create_module(spec) function (which ≥3.6 must exist if exec_module() below is defined) normally returns None for default module creation semantics: i.e., let the system create the module object. But you can return a custom object you've created yourself, if necessary.

The loader's exec_module(module) method takes a partially-constructed module object (basically, just has __name__ defined on it) and must finish construction, usually by doing any further setup and exec'ing the code in the module (e.g., with exec(code, module.__dict__)), if it has any.

All modules must have __path__ set. For a namespace module this can be set to information its loader needs, even if just []. For a regular module, this will normally point to a path whence information used to construct the module was loaded.

The attributes of a module are set by adding entries to its __dict__ attribute; from foo import bar will go through the above process to generate the foo module and then return foo.bar which is foo.__dict__['bar'].

The load_module(fullname) method for backwards compatibility is provided automatically wehn exec_module() is defined.

Other sources of information:

XXX To-do

XXX Bring into this doc a glossary? E.g., https://www.python.org/dev/peps/pep-0451/#terms-and-concepts

XXX [from PEP 302]: The built-in __import__ function (known as PyImport_ImportModuleEx() in import.c) will then check to see whether the module doing the import is a package or a submodule of a package. If it is indeed a (submodule of a) package, it first tries to do the import relative to the package (the parent package for a submodule). For example, if a package named "spam" does "import eggs", it will first look for a module named "spam.eggs". If that fails, the import continues as an absolute import

XXX import searches sys.path for directories and files from which to build modules. Symlinks are dereferenced before calculating names and paths. The default sys.path includes the directory containing the input script (or current directory if no file specified), the $PYTHONPATH environment variable paths and installation-dependent defaults.

XXX Writable paths may have __pycache__ directories created with the "compiled" code (cpython-34.pyc, machine-portable) underneath, if the "source" was not already compiled code. Compiled files will be read from directories that contain no source.

XXX As well as directories, ZIP files containing source code or compiled source code (not binary shared libs) may be specified in the path.