The Python packaging utilities allow users to easily analyze their Python scripts and create Conda environments that are specifically built to contain the necessary dependencies required for their application to run. In distributed computing systems such as Work Queue, it is often difficult to maintain homogenous work environments for their Python applications, as the scripts utilize a large number of outside resources at runtime, such as Python interpreters and imported libraries. The Python packaging collection provides three easy-to-use tools that solve this problem, helping users to analyze their Python programs and build the appropriate Conda environments that ensure consistent runtimes within the Work Queue system.
The python_package_analyze
tool analyzes a Python script to determine all its top-level module dependencies and the interpreter version it uses. It then generates a concise, human-readable JSON output file containing the necessary information required to build a self-contained Conda virtual environment for the Python script.
The python_package_create
tool takes the output JSON file generated by python_package_analyze
and creates this Conda environment, preinstalled with all the necessary libraries and the correct Python interpreter version. It then generates a packaged tarball of the environment that can be easily relocated to a different machine within the system to run the Python task.
The python_package_run
tool acts as a wrapper script for the Python task, unpacking and activating the Conda environment and running the task within the environment.
python_package_analyze
- command-line utility for analyzing Python script for library and interpreter dependencies
python_package_analyze [options] <python-script> <json-output-file>
python_package_analyze
is a simple command line utility for analyzing Python scripts for the necessary external dependencies. It generates an output file that can be used with python_package_create
to build a self-contained Conda environment for the Python application.
The python-script
argument is the path (relative or absolute) to the Python script to be analyzed. The json-output-file
argument is the path (relative or absolute) to the output JSON file that will be generated by the command. The file does not need to exist, and will overwrite a file with the same name if it already exists.
-h, --help Show this help message --toplevel Only include imports at the top level of the script. --function FUNCTION Only include imports in the given function. --pkg-mapping IMPORT=NAME Specify that the module imported as IMPORT in the code is provided by the pip/conda package NAME.
On success, returns zero. On failure, returns non-zero.
An example Python script example.py
contains the following code:
import os
import sys
import pickle
import antigravity
import matplotlib
if __name__ == "__main__":
print("example")
To analyze the example.py
script for its dependencies and generate the output JSON dependencies file dependencies.json
, run the following command:
$ python_package_analyze example.py dependencies.json
Once the command completes, the dependencies.json
file within the current working directory will contain a Conda environment specification
(suitable to use with conda env create
).
Note that system-level modules are not included, as they are automatically installed into Conda virtual environments. Additionally, imports not managed by Pip or Conda are not allowed.
python_package_create
- command-line utility for creating a Conda virtual environment given a Python dependencies file
python_package_create [options] <dependency-file> <output-path>
python_package_create
is a simple command-line utility that creates a local Conda environment from an input JSON dependency file, generated by python_package_analyze
.
The command creates an environment tarball at output-path
that can be sent to and run on different machines with the same architecture.
The dependency-file
argument is the path (relative or absolute) to the JSON dependency file that was created by python_package_analyze
. The output-path
argument specifies the path for the environment tarball that is created
(should usually end in .tar.gz
).
-h Show this help message
On success, returns zero. On failure, returns non-zero.
A dependencies file dependencies.json
should first be generated with python_package_analyze
.
To generate a Conda environment with the Python 3.7.3 interpreter and the antigravity
and matplotlib
modules preinstalled and with name example_venv
, run the following command:
$ python_package_create dependencies.json example_venv.tar.gz
This will create an example_venv.tar.gz
environment tarball within the current working directory, which can then be exported to different machines for execution.
python_package_run
- wrapper script that executes Python script within an isolated Conda environment
python_package_run [options] --environment <file> command and args ...
The python_package_run
tool acts as a wrapper script for a Python task, running the task within the specified Conda environment. python_package_run
can be utilized on different machines within the Work Queue system to unpack and activate a Conda environment, and run a task within the isolated environment.
The --environment <file>
argument is the name of the Conda environment as a tarball file in which to run the Python task.
command and args
(the COMMAND
) are interpreted as ARGV
for a command to be run inside the Conda environment.
By default, the conda environment is unpacked into a temporary directory which is removed at the end of execution. If the --unpack-to <dir>
is given, then the environment is unpacked to <dir>
, and it is not removed at the end of execution. Further (even simultaneous) executions of python_package_run
will not unpack the environment if <dir>
is already populated. Instances of python_package_run
coordinate via a writing lock. By default, the wait for a writing lock is 300 seconds, but this can be modified with the --wait-for-lock <secs>
option.
If the argument to --unpack-to
does not exist, then it is created as an empty directory. If it is an existing directory, but it is not empty, then unpacking is not performed, regardless on whether this directory contains a valid conda environment.
-e, --environment Conda environment as a tar file. (Required.) -d, --unpack-to
Directory to unpack the environment. If not given, a temporary directory is used. -w, --wait-for-lock Number of seconds to wait to get a writing lock on . Default is 300. -h, --help Show the help screen. command and args Command to execute inside the given environment.On success, returns 0. On failure, returns non-zero.
A Python script example.py
has been analyzed using python_package_analyze
and a corresponding Conda environment named example_venv.tar.gz
has been created, with all the necessary dependencies preinstalled. To execute the script within the environment, run the following command:
python_package_run --environment example_venv.tar.gz python3 example.py
This will run the command python3 example.py
within the Conda environment in example_venv.tar.gz
. Note that this command can be performed either locally, on the same machine that analyzed the script and created the environment, or remotely, on a different machine that contains the Conda environment tarball and the example.py
script.
python_package_run --unpack-to my_persistent_env --environment example_venv.tar.gz python3 example.py
The previous command will run faster the second time it is executed, as the
environment is only unpacked once to my_persistent_env
.
Desired Python script to run: hi.py
./python_package_analyze hi.py output.json
- Generates the appropriate JSON file in the current working directory
./python_package_create output.json venv.tar.gz
- Will create a packed tarball of the environment named
venv.tar.gz
in the current working directory
./python_package_run --environment venv.tar.gz python3 hi.py
- Runs the
python3 hi.py
task command within thevenv.tar.gz
Conda environment