Skip to content

Latest commit

 

History

History
 
 

python_environments

Python Packaging Utilities

The Python packaging utilities allow users to easily analyze their Python scripts and create Conda environments that are specifically built to contain the necessary dependencies required for their application to run. In distributed computing systems such as Work Queue, it is often difficult to maintain homogenous work environments for their Python applications, as the scripts utilize a large number of outside resources at runtime, such as Python interpreters and imported libraries. The Python packaging collection provides three easy-to-use tools that solve this problem, helping users to analyze their Python programs and build the appropriate Conda environments that ensure consistent runtimes within the Work Queue system.

The python_package_analyze tool analyzes a Python script to determine all its top-level module dependencies and the interpreter version it uses. It then generates a concise, human-readable JSON output file containing the necessary information required to build a self-contained Conda virtual environment for the Python script.

The python_package_create tool takes the output JSON file generated by python_package_analyze and creates this Conda environment, preinstalled with all the necessary libraries and the correct Python interpreter version. It then generates a packaged tarball of the environment that can be easily relocated to a different machine within the system to run the Python task.

The python_package_run tool acts as a wrapper script for the Python task, unpacking and activating the Conda environment and running the task within the environment.

python_package_analyze(1)

NAME

python_package_analyze - command-line utility for analyzing Python script for library and interpreter dependencies

SYNOPSIS

python_package_analyze [options] <python-script> <json-output-file>

DESCRIPTION

python_package_analyze is a simple command line utility for analyzing Python scripts for the necessary external dependencies. It generates an output file that can be used with python_package_create to build a self-contained Conda environment for the Python application.

The python-script argument is the path (relative or absolute) to the Python script to be analyzed. The json-output-file argument is the path (relative or absolute) to the output JSON file that will be generated by the command. The file does not need to exist, and will overwrite a file with the same name if it already exists.

OPTIONS

-h, --help Show this help message --toplevel Only include imports at the top level of the script. --function FUNCTION Only include imports in the given function. --pkg-mapping IMPORT=NAME Specify that the module imported as IMPORT in the code is provided by the pip/conda package NAME.

EXIT STATUS

On success, returns zero. On failure, returns non-zero.

EXAMPLE

An example Python script example.py contains the following code:

import os
import sys
import pickle

import antigravity
import matplotlib


if __name__ == "__main__":
    print("example")

To analyze the example.py script for its dependencies and generate the output JSON dependencies file dependencies.json, run the following command:

$ python_package_analyze example.py dependencies.json

Once the command completes, the dependencies.json file within the current working directory will contain a Conda environment specification (suitable to use with conda env create).

Note that system-level modules are not included, as they are automatically installed into Conda virtual environments. Additionally, imports not managed by Pip or Conda are not allowed.

python_package_create(1)

NAME

python_package_create - command-line utility for creating a Conda virtual environment given a Python dependencies file

SYNOPSIS

python_package_create [options] <dependency-file> <output-path>

DESCRIPTION

python_package_create is a simple command-line utility that creates a local Conda environment from an input JSON dependency file, generated by python_package_analyze. The command creates an environment tarball at output-path that can be sent to and run on different machines with the same architecture.

The dependency-file argument is the path (relative or absolute) to the JSON dependency file that was created by python_package_analyze. The output-path argument specifies the path for the environment tarball that is created (should usually end in .tar.gz).

OPTIONS

-h Show this help message

EXIT STATUS

On success, returns zero. On failure, returns non-zero.

EXAMPLE

A dependencies file dependencies.json should first be generated with python_package_analyze.

To generate a Conda environment with the Python 3.7.3 interpreter and the antigravity and matplotlib modules preinstalled and with name example_venv, run the following command:

$ python_package_create dependencies.json example_venv.tar.gz

This will create an example_venv.tar.gz environment tarball within the current working directory, which can then be exported to different machines for execution.

python_package_run(1)

NAME

python_package_run - wrapper script that executes Python script within an isolated Conda environment

SYNOPSIS

python_package_run [options] --environment <file> command and args ...

DESCRIPTION

The python_package_run tool acts as a wrapper script for a Python task, running the task within the specified Conda environment. python_package_run can be utilized on different machines within the Work Queue system to unpack and activate a Conda environment, and run a task within the isolated environment.

The --environment <file> argument is the name of the Conda environment as a tarball file in which to run the Python task. command and args (the COMMAND) are interpreted as ARGV for a command to be run inside the Conda environment.

By default, the conda environment is unpacked into a temporary directory which is removed at the end of execution. If the --unpack-to <dir> is given, then the environment is unpacked to <dir>, and it is not removed at the end of execution. Further (even simultaneous) executions of python_package_run will not unpack the environment if <dir> is already populated. Instances of python_package_run coordinate via a writing lock. By default, the wait for a writing lock is 300 seconds, but this can be modified with the --wait-for-lock <secs> option.

If the argument to --unpack-to does not exist, then it is created as an empty directory. If it is an existing directory, but it is not empty, then unpacking is not performed, regardless on whether this directory contains a valid conda environment.

OPTIONS

-e, --environment Conda environment as a tar file. (Required.) -d, --unpack-to

Directory to unpack the environment. If not given, a temporary directory is used. -w, --wait-for-lock Number of seconds to wait to get a writing lock on . Default is 300. -h, --help Show the help screen. command and args Command to execute inside the given environment.

EXIT STATUS

On success, returns 0. On failure, returns non-zero.

EXAMPLE

A Python script example.py has been analyzed using python_package_analyze and a corresponding Conda environment named example_venv.tar.gz has been created, with all the necessary dependencies preinstalled. To execute the script within the environment, run the following command:

python_package_run --environment example_venv.tar.gz python3 example.py

This will run the command python3 example.py within the Conda environment in example_venv.tar.gz. Note that this command can be performed either locally, on the same machine that analyzed the script and created the environment, or remotely, on a different machine that contains the Conda environment tarball and the example.py script.

python_package_run --unpack-to my_persistent_env --environment example_venv.tar.gz python3 example.py

The previous command will run faster the second time it is executed, as the environment is only unpacked once to my_persistent_env.

HOW TO TEST OVERALL FUNCTIONALITY

Desired Python script to run: hi.py

  1. ./python_package_analyze hi.py output.json
  • Generates the appropriate JSON file in the current working directory
  1. ./python_package_create output.json venv.tar.gz
  • Will create a packed tarball of the environment named venv.tar.gz in the current working directory
  1. ./python_package_run --environment venv.tar.gz python3 hi.py
  • Runs the python3 hi.py task command within the venv.tar.gz Conda environment