Skip to content

Latest commit

 

History

History
435 lines (364 loc) · 18.2 KB

AOT.md

File metadata and controls

435 lines (364 loc) · 18.2 KB

Creating standalone executable bundles

This document provides a short description about producing ahead-of-time compiled executable bundles. The motivation for this work is to remove the cost of compile time by allowing the users of Glow to compile the package ahead of time.

Overview

A bundle is a self-contained compiled network model that can be used to execute the model in a standalone mode. After following the instructions in this document and the CMakeLists.txt in the example directory you will be able to compile convolutional neural networks into small executables. Example:

  $cmake -G ninja <other cmake flags> -DGLOW_WITH_BUNDLES=ON -DGLOW_WITH_CPU=ON
  ...

  $ninja ResNet50Bundle
  ...

  $./resnet50 cat.png
  Result: 285

Producing a bundle

It is possible to use the Glow library to produce bundles. On the CPU, the bundles are object files that can be linked with some executable. On other architectures, the bundle may look completely different.

This document demonstrates how to produce a bundle for the host CPU using the 'image-classifier' tool. We use the flag -emit-bundle to specify the output directory.

$image-classifier image.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -backend=CPU -emit-bundle build/

The command above would compile the neural network model described by the files init_net.pb and predict_net.pb located in the network_model_directory_name directory and generate a bundle consisting of two files in the directory output_directory_name, <network_name>.o and <network_name>.weights.bin where <network_name> is by default equals to the last directory in the model path, i.e., resnet50 in that case, and can be changed using -network-name=<network_name>. predict_net.pb describes the network model using the protobuf format for the ONNX or the caffe2 representation. init_net.pb contains the weights that are used by the network using the protobuf format as well.

The first generated file is named <network_name>.o and contains the compiled code of the network model. By default, this is a non-relocatable object file that can be linked with other files in your project. It is possible to control the relocation model with the command line option -relocation-model=<mode>.

This option supports two modes:

  • static: (Default) Produce non-relocatable code.
  • pic: Produce position independent code.

The second generated file is named <network_name>.weights.bin and contains the weights required to run the compiled model.

Another tool is the model-compiler which is used to compile a model into a bundle. This tool is more generic (is not tied just to image classification applications) and can compile models with any number of inputs. There is a difference when using this tool with ONNX or Caffe2 models:

  • when using ONNX models the tool can infer automatically the inputs of the model since the description of the input tensors is part of the model. We can use this tool simply as:
    $model-compiler -model=<onnx-model-path> -backend=CPU -emit-bundle=<bundle-dir>
    
  • when using Caffe2 models the user must provide explicitly the description of the input tensors (which is not part of the model) using the -model-input option:
    $model-compiler -model=<caffe2-model-path> -backend=CPU -emit-bundle=<bundle-dir> \
        -model-input=<inputName1>,<inputType1>,<inputShape1> \
        -model-input=<inputName2>,<inputType2>,<inputShape2> \
        ...
    
    For quantized types the format of the -model-input is slightly different since the scale and offset parameters should also be provided:
    -model-input=<name>,<type>,<scale>,<offset>,<shape>
    
    For example we can can provide one or more inputs with:
    -model-input=input_03_data,float,[1]
    -model-input=data_bias,int32,[1,32,32]
    -model-input=data,int8q,0.123,-13,[1,10]
    

For more information about the options of the model-compiler type:

$model-compiler -help

Cross-compile a bundle for a specific architecture

Since the CPU backend is based on LLVM the Glow tools can be used to cross-compile bundles for different target architectures. To specify the target architecture you must use the -target and -mcpu flags (if no target flags are provided the bundle will be generated by default for the native architecture - the one which is running Glow). For example to cross-compile a bundle for the ARM Cortex M7 architecture you must specify these extra flags:

-target=arm -mcpu=cortex-m7

The bundle can be cross-compiled for any target architecture supported by LLVM. For the complete list of LLVM target architectures you can type llc -version command in Linux (assuming you have LLVM installed). For example the LLVM 8.0.1 has the following supported architectures:

LLVM (http://llvm.org/):
  LLVM version 8.0.1
  
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: skylake

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    armeb      - ARM (big endian)
    avr        - Atmel AVR Microcontroller
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    hexagon    - Hexagon
    lanai      - Lanai
    mips       - MIPS (32-bit big endian)
    mips64     - MIPS (64-bit big endian)
    mips64el   - MIPS (64-bit little endian)
    mipsel     - MIPS (32-bit little endian)
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    r600       - AMD GPUs HD2XXX-HD6XXX
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore

Extra options

  • When cross-compiling bundles for some target architectures you might be interested in generating a bundle compatible with a given float ABI (Application Binary Interface) type (soft or hard). The LLVM backend can be instructed to generate an object file using a specific float ABI by using the option -float-abi=hard or -float-abi=soft.

  • When compiling the bundle it is useful to view the final form of the graph after all the transformations and optimizations performed by Glow (which might differ from the initial model). You can generate the graph visual representation in .dot format by using the -dump-graph-DAG option like in this:

    -dump-graph-DAG=graph.dot
    

    Additionally, you can convert the .dot file to .pdf format using the dot utility available on Linux like this:

    dot -Tpdf graph.dot -o graph.pdf
    

Bundle memory layout

The memory of a bundle is organized in three separate memory regions which must be allocated by the user application code and provided through the bundle interface:

  • constantWeight - contains the model constant weights. The user application must:

    • allocate this memory region (statically or dynamically)
    • initialize this memory region with the content of the generated weights file in one of two possible formats:
      • binary format (<network_name>.weights.bin) used to initialize this memory region (allocated statically or dynamically) by loading the binary file dynamically at run-time using standard C function like fopen.
      • text format (<network_name>.weights.txt) used to initialize this memory region (only if statically allocated) by including the text file statically at compile-time as a C array using the #include pre-processor directive. This format is suitable for target architectures which do not have file systems (for example microcontrollers).
    • provide the base address of this memory region to the inference function
  • mutableWeight - contains all the model inputs and outputs (graph placeholders). The tensors corresponding to different inputs and outputs are identified using offsets relative to the base address of this memory region. The user application must:

    • allocate this memory region (statically or dynamically)
    • initialize the model input tensors from this memory region with the desired input data before running the inference
    • provide the base address of this memory region to the inference function
    • read the model output tensors from this memory region after running the inference
  • activations - this memory region is a scratch memory required for the bundle code to store the intermediate results of the graph computation (activations). The user application must:

    • allocate this memory region (statically or dynamically)
    • provide the base address of this memory region to the inference function
    • this memory region is NOT required to be initialized

The required sizes for all the memory regions described above are provided in the bundle interface. Also all the memory regions must be allocated with a minimum alignment which is also provided in the interface (typically 64 bytes). For example, for aligning a statically allocated buffer one can use the following C syntax:

__attribute__((aligned(64)))
uint8_t aligned_buffer[BUFFER_SIZE];

Static bundle API

This is the default bundle API obtained by generating the bundle with the option -bundle-api=static. Below is an example of how the auto-generated header file looks like for the Lenet Mnist model:

// Placeholder address offsets within mutable buffer (bytes)
#define LENET_MNIST_data        0
#define LENET_MNIST_softmax__1  3136

// Memory sizes (bytes)
#define LENET_MNIST_CONSTANT_MEM_SIZE     1724672
#define LENET_MNIST_MUTABLE_MEM_SIZE      3200
#define LENET_MNIST_ACTIVATIONS_MEM_SIZE  57600

// Memory alignment (bytes)
#define LENET_MNIST_MEM_ALIGN  64

// Bundle entry point (inference function)
void lenet_mnist(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);

The header file contains all the information required to run the bundle, defined in a static manner using macro defines:

  • the offsets of all the placeholders (graph inputs/outputs) within the mutableWeight memory
  • the sizes for all the memory regions
  • the alignment required for allocating the memory regions
  • the inference function prototype

All the definitions names (the macros and the inference function) are prefixed with the model name, in this example with lenet_mnist. If you want to change the model name you can use the command line option -network-name, for example -network-name=my_bundle.

The auto-generated header file file also contains some extra defines to help with writing the user application code:

// Memory alignment definition with given alignment size
// for static allocation of memory.
#define GLOW_MEM_ALIGN(size)  __attribute__((aligned(size)))

// Macro function to get the absolute address of a
// placeholder using the base address of the mutable
// weight buffer and placeholder offset definition.
#define GLOW_GET_ADDR(mutableBaseAddr, placeholderOff)  (((uint8_t*)(mutableBaseAddr)) + placeholderOff)

For example, in order to allocate and initialize all the memory regions, you need to write the following in the user application (lenet_mnist.weights.txt is the file containing the model weights serialized as text):

GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
#include "lenet_mnist.weights.txt"
};

GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t mutableWeight[LENET_MNIST_MUTABLE_MEM_SIZE];

GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t activations[LENET_MNIST_ACTIVATIONS_MEM_SIZE];

In order to obtain the absolute addresses of the model inputs/outputs you need to write the following in the user application:

uint8_t *inputAddr  = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_data);
uint8_t *outputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_softmax__1);

Dynamic bundle API

This is the bundle API obtained by generating the bundle with the option -bundle-api=dynamic. Below is an example of how the auto-generated header file looks like for the Resnet50 model:

// Bundle memory configuration (memory layout)
extern BundleConfig resnet50_config;

// Bundle entry point (inference function)
void resnet50(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);

This API has all the information about the memory configuration encapsulated in a structure named <network_name>_config. The layout of this structure is defined by the type BundleConfig which is also included in the generated header file:

// Type describing the config of a generated bundle.
struct BundleConfig {
  // Size of the constant weight variables memory area.
  uint64_t constantWeightVarsMemSize;
  // Size of the mutable weight variables memory area.
  uint64_t mutableWeightVarsMemSize;
  // Size of the activations memory area.
  uint64_t activationsMemSize;
  // Alignment to be used for weights and activations.
  uint64_t alignment;
  // Number of symbols in the symbol table.
  uint64_t numSymbols;
  // Symbol table.
  const SymbolTableEntry *symbolTable;
};

Similar to the static API, this structure contains:

  • the sizes for all the memory regions
  • the alignment required for allocating all the memory regions
  • the number of symbols
  • the descriptions of all the symbols as an array of symbol entries

In this case the notion of symbol might include not only the model placeholders but also the model constant weights. Each symbol is described according to the SymbolTableEntry structure definition (included also in the header file):

// Type describing a symbol table entry of a generated bundle.
struct SymbolTableEntry {
  // Name of a variable.
  const char *name;
  // Offset of the variable inside the memory area.
  uint64_t offset;
  // The number of elements inside this variable.
  uint64_t size;
  // Variable kind: 1 if it is a mutable variable, 0 otherwise.
  char kind;
};

For each symbol the following information is registered:

  • the symbol name
  • the symbol kind: whether is mutable (placeholder) or not (constant)
  • the size in bytes
  • the offset: if the symbol is mutable this is the offset of the variable within the mutableWeight buffer, otherwise this is the offset of the variable within the constantWeight buffer

The user has to look up the symbol entries to find the model variables (placeholders or constants) at run-time (dynamically).

How to use the bundle

This section describes the use of the CPU bundle. Other targets may have different interfaces.

To integrate the artifacts generated by the image-classifier into your project, you generally need to do the following:

  • You need to link with the generated object file <network_name>.o.
  • You need to allocate the memory for constant weights variables, mutable weights variables (i.e. inputs and outputs) and activations based on the memory area sizes provided by <network_name>_config.
  • You need to load the content of the auto-generated network_model_name.weights.bin file into the constant weights variables memory area.
  • And need to initialize the mutable weights area with inputs (e.g. image data)
  • And finally, you need to invoke the <network_name> function with 3 parameters that are base addresses of the memory areas for constant weights variables, mutable weights variables, and activations.
  • After <network_name> has returned, you can find the results of the mutable weights variables area.

A step-by-step example of the Resnet50 network model

There are concrete examples of integrating a network model with a project located in the examples/bundles/ directory in the Glow repository. You can enable the compilation of these bundles by invoking cmake with -DGLOW_WITH_BUNDLES=ON -DGLOW_WITH_CPU=ON.

Floating point network

To build and run the example, you just need to execute:

  • cmake -G ninja <other cmake flags> -DGLOW_WITH_BUNDLES=ON -DGLOW_WITH_CPU=ON
  • ninja RunResNet50Bundle

The CMakeLists.txt provides the following targets:

  • ResNet50BundleNetFiles: it downloads the Resnet50 network model in the Caffe2 format.
  • ResNet50BundleNet: it generates the bundle files using the Glow image-classifier as described above. The concrete command line looks like this: image-classifier tests/images/imagenet/cat_285.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -backend=CPU -emit-bundle <build_dir> It reads the network model from resnet50 and generates the resnet50.o and resnet50.weights.bin files into the build_dir directory.
  • ResNet50BundleMain: it compiles the main.cpp file, which is the main file of the project. This source file gives a good idea about how to interface with an auto-generated bundle. It contains the code for interfacing with the auto-generated bundle.
    • It allocated the memory areas based on their memory sizes provided in resnet50_config.
    • Then it loads the weights from the auto-generated resnet50.weights.bin file.
    • It loads the input image, pre-processes it and puts it into the mutable weight variables memory area.
    • Once everything is setup, it invokes the compiled network model by calling the resnet50 function from the resnet50.o object file.
  • ResNet50Bundle: it links the user-defined main.o and auto-generated resnet50.o into a standalone executable file called resnet50

Quantized network

All of the aforementioned targets have quantized versions in CMakeLists.txt named QuantizedResNet50BundleNet, QuantizedResNet50Bundle.

This run performs almost the same steps as non-quantized Resnet50 version except it emits bundle based on the quantization profile: image-classifier tests/images/imagenet/cat_285.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-profile=profile.yml -backend=CPU -emit-bundle build

The profile.yml itself is captured at a prior step by executing image-classifier with the dump-profile option: image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -dump-profile=profile.yml.

See the CMakeLists.txt for details.