Skip to content

Commit

Permalink
(trivial) Change bundle weights file names to ".weights.bin" / ".weig…
Browse files Browse the repository at this point in the history
…hts.txt" (pytorch#3721)

Summary:
**Summary**
The bundle generates the weights in 2 files with 2 formats:
- binary: the file name was "<bundle_name>.weights"
- text (include C buffer array format): the file name was "<bundle_name>.inc"

I changed the generated file names to have more clarity:
- <bundle_name>.weights.bin - for the binary format
- <bundle_name>.weights.txt - for the text format

**Documentation**
Small updates.

**Test Plan**
None
Pull Request resolved: pytorch#3721

Reviewed By: shajrawi

Differential Revision: D18299883

Pulled By: opti-mix

fbshipit-source-id: 1f736a6168b6e342ecbb021f82eeddfb029b2e50
  • Loading branch information
mciprian13 authored and facebook-github-bot committed Nov 4, 2019
1 parent 57a4427 commit 1c6782f
Show file tree
Hide file tree
Showing 5 changed files with 259 additions and 41 deletions.
278 changes: 247 additions & 31 deletions docs/AOT.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ $image-classifier image.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0
The command above would compile the neural network model described by the files
`init_net.pb` and `predict_net.pb` located in the `network_model_directory_name`
directory and generate a bundle consisting of two files in the directory
`output_directory_name`, `<network_name>.o` and `<network_name>.weights` where
`output_directory_name`, `<network_name>.o` and `<network_name>.weights.bin` where
`<network_name>` is by default equals to the last directory in the model path,
i.e., `resnet50` in that case, and can be changed using
`-network-name=<network_name>`.
Expand All @@ -59,7 +59,7 @@ This option supports two modes:
- `static`: (Default) Produce non-relocatable code.
- `pic`: Produce position independent code.

The second generated file is named `<network_name>.weights` and
The second generated file is named `<network_name>.weights.bin` and
contains the weights required to run the compiled model.

Another tool is the `model-compiler` which is used to compile a model into a bundle.
Expand Down Expand Up @@ -96,29 +96,232 @@ For more information about the options of the model-compiler type:
$model-compiler -help
```

## APIs exposed by bundles
## Cross-compile a bundle for a specific architecture

This section describes the APIs that the CPU bundle exposes. Other targets may
expose a completely different API.
Since the CPU backend is based on LLVM the Glow tools can be used to
cross-compile bundles for different target architectures. To specify
the target architecture you must use the `-target` and `-mcpu` flags
(if no target flags are provided the bundle will be generated by default
for the native architecture - the one which is running Glow). For example
to cross-compile a bundle for the ARM Cortex M7 architecture you must
specify these extra flags:
```
-target=arm -mcpu=cortex-m7
```

The bundle can be cross-compiled for any target architecture supported by
LLVM. For the complete list of LLVM target architectures you can type
`llc -version` command in Linux (assuming you have LLVM installed). For
example the LLVM 8.0.1 has the following supported architectures:

```
LLVM (http://llvm.org/):
LLVM version 8.0.1
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: skylake
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_be - AArch64 (big endian)
amdgcn - AMD GCN GPUs
arm - ARM
arm64 - ARM64 (little endian)
armeb - ARM (big endian)
avr - Atmel AVR Microcontroller
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
hexagon - Hexagon
lanai - Lanai
mips - MIPS (32-bit big endian)
mips64 - MIPS (64-bit big endian)
mips64el - MIPS (64-bit little endian)
mipsel - MIPS (32-bit little endian)
msp430 - MSP430 [experimental]
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
ppc32 - PowerPC 32
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
r600 - AMD GPUs HD2XXX-HD6XXX
sparc - Sparc
sparcel - Sparc LE
sparcv9 - Sparc V9
systemz - SystemZ
thumb - Thumb
thumbeb - Thumb (big endian)
wasm32 - WebAssembly 32-bit
wasm64 - WebAssembly 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
xcore - XCore
```

## Extra options

- When cross-compiling bundles for some target architectures you might
be interested in generating a bundle compatible with a given float ABI
(Application Binary Interface) type (*soft* or *hard*). The LLVM backend
can be instructed to generate an object file using a specific float ABI
by using the option `-float-abi=hard` or `-float-abi=soft`.

- When compiling the bundle it is useful to view the final form of the
graph after all the transformations and optimizations performed by Glow
(which might differ from the initial model). You can generate the graph
visual representation in *.dot* format by using the `-dump-graph-DAG`
option like in this:
```
-dump-graph-DAG=graph.dot
```
Additionally, you can convert the *.dot* file to *.pdf* format using the
*dot* utility available on Linux like this:
```
dot -Tpdf graph.dot -o graph.pdf
```

## Bundle memory layout

The memory of a bundle is organized in three separate memory regions which must be
allocated by the user application code and provided through the bundle interface:

- `constantWeight` - contains the model constant weights. The user application must:
- allocate this memory region (statically or dynamically)
- initialize this memory region with the content of the generated weights file in
one of two possible formats:
- binary format (`<network_name>.weights.bin`) used to initialize this memory
region (allocated statically or dynamically) by loading the binary file
dynamically at run-time using standard C function like **fopen**.
- text format (`<network_name>.weights.txt`) used to initialize this memory
region (only if statically allocated) by including the text file statically
at compile-time as a C array using the **#include** pre-processor directive.
This format is suitable for target architectures which do not have file systems
(for example microcontrollers).
- provide the base address of this memory region to the inference function

- `mutableWeight` - contains all the model inputs and outputs (graph placeholders).
The tensors corresponding to different inputs and outputs are identified using offsets
relative to the base address of this memory region. The user application must:
- allocate this memory region (statically or dynamically)
- initialize the model input tensors from this memory region with the desired input
data before running the inference
- provide the base address of this memory region to the inference function
- read the model output tensors from this memory region after running the inference

- `activations` - this memory region is a scratch memory required for the bundle code
to store the intermediate results of the graph computation (activations). The user
application must:
- allocate this memory region (statically or dynamically)
- provide the base address of this memory region to the inference function
- this memory region is NOT required to be initialized

The required sizes for all the memory regions described above are provided in the bundle
interface. Also all the memory regions must be allocated with a minimum alignment which
is also provided in the interface (typically 64 bytes). For example, for aligning a
statically allocated buffer one can use the following C syntax:

```c++
__attribute__((aligned(64)))
uint8_t aligned_buffer[BUFFER_SIZE];
```
## Static bundle API
This is the default bundle API obtained by generating the bundle with the option
`-bundle-api=static`. Below is an example of how the auto-generated header file
looks like for the Lenet Mnist model:
```c++
// Placeholder address offsets within mutable buffer (bytes)
#define LENET_MNIST_data 0
#define LENET_MNIST_softmax__1 3136
// Memory sizes (bytes)
#define LENET_MNIST_CONSTANT_MEM_SIZE 1724672
#define LENET_MNIST_MUTABLE_MEM_SIZE 3200
#define LENET_MNIST_ACTIVATIONS_MEM_SIZE 57600
// Memory alignment (bytes)
#define LENET_MNIST_MEM_ALIGN 64
// Bundle entry point (inference function)
void lenet_mnist(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);
```

The header file contains all the information required to run the bundle,
defined in a static manner using macro defines:
- the offsets of all the placeholders (graph inputs/outputs) within the
`mutableWeight` memory
- the sizes for all the memory regions
- the alignment required for allocating the memory regions
- the inference function prototype

All the definitions names (the macros and the inference function) are prefixed
with the model name, in this example with *lenet_mnist*. If you want to change
the model name you can use the command line option `-network-name`, for example
`-network-name=my_bundle`.

The auto-generated header file file also contains some extra defines to
help with writing the user application code:

```c++
// Memory alignment definition with given alignment size
// for static allocation of memory.
#define GLOW_MEM_ALIGN(size) __attribute__((aligned(size)))

// Macro function to get the absolute address of a
// placeholder using the base address of the mutable
// weight buffer and placeholder offset definition.
#define GLOW_GET_ADDR(mutableBaseAddr, placeholderOff) (((uint8_t*)(mutableBaseAddr)) + placeholderOff)
```
For example, in order to allocate and initialize all the memory regions, you need
to write the following in the user application (*lenet_mnist.weights.txt* is the
file containing the model weights serialized as text):
```c++
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
#include "lenet_mnist.weights.txt"
};
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t mutableWeight[LENET_MNIST_MUTABLE_MEM_SIZE];
Each bundle exposes two symbols named `<network_name>` and
`<network_name>_config`, where, again, `<network_name>` is specified by the
`-network-name` command line option. The `<network_name>` is the name of the
auto-generated function that implements the network model. This symbol always
has the following signature:
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t activations[LENET_MNIST_ACTIVATIONS_MEM_SIZE];
```

In order to obtain the absolute addresses of the model inputs/outputs
you need to write the following in the user application:

```c++
uint8_t *inputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_data);
uint8_t *outputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_softmax__1);
```

## Dynamic bundle API

This is the bundle API obtained by generating the bundle with the option
`-bundle-api=dynamic`. Below is an example of how the auto-generated header
file looks like for the Resnet50 model:

```c++
extern "C" void network_name(uint8_t *constantWeightVars,
uint8_t *mutableWeightVars,
uint8_t *activations);
// Bundle memory configuration (memory layout)
extern BundleConfig resnet50_config;

// Bundle entry point (inference function)
void resnet50(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);
```
The parameters of this function are the base addresses of the memory areas for
constant weights variables, mutable weights variables (i.e. inputs and outputs)
and activations.
The `<network_name>_config` is a symbol that contains the configuration of
the compiled network. The type of this symbol is always the following struct:
This API has all the information about the memory configuration encapsulated
in a structure named `<network_name>_config`. The layout of this structure is
defined by the type `BundleConfig` which is also included in the generated
header file:
```c++
// Type describing the config of a generated bundle.
struct BundleConfig {
// Size of the constant weight variables memory area.
uint64_t constantWeightVarsMemSize;
Expand All @@ -134,29 +337,42 @@ struct BundleConfig {
const SymbolTableEntry *symbolTable;
};
```
This configuration is supposed to be used by the client code to allocate the
required amounts of memory for each of the memory areas, before invoking the
`<network_name>` function to run the network.

Clients also use `BundleConfig` to perform the symbol table lookups when they
need to find information about an input or output variable.
The SymbolTableEntry always has the following structure:
Similar to the static API, this structure contains:
- the sizes for all the memory regions
- the alignment required for allocating all the memory regions
- the number of symbols
- the descriptions of all the symbols as an array of symbol entries

In this case the notion of *symbol* might include not only the model
placeholders but also the model constant weights. Each symbol is
described according to the `SymbolTableEntry` structure definition
(included also in the header file):

```c++
// Type describing a symbol table entry of a generated bundle.
struct SymbolTableEntry {
// Name of a variable.
const char *name;
// Offset of the variable inside the memory area.
uint64_t offset;
// The number of elements inside this variable.
uint64_t size;
// The kind of the variable. 1 if it is a mutable variable, 0 otherwise.
// Variable kind: 1 if it is a mutable variable, 0 otherwise.
char kind;
};
```
Offsets of constants are offsets inside the memory area for constant weights.
Offsets of mutable variables are offsets inside the memory area for mutable
weights.
For each symbol the following information is registered:
- the symbol name
- the symbol kind: whether is mutable (placeholder) or not (constant)
- the size in bytes
- the offset: if the symbol is mutable this is the offset of the variable
within the `mutableWeight` buffer, otherwise this is the offset of the
variable within the `constantWeight` buffer
The user has to look up the symbol entries to find the model variables
(placeholders or constants) at run-time (dynamically).
## How to use the bundle
Expand All @@ -169,7 +385,7 @@ generally need to do the following:
* You need to allocate the memory for constant weights variables,
mutable weights variables (i.e. inputs and outputs) and activations based on the
memory area sizes provided by `<network_name>_config`.
* You need to load the content of the auto-generated `network_model_name.weights`
* You need to load the content of the auto-generated `network_model_name.weights.bin`
file into the constant weights variables memory area.
* And need to initialize the mutable weights area with inputs (e.g. image data)
* And finally, you need to invoke the `<network_name>` function with 3
Expand All @@ -193,12 +409,12 @@ The CMakeLists.txt provides the following targets:
The concrete command line looks like this:
`image-classifier tests/images/imagenet/cat_285.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -backend=CPU -emit-bundle <build_dir>`
It reads the network model from `resnet50` and generates the `resnet50.o`
and `resnet50.weights` files into the `build_dir` directory.
and `resnet50.weights.bin` files into the `build_dir` directory.
* `ResNet50BundleMain`: it compiles the `main.cpp` file, which is the main file of the project.
This source file gives a good idea about how to interface with an auto-generated bundle.
It contains the code for interfacing with the auto-generated bundle.
* It allocated the memory areas based on their memory sizes provided in `resnet50_config`.
* Then it loads the weights from the auto-generated `resnet50.weights` file.
* Then it loads the weights from the auto-generated `resnet50.weights.bin` file.
* It loads the input image, pre-processes it and puts it into the mutable weight variables
memory area.
* Once everything is setup, it invokes the compiled network model by calling the
Expand Down
2 changes: 1 addition & 1 deletion examples/bundles/lenet_mnist/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ void parseCommandLineOptions(int argc, char **argv) {
/// initialize.
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
#include "lenet_mnist.inc"
#include "lenet_mnist.weights.txt"
};

/// Statically allocate memory for mutable weights (model input/output data).
Expand Down
4 changes: 2 additions & 2 deletions examples/bundles/resnet50/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ add_custom_command(
COMMAND
image-classifier ${IMAGES}/dog_207.png -g -image-mode=0to1
-m=${RESNET50_BUNDLE_DIR}/resnet50 -model-input-name=${MODEL_INPUT_NAME}
-backend=CPU -emit-bundle ${BUNDLE_OUTPUT_DIRECTORY}
-backend=CPU -emit-bundle ${BUNDLE_OUTPUT_DIRECTORY} -bundle-api=dynamic
DEPENDS
image-classifier ResNet50BundleDir
)
Expand All @@ -63,7 +63,7 @@ add_custom_command(
COMMAND
image-classifier ${IMAGES}/dog_207.png -g -i=0to1 -load-profile=profile.yml -assert-all-nodes-quantized -keep-original-precision-for-nodes=SoftMax
-m=${RESNET50_BUNDLE_DIR}/resnet50 -model-input-name=${MODEL_INPUT_NAME}
-backend=CPU -emit-bundle ${QUANTIZED_BUNDLE_OUTPUT_DIRECTORY}
-backend=CPU -emit-bundle ${QUANTIZED_BUNDLE_OUTPUT_DIRECTORY} -bundle-api=dynamic
DEPENDS
image-classifier ResNet50BundleDir
)
Expand Down
2 changes: 1 addition & 1 deletion examples/bundles/resnet50/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ int main(int argc, char **argv) {
parseCommandLineOptions(argc, argv);
// Allocate and initialize constant and mutable weights.
uint8_t *constantWeightVarsAddr =
initConstantWeights("resnet50.weights", resnet50_config);
initConstantWeights("resnet50.weights.bin", resnet50_config);
uint8_t *mutableWeightVarsAddr = initMutableWeightVars(resnet50_config);
uint8_t *activationsAddr = initActivations(resnet50_config);

Expand Down
Loading

0 comments on commit 1c6782f

Please sign in to comment.