(trivial) Change bundle weights file names to ".weights.bin" / ".weig…

…hts.txt" (pytorch#3721) Summary: **Summary** The bundle generates the weights in 2 files with 2 formats: - binary: the file name was "<bundle_name>.weights" - text (include C buffer array format): the file name was "<bundle_name>.inc" I changed the generated file names to have more clarity: - <bundle_name>.weights.bin - for the binary format - <bundle_name>.weights.txt - for the text format **Documentation** Small updates. **Test Plan** None Pull Request resolved: pytorch#3721 Reviewed By: shajrawi Differential Revision: D18299883 Pulled By: opti-mix fbshipit-source-id: 1f736a6168b6e342ecbb021f82eeddfb029b2e50
ChunliF · Nov 4, 2019 · 1c6782f · 1c6782f
1 parent 57a4427
commit 1c6782f
Show file tree

Hide file tree

Showing 5 changed files with 259 additions and 41 deletions.
diff --git a/docs/AOT.md b/docs/AOT.md
@@ -42,7 +42,7 @@ $image-classifier image.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0
 The command above would compile the neural network model described by the files
 `init_net.pb` and `predict_net.pb` located in the `network_model_directory_name`
 directory and generate a bundle consisting of two files in the directory
-`output_directory_name`, `<network_name>.o` and `<network_name>.weights` where
+`output_directory_name`, `<network_name>.o` and `<network_name>.weights.bin` where
 `<network_name>` is by default equals to the last directory in the model path,
 i.e., `resnet50` in that case, and can be changed using
 `-network-name=<network_name>`.
@@ -59,7 +59,7 @@ This option supports two modes:
 - `static`: (Default) Produce non-relocatable code.
 - `pic`: Produce position independent code.
 
-The second generated file is named `<network_name>.weights` and
+The second generated file is named `<network_name>.weights.bin` and
 contains the weights required to run the compiled model.
 
 Another tool is the `model-compiler` which is used to compile a model into a bundle.
@@ -96,29 +96,232 @@ For more information about the options of the model-compiler type:
 $model-compiler -help
 ```
 
-## APIs exposed by bundles
+## Cross-compile a bundle for a specific architecture
 
-This section describes the APIs that the CPU bundle exposes. Other targets may
-expose a completely different API.
+Since the CPU backend is based on LLVM the Glow tools can be used to
+cross-compile bundles for different target architectures. To specify
+the target architecture you must use the `-target` and `-mcpu` flags
+(if no target flags are provided the bundle will be generated by default
+for the native architecture - the one which is running Glow). For example
+to cross-compile a bundle for the ARM Cortex M7 architecture you must
+specify these extra flags:
+```
+-target=arm -mcpu=cortex-m7
+```
+
+The bundle can be cross-compiled for any target architecture supported by
+LLVM. For the complete list of LLVM target architectures you can type
+`llc -version` command in Linux (assuming you have LLVM installed). For
+example the LLVM 8.0.1 has the following supported architectures:
+
+```
+LLVM (http://llvm.org/):
+  LLVM version 8.0.1
+  
+  Optimized build.
+  Default target: x86_64-pc-linux-gnu
+  Host CPU: skylake
+
+  Registered Targets:
+    aarch64    - AArch64 (little endian)
+    aarch64_be - AArch64 (big endian)
+    amdgcn     - AMD GCN GPUs
+    arm        - ARM
+    arm64      - ARM64 (little endian)
+    armeb      - ARM (big endian)
+    avr        - Atmel AVR Microcontroller
+    bpf        - BPF (host endian)
+    bpfeb      - BPF (big endian)
+    bpfel      - BPF (little endian)
+    hexagon    - Hexagon
+    lanai      - Lanai
+    mips       - MIPS (32-bit big endian)
+    mips64     - MIPS (64-bit big endian)
+    mips64el   - MIPS (64-bit little endian)
+    mipsel     - MIPS (32-bit little endian)
+    msp430     - MSP430 [experimental]
+    nvptx      - NVIDIA PTX 32-bit
+    nvptx64    - NVIDIA PTX 64-bit
+    ppc32      - PowerPC 32
+    ppc64      - PowerPC 64
+    ppc64le    - PowerPC 64 LE
+    r600       - AMD GPUs HD2XXX-HD6XXX
+    sparc      - Sparc
+    sparcel    - Sparc LE
+    sparcv9    - Sparc V9
+    systemz    - SystemZ
+    thumb      - Thumb
+    thumbeb    - Thumb (big endian)
+    wasm32     - WebAssembly 32-bit
+    wasm64     - WebAssembly 64-bit
+    x86        - 32-bit X86: Pentium-Pro and above
+    x86-64     - 64-bit X86: EM64T and AMD64
+    xcore      - XCore
+```
+
+## Extra options
+
+- When cross-compiling bundles for some target architectures you might
+be interested in generating a bundle compatible with a given float ABI
+(Application Binary Interface) type (*soft* or *hard*). The LLVM backend
+can be instructed to generate an object file using a specific float ABI
+by using the option `-float-abi=hard` or `-float-abi=soft`.
+
+- When compiling the bundle it is useful to view the final form of the
+graph after all the transformations and optimizations performed by Glow
+(which might differ from the initial model). You can generate the graph
+visual representation in *.dot* format by using the `-dump-graph-DAG`
+option like in this:
+  ```
+  -dump-graph-DAG=graph.dot
+  ```
+  Additionally, you can convert the *.dot* file to *.pdf* format using the
+  *dot* utility available on Linux like this:
+  ```
+  dot -Tpdf graph.dot -o graph.pdf
+  ```
+
+## Bundle memory layout
+
+The memory of a bundle is organized in three separate memory regions which must be
+allocated by the user application code and provided through the bundle interface:
+
+- `constantWeight` - contains the model constant weights. The user application must:
+  - allocate this memory region (statically or dynamically)
+  - initialize this memory region with the content of the generated weights file in
+    one of two possible formats:
+    - binary format (`<network_name>.weights.bin`) used to initialize this memory
+      region (allocated statically or dynamically) by loading the binary file
+      dynamically at run-time using standard C function like **fopen**. 
+    - text format (`<network_name>.weights.txt`) used to initialize this memory
+      region (only if statically allocated) by including the text file statically
+      at compile-time as a C array using the **#include** pre-processor directive.
+      This format is suitable for target architectures which do not have file systems
+      (for example microcontrollers).
+  - provide the base address of this memory region to the inference function
+
+- `mutableWeight` - contains all the model inputs and outputs (graph placeholders).
+The tensors corresponding to different inputs and outputs are identified using offsets
+relative to the base address of this memory region. The user application must:
+  - allocate this memory region (statically or dynamically)
+  - initialize the model input tensors from this memory region with the desired input
+    data before running the inference
+  - provide the base address of this memory region to the inference function
+  - read the model output tensors from this memory region after running the inference
+
+- `activations` - this memory region is a scratch memory required for the bundle code
+to store the intermediate results of the graph computation (activations). The user
+application must:
+  - allocate this memory region (statically or dynamically)
+  - provide the base address of this memory region to the inference function
+  - this memory region is NOT required to be initialized
+
+The required sizes for all the memory regions described above are provided in the bundle
+interface. Also all the memory regions must be allocated with a minimum alignment which
+is also provided in the interface (typically 64 bytes). For example, for aligning a
+statically allocated buffer one can use the following C syntax: 
+
+```c++
+__attribute__((aligned(64)))
+uint8_t aligned_buffer[BUFFER_SIZE];
+```
+
+## Static bundle API
+
+This is the default bundle API obtained by generating the bundle with the option
+`-bundle-api=static`. Below is an example of how the auto-generated header file
+looks like for the Lenet Mnist model:
+
+```c++
+// Placeholder address offsets within mutable buffer (bytes)
+#define LENET_MNIST_data        0
+#define LENET_MNIST_softmax__1  3136
+
+// Memory sizes (bytes)
+#define LENET_MNIST_CONSTANT_MEM_SIZE     1724672
+#define LENET_MNIST_MUTABLE_MEM_SIZE      3200
+#define LENET_MNIST_ACTIVATIONS_MEM_SIZE  57600
+
+// Memory alignment (bytes)
+#define LENET_MNIST_MEM_ALIGN  64
+
+// Bundle entry point (inference function)
+void lenet_mnist(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);
+```
+
+The header file contains all the information required to run the bundle,
+defined in a static manner using macro defines:
+- the offsets of all the placeholders (graph inputs/outputs) within the
+`mutableWeight` memory
+- the sizes for all the memory regions
+- the alignment required for allocating the memory regions
+- the inference function prototype
+
+All the definitions names (the macros and the inference function) are prefixed
+with the model name, in this example with *lenet_mnist*. If you want to change
+the model name you can use the command line option `-network-name`, for example
+`-network-name=my_bundle`.
+
+The auto-generated header file file also contains some extra defines to
+help with writing the user application code:
+
+```c++
+// Memory alignment definition with given alignment size
+// for static allocation of memory.
+#define GLOW_MEM_ALIGN(size)  __attribute__((aligned(size)))
+
+// Macro function to get the absolute address of a
+// placeholder using the base address of the mutable
+// weight buffer and placeholder offset definition.
+#define GLOW_GET_ADDR(mutableBaseAddr, placeholderOff)  (((uint8_t*)(mutableBaseAddr)) + placeholderOff)
+```
+
+For example, in order to allocate and initialize all the memory regions, you need
+to write the following in the user application (*lenet_mnist.weights.txt* is the
+file containing the model weights serialized as text):
+
+```c++
+GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
+uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
+#include "lenet_mnist.weights.txt"
+};
+
+GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
+uint8_t mutableWeight[LENET_MNIST_MUTABLE_MEM_SIZE];
 
-Each bundle exposes two symbols named `<network_name>` and
-`<network_name>_config`, where, again, `<network_name>` is specified by the
-`-network-name` command line option.  The `<network_name>` is the name of the
-auto-generated function that implements the network model. This symbol always
-has the following signature:
+GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
+uint8_t activations[LENET_MNIST_ACTIVATIONS_MEM_SIZE];
+```
+
+In order to obtain the absolute addresses of the model inputs/outputs
+you need to write the following in the user application:
+
+```c++
+uint8_t *inputAddr  = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_data);
+uint8_t *outputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_softmax__1);
+```
+
+## Dynamic bundle API
+
+This is the bundle API obtained by generating the bundle with the option
+`-bundle-api=dynamic`. Below is an example of how the auto-generated header
+file looks like for the Resnet50 model:
 
 ```c++
-extern "C" void network_name(uint8_t *constantWeightVars,
-                             uint8_t *mutableWeightVars,
-                             uint8_t *activations);
+// Bundle memory configuration (memory layout)
+extern BundleConfig resnet50_config;
+
+// Bundle entry point (inference function)
+void resnet50(uint8_t *constantWeight, uint8_t *mutableWeight, uint8_t *activations);
 ```
-The parameters of this function are the base addresses of the memory areas for
-constant weights variables, mutable weights variables (i.e. inputs and outputs)
-and activations.
 
-The `<network_name>_config` is a symbol that contains the configuration of
-the compiled network. The type of this symbol is always the following struct:
+This API has all the information about the memory configuration encapsulated
+in a structure named `<network_name>_config`. The layout of this structure is
+defined by the type `BundleConfig` which is also included in the generated
+header file:
+
 ```c++
+// Type describing the config of a generated bundle.
 struct BundleConfig {
   // Size of the constant weight variables memory area.
   uint64_t constantWeightVarsMemSize;
@@ -134,29 +337,42 @@ struct BundleConfig {
   const SymbolTableEntry *symbolTable;
 };
 ```
-This configuration is supposed to be used by the client code to allocate the
-required amounts of memory for each of the memory areas, before invoking the
-`<network_name>` function to run the network.
 
-Clients also use `BundleConfig` to perform the symbol table lookups when they
-need to find information about an input or output variable.
-The SymbolTableEntry always has the following structure:
+Similar to the static API, this structure contains:
+- the sizes for all the memory regions
+- the alignment required for allocating all the memory regions
+- the number of symbols
+- the descriptions of all the symbols as an array of symbol entries
+
+In this case the notion of *symbol* might include not only the model
+placeholders but also the model constant weights. Each symbol is
+described according to the `SymbolTableEntry` structure definition
+(included also in the header file):
+
 ```c++
+// Type describing a symbol table entry of a generated bundle.
 struct SymbolTableEntry {
   // Name of a variable.
   const char *name;
   // Offset of the variable inside the memory area.
   uint64_t offset;
   // The number of elements inside this variable.
   uint64_t size;
-  // The kind of the variable. 1 if it is a mutable variable, 0 otherwise.
+  // Variable kind: 1 if it is a mutable variable, 0 otherwise.
   char kind;
 };
 ```
 
-Offsets of constants are offsets inside the memory area for constant weights.
-Offsets of mutable variables are offsets inside the memory area for mutable
-weights.
+For each symbol the following information is registered:
+- the symbol name
+- the symbol kind: whether is mutable (placeholder) or not (constant)
+- the size in bytes
+- the offset: if the symbol is mutable this is the offset of the variable
+  within the `mutableWeight` buffer, otherwise this is the offset of the
+  variable within the `constantWeight` buffer
+
+The user has to look up the symbol entries to find the model variables
+(placeholders or constants) at run-time (dynamically).
 
 ## How to use the bundle
 
@@ -169,7 +385,7 @@ generally need to do the following:
 * You need to allocate the memory for constant weights variables,
 mutable weights variables (i.e. inputs and outputs) and activations based on the
 memory area sizes provided by `<network_name>_config`.
-* You need to load the content of the auto-generated `network_model_name.weights`
+* You need to load the content of the auto-generated `network_model_name.weights.bin`
 file into the constant weights variables memory area.
 * And need to initialize the mutable weights area with inputs (e.g. image data)
 * And finally, you need to invoke the `<network_name>` function with 3
@@ -193,12 +409,12 @@ The CMakeLists.txt provides the following targets:
   The concrete command line looks like this:
   `image-classifier tests/images/imagenet/cat_285.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -backend=CPU -emit-bundle <build_dir>`
   It reads the network model from `resnet50` and generates the `resnet50.o`
-  and `resnet50.weights` files into the `build_dir` directory.
+  and `resnet50.weights.bin` files into the `build_dir` directory.
 * `ResNet50BundleMain`:  it compiles the `main.cpp` file, which is the main file of the project.
   This source file gives a good idea about how to interface with an auto-generated bundle.
   It contains the code for interfacing with the auto-generated bundle.
   *  It allocated the memory areas based on their memory sizes provided in `resnet50_config`.
-  *  Then it loads the weights from the auto-generated `resnet50.weights` file.
+  *  Then it loads the weights from the auto-generated `resnet50.weights.bin` file.
   *  It loads the input image, pre-processes it and puts it into the mutable weight variables
      memory area.
   *  Once everything is setup, it invokes the compiled network model by calling the

diff --git a/examples/bundles/lenet_mnist/main.cpp b/examples/bundles/lenet_mnist/main.cpp
@@ -207,7 +207,7 @@ void parseCommandLineOptions(int argc, char **argv) {
 /// initialize.
 GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
 uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
-#include "lenet_mnist.inc"
+#include "lenet_mnist.weights.txt"
 };
 
 /// Statically allocate memory for mutable weights (model input/output data).

diff --git a/examples/bundles/resnet50/CMakeLists.txt b/examples/bundles/resnet50/CMakeLists.txt
@@ -37,7 +37,7 @@ add_custom_command(
   COMMAND
     image-classifier ${IMAGES}/dog_207.png -g -image-mode=0to1
     -m=${RESNET50_BUNDLE_DIR}/resnet50 -model-input-name=${MODEL_INPUT_NAME}
-    -backend=CPU -emit-bundle ${BUNDLE_OUTPUT_DIRECTORY}
+    -backend=CPU -emit-bundle ${BUNDLE_OUTPUT_DIRECTORY} -bundle-api=dynamic
   DEPENDS
     image-classifier ResNet50BundleDir
 )
@@ -63,7 +63,7 @@ add_custom_command(
   COMMAND
     image-classifier ${IMAGES}/dog_207.png -g -i=0to1 -load-profile=profile.yml -assert-all-nodes-quantized -keep-original-precision-for-nodes=SoftMax
     -m=${RESNET50_BUNDLE_DIR}/resnet50 -model-input-name=${MODEL_INPUT_NAME}
-    -backend=CPU -emit-bundle ${QUANTIZED_BUNDLE_OUTPUT_DIRECTORY}
+    -backend=CPU -emit-bundle ${QUANTIZED_BUNDLE_OUTPUT_DIRECTORY} -bundle-api=dynamic
   DEPENDS
   image-classifier ResNet50BundleDir
 )

diff --git a/examples/bundles/resnet50/main.cpp b/examples/bundles/resnet50/main.cpp
@@ -343,7 +343,7 @@ int main(int argc, char **argv) {
   parseCommandLineOptions(argc, argv);
   // Allocate and initialize constant and mutable weights.
   uint8_t *constantWeightVarsAddr =
-      initConstantWeights("resnet50.weights", resnet50_config);
+      initConstantWeights("resnet50.weights.bin", resnet50_config);
   uint8_t *mutableWeightVarsAddr = initMutableWeightVars(resnet50_config);
   uint8_t *activationsAddr = initActivations(resnet50_config);