Below is the layout of the examples/mediatek
directory, which includes the necessary files for the example applications:
examples/mediatek
├── aot_utils # Utils for AoT export
├── llm_utils # Utils for LLM models
├── preformatter_templates # Model specific prompt preformatter templates
├── prompts # Calibration Prompts
├── tokenizers_ # Model tokenizer scripts
├── oss_utils # Utils for oss models
├── eval_utils # Utils for eval oss models
├── model_export_scripts # Model specifc export scripts
├── models # Model definitions
├── llm_models # LLM model definitions
├── weights # LLM model weights location (Offline) [Ensure that config.json, relevant tokenizer files and .bin or .safetensors weights file(s) are placed here]
├── executor_runner # Example C++ wrapper for the ExecuTorch runtime
├── pte # Generated .pte files location
├── shell_scripts # Shell scripts to quickrun model specific exports
├── CMakeLists.txt # CMake build configuration file for compiling examples
├── requirements.txt # MTK and other required packages
├── mtk_build_examples.sh # Script for building MediaTek backend and the examples
└── README.md # Documentation for the examples (this file)
- Follow the instructions of Prerequisites and Setup in
backends/mediatek/scripts/README.md
.
- Build the backend and the examples by exedcuting the script:
./mtk_build_examples.sh
- Exporting Models to
.pte
- In the
examples/mediatek directory
, run:
source shell_scripts/export_llama.sh <model_name> <num_chunks> <prompt_num_tokens> <cache_size> <calibration_set_name>
- Defaults:
model_name
= llama3num_chunks
= 4prompt_num_tokens
= 128cache_size
= 1024calibration_set_name
= None
- Argument Explanations/Options:
model_name
: llama2/llama3 Note: Currently Only Tested on Llama2 7B Chat and Llama3 8B Instruct.num_chunks
: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Will result innum_chunks
number of.pte
files being generated. Typical values are 1, 2 and 4.prompt_num_tokens
: Number of tokens (> 1) consumed each forward pass for the prompt processing stage.cache_size
: Cache Size.calibration_set_name
: Name of calibration dataset with extension that is found inside theaot_utils/llm_utils/prompts
directory. Example:alpaca.txt
. If"None"
, will use dummy data to calibrate. Note: Export script example only tested on.txt
file.
.pte
files will be generated inexamples/mediatek/pte
- Users should expect
num_chunks*2
number of pte files (half of them for prompt and half of them for generation). - Generation
.pte
files have "1t
" in their names. - Additionally, an embedding bin file will be generated in the weights folder where the
config.json
can be found in. [examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin
] - eg. For
llama3-8B-instruct
, embedding bin generated inexamples/mediatek/models/llm_models/weights/llama3-8B-instruct/
- AoT flow will take roughly 2.5 hours (114GB RAM for
num_chunks=4
) to complete (Results will vary by device/hardware configurations)
- Users should expect
- Exporting Model to
.pte
bash shell_scripts/export_oss.sh <model_name>
- Argument Options:
model_name
: deeplabv3/edsr/inceptionv3/inceptionv4/mobilenetv2/mobilenetv3/resnet18/resnet50
To set up the build environment for the mtk_executor_runner
:
- Navigate to the
backends/mediatek/scripts
directory within the repository. - Follow the detailed build steps provided in that location.
- Upon successful completion of the build steps, the
mtk_executor_runner
binary will be generated.
Transfer the .pte
model files and the mtk_executor_runner
binary to your Android device using the following commands:
adb push mtk_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
adb push <MODEL_NAME>.pte <PHONE_PATH, e.g. /data/local/tmp>
Make sure to replace <MODEL_NAME>
with the actual name of your model file. And, replace the <PHONE_PATH>
with the desired detination on the device.
adb push mtk_oss_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
adb push input_list.txt <PHONE_PATH, e.g. /data/local/tmp>
for i in input*bin; do adb push "$i" <PHONE_PATH, e.g. /data/local/tmp>; done;
Execute the model on your Android device by running:
adb shell "/data/local/tmp/mtk_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --iteration <ITER_TIMES>"
In the command above, replace <MODEL_NAME>
with the name of your model file and <ITER_TIMES>
with the desired number of iterations to run the model.
Note: For llama models, please use mtk_llama_executor_runner
. Refer to examples/mediatek/executor_runner/run_llama3_sample.sh
for reference.
adb shell "/data/local/tmp/mtk_oss_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --input_list /data/local/tmp/input_list.txt --output_folder /data/local/tmp/output_<MODEL_NAME>"
adb pull "/data/local/tmp/output_<MODEL_NAME> ./"
python3 eval_utils/eval_oss_result.py --eval_type <eval_type> --target_f <golden_folder> --output_f <prediction_folder>
For example:
python3 eval_utils/eval_oss_result.py --eval_type piq --target_f edsr --output_f output_edsr
- Argument Options:
eval_type
: topk/piq/segmentationtarget_f
: folder contain golden data files. file name isgolden_<data_idx>_0.bin
output_f
: folder contain model output data files. file name isoutput_<data_idx>_0.bin