std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757

lightScout · 2024-11-27T20:27:21Z

I'm experiencing a std::bad_alloc exception when attempting to load a large model (~2.16 GB) using MediaPipe's LLM inference capabilities on an iPhone 16 Pro. The app crashes during model initialization due to what appears to be a memory allocation issue.

Environment:

Device: iPhone 16 Pro
iOS Version: latest
MediaPipe Version: latest
Xcode Version: 16.1

Steps to Reproduce:

Model Preparation:

Use a large .task model file approximately 2.16 GB in size (e.g., Llama-3.2-1b-q8.task).
The model is downloaded at runtime and stored in the app's documents directory to avoid bundling it with the app.
Model Initialization Code:

Initialize the model using the following code snippet:

init(model: Model) throws {
let options = LlmInference.Options(modelPath: model.modelFileURL.path)
options.maxTokens = 512
inference = try LlmInference(options: options)
let sessionOptions = LlmInference.Session.Options()
sessionOptions.temperature = 0.2
sessionOptions.randomSeed = 2222
session = try LlmInference.Session(llmInference: inference, options: sessionOptions)
}
Run the App:

Launch the app on the iPhone 16 Pro.
The app attempts to initialize the model using the above code.
Expected Behavior:

The model should initialize successfully, allowing for on-device inference using MediaPipe's LLM capabilities.
Actual Behavior:

The app crashes with a std::bad_alloc exception during model initialization.

Here are the relevant logs and error messages:

normalizer.cc(52) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
Initialized TensorFlow Lite runtime.
INFO: Initialized TensorFlow Lite runtime.
Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
XNNPack weight cache: could not load '/private/var/mobile/Containers/Data/Application/9D9A1F20-9F60-412D-8A23-21BC0BF22DF1/tmp/Llama-3.2-1b-q8.task.xnnpack_cache': No such file or directory.
WARNING: XNNPack weight cache: could not load '/private/var/mobile/Containers/Data/Application/9D9A1F20-9F60-412D-8A23-21BC0BF22DF1/tmp/Llama-3.2-1b-q8.task.xnnpack_cache': No such file or directory.
libc++abi: terminating due to uncaught exception of type std::bad_alloc: std::bad_alloc

libsystem_kernel.dylib`__pthread_kill:
    0x1ef0db26c <+0>:  mov    x16, #0x148               ; =328 
    0x1ef0db270 <+4>:  svc    #0x80
->  0x1ef0db274 <+8>:  b.lo   0x1ef0db294               ; <+40>
    0x1ef0db278 <+12>: pacibsp 
    0x1ef0db27c <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1ef0db280 <+20>: mov    x29, sp
    0x1ef0db284 <+24>: bl     0x1ef0d6348               ; cerror_nocancel
    0x1ef0db288 <+28>: mov    sp, x29
    0x1ef0db28c <+32>: ldp    x29, x30, [sp], #0x10
    0x1ef0db290 <+36>: retab  
    0x1ef0db294 <+40>: ret

Is there a recommended way to load large models using MediaPipe on iOS devices without exceeding memory limits?

Are there any best practices or techniques within MediaPipe or TensorFlow Lite to handle large models efficiently on mobile devices?

Can MediaPipe support loading models in a way that mitigates high memory consumption, such as streaming parts of the model or more efficient memory management during initialization?

The text was updated successfully, but these errors were encountered:

kalyan2789g · 2024-11-28T09:30:17Z

@lightScout, The MediaPipe LLM Inference API offers well-defined usage guidelines. To troubleshoot the issue you're encountering, please refer to the official documentation: LLM Inference. If the problem persists after following the guidelines, please let us know, and we'll be happy to assist further.
Thanks,
@kalyan2789g

lightScout · 2024-11-28T09:39:57Z

@kalyan2789g

Thank you for your prompt response. I appreciate your reference to the official documentation for the MediaPipe LLM Inference API. I have thoroughly reviewed the guidelines provided in the documentation. However, I believe that the guidelines do not fully address the specific issue I am encountering.

As mentioned in my previous message, I am experiencing a std::bad_alloc exception when attempting to load a large .task model (~2.16 GB) using MediaPipe on an iPhone 16 Pro. Despite following the recommended practices in the documentation, the app crashes during model initialisation due to memory allocation issues.

Key Points:

Memory Constraints on iOS Devices:

iOS imposes strict per-app memory limits, which seem to be exceeded when loading large models.
The documentation does not provide guidance on handling models of this size within the memory constraints of mobile devices.

I understand that mobile devices have inherent limitations, but I was hoping to utilise MediaPipe's capabilities for on-device inference with larger models. Given that the official guidelines do not cover this scenario in detail, I kindly request further investigation into this issue.

kalyan2789g · 2024-11-28T10:19:03Z

Hi @lightScout, We are actively working with our internal team to diagnose the root cause of the issue. We will provide a resolution as soon as our investigation is complete.
Thanks,
@kalyan2789g

schmidt-sebastian · 2024-12-13T21:01:55Z

Two thoughts:

You should enable extended virtual addressing to enable your app to address virtual memory space. Please see here: https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/ios/InferenceExample/InferenceExample.entitlements#L5C1-L6C1 (this is our iOS sample app).
Models converted via AI Edge Torch currently only run on CPU, which causes higher memory consumption. The entire team is working on GPU support for these models thought, but you may have more luck with the Gemma models right now that you can download from Kaggle: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference
We also have GPU support for selected other models - https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#falcon_1b, https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#stablelm, https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#phi-2. Other models need to be converted via AI Edge Torch, and can be deployed on GPU soon.

github-actions · 2024-12-21T01:58:10Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

lightScout added the type:others issues not falling in bug, perfromance, support, build and install or feature label Nov 27, 2024

google-ml-butler bot assigned kalyan2789g Nov 27, 2024

kalyan2789g added platform:ios MediaPipe IOS issues task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup labels Nov 28, 2024

kalyan2789g added the stat:awaiting response Waiting for user response label Nov 28, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label Nov 28, 2024

schmidt-sebastian added the stat:awaiting response Waiting for user response label Dec 13, 2024

github-actions bot added the stale label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757

std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757

lightScout commented Nov 27, 2024

kalyan2789g commented Nov 28, 2024

lightScout commented Nov 28, 2024

kalyan2789g commented Nov 28, 2024

schmidt-sebastian commented Dec 13, 2024

github-actions bot commented Dec 21, 2024

std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757

std::bad_alloc Exception When Loading Large Model on iOS with MediaPipe #5757

Comments

lightScout commented Nov 27, 2024

kalyan2789g commented Nov 28, 2024

lightScout commented Nov 28, 2024

kalyan2789g commented Nov 28, 2024

schmidt-sebastian commented Dec 13, 2024

github-actions bot commented Dec 21, 2024