Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Vision][Fix] Enable image processing kernel on non-CUDA backends (ml…
…c-ai#2923) Prior to this PR, when compiling/running phi3.5-vision on non-CUDA backend like Metal, we would run into the following issues: - Shape inference would exceed int32 (CUDA does not run into this as we use int64 on CUDA), leading to error in runtime: ``` TVMError: Assert fail: (T.Div(new_h - 2147483185, 336) - -6391320) * 336 == T.Cast("int32", resize2d1_var_lv4_shape[1]), Argument resize2d1.var_lv4.shape[1] has an unsatisfied constraint: new_h + T.Div((new_h + 336 - 1) // 336 * 336 - new_h, 2) + ((new_h + 336 - 1) // 336 * 336 - new_h - T.Div((new_h + 336 - 1) // 336 * 336 - new_h, 2)) == T.Cast("int32", resize2d1_var_lv4_shape[1]) ``` - If naively keeping int64 on Metal, we run into: - `TVMError: Check failed: blockSize <= maxTotalThreadsPerThreadgroup (1024 vs. 896) :` - This is because when we use too many registers, number of available threads in a block decreases (to 896 here) This PR fixes the issues above. Besides, we rename `std` to `stddev` to avoid reserved name issues on backends like WGSL. Tested on Metal with: ``` python python/mlc_llm/testing/debug_chat.py "List the objects you can identify in this image succinctly." --generate-len 256 --model dist/phi-3_5-vision-q4f16_1 --model-lib dist/libs/phi-3_5-vision-q4f16_1-metal.so --debug-dir debug/ --image-url https://www.islandvulnerability.org/borders/ai8699.jpg --disable-instrument ``` --------- Co-authored-by: Ruihang Lai <[email protected]>
- Loading branch information