Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PyTorch Edge][QNNPack] Enable Depthwise Specific Conv3d Kernel for K…
…ernel Size 3x3x3 (pytorch#69315) Summary: Pull Request resolved: pytorch#69315 Uses kernels and setup modifications from earlier diffs in this stack ghstack-source-id: 146346780 Test Plan: **Correctness** - Test using QNNPack Operator-Level Test: -- Neon Kernel: As in test plan of D32217846, all tests pass -- SSE2 Kernel: ```buck test xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack:pytorch_qnnpack_test```, all tests pass - Test by Printing Results of Model-Level Test: D32122020 **Performance** *Operator Level tests from convolution.cc in D32217846* ||Before (V23 of D32217846, without newly added kernel)|After (V48 of D31966574, with newly added kernel)| |depthwise 3x3x3 static|184 ms|134 ms| |depthwise 3x3x3 runtime|181 ms|134 ms| |depthwise 3x3x3s2 static|30 ms|22 ms| |depthwise 3x3x3s2 runtime|30 ms|23 ms| |depthwise 3x3x3s1x2 static|97 ms|70 ms| |depthwise 3x3x3s1x2 runtime|96 ms|70 ms| |depthwise 3x3x3s2x1 static|53 ms|38 ms| |depthwise 3x3x3s2x1 runtime|53 ms|38 ms| |depthwise 3x3x3d2 static|104 ms|74 ms| |depthwise 3x3x3d2 runtime|103 ms|75 ms| |depthwise 3x3x3d1x2 static|158 ms|116 ms| |depthwise 3x3x3d1x2 runtime|157 ms|115 ms| |depthwise 3x3x3d2x1 static|120 ms|86 ms| |depthwise 3x3x3d2x1 runtime|120 ms|87 ms| |depthwise 3x3x3 per channel static|182 ms|134 ms| |depthwise 3x3x3 per channel runtime|184 ms|134 ms| |depthwise 3x3x3s2 per channel static|30 ms|22 ms| |depthwise 3x3x3s2 per channel runtime|31 ms|23 ms| |depthwise 3x3x3s1x2 per channel static|95 ms|70 ms| |depthwise 3x3x3s1x2 per channel runtime|95 ms|71 ms| |depthwise 3x3x3s2x1 per channel static|53 ms|39 ms| |depthwise 3x3x3s2x1 per channel runtime|55 ms|39 ms| |depthwise 3x3x3d2 per channel static|105 ms|75 ms| |depthwise 3x3x3d2 per channel runtime|103 ms|75 ms| |depthwise 3x3x3d1x2 per channel static|158 ms|116 ms| |depthwise 3x3x3d1x2 per channel runtime|158 ms|116 ms| |depthwise 3x3x3d2x1 per channel static|118 ms|87 ms| |depthwise 3x3x3d2x1 per channel runtime|119 ms|87 ms| Average Change: -36.96% (Generated with https://www.internalfb.com/intern/anp/view/?id=1371846&revision_id=291376782898627) *Model Level Test on Synthesized Conv3d Model* Model Details: - 21 channels, input size: 9 x 12 x 7, kernel size: 3x3x3 - Config added in D31928710 - Model generated with https://www.internalfb.com/intern/anp/view/?id=1313660&revision_id=248658657303993 ```buck run aibench:run_bench -- -b dw_conv_3d_3x3x3_big_2b.json --platform android/arm64 --framework pytorch --remote --devices Pixel-4a-11-30``` - Before (V23 of D32217846): [0.0935 ms](https://our.intern.facebook.com/intern/aibench/details/768298420366437) - After (V48 of D31966574): [0.0665 ms](https://our.intern.facebook.com/intern/aibench/details/67271954298132) (29% faster) * Model Level Test on Video Model-like Inputs (provided by liyilui) * - D33000199 - 87.5% faster Reviewed By: kimishpatel Differential Revision: D31966574 fbshipit-source-id: 6554a878401c1120054f6b02241456e8fb44b152
- Loading branch information