Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[coreml] Introducing Quantization (pytorch#78108)
Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models Test Plan: https://fburl.com/anp/r0ntsbq0 Notebook runnining through quantization workflow: created a custom bentos kernel to run it through coreml ```bento_kernel( name = "coreml", deps = [ "fbsource//third-party/pypi/coremltools:coremltools", "//caffe2:coreml_backend", "//caffe2:coreml_backend_cpp", "//caffe2:torch", "//caffe2/torch/fb/mobile/model_exporter:model_exporter", ], ) ``` Initial benchmarks on iPhone 11: FP32 Core ML Model: https://our.intern.facebook.com/intern/aibench/details/203998485252700 Quantized Core ML Model: https://our.intern.facebook.com/intern/aibench/details/927584023592505 High End Quantized Model: https://our.intern.facebook.com/intern/aibench/details/396271714697929 Summarized Results | Backend | Quantization | p50 net latency | Model Size | |---------|--------------|-----------------|------------| | Core ML | No | 1.2200 | 1.2mb | | Core ML | Yes | 1.2135 | 385kb | | CPU | Yes | 3.1720 | 426kb | Reviewed By: SS-JIA Differential Revision: D36559966 Pull Request resolved: pytorch#78108 Approved by: https://github.com/jmdetloff
- Loading branch information