Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request opencv#25167 from plctlab:rvp_3rdparty
3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions opencv#25167 # Summary ### Previous context From PR opencv#24556: >> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp >> Reference documentation is here: >> >> * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html >> * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html >> * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html >> * Carotene library is turned on here: https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911 > As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well. > All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations. > This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon. ### Progress #### Part 1 (This PR) - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [x] Element-wise add and subtract - [x] Element-wise minimum or maximum - [x] Element-wise absolute difference - [x] Bitwise logical operations - [x] Element-wise compare - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - [x] Integral - [x] Threshold - [x] WarpAffine - [x] WarpPerspective - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) #### Part 2 (Next PR) **Rough Estimate. Todo List May Change.** - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - smaller remap HAL interface - AdaptiveThreshold - BoxFilter - Canny - Convert - Filter - GaussianBlur - MedianBlur - Morph - Pyrdown - Resize - Scharr - SepFilter - Sobel - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) - FAST ### Performance Tests The optimization does not contain floating point opreations. **Absolute Difference** Geometric mean (ms) |Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)| |---|:-:|:-:|:-:| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01| **Bitwise Operation** Geometric mean (ms) |Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)| |---|:-:|:-:|:-:| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
- Loading branch information