forked from opencv/opencv
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request opencv#7960 from catree:tutorial_parallel_for_
Add OpenCV parallel_for_ tutorial.
- Loading branch information
Showing
6 changed files
with
312 additions
and
0 deletions.
There are no files selected for viewing
183 changes: 183 additions & 0 deletions
183
...s/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,183 @@ | ||
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_} | ||
================================================================== | ||
|
||
Goal | ||
---- | ||
|
||
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily | ||
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set | ||
exploiting almost all the CPU load available. | ||
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). | ||
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended | ||
to remain simple. | ||
|
||
Precondition | ||
---- | ||
|
||
The first precondition is to have OpenCV built with a parallel framework. | ||
In OpenCV 3.2, the following parallel frameworks are available in that order: | ||
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled) | ||
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled) | ||
3. OpenMP (integrated to compiler, should be explicitly enabled) | ||
4. APPLE GCD (system wide, used automatically (APPLE only)) | ||
5. Windows RT concurrency (system wide, used automatically (Windows RT only)) | ||
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10)) | ||
7. Pthreads (if available) | ||
|
||
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries | ||
are third party libraries and have to be explictly built and enabled in CMake (e.g. TBB, C=), others are | ||
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to | ||
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library. | ||
|
||
The second (weak) precondition is more related to the task you want to achieve as not all computations | ||
are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be splitted | ||
into multiple elementary operations with no memory dependency (no possible race condition) are easily | ||
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of | ||
one pixel does not depend to the state of other pixels. | ||
|
||
Simple example: drawing a Mandelbrot set | ||
---- | ||
|
||
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt | ||
the code to parallize the computation. | ||
|
||
Theory | ||
----------- | ||
|
||
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician | ||
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a | ||
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a | ||
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth | ||
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set). | ||
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article). | ||
|
||
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration | ||
> of the quadratic map | ||
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f] | ||
> remains bounded. | ||
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying | ||
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets. | ||
> This can also be represented as | ||
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f] | ||
Pseudocode | ||
----------- | ||
|
||
A simple algorithm to generate a representation of the Mandelbrot set is called the | ||
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm). | ||
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not | ||
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas | ||
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will | ||
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations | ||
needed to "escape" to depict the pixel value in the image. | ||
|
||
``` | ||
For each pixel (Px, Py) on the screen, do: | ||
{ | ||
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1)) | ||
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1)) | ||
x = 0.0 | ||
y = 0.0 | ||
iteration = 0 | ||
max_iteration = 1000 | ||
while (x*x + y*y < 2*2 AND iteration < max_iteration) { | ||
xtemp = x*x - y*y + x0 | ||
y = 2*x*y + y0 | ||
x = xtemp | ||
iteration = iteration + 1 | ||
} | ||
color = palette[iteration] | ||
plot(Px, Py, color) | ||
} | ||
``` | ||
|
||
To relate between the pseudocode and the theory, we have: | ||
* \f$ z = x + iy \f$ | ||
* \f$ z^2 = x^2 + i2xy - y^2 \f$ | ||
* \f$ c = x_0 + iy_0 \f$ | ||
|
||
![](images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png) | ||
|
||
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis. | ||
You can see that the whole shape can be repeatedly visible if we zoom at particular locations. | ||
|
||
Implementation | ||
----------- | ||
|
||
Escape time algorithm implementation | ||
-------------------------- | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm | ||
|
||
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a | ||
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration. | ||
|
||
Sequential Mandelbrot implementation | ||
-------------------------- | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential | ||
|
||
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the | ||
pixel is likely to belong to the Mandelbrot set or not. | ||
|
||
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with: | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation | ||
|
||
Finally, to assign the grayscale value to the pixels, we use the following rule: | ||
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set), | ||
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range. | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value | ||
|
||
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost | ||
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his | ||
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)): | ||
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$ | ||
|
||
![](images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png) | ||
|
||
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation | ||
and you can observe how the lowest values will be boosted when looking at the slope at these positions. | ||
|
||
Parallel Mandelbrot implementation | ||
-------------------------- | ||
|
||
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the | ||
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern | ||
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework. | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel | ||
|
||
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the | ||
`virtual void operator ()(const cv::Range& range) const`. | ||
|
||
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread. | ||
This splitting is done automatically to distribuate equally the computation load. We have to convert the pixel index coordinate | ||
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place | ||
the image. | ||
|
||
The parallel execution is called with: | ||
|
||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call | ||
|
||
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image. | ||
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the | ||
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)` | ||
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the | ||
workload only on two threads. | ||
|
||
Results | ||
----------- | ||
|
||
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). | ||
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads | ||
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X. | ||
Main reasons should be mostly due to: | ||
* the overhead to create and manage the threads, | ||
* background processes running in parallel, | ||
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores. | ||
|
||
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color | ||
depending on the escaped iteration and using a color palette to get more aesthetic images): | ||
![Mandelbrot set with xMin=-2.1, xMax=0.6, yMin=-1.2, yMax=1.2, maxIterations=500](images/how_to_use_OpenCV_parallel_for_Mandelbrot.png) |
Binary file added
BIN
+16.4 KB
...V_parallel_for_/images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+61.8 KB
...o_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_Mandelbrot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+33 KB
...rallel_for_/images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
122 changes: 122 additions & 0 deletions
122
...pp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
#include <iostream> | ||
#include <opencv2/core.hpp> | ||
#include <opencv2/imgcodecs.hpp> | ||
|
||
using namespace std; | ||
using namespace cv; | ||
|
||
namespace | ||
{ | ||
//! [mandelbrot-escape-time-algorithm] | ||
int mandelbrot(const complex<float> &z0, const int max) | ||
{ | ||
complex<float> z = z0; | ||
for (int t = 0; t < max; t++) | ||
{ | ||
if (z.real()*z.real() + z.imag()*z.imag() > 4.0f) return t; | ||
z = z*z + z0; | ||
} | ||
|
||
return max; | ||
} | ||
//! [mandelbrot-escape-time-algorithm] | ||
|
||
//! [mandelbrot-grayscale-value] | ||
int mandelbrotFormula(const complex<float> &z0, const int maxIter=500) { | ||
int value = mandelbrot(z0, maxIter); | ||
if(maxIter - value == 0) | ||
{ | ||
return 0; | ||
} | ||
|
||
return cvRound(sqrt(value / (float) maxIter) * 255); | ||
} | ||
//! [mandelbrot-grayscale-value] | ||
|
||
//! [mandelbrot-parallel] | ||
class ParallelMandelbrot : public ParallelLoopBody | ||
{ | ||
public: | ||
ParallelMandelbrot (Mat &img, const float x1, const float y1, const float scaleX, const float scaleY) | ||
: m_img(img), m_x1(x1), m_y1(y1), m_scaleX(scaleX), m_scaleY(scaleY) | ||
{ | ||
} | ||
|
||
virtual void operator ()(const Range& range) const | ||
{ | ||
for (int r = range.start; r < range.end; r++) | ||
{ | ||
int i = r / m_img.cols; | ||
int j = r % m_img.cols; | ||
|
||
float x0 = j / m_scaleX + m_x1; | ||
float y0 = i / m_scaleY + m_y1; | ||
|
||
complex<float> z0(x0, y0); | ||
uchar value = (uchar) mandelbrotFormula(z0); | ||
m_img.ptr<uchar>(i)[j] = value; | ||
} | ||
} | ||
|
||
ParallelMandelbrot& operator=(const ParallelMandelbrot &) { | ||
return *this; | ||
}; | ||
|
||
private: | ||
Mat &m_img; | ||
float m_x1; | ||
float m_y1; | ||
float m_scaleX; | ||
float m_scaleY; | ||
}; | ||
//! [mandelbrot-parallel] | ||
|
||
//! [mandelbrot-sequential] | ||
void sequentialMandelbrot(Mat &img, const float x1, const float y1, const float scaleX, const float scaleY) | ||
{ | ||
for (int i = 0; i < img.rows; i++) | ||
{ | ||
for (int j = 0; j < img.cols; j++) | ||
{ | ||
float x0 = j / scaleX + x1; | ||
float y0 = i / scaleY + y1; | ||
|
||
complex<float> z0(x0, y0); | ||
uchar value = (uchar) mandelbrotFormula(z0); | ||
img.ptr<uchar>(i)[j] = value; | ||
} | ||
} | ||
} | ||
//! [mandelbrot-sequential] | ||
} | ||
|
||
int main() | ||
{ | ||
//! [mandelbrot-transformation] | ||
Mat mandelbrotImg(4800, 5400, CV_8U); | ||
float x1 = -2.1f, x2 = 0.6f; | ||
float y1 = -1.2f, y2 = 1.2f; | ||
float scaleX = mandelbrotImg.cols / (x2 - x1); | ||
float scaleY = mandelbrotImg.rows / (y2 - y1); | ||
//! [mandelbrot-transformation] | ||
|
||
double t1 = (double) getTickCount(); | ||
//! [mandelbrot-parallel-call] | ||
ParallelMandelbrot parallelMandelbrot(mandelbrotImg, x1, y1, scaleX, scaleY); | ||
parallel_for_(Range(0, mandelbrotImg.rows*mandelbrotImg.cols), parallelMandelbrot); | ||
//! [mandelbrot-parallel-call] | ||
t1 = ((double) getTickCount() - t1) / getTickFrequency(); | ||
cout << "Parallel Mandelbrot: " << t1 << " s" << endl; | ||
|
||
Mat mandelbrotImgSequential(4800, 5400, CV_8U); | ||
double t2 = (double) getTickCount(); | ||
sequentialMandelbrot(mandelbrotImgSequential, x1, y1, scaleX, scaleY); | ||
t2 = ((double) getTickCount() - t2) / getTickFrequency(); | ||
cout << "Sequential Mandelbrot: " << t2 << " s" << endl; | ||
cout << "Speed-up: " << t2/t1 << " X" << endl; | ||
|
||
imwrite("Mandelbrot_parallel.png", mandelbrotImg); | ||
imwrite("Mandelbrot_sequential.png", mandelbrotImgSequential); | ||
|
||
return EXIT_SUCCESS; | ||
} |