Suppose that we spawn
$\text{P-FIB}(n - 2)$ in line 4 of$\text{P-FIB}$ , rather than calling it as is done in the code. What is the impact on the asymptotic work, span, and parallelism?
(Removed)
Draw the computation dag that results from executing
$\text{P-FIB}(5)$ . Assuming that each strand in the computation takes unit time, what are the work, span, and parallelism of the computation? Show how to schedule the dag on 3 processors using greedy scheduling by labeling each strand with the time step in which it is executed.
- Work:
$T_1 = 29$ . - Span:
$T_\infty = 10$ . - Parallelism:
$T_1 / T_\infty \approx 2.9$ .
Prove that a greedy scheduler achieves the following time bound, which is slightly stronger than the bound proven in Theorem 27.1:
$$T_P \le \frac{T_1 - T_\infty}{P} + T_\infty. \tag{27.5}$$
Suppose that there are x incomplete steps in a run of the program. Since each of these steps causes at least one unit of work to be done, we have that there is at most
This is a contradiction because there are only
Where the second inequality comes by noting that the middle expression, as a function of
Construct a computation dag for which one execution of a greedy scheduler can take nearly twice the time of another execution of a greedy scheduler on the same number of processors. Describe how the two executions would proceed.
The computation is given in the image below. Let vertex
Professor Karan measures her deterministic multithreaded algorithm on
$4$ ,$10$ , and$64$ processors of an ideal parallel computer using a greedy scheduler. She claims that the three runs yielded$T_4 = 80$ seconds,$T_{10} = 42$ seconds, and$T_{64} = 10$ seconds. Argue that the professor is either lying or incompetent. ($\textit{Hint:}$ Use the work law$\text{(27.2)}$ , the span law$\text{(27.3)}$ , and inequality$\text{(27.5)}$ from Exercise 27.1-3.)
(Removed)
Give a multithreaded algorithm to multiply an
$n \times n$ matrix by an$n$ -vector that achieves$\Theta(n^2 / \lg n)$ parallelism while maintaining$\Theta(n^2)$ work.
(Removed)
Consider the following multithreaded pseudocode for transposing an
$n \times n$ matrix$A$ in place:P-TRANSPOSE(A) n = A.rows parallel for j = 2 to n parallel for i = 1 to j - 1 exchange a[i, j] with a[j, i]Analyze the work, span, and parallelism of this algorithm.
(Removed)
Suppose that we replace the parallel for loop in line 3 of
$\text{P-TRANSPOSE}$ (see Exercise 27.1-7) with an ordinary for loop. Analyze the work, span, and parallelism of the resulting algorithm.
(Removed)
For how many processors do the two versions of the chess programs run equally fast, assuming that
$T_P = T_1 / P + T_\infty$ ?
(Removed)