forked from liuxinyu95/AlgoXY
-
Notifications
You must be signed in to change notification settings - Fork 0
/
search-en.tex
3668 lines (3085 loc) · 153 KB
/
search-en.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\ifx\wholebook\relax \else
\documentclass[b5paper]{article}
\usepackage[nomarginpar
%, margin=.5in
]{geometry}
\addtolength{\oddsidemargin}{-0.05in}
\addtolength{\evensidemargin}{-0.05in}
\addtolength{\textwidth}{0.1in}
\usepackage[en]{../prelude}
\setcounter{page}{1}
\begin{document}
\title{Solution search}
\author{Xinyu~LIU
\thanks{{\bfseries Xinyu LIU} \newline
Email: [email protected] \newline}
}
\maketitle
\fi
\markboth{Solution search}{Elementary Algorithms}
\ifx\wholebook\relax
\chapter{Solution search}
\numberwithin{Exercise}{chapter}
\fi
\def\includetikz{}
Computers enables people to search the solution for many problems: we build robot to search and pick the gadget in assembly lane; we develop car navigator to search the map for the best route; we make smart phone application to search the best shopping plan. This chapter is about the elementary lookup, matching, and solution search algorithms.
\section{$k$ selection problem}
\index{Selection algorithm}
A selection problem is to find the $k$-th smallest (or largest, $k > 0$) element in a collection $xs$ (list or array). The ordering is abstract, denoted as $\leq$. The intuitive method repeatedly finds the minimum for $k$ times. It takes $O(n)$ time to find the minimum, where $n = |xs|$ is the size. The total performance is bound to $O(kn)$. Alternatively, we can use heap to update, access the top element in $O(\lg n)$ time, hence find the $k$-th element in $O(k \lg n)$ time.
\be
top\ k\ xs = find\ k\ (\textit{heapify}\ xs)
\ee
Or in Curried form:
\be
top\ k\ = (find\ k) \circ \textit{heapify}
\label{eq:kth-heap1}
\ee
Where:
\be
\begin{array}{rcl}
find\ 1 & = & top \\
find\ k & = & (find\ (k - 1)) \circ pop
\end{array}
\label{eq:kth-heap2}
\ee
We can do it even better with the divide and conquer method. Pick an arbitrary element $p$ in $xs$, split $xs$ into $as$ and $bs$ with $p$ ($X = as \doubleplus [p] \doubleplus bs$), where $as = [x \gets xs, x \leq p]$ and $B = [x \gets xs, x > p]$. Let $m = |as|$ be the size of $as$, compare $m$ and $k$:
\begin{enumerate}
\item If $m = k - 1$, then $p$ is the $k$-th element;
\item If $m < k - 1$, the $k$-th element is in $as$, drop $bs $ and recursively search in $as$;
\item If $m > k - 1$, the $k$-th element is in $bs$, drop $as$ and recursively search the $(k-m)$-th element in $bs$.
\end{enumerate}
Reuse the \textit{part} function in quick sort (see \cref{eq:qsort-partition}):
\be
top\ k\ (x \cons xs) = \begin{cases}
m = k - 1: & x, \text{where}\ m = |as|, (as, bs) = part\ (\leq x)\ xs \\
m < k - 1: & top\ (k - m)\ bs \\
\text{otherwise}: & top\ k\ as \\
\end{cases}
\ee
In ideal case, the split is balanced (the sizes of $as$ and $bs$ are almost same), halves the size every time. The performance is $O(n + n/2 + n/4 +...) = O(n)$. Same as the quick sort algorithm, the worst case happens when the partition is always unbalanced. The performance downgrades to $O(kn)$ or $O((n-k)n)$. In average case, we can find the $k$-th element in linear time. Most engineering practices in quick sort are applicable too, like the `media of three'\footnote{Blum, Floyd, Pratt, Rivest, and Tarjan developed a linear time algorithm in 1973\cite{CLRS}\cite{median-of-median}. Split the elements into groups, each has 5 elements at most. It gives $n/5$ medians. Repeat this to pick the median of median.}, and randomly pivot:
\begin{algorithmic}[1]
\Function{Top}{$k, xs, l, u$}
\State \textproc{Exchange} $xs[l] \leftrightarrow xs[$ \Call{Random}{$l, u$} $]$ \Comment{Randomly select in $[l, u]$}
\State $p \gets$ \Call{Partition}{$xs, l, u$}
\If{$p - l + 1 = k$}
\State \Return $xs[p]$
\EndIf
\If{$k < p - l + 1$}
\State \Return \Call{Top}{$k, xs, l, p-1$}
\EndIf
\State \Return \Call{Top}{$k - p + l - 1, xs, p + 1, u$}
\EndFunction
\end{algorithmic}
We can change to return all the top $k$ elements (in arbitrary order), as below example program:
\lstset{frame = single}
\begin{Haskell}
tops _ [] = []
tops 0 _ = []
tops n (x:xs) | len == k = as
| len < n = as ++ [x] ++ tops (n - len - 1) bs
| otherwise = tops n as
where
(as, bs) = partition (<= x) xs
len = length as
\end{Haskell}
\section{Binary search}
\index{Binary search}
My high school teacher once played a `math magic'. He asked a student to pick a number from 0 to 1000 in mind. He asked 10 questions, then figured out that number based on the yes/no answers from the student. For example: is it even? is it prime? can it be divided by 3? and etc. If halves the numbers with every question, one can find any number within 1000 because $2^{10} = 1024 > 1000$. The question of whether it is even, perfectly halves numbers\footnote{There's a `mind reading' game in social network. One thinks about a person in mind. The AI robot asks 16 questions, and tells who that person is from the yes/no answers}. The game becomes not so interesting when the player guess like: 1000, high; 50, low; 750, low; 890, low; 990, correct! This is the {\em binary search} method. To find $x$ in an ordered sequence $A$, one firstly tries the middle point $y$. Done if $x = y$; if $x < y$, then drop the second half of $A$ as it's ordered; otherwise drop the first half. When $A = [\ ]$, then $x$ doesn't exist. $A$ need be ordered, I often see people are struggled with unordered data, confusing why the binary search does not work. {\em `Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky'} said Donald Knuth. Jon Bentley said most binary search implementations had error, including the one he gave in `{\em Programming pearls}'. He corrected the error after two decades\cite{Bentley}. Below is the binary search definition, where the lower and upper bounds of $A$ is $l$ and $u$ (exclude $u$).
\be
\textit{bsearch}\ x\ A\ (l, u) = \begin{cases}
u < l: & \textit{Nothing} \\
x = A[m]: & \textit{Just}\ m, \text{where}\ m = l + \lfloor \dfrac{u - l}{2} \rfloor \\
x < A[m]: & \textit{bsearch}\ x\ A\ (l, m - 1) \\
\text{otherwise}: & \textit{bsearch}\ x\ A\ (m + 1, u) \\
\end{cases}
\ee
Or implement with loops:
\begin{algorithmic}[1]
\Function{Binary-Search}{$x, A, l, u$}
\While{$l < u$}
\State $m \gets l + \lfloor \dfrac{u - l}{2} \rfloor$ \Comment{avoid $\lfloor \dfrac{l+u}{2} \rfloor$ overflow}
\If{$A[m] = x$}
\State \Return $m$
\EndIf
\If{$x < A[m]$}
\State $u \gets m - 1$
\Else
\State $l \gets m + 1$
\EndIf
\EndWhile
\State Not found
\EndFunction
\end{algorithmic}
The performance of binary search is bound to $O(\lg n)$ because it halves $A$ every time. We can extend it to solve equation of monotone functions, for example $a^x = y$, where $a \leq y$, $a$ and $y$ are nature numbers. To find the integral $x$, exhaust $a^0, a^1, a^2, ...$, till $a^i = y$ or $a^i < y < a^{i+1}$ (no solution). If $a$ and $x$ are big numbers, it's expensive to compute $a^x$ in loops\footnote{One can reuse the result of $a^n$ to compute $a^{n + 1} = a a^n$. We consider generic monotone $f(n)$.}. Let's apply binary search. As $a^y \geq y$, we search in $[0, 1, ..., y]$. Function $f(x) = a^x$ is monotone, fix $x$, we examine the middle point: $x_m = \lfloor \dfrac{0 + y}{2} \rfloor$. If $a^{x_m} = y$, then $x_m$ is the solution; if $a^{m} < y$, we discard the range before $x_m$; otherwise discard the range after $x_m$. Both halve the search range. When the range becomes empty, it means no solution. Below is the implementation. Denote the monotone function as $f$, call $bsearch\ f\ y\ (0, y)$, where $f(x) = a^x$ to start. This method computes $f(x)$ for $O(\lg y)$ times, better than the exhaustive search.
\be
\textit{bsearch}\ f\ y\ (l, u) = \begin{cases}
u < l: & \textit{Nothing} \\
f(m) = y: & \textit{Just}\ m, \text{where}\ m = \lfloor \dfrac{l + u}{2} \rfloor \\
f(m) < y: & \textit{bsearch}\ f\ y\ (m + 1, u) \\
f(m) > y: & \textit{bsearch}\ f\ y\ (l, m - 1) \\
\end{cases}
\label{eq:bsearch}
\ee
\subsection{2D search}
Extend binary search to 2D or even higher dimension. Consider matrix $M$ of size $m \times n$. The elements in each row, column are ascending nature numbers as shown in \cref{fig:matrix-eg}. How to locate all elements equal to $z$? i.e. find all locations of $(i, j)$, such that $M_{i,j} = z$.
\be
[(x, y) | x \gets [1, 2,..., m], y \gets [1, 2,..., n], M_{x, y} = z]
\label{eq:bsearch-brute}
\ee
\begin{figure}[htbp]
\centering
\[
\left [
\begin{array}{ccccc}
1 & 2 & 3 & 4 & ... \\
2 & 4 & 5 & 6 & ... \\
3 & 5 & 7 & 8 & ... \\
4 & 6 & 8 & 9 & ... \\
... \\
\end{array}
\right ]
\]
\caption{Each row, column is ascending.}
\label{fig:matrix-eg}
\end{figure}
Richard Bird used to interview students with this question\cite{fp-pearls}. Those who had programming experience at school tended to apply binary search. But it was easy to get stuck. One often checks the middle point $M_{\frac{m}{2}, \frac{n}{2}}$. If it is less than $z$, then drop the top-left rectangle; if greater than $z$ then drop the bottom-right rectangle, as shown in \cref{fig:bsearch-2D}, discard the shaded rectangle. Both cases lead to a L-shape search area, where one can't apply recursive search directly any more. Define the 2D search as: given $f(x, y)$, search integer solution $(x, y)$, such that $f(x, y) = z$. The matrix search is just a special case as:
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.5]{img/binary-search-2d}
\caption{Left: the middle point $< z$, all shaded rectangle $< z$; Right: the middle point $> z$, all shaded rectangle $> z$.}
\label{fig:bsearch-2D}
\end{figure}
\[
f(x, y) = \begin{cases}
1 \leq x \leq m, 1 \leq y \leq n: & M_{x, y} \\
\text{otherwise}: & -1 \\
\end{cases}
\]
\index{Saddle back search}
For monotone function $f(x, y)$, e.g., $f(x, y) = x^a + y^b$, where $a, b$ are nature numbers, the effective method is to search from the top-left, but not bottom-left\cite{saddle-back}. As shown in \cref{fig:saddleback-1}, start from $(0, z)$, for each location $(p, q)$, compare $f(p, q)$ and $z$:
\begin{enumerate}
\item If $f(p, q) < z$, since $f$ is monotone increasing, $f(p, y) < z$ for all $0 \leq y < q$. Drop all points in the vertical line segment ({\color{red}red});
\item If $f(p, q) > z$, then $f(x, q) > z$ for all $p < x \leq z$. Drop all points in the horizontal line segment ({\color{blue}blue});
\item If $f(p, q) = z$, then $(p, q)$ is a solution. Drop both line segments.
\end{enumerate}
Reduce the search rectangle line by line, every time drop a row, or a column, or both.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.5]{img/saddle-back-start}
\caption{Search from top-left.}
\label{fig:saddleback-1}
\end{figure}
Define \textit{search} function as below, and pass the top-left corner: $search\ f\ z\ 0\ z$
\be
\textit{search}\ f\ z\ p\ q\ = \begin{cases}
p > z\ \text{or}\ q < 0: & [\ ] \\
f(p, q) < z: & \textit{search}\ f\ z\ (p + 1)\ q \\
f(p, q) > z: & \textit{search}\ f\ z\ p\ (q - 1) \\
f(p, q) = z: & (p, q) : \textit{search}\ f\ z\ (p + 1)\ (q - 1) \\
\end{cases}
\ee
Every time, at least one of $p$ and $q$ advances towards the bottom or right by one. It needs at most $2(z+1)$ steps. There are three best cases: (1) both $p$ and $q$ advance one a time, there are total $z + 1$ steps. As in \cref{fig:saddleback-1-cases} (a), all points in the diagonal line $(x, z-x)$ satisfy $f(x, z-x) = z$. It reaches to $(z, 0)$ in $z + 1$ steps; (2) move to the right horizontally till $p$ exceeds $z$. As in \cref{fig:saddleback-1-cases} (b), all points in the top horizontal line $(x, z)$ satisfy $f(x, z) < z$. It terminates after $z + 1$ steps; (3) move down vertically till $q$ becomes negative. As in \cref{fig:saddleback-1-cases} (c), all points in the left vertical line $(0, x)$ satisfy $f(0, x) > z$. It terminates after $z + 1$ steps; (d) is the worst case. If project all the horizontal sections in the search path to $x$ axis, all the vertical sections to $y$ axis, it gives the total steps of $2(z+1)$. This method improves the performance of exhaustive search from $O(z^2)$ to $O(z)$.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.5]{img/saddle-back-paths}
\caption{The best and worst cases.}
\label{fig:saddleback-1-cases}
\end{figure}
This algorithm is called the {\em `saddle back'} search. The plot image of $f$ has the smallest bottom-left and the largest top-right. It looks like a saddle with two wings as shown in \cref{fig:saddleback-frame}. We can further reduce the search rectangle of $(0, z)- (z, 0)$. Since $f$ is monotone increasing, find the maximum $m$ along $y$ axis satisfying $f(0, m) \leq z$; find the maximum $n$ along $x$ axis satisfying $f(n, 0) \leq z$. Reduce the search rectangle to $(0, m) - (n, 0)$, as shown in \cref{fig:saddleback-2}.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.8]{img/saddleback-xx-yy}
\caption{Plot of $f(x, y) = x^2 + y^2$.}
\label{fig:saddleback-frame}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{img/saddle-back-area}
\caption{Reduced search rectangle.}
\label{fig:saddleback-2}
\end{figure}
\be
\begin{cases}
m & = \max\ \{0 \leq y \leq z, f(0, y) \leq z\} \\
n & = \max\ \{0 \leq x \leq z, f(x, 0) \leq z\}
\end{cases}
\ee
We can apply binary search to find $m$, $n$ (fix $x = 0$ to search $m$, fix $y = 0$ to search $n$). Modify \cref{eq:bsearch}, search $l \leq x \leq u$ satisfying $f(x) \leq y < f(x+1)$.
\be
\textit{bsearch}\ f\ y\ (l, u) = \begin{cases}
u \leq l: & l \\
f(m) \leq y < f(m + 1): & m, \text{where}\ m = \lfloor \dfrac{l + u}{2} \rfloor \\
f(m) \leq y: & \textit{bsearch}\ f\ y\ (m + 1, u) \\
f(m) > y : & \textit{bsearch}\ f\ y\ (l, m - 1) \\
\end{cases}
\label{eq:bsearch-general}
\ee
Then determine $m$, $n$ with binary search:
\be
\begin{cases}
m & = \textit{bsearch}\ (y \mapsto f(0, y))\ z\ (0, z) \\
n & = \textit{bsearch}\ (x \mapsto f(x, 0))\ z\ (0, z) \\
\end{cases}
\label{eq:bsearch-boundaries}
\ee
Finally, apply saddle back search in this smaller rectangle: $solve(f, z) = search\ f\ z\ 0\ \pmb{m}$
\be
\textit{search}\ f\ z\ p\ q\ = \begin{cases}
p > \pmb{n}\ \text{or}\ q < 0: & [\ ] \\
f(p, q) < z: & \textit{search}\ f\ z\ (p + 1)\ q \\
f(p, q) > z: & \textit{search}\ f\ z\ p\ (q - 1) \\
f(p, q) = z: & (p, q) : \textit{search}\ f\ z\ (p + 1)\ (q - 1) \\
\end{cases}
\ee
We apply two rounds of binary search to find $m$, $n$, each round computes $f$ for $O(\lg z)$ times; The saddle back search computes $f$ for $O(m + n)$ times in the worst case; it's $O(\min(m, n))$ in the best case as below table. For functions like $f(x, y) = x^a + y^b$, $a, b \in \mathbb{N}$, the boundary $m$, $n$ are very small. The total performance is close to $O(\lg z)$.
\btab{|l|l|}
\hline
& steps to compute $f$ \\
\hline
worst & $2 \log z + m + n$ \\
\hline
best & $2 \log z + \min(m, n)$ \\
\hline
\etab
As shown in \cref{fig:saddleback-drop}, for a point $(p, q)$ in rectangle $(a, b) - (c, d)$, if $f(p, q) \neq z$, we can only discard the shaded part ($\leq 1/4$). If $f(p, q) = z$, we can discard the bottom-left, top-right parts, and all points in row $p$ and column $q$ since $f$ is monotone. Hence reduce the search rectangle by 1/2. To find the point satisfying $f(p, q) = z$, we apply binary search along the horizontal or vertical central line. Because the performance is bound to $O(\lg |L|)$ for line $L$, we chose the shorter central line as shown in \cref{fig:saddleback-centerline}.
\begin{figure}[htbp]
\centering
\subcaptionbox{If $f(p, q) \neq z$, we can only drop the shaded area, the remaining is a 'L' shape.}{\includegraphics[scale=0.6]{img/saddleback-l-area}} \\
\subcaptionbox{If $f(p, q) = z$, we can drop 1/2 rectangle.}{\includegraphics[scale=0.6]{img/saddleback-half-area}}
\caption{Reduce the search rectangle.}
\label{fig:saddleback-drop}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{img/saddleback-mid-line}
\caption{Chose the shorter center line.}
\label{fig:saddleback-centerline}
\end{figure}
If there is no point satisfying $f(p, q) = z$, find a point, such that $f(p, q) < z < f(p + 1, q)$ in the horizontal central line (or $f(p, q) < z < f(p, q + 1)$ for vertical central line). We can't discard all points in row $p$ and column $q$. In summary, we apply binary search along horizontal central line for the point: $f(p, q) \leq z < f(p + 1, q)$; or search the vertical central line for the point: $f(p, q) \leq z < f(p, q + 1)$. If all points in the line segment are $f(p, q) < z$, then return the upper bound; if all are $f(p, q) > z$, then return the lower bound. We discard half side in this case. Below is the improved saddle back search:
\begin{enumerate}
\item Apply binary search along the $x, y$ axes for the search rectangle $(0, m) - (n, 0)$;
\item For rectangle $(a, b) - (c, d)$, if the height > width, apply binary search along the horizontal central line; otherwise search along the vertical central line for the point $(p, q)$;
\item If $f(p, q) = z$, it is a solution. Recursively search rectangles $(a, b) - (p-1, q+1)$ and $(p+1, q-1) - (c, d)$;
\item If $f(p, q) \neq z$, recursively search the two rectangles and a line section, either $(p, q+1) - (p, b)$ in \cref{fig:include-line} (a); or $(p+1, q) - (c, q)$ in \cref{fig:include-line} (b).
\end{enumerate}
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{img/saddleback-mid-line-inc}
\caption{Recursively search the shaded parts, include the bold line if $f(p, q) \neq z$.}
\label{fig:include-line}
\end{figure}
\be
\textit{search}\ (a, b)\ (c, d) = \begin{cases}
c < a\ \text{or}\ d < b: & [\ ] \\
c - a < b - d: & \textit{csearch} \\
\text{otherwise}: & \textit{rsearch} \\
\end{cases}
\ee
Where $csearch$ apply binary search to the horizontal central line for point $(p, q)$, such that $f(p, q) \leq z < f(p+1, q)$, as shown in \cref{fig:include-line} (a). If all function values are greater than $z$, then return the lower bound $(a, \lfloor \dfrac{b + d}{2} \rfloor)$. Drop the above side (include the central line) as shown in \cref{fig:saddleback-edge-cases} (a).
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{img/saddleback-halve}
\caption{Special case.}
\label{fig:saddleback-edge-cases}
\end{figure}
Let
\[
\begin{cases}
q = \lfloor \dfrac{b + d}{2} \rfloor \\
p = \textit{bsearch}\ (x \mapsto f(x, q))\ z\ (a, c) \\
\end{cases}
\]
\be
\resizebox{\textwidth}{!}{\ensuremath{
\textit{csearch} = \begin{cases}
f(p, q) > z: & \textit{search}\ (p, q - 1)\ (c, d) \\
f(p, q) = z: & \textit{search}\ (a, b)\ (p - 1, q + 1) \doubleplus [(p, q)] \doubleplus \textit{search}\ (p + 1, q - 1)\ (c, d) \\
f(p, q) < z: & \textit{search}\ (a, b)\ (p, q + 1) \doubleplus \textit{search}\ (p + 1, q - 1)\ (c, d) \\
\end{cases}
}}
\ee
Function $rsearch$ is symmetric along the vertical central line. Below example program implements the improved saddle back search:
\begin{Haskell}
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (\x -> f x 0) z (0, z)
search f z (a, b) (c, d)
| c < a || b < d = []
| c - a < b - d = let q = (b + d) `div` 2 in
csearch (bsearch (\x -> f x q) z (a, c), q)
| otherwise = let p = (a + c) `div` 2 in
rsearch (p, bsearch (f p) z (d, b))
where
csearch (p, q)
| z < f p q = search f z (p, q - 1) (c, d)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p, q + 1) ++
search f z (p + 1, q - 1) (c, d)
rsearch (p, q)
| z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p - 1, q + 1) ++
search f z (p + 1, q) (c, d)
\end{Haskell}
As we halve the rectangle every time, we search $O(\lg (mn))$ rounds. We apply binary search along the central line for $(p, q)$, compute $f$ for $O(\lg (\min(m, n)))$ times. Let the time be $T(m, n)$ when search $m \times n$ rectangle. We have the following recursive equation:
\be
T(m, n) = \lg(min(m, n)) + 2 T(\dfrac{m}{2}, \dfrac{n}{2})
\ee
Suppose $m = 2^i > n = 2^j$, use telescope method:
\be
\begin{array}{rl}
T(2^i, 2^j) & = j + 2 T(2^{i-1}, 2^{j-1}) \\
& = \displaystyle \sum_{k=0}^{i-1} 2^k(j - k) \\
& = O(2^i(j-i)) \\
& = O(m \lg (n/m))
\end{array}
\ee
Richard Bird proves this is asymptotically optimal by a lower bound of searching a given value in $m \times n$ rectangle \cite{fp-pearls}.
\begin{Exercise}\label{ex:binary-search}
\Question{Prove the performance of $k$-selection problem is $O(n)$ in average.}
\Question{To find the top $k$ element in $A$, we can search $x = \max\ (take\ k\ A), y = \min\ (drop\ k\ A)$. If $x < y$, then the first $k$ elements in $A$ is the answer; otherwise, we partition the first $k$ elements with $x$, partition the rest with $y$, then recursively find in sub-sequence $[a \gets A, x < a < y]$ for the top $k'$ elements, where $k' = k - |[a \gets A, a \leq x]|$. Implement this solution, and evaluate its performance.}
\Question{Find the `simplified' median of two sorted arrays $A$ and $B$ in $O(\lg (m + n))$ time, where $m = |A|, n = |B|$. The array index starts from 0. The simplified median is defined as $median(A, B) = C[\lfloor \dfrac{m + n}{2} \rfloor]$, where $C = merge(A, B)$ is the merged sorted array\footnote{In statistics, the median of an ascending data set $x$ with $n$ elements is defined as:
\[
median(x) = \begin{cases}
odd(n): & x[\frac{n + 1}{2}] \\
even(n): & \dfrac{1}{2}(x[\dfrac{n}{2}] + x[\dfrac{n}{2}+1]) \\
\end{cases}
\]
}.}
\Question{For the saddle back search, eliminate recursion, implement it in loops to update the boundaries.}
\Question{For 2D search, let the bottom-left be the minimum, the top-right be the maximum. if $z$ is less than the minimum or greater than the maximum, then no solution; otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line crossed at the center. then recursive search in these 4 small rectangles. Implement this solution and evaluate its performance.}
\end{Exercise}
\begin{Answer}
\Question{Prove the performance of $k$-selection problem is $O(n)$ in average.
Refer to the performance analysis of quick sort in \cref{sec:quick-sort-big-o}.
}
\Question{To find the top $k$ element in $A$, we can search $x = \max\ (take\ k\ A), y = \min\ (drop\ k\ A)$. If $x < y$, then the first $k$ elements in $A$ is the answer; otherwise, we partition the first $k$ elements with $x$, partition the rest with $y$, then recursively find in sub-sequence $[a \gets A, x < a < y]$ for the top $k'$ elements, where $k' = k - |[a \gets A, a \leq x]|$. Implement this solution, and evaluate its performance.
\begin{algorithmic}[1]
\Procedure{Tops}{$k, A$}
\State $l \gets 1$
\State $u \gets |A|$
\Loop
\State $i \gets$ \Call{Max-At}{$A[l..k]$}
\State $j \gets$ \Call{Min-At}{$A[k+1..u]$}
\If{$A[i] < A[j]$}
\State break
\EndIf
\State \textproc{Exchange} $A[l] \leftrightarrow A[j]$
\State \textproc{Exchange} $A[k+1] \leftrightarrow A[i]$
\State $l \gets$ \Call{Partition}{$A, l, k$}
\State $u \gets$ \Call{Partition}{$A, k+1, u$}
\EndLoop
\EndProcedure
\end{algorithmic}
The performance is $O(n)$ in average. Every loop takes linear time to locate the min $i$, max $j$. Then partitions two rounds in linear time. If the partition is balanced, we discard half elements in average, hence the total time is bound to: $O(n + n/2 + n/4...) = O(n)$.
}
\Question{Find the `simplified' median of two sorted arrays $A$ and $B$ in $O(\lg (m + n))$ time, where $m = |A|, n = |B|$. The array index starts from 0. The simplified median is defined as $median(A, B) = C[\lfloor \dfrac{m + n}{2} \rfloor]$, where $C = merge(A, B)$ is the merged sorted array\footnote{In statistics, the median of an ascending data set $x$ with $n$ elements is defined as:
\[
median(x) = \begin{cases}
odd(n): & x[\frac{n + 1}{2}] \\
even(n): & \dfrac{1}{2}(x[\dfrac{n}{2}] + x[\dfrac{n}{2}+1]) \\
\end{cases}
\]
}.
We give two solutions. The first applies binary search to each array. Let $l = 0$ and $u = m$ be the lower and upper bounds respectively. We guess the median in $A$ is at index $i = \lfloor \dfrac{l + u}{2} \rfloor$. According to the definition of the simplified median, there are total $h = \lfloor \dfrac{m + n}{2} \rfloor$ elements before it. Where there are $i$ elements before $A[i]$ in $A$. If we guess right, then there are $j = h - i$ elements before $A[i]$ in $B$. If $B[j] \leq A[i] \leq B[j + 1]$ holds, then the guessed $A[i]$ is the median. Otherwise, we need update the boundaries $l$ or $u$ if the guess is too big or small. Below example program implements this solution:
\begin{Bourbaki}
K median([K] a, [K] b) {
if a == [] then return b[length(b) / 2]
if b == [] then return a[length(a) / 2]
Int i = medianOf(a, b)
return if i == -1 then return median(b, a) else a[i]
}
Int medianOf([K] a, [K] b) {
Int l = 0, u = length(a)
while l < u {
var i = (l + u) / 2
var j = (length(a) + length(b)) / 2 - i
if j < 1 or j >= len(b) {
if (j == 0 and a[i] <= b[0]) or
(j == len(b) and b[j - 1] <= a[i]) then return i
if j >= len(b) then l = i + 1 else u = i
} else {
if b[j - 1] <= a[i] and a[i] <= b[j] then return i
if a[i] < b[j - 1] then l = i + 1 else u = i
}
}
return -1
}
\end{Bourbaki}
The second solution is to develop a generic function that looks for the $k$-th element. Assume $m \geq n$ (otherwise swap $A$ and $B$), If either array is empty, then return the $k$-th element of the other array. If $k = 1$, then return the smaller one between $A[0]$ and $B[0]$. Otherwise guess $j = min(k/2, n)$ and $i = k - j$, then check $A[i]$ and $B[j]$. If $A[i] < B[j]$, we drop all elements before $A[i]$ and after $B[j]$, then recursively find the $(k - i)$-th element of the remaining; otherwise, drop all before $B[j]$ and after $A[i]$ for recursively finding.
\begin{Bourbaki}
K median([K] xs, [K] ys) {
Int n = length(xs), m = length(ys)
return kth(xs, 0, n, ys, 0, m, (m + n) / 2 + 1)
}
K kth([K] xs, Int x0, Int x1, [K] ys, Int y0, Int y1, Int k) {
if x1 - x0 < y1 - y0 then return kth(ys, y0, y1, xs, x0, x1, k)
if x1 <= x0 then return ys[y0 + k - 1]
if y1 <= y0 then return xs[x0 + k - 1]
if k == 1 then return min(xs[x0], ys[y0])
var j = min(k / 2, y1 - y0), i = k - j
i = x0 + i, j = y0 + j
if xs[i - 1] < ys[j - 1] then
return kth(xs, i, x1, ys, y0, j, k - i + x0)
else
return kth(xs, x0, i, ys, j, y1, k - j + y0)
}
\end{Bourbaki}
We can't define the simplified median as the $m \in A \doubleplus B$, satisfying:
\[
|[y \gets A \doubleplus B, y < m]| - |[y \gets A \doubleplus B, y > m]| = 0, \pm 1
\]
Meaning there are same amount of elements that are greater than $m$ and less than $m$, or they only differ by 1. Consider the counter example: $[0, 1, 2, 3, 3, 3, 3, 3, 5]$. It doesn't work even if we use $\leq$ and $\geq$ instead.
}
\Question{For the saddle back search, eliminate recursion, implement it in loops to update the boundary.
\begin{algorithmic}[1]
\Function{Solve}{$f, z$}
\State $p \gets 0, q \gets z$
\State $S \gets \phi$
\While{$p \leq z$ and $q \geq 0$}
\State $z' \gets f(p, q)$
\If{$z' < z$}
\State $p \gets p + 1$
\ElsIf{$z' > z$}
\State $q \gets q - 1$
\Else
\State $S \gets S \cup \{(p, q)\}$
\State $p \gets p + 1, q \gets q - 1$
\EndIf
\EndWhile
\State \Return $S$
\EndFunction
\end{algorithmic}
}
\Question{For 2D search, let the bottom-left be the minimum, the top-right be the maximum. if $z$ is less than the minimum or greater than the maximum, then no solution; otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line crossed at the center. then recursive search in these 4 small rectangles. Implement this solution and evaluate its performance.
\begin{algorithmic}[1]
\Procedure{Search}{$f, z, a, b, c, d$} \Comment{$(a, b)$: bottom-left $(c, d)$: top-right}
\If {$z \leq f(a, b)$ or $f(c, d) \geq z$}
\If{$z = f(a, b)$}
\State record $(a, b)$ as a solution
\EndIf
\If{$z = f(c, d)$}
\State record $(c, d)$ as a solution
\EndIf
\State \Return
\EndIf
\State $p \gets \lfloor \dfrac{a + c}{2} \rfloor$
\State $q \gets \lfloor \dfrac{b + d}{2} \rfloor$
\State \Call{Search}{$f, z, a, q, p, d$}
\State \Call{Search}{$f, z, p, q, c, d$}
\State \Call{Search}{$f, z, a, b, p, q$}
\State \Call{Search}{$f, z, p, b, c, q$}
\EndProcedure
\end{algorithmic}
Let the time to search in rectangle of area $A$ be $T(A)$. We take $O(1)$ time to check whether $z \leq f(a, b)$ or $f(c, d) \geq z$, then divide into 4 smaller areas, i.e., $T(A) = 4 T(A/4) + O(c)$. We apply the master theorem, the complexity is $O(A) = O(m n)$, which is proportion to the area. It's essentially same as exhaustive search in the rectangle.
}
\end{Answer}
\section{The majority number}
\index{Boyer-Moor majority number}
People often vote and use computer to count the result. Suppose a candidate wins if and only if gets more than half votes. From the votes sequence A, B, A, C, B, B, D, ..., how to find the winner efficiently? We can use a map to count the result (see chapter 2)\footnote{There is a probabilistic sub-linear space counting algorithm published
in 2004, named as `Count-min sketch'\cite{count-min-sketch}.}.
\begin{lstlisting}[language = Bourbaki]
Optional<T> majority([T] xs) {
Map<T, Int> m
for var x in xs {
if x in m then m[x]++ else mx[x] = 0
}
var (r, v) = (Optional<T>.Nothing, length(xs) / 2 - 1)
for var (x, c) in m {
if c > v then (r, v) = (Optional.of(x), c)
}
return r
}
\end{lstlisting}
The map is often implemented as a red-black tree or Hash table. For $m$ candidates, $n$ votes, below table gives the performance:
\btab{|l|l|l|}
\hline
map & time & space \\
\hline
self-balancing tree & $O(n \lg m)$ & $O(m)$ \\
\hline
Hash table & $O(n)$ & at least $O(m)$ \\
\hline
\etab
Define the element occurs over 50\% as the `majority'. Boyer and Moore developed an algorithm in 1980, which picks the majority element in one scan if there is with constant space\cite{boyer-moore-majority}. There is at most 1 majority. Repeat dropping two different elements till all remaining are same. If the majority exists, then it is the remaining. Start from the first vote, let the candidate be the winner so far with point 1. If the next one votes the same candidate, then add the winner point by 1, otherwise -1. The candidate won't be the winner when the point reduces to 0. We pick the candidate of the next vote as the new winner and go on. As shown in below table, if there exists majority $m$, then other candidates can't beat $m$. Otherwise if the majority doesn't exist (invalid vote result, no winner), then discard the recorded `winner'. We need another scan to valid the winner.
\btab{|l|l|l|}
\hline
winner & count & position \\
\hline
A & 1 & {\bf A}, B, C, B, B, C, A, B, A, B, B, D, B \\
A & 0 & A, {\bf B}, C, B, B, C, A, B, A, B, B, D, B \\
C & 1 & A, B, {\bf C}, B, B, C, A, B, A, B, B, D, B \\
C & 0 & A, B, C, {\bf B}, B, C, A, B, A, B, B, D, B \\
B & 1 & A, B, C, B, {\bf B}, C, A, B, A, B, B, D, B \\
B & 0 & A, B, C, B, B, {\bf C}, A, B, A, B, B, D, B \\
A & 1 & A, B, C, B, B, C, {\bf A}, B, A, B, B, D, B \\
A & 0 & A, B, C, B, B, C, A, {\bf B}, A, B, B, D, B \\
A & 1 & A, B, C, B, B, C, A, B, {\bf A}, B, B, D, B \\
A & 0 & A, B, C, B, B, C, A, B, A, {\bf B}, B, D, B \\
B & 1 & A, B, C, B, B, C, A, B, A, B, {\bf B}, D, B \\
B & 0 & A, B, C, B, B, C, A, B, A, B, B, {\bf D}, B \\
B & 1 & A, B, C, B, B, C, A, B, A, B, B, D, {\bf B} \\
\hline
\etab
\be
\begin{array}{rcl}
maj\ [\ ] & = & \nil \\
maj\ (x \cons xs) & = & scan\ (x, 1)\ xs \\
\end{array}
\ee
Where $scan$ is defined as:
\be
\begin{array}{rcl}
scan\ (m, v)\ [\ ] & = & m \\
scan\ (m, v)\ (x \cons xs) & = & \begin{cases}
m = x: & scan\ (m, v + 1)\ xs \\
v = 0: & scan\ (x, 1)\ xs \\
\text{otherwise}: & scan\ (m, v - 1)\ xs \\
\end{cases}
\end{array}
\ee
Or implement with fold (Curried form): $maj = foldr\ f\ (\nil, 0)$, where:
\be
f\ x\ (m, v) = \begin{cases}
x = m: & (m, v + 1) \\
v = 0: & (x, 1) \\
\text{otherwise}: & (m, v - 1) \\
\end{cases}
\ee
Finally, verify the winner is the true majority:
\be
\textit{verify}\ m = \text{if}\ 2|filter\ (= m)\ xs| > |xs|\ \text{then}\ \textit{Just}\ m\ \text{else}\ \textit{Nothing}
\ee
Below is the corresponding iterative implementation:
\begin{algorithmic}[1]
\Function{Majority}{$A$}
\State $c \gets 0, m \gets \nil$
\For{each $a$ in $A$}
\If{$c = 0$}
\State $m \gets a$
\EndIf
\If{$a = m$}
\State $c \gets c + 1$
\Else
\State $c \gets c - 1$
\EndIf
\EndFor
\State $c \gets 0$
\For{each $a$ in $A$}
\If{$a = m$}
\State $c \gets c + 1$
\EndIf
\EndFor
\If{$c > \%50|A|$}
\State \Return \textit{Just} $m$
\Else
\State \Return \textit{Nothing}
\EndIf
\EndFunction
\end{algorithmic}
\begin{Exercise}\label{ex:majority-problem}
\Question{Extend to find $k$ majorities that occurs over $\lfloor n/k \rfloor$ times in collection $A$, where $n = |A|$. Hint: Drop $k$ different elements every time, till the remaining is less than $k$ distinct candidates. Any $k$-majority (the one over $\lfloor n/k \rfloor$) must remain in the end.}
\end{Exercise}
\begin{Answer}[ref = {ex:majority-problem}]
\Question{Extend to find $k$ majorities that occurs over $\lfloor n/k \rfloor$ times in collection $A$, where $n = |A|$.
\vspace{3mm}
We use a dictionary of $Map: T \mapsto Int$, where $T$ is the element type in $A$. It records the net-wins for candidate $a$. Start the dictionary from empty $\nil$. We scan $A$ while updating the dictionary: $foldr\ maj\ \nil\ A$, where $maj$ is defined as:
\be
maj\ a\ m = \begin{cases}
a \in m: & m[a] \gets m[a] + 1 \\
|m| < k: & m[a] \gets 1 \\
\text{otherwise}: & filter\ (b \mapsto m[b] \neq 0)\ \{b \mapsto m[b] - 1 | b \in m\}
\end{cases}
\ee
For every $a$ in $A$, if $a \notin m$ (new to the dictionary), and the candidates in $m$ is less than $k$, we add $a$ to $m$ with one net-win vote: $m[a] \gets 1$; if $a \in m$, add the vote by 1: $m[a] \gets m[a] + 1$; otherwise, if there are already $k$ candidates, we reduce the vote by 1 for every one, and remove the candidate when the vote becomes 0.
We need verify the remaining candidates at last, whether the votes $> n/k$, let $m' = \{(a, 0) | a \in m\}$. Scan $A$ again: $foldr\ cnt\ m'\ A$, where $cnt$ is defined as:
\be
cnt\ a\ m' = \text{if}\ a \in m'\ \text{then}\ m'[a] \gets m'[a] + 1\ \text{else}\ m'
\ee
After scan, $m'$ records the votes for each candidate, we filter the true winners in: $keys\ (filter\ (> n/k)\ m')$.
\begin{Haskell}
majorities k xs = verify $ foldr maj Map.empty xs where
maj x m | x `Map.member` m = Map.adjust (1+) x m
| Map.size m < k = Map.insert x 1 m
| otherwise = Map.filter (/=0) $ Map.map (-1 +) m
verify m = Map.keys $ Map.filter (> th) $ foldr cnt m' xs where
m' = Map.map (const 0) m
cnt x m = if x `Map.member` m then Map.adjust (1+) x m else m
th = (length xs) `div` k
\end{Haskell}
Below is the corresponding iterative implementation:
\begin{algorithmic}[1]
\Function{Maj}{$k, A$}
\State $m \gets \{\}$
\For{each $a$ in $A$}
\If{$a \in m$}
\State $m[a] \gets m[a] + 1$
\ElsIf{$|m| < k$}
\State $m[a] \gets 1$
\Else
\For{each $c$ in $m$}
\State $m[c] \gets m[c] - 1$
\If{$m[c] = 0$}
\State \Call{Remove}{$c, m$}
\EndIf
\EndFor
\EndIf
\EndFor
\For{each $c$ in $m$}
\State $m[c] \gets 0$
\EndFor
\For{each $a$ in $A$} \Comment{verify}
\If{$a \in m$}
\State $m[a] \gets m[a] + 1$
\EndIf
\EndFor
\State $r = [\ ], n \gets |A|$
\For{each $c$ in $m$}
\If{$m[c] > \dfrac{n}{k}$}
\State \Call{Add}{$c, r$}
\EndIf
\EndFor
\State \Return $r$
\EndFunction
\end{algorithmic}
}
\end{Answer}
\section{Maximum sum of sub-vector}
\index{Maximum sum problem}
For vector $V$, define a range $V[i...j]$ as sub-vector, the sum of sub-vector is $S = V[i] + V[i+1] + ... + V[j]$. Empty $[\ ]$ is sub-vector of any vector with sum 0. How to find the maximum sum of a given vector $V$\cite{Bentley}? For example, in vector [3, -13, 19, -12, 1, 9, 18, -16, 15, -15], the sub-vector [19, -12, 1, 9, 18] gives the maximum sum of 35. If all elements are positive, then the max is the total sum. If all are negative, then the empty vector gives the max sum of 0. Below is the exhaustive search implementation:
\begin{algorithmic}[1]
\Function{Max-Sum}{$V$}
\State $m \gets 0, n \gets |V|$
\For{$i \gets 1$ to $n$}
\State $s \gets 0$
\For{$j \gets i$ to $n$}
\State $s \gets s + V[j]$
\State $m \gets $ \Call{Max}{$m, s$}
\EndFor
\EndFor
\State \Return $m$
\EndFunction
\end{algorithmic}
The performance of exhaustive search is $O(n^2)$, where $n$ is the vector length. Similar to majority number algorithm, we scan the vector. For every position $i$, record the sum of sub-vector ends with $i$ as $A$, and the maximum sum so far as $B$. As shown in \cref{fig:max-sum-invariant}. $A$ is not necessarily equal to $B$. We maintain $B \leq A$ always hold. When $B + V[i] > A$, we replace $A$ with this greater value. When $B + V[i] < 0$, we reset $B$ to 0. Below table gives the steps when scan $[3, -13, 19, -12, 1, 9, 18, -16, 15, -15]$.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{img/max-sum}
\caption{$A$: max sum so far; $B$: sum of the sub-vector ends with $i$.}
\label{fig:max-sum-invariant}
\end{figure}
\btab{|l|l|r|}
\hline
max sum & max end at $i$ & yet to scan \\
\hline
0 & 0 & $[3, -13, 19, -12, 1, 9, 18, -16, 15, -15]$ \\
3 & 3 & $[-13, 19, -12, 1, 9, 18, -16, 15, -15]$ \\
3 & 0 & $[19, -12, 1, 9, 18, -16, 15, -15]$ \\
19 & 19 & $[-12, 1, 9, 18, -16, 15, -15]$ \\
19 & 7 & $[1, 9, 18, -16, 15, -15]$ \\
19 & 8 & $[9, 18, -16, 15, -15]$ \\
19 & 17 & $[18, -16, 15, -15]$ \\
35 & 35 & $[-16, 15, -15]$ \\
35 & 19 & $[15, -15]$ \\
35 & 34 & $[-15]$ \\
35 & 19 & $[\ ]$\\
\hline
\etab
\begin{algorithmic}[1]
\Function{Max-Sum}{$V$}
\State $A \gets 0, B \gets 0, n \gets |V|$
\For{$i \gets 1$ to $n$}
\State $B \gets $ \Call{Max}{$B + V[i], 0$}
\State $A \gets $ \Call{Max}{$A, B$}
\EndFor
\State \Return $A$
\EndFunction
\end{algorithmic}
Or implement with fold (Curried form): $S_{max} = fst \circ foldr\ f\ (0, 0)$, where $f$ update the maximum sum so far:
\be
f\ x\ (S_m, S) = (S_m' = \max(S_m, S'), S' = \max(0, x + S))
\ee
\begin{Exercise}\label{ex:max-subsum}
\Question{Modify the solution that finds the max sum of sub-vector, returns the sub-vector of the maximum sum.}
\Question{Bentley gives a divide and conquer algorithm to find the max sum in $O(n \lg n)$ time\cite{Bentley}. Split the vector at middle, recursively find the max sum of the two halves, and the max sum that crosses the middle. Then pick the greatest. Implement this solution.}
\Question{Find the sub-metrics in a $m \times n$ metrics that gives the maximum sum.}
\end{Exercise}
\begin{Answer}[ref = {ex:max-subsum}]
\Question{Modify the solution that finds the max sum of sub-vector, returns the sub-vector of the maximum sum.
If want to return the sub-vector together with the maximum sum, we can maintain two pairs $P_m$ and $P$ during folding, each pair contains the sum and the sub-vector $(S, L)$.
\blre
max_s & = & 1st \circ foldr\ f\ ((0, [\ ]), (0, [\ ])) \\
\text{where}: & & f\ x\ (P_m, (S, L)) = (P_m' = \max(P_m, P'), P' = \max((0, [\ ]), (x + S, x \cons L))) \\
\elre
}
\Question{Bentley gives a divide and conquer algorithm to find the max sum in $O(n \lg n)$ time\cite{Bentley}. Split the vector at middle, recursively find the max sum of the two halves, and the max sum that crosses the middle. Then pick the greatest. Implement this solution.
\begin{algorithmic}[1]
\Function{Max-Sum}{$A$}
\If{$A = \phi$}
\State \Return 0
\ElsIf{$|A| = 1$}
\State \Return \Call{Max}{$0, A[1]$}
\Else
\State $m \gets \lfloor \frac{|A|}{2} \rfloor$
\State $a \gets$ \textproc{Max-From}(\Call{Reverse}{$A[1...m]$})
\State $b \gets$ \Call{Max-From}{$A[m+1...|A|]$}
\State $c \gets$ \Call{Max-Sum}{$A[1...m]$}
\State $d \gets$ \Call{Max-Sum}{$A[m+1...|A|$}
\State \Return \textproc{Max}($a+b, c, d$)
\EndIf
\EndFunction
\Statex
\Function{Max-From}{$A$}
\State $sum \gets 0, m \gets 0$
\For{$i \gets 1$ to $|A|$}
\State $sum \gets sum + A[i]$
\State $m \gets $ \Call{Max}{$m, sum$}
\EndFor
\State \Return $m$
\EndFunction
\end{algorithmic}
Consider the recursive equation: $T(n) = 2T(n/2) + O(n)$, from the master theorem, the performance is $O(n)$.
}
\Question{Find the sub-metrics in a $m \times n$ metrics that gives the maximum sum.
We start from the first row of the metrics, add a row per round $[M[1, *], M[2, *], ..., M[i, *]]$. Then sum the numbers in each column and convert it to a vector:
\[
V = \left [ \sum_{j=1}^i M[1, j], \sum_{j=1}^i M[2, j], ..., \sum_{j=1}^i M[n, j] \right ]
\]
Next use the $maxsum$ to find the max sum in vector $V$, and record the global maximum sum.
\begin{Haskell}
maxSum = maximum . (map maxS) . acc . rows where
rows = init . tails -- exclude the empty row
acc = concatMap (scanl1 (zipWith (+))) -- accumulated sum along columns
maxS = snd . (foldl f (0, 0)) -- max sum in a vector
f (m, s) x = let m' = max (m + x) 0
s' = max m' s in (m', s')
\end{Haskell}
Where \texttt{tails} is defined in \cref{ex:list-tails}, \texttt{zipWith} is defined in \cref{sec:list-zipwith}, \texttt{concatMap} is defined in \cref{sec:list-concatmap}. The $scanl$ is similar to $foldl$, it records the result every time in a list. $scanl1$ is a special case of $scanl$, that the initial value is the first element.
\[
\begin{array}{rcl}
scanl1\ f\ [\ ] & = & [\ ] \\
scanl1\ f\ (x \cons xs) & = & scanl\ f\ x\ xs \\
\end{array}
\]
where
\[
\begin{array}{rcl}
scanl\ f\ q\ [\ ] & = & [q] \\
scanl\ f\ (x \cons xs) & = & q : scanl\ f\ (f\ q\ x)\ xs \\
\end{array}
\]
Below is the corresponding imperative program:
\begin{Bourbaki}
K maxsum2([[K]] m) {
Int n = length(m), k = length(m[0]) // number of row, col
K maxs = 0 // max so far
for i = 0 to n - 1 {
xs = [0] * k
for j = i to n - 1 {
xs = [x + y for (x, y) in zip(xs, m[j])]
maxs = max(maxs, maxsum1(xs))
}
}
return maxs
}
K maxsum1([K] xs) {
K s = 0 // max so far
K m = 0 // max end here
for x in xs {
m = max(m + x, 0)
s = max(m, s)
}
return s
}
\end{Bourbaki}
}
\end{Answer}
\section{String matching}
\index{KMP} \index{Knuth-Morris-Pratt algorithm}
String matching is widely used in editor applications. We can use data structures like radix tree, prefix tree (see chapter 6) to search, or directly match the string as shown in \cref{fig:strstr}\footnote{Some programming environment provide match tool, like \texttt{strstr} in C library, \texttt{find} in C++ library, \texttt{indexOf} in Java library.}. We match a pattern $P$ in text $T$ character by character. At offset $s = 4$, the first 4 are same as shown in \cref{fig:strstr} (a), the 5th is `y' in $P$, but `t' in $T$. We stop here, add $s$ by 1 (move $P$ to right by 1), then restart matching `ananym' and `nantho...'. Actually, we can increase $s$ more than 1. The first two characters `an' happen to be the suffix of `anan'. We can add $s$ by 2 as shown in \cref{fig:strstr} (b). We reuse the information from the 4 matched characters, skip some positions. Knuth, Morris and Pratt develope an efficient matching algorithm from this idea \cite{kmp}, known as `KMP'. the initials of the three authors.
\begin{figure}[htbp]
\centering
\subcaptionbox{The offset $s = 4$, after matching $q=4$ characters, the 5th mismatches.}{\input{img/strstr.tex}} \\
\subcaptionbox{Move $s = 4 + 2 = 6$.}{\input{img/kmp.tex}}
\caption{Match `ananym' in `any ananthous ananym flower'.}
\label{fig:strstr}
\end{figure}
Denote the first $k$ characters of text $T$ as $T_k$ (the $k$-character prefix of $T$). To shift $P$ to the right $s$ steps as far as possible, we need reuse the information of the matched $q$ characters. As shown in \cref{fig:kmp-fallback}, if we can shift $P$ ahead, there exists some $k$, such that the first $k$ characters are same as the last $k$ characters of $P_q$, i.e., the prefix $P_k$ is suffix of $P_q$. Define empty ``'' be both the prefix and suffix of any string, hence the minimum $k = 0$ always exists. We need find the maximum $k$ for the string that is both the prefix and suffix. Define the {\em prefix function} $\pi(q)$, that gives where to fallback when the $(q + 1)$-th character doesn't match\cite{CLRS}.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.45]{img/kmp-fallback}
\caption{$P_k$ is both the prefix and suffix of $P_q$.}
\label{fig:kmp-fallback}
\end{figure}
\be
\pi(q) = \max \{ k | 0 \leq k < q, \text{and}\ P_k\ \text{is suffix of}\ P_q \}
\label{eq:prefix-function}
\ee
When match pattern $P$ against text $T$ from offset $s$, if fails after matching $q$ characters, we next look up $q' = \pi(q)$ to get a fallback position $q'$. Then retry to compare $P[q']$ with the text:
\begin{algorithmic}[1]
\Function{KMP}{$T, P$}
\State $\pi \gets $ \Call{Build-Prefixes}{$P$}
\State $n \gets |T|, m \gets |P|, q \gets 0$
\For{$i \gets 1$ to $n$}
\While{$q > 0$ and $P[q+1] \neq T[i]$}
\State $q \gets \pi(q)$
\EndWhile
\If{$P[q+1] = T[i]$}
\State $q \gets q + 1$
\EndIf
\If{$q = m$}
\State position $i - m$ is a solution
\State $q \gets \pi(q)$ \Comment{search further}
\EndIf
\EndFor
\EndFunction
\end{algorithmic}
The definition \cref{eq:prefix-function} is not practical to build $\pi(q)$. Let us pre-process $P$. If the first character doesn't match, then the longest prefix and suffix is empty: $\pi(1) = 0$, i.e., $P_k = P_0 = $ ``''. When scan the $q$-th character in $P$, the prefix function values $\pi(i)$, $i = 1, 2, ..., q-1$, are already known, and so far, the longest prefix $P_k$ is also the suffix of $P_{q-1}$. As shown in \cref{fig:kmp-prefix-func}, if $P[q] = P[k+1]$, we find a greater $k$, and increase $k$ by 1; otherwise, if $P[q] \neq P[k + 1]$, then lookup $\pi(k)$ and fallback to a shorter prefix $P_{k'}$, where $k' = \pi(k)$. Then compare the next character of this new prefix with the $q$-th character. Repeat this till $k$ becomes zero (empty string), or the $q$-th character matches. Below table gives the prefix function values of `ananym'. The $k$ column is the maximum satisfying \cref{eq:prefix-function}.
\begin{figure}[htbp]