\chapter{Sorting} \index{sorting} \key{Sorting} is a fundamental algorithm design problem. Many efficient algorithms use sorting as a subroutine, because it is often easier to process data if the elements are in a sorted order. For example, the problem ''does the array contain two equal elements?'' is easy to solve using sorting. If the array contains two equal elements, they will be next to each other after sorting, so it is easy to find them. Also the problem ''what is the most frequent element in the array?'' can be solved similarly. There are many algorithms for sorting, and they are also good examples of how to apply different algorithm design techniques. The efficient general sorting algorithms work in $O(n \log n)$ time, and many algorithms that use sorting as a subroutine also have this time complexity. \section{Sorting theory} The basic problem in sorting is as follows: \begin{framed} \noindent Given an array that contains $n$ elements, your task is to sort the elements in increasing order. \end{framed} \noindent For example, the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$2$}; \node at (4.5,0.5) {$9$}; \node at (5.5,0.5) {$2$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$6$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} will be as follows after sorting: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$2$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$3$}; \node at (4.5,0.5) {$5$}; \node at (5.5,0.5) {$6$}; \node at (6.5,0.5) {$8$}; \node at (7.5,0.5) {$9$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \subsubsection{$O(n^2)$ algorithms} \index{bubble sort} Simple algorithms for sorting an array work in $O(n^2)$ time. Such algorithms are short and usually consist of two nested loops. A famous $O(n^2)$ time sorting algorithm is \key{bubble sort} where the elements ''bubble'' in the array according to their values. Bubble sort consists of $n-1$ rounds. On each round, the algorithm iterates through the elements of the array. Whenever two consecutive elements are found that are not in correct order, the algorithm swaps them. The algorithm can be implemented as follows for an array $\texttt{t}[1],\texttt{t}[2],\ldots,\texttt{t}[n]$: \begin{lstlisting} for (int i = 1; i <= n-1; i++) { for (int j = 1; j <= n-i; j++) { if (t[j] > t[j+1]) swap(t[j],t[j+1]); } } \end{lstlisting} After the first round of the algorithm, the largest element will be in the correct position, and in general, after $k$ rounds, the $k$ largest elements will be in the correct positions. Thus, after $n-1$ rounds, the whole array will be sorted. For example, in the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$2$}; \node at (4.5,0.5) {$9$}; \node at (5.5,0.5) {$2$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$6$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \noindent the first round of bubble sort swaps elements as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$9$}; \node at (5.5,0.5) {$2$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$6$}; \draw[thick,<->] (3.5,-0.25) .. controls (3.25,-1.00) and (2.75,-1.00) .. (2.5,-0.25); \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$2$}; \node at (5.5,0.5) {$9$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$6$}; \draw[thick,<->] (5.5,-0.25) .. controls (5.25,-1.00) and (4.75,-1.00) .. (4.5,-0.25); \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$2$}; \node at (5.5,0.5) {$5$}; \node at (6.5,0.5) {$9$}; \node at (7.5,0.5) {$6$}; \draw[thick,<->] (6.5,-0.25) .. controls (6.25,-1.00) and (5.75,-1.00) .. (5.5,-0.25); \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$2$}; \node at (5.5,0.5) {$5$}; \node at (6.5,0.5) {$6$}; \node at (7.5,0.5) {$9$}; \draw[thick,<->] (7.5,-0.25) .. controls (7.25,-1.00) and (6.75,-1.00) .. (6.5,-0.25); \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \subsubsection{Inversions} \index{inversion} Bubble sort is an example of a sorting algorithm that always swaps consecutive elements in the array. It turns out that the time complexity of such an algorithm is \emph{always} at least $O(n^2)$, because in the worst case, $O(n^2)$ swaps are required for sorting the array. A useful concept when analyzing sorting algorithms is an \key{inversion}: a pair of elements $(\texttt{t}[a],\texttt{t}[b])$ in the array such that $a\texttt{t}[b]$, i.e., the elements are in the wrong order. For example, in the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$2$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$6$}; \node at (4.5,0.5) {$3$}; \node at (5.5,0.5) {$5$}; \node at (6.5,0.5) {$9$}; \node at (7.5,0.5) {$8$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} the inversions are $(6,3)$, $(6,5)$ and $(9,8)$. The number of inversions tells us how much work is needed to sort the array. An array is completely sorted when there are no inversions. On the other hand, if the array elements are in the reverse order, the number of inversions is the largest possible: \[1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2)\] Swapping a pair of consecutive elements that are in the wrong order removes exactly one inversion from the array. Hence, if a sorting algorithm can only swap consecutive elements, each swap removes at most one inversion and the time complexity of the algorithm is at least $O(n^2)$. \subsubsection{$O(n \log n)$ algorithms} \index{merge sort} It is possible to sort an array efficiently in $O(n \log n)$ time using algorithms that are not limited to swapping consecutive elements. One such algorithm is \key{mergesort}\footnote{According to \cite{knu98}, mergesort was invented by J. von Neumann in 1945.} that is based on recursion. Mergesort sorts a subarray \texttt{t}$[a,b]$ as follows: \begin{enumerate} \item If $a=b$, do not do anything, because the subarray is already sorted. \item Calculate the position of the middle element: $k=\lfloor (a+b)/2 \rfloor$. \item Recursively sort the subarray \texttt{t}$[a,k]$. \item Recursively sort the subarray \texttt{t}$[k+1,b]$. \item \emph{Merge} the sorted subarrays \texttt{t}$[a,k]$ and \texttt{t}$[k+1,b]$ into a sorted subarray \texttt{t}$[a,b]$. \end{enumerate} Mergesort is an efficient algorithm, because it halves the size of the subarray at each step. The recursion consists of $O(\log n)$ levels, and processing each level takes $O(n)$ time. Merging the subarrays \texttt{t}$[a,k]$ and \texttt{t}$[k+1,b]$ is possible in linear time, because they are already sorted. For example, consider sorting the following array: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$6$}; \node at (3.5,0.5) {$2$}; \node at (4.5,0.5) {$8$}; \node at (5.5,0.5) {$2$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$9$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The array will be divided into two subarrays as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (4,1); \draw (5,0) grid (9,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$6$}; \node at (3.5,0.5) {$2$}; \node at (5.5,0.5) {$8$}; \node at (6.5,0.5) {$2$}; \node at (7.5,0.5) {$5$}; \node at (8.5,0.5) {$9$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (5.5,1.4) {$5$}; \node at (6.5,1.4) {$6$}; \node at (7.5,1.4) {$7$}; \node at (8.5,1.4) {$8$}; \end{tikzpicture} \end{center} Then, the subarrays will be sorted recursively as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (4,1); \draw (5,0) grid (9,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$2$}; \node at (2.5,0.5) {$3$}; \node at (3.5,0.5) {$6$}; \node at (5.5,0.5) {$2$}; \node at (6.5,0.5) {$5$}; \node at (7.5,0.5) {$8$}; \node at (8.5,0.5) {$9$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (5.5,1.4) {$5$}; \node at (6.5,1.4) {$6$}; \node at (7.5,1.4) {$7$}; \node at (8.5,1.4) {$8$}; \end{tikzpicture} \end{center} Finally, the algorithm merges the sorted subarrays and creates the final sorted array: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$2$}; \node at (2.5,0.5) {$2$}; \node at (3.5,0.5) {$3$}; \node at (4.5,0.5) {$5$}; \node at (5.5,0.5) {$6$}; \node at (6.5,0.5) {$8$}; \node at (7.5,0.5) {$9$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \subsubsection{Sorting lower bound} Is it possible to sort an array faster than in $O(n \log n)$ time? It turns out that this is \emph{not} possible when we restrict ourselves to sorting algorithms that are based on comparing array elements. The lower bound for the time complexity can be proved by considering sorting as a process where each comparison of two elements gives more information about the contents of the array. The process creates the following tree: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) rectangle (3,1); \node at (1.5,0.5) {$x < y?$}; \draw[thick,->] (1.5,0) -- (-2.5,-1.5); \draw[thick,->] (1.5,0) -- (5.5,-1.5); \draw (-4,-2.5) rectangle (-1,-1.5); \draw (4,-2.5) rectangle (7,-1.5); \node at (-2.5,-2) {$x < y?$}; \node at (5.5,-2) {$x < y?$}; \draw[thick,->] (-2.5,-2.5) -- (-4.5,-4); \draw[thick,->] (-2.5,-2.5) -- (-0.5,-4); \draw[thick,->] (5.5,-2.5) -- (3.5,-4); \draw[thick,->] (5.5,-2.5) -- (7.5,-4); \draw (-6,-5) rectangle (-3,-4); \draw (-2,-5) rectangle (1,-4); \draw (2,-5) rectangle (5,-4); \draw (6,-5) rectangle (9,-4); \node at (-4.5,-4.5) {$x < y?$}; \node at (-0.5,-4.5) {$x < y?$}; \node at (3.5,-4.5) {$x < y?$}; \node at (7.5,-4.5) {$x < y?$}; \draw[thick,->] (-4.5,-5) -- (-5.5,-6); \draw[thick,->] (-4.5,-5) -- (-3.5,-6); \draw[thick,->] (-0.5,-5) -- (0.5,-6); \draw[thick,->] (-0.5,-5) -- (-1.5,-6); \draw[thick,->] (3.5,-5) -- (2.5,-6); \draw[thick,->] (3.5,-5) -- (4.5,-6); \draw[thick,->] (7.5,-5) -- (6.5,-6); \draw[thick,->] (7.5,-5) -- (8.5,-6); \end{tikzpicture} \end{center} Here ''$x v = {4,2,5,3,5,8,3}; sort(v.begin(),v.end()); \end{lstlisting} After the sorting, the contents of the vector will be $[2,3,3,4,5,5,8]$. The default sorting order is increasing, but a reverse order is possible as follows: \begin{lstlisting} sort(v.rbegin(),v.rend()); \end{lstlisting} An ordinary array can be sorted as follows: \begin{lstlisting} int n = 7; // array size int t[] = {4,2,5,3,5,8,3}; sort(t,t+n); \end{lstlisting} The following code sorts the string \texttt{s}: \begin{lstlisting} string s = "monkey"; sort(s.begin(), s.end()); \end{lstlisting} Sorting a string means that the characters in the string are sorted. For example, the string ''monkey'' becomes ''ekmnoy''. \subsubsection{Comparison operators} \index{comparison operator} The function \texttt{sort} requires that a \key{comparison operator} is defined for the data type of the elements to be sorted. During the sorting, this operator will be used whenever it is needed to find out the order of two elements. Most C++ data types have a built-in comparison operator, and elements of those types can be sorted automatically. For example, numbers are sorted according to their values and strings are sorted in alphabetical order. \index{pair@\texttt{pair}} Pairs (\texttt{pair}) are sorted primarily by their first elements (\texttt{first}). However, if the first elements of two pairs are equal, they are sorted by their second elements (\texttt{second}): \begin{lstlisting} vector> v; v.push_back({1,5}); v.push_back({2,3}); v.push_back({1,2}); sort(v.begin(), v.end()); \end{lstlisting} After this, the order of the pairs is $(1,2)$, $(1,5)$ and $(2,3)$. \index{tuple@\texttt{tuple}} In a similar way, tuples (\texttt{tuple}) are sorted primarily by the first element, secondarily by the second element, etc.: \begin{lstlisting} vector> v; v.push_back(make_tuple(2,1,4)); v.push_back(make_tuple(1,5,3)); v.push_back(make_tuple(2,1,3)); sort(v.begin(), v.end()); \end{lstlisting} After this, the order of the tuples is $(1,5,3)$, $(2,1,3)$ and $(2,1,4)$. \subsubsection{User-defined structs} User-defined structs do not have a comparison operator automatically. The operator should be defined inside the struct as a function \texttt{operator<} whose parameter is another element of the same type. The operator should return \texttt{true} if the element is smaller than the parameter, and \texttt{false} otherwise. For example, the following struct \texttt{P} contains the x and y coordinate of a point. The comparison operator is defined so that the points are sorted primarily by the x coordinate and secondarily by the y coordinate. \begin{lstlisting} struct P { int x, y; bool operator<(const P &p) { if (x != p.x) return x < p.x; else return y < p.y; } }; \end{lstlisting} \subsubsection{Comparison functions} \index{comparison function} It is also possible to give an external \key{comparison function} to the \texttt{sort} function as a callback function. For example, the following comparison function sorts strings primarily by length and secondarily by alphabetical order: \begin{lstlisting} bool cmp(string a, string b) { if (a.size() != b.size()) return a.size() < b.size(); return a < b; } \end{lstlisting} Now a vector of strings can be sorted as follows: \begin{lstlisting} sort(v.begin(), v.end(), cmp); \end{lstlisting} \section{Binary search} \index{binary search} A general method for searching for an element in an array is to use a \texttt{for} loop that iterates through the elements in the array. For example, the following code searches for an element $x$ in the array \texttt{t}: \begin{lstlisting} for (int i = 1; i <= n; i++) { if (t[i] == x) // x found at index i } \end{lstlisting} The time complexity of this approach is $O(n)$, because in the worst case, it is needed to check all elements in the array. If the array may contain any elements, this is also the best possible approach, because there is no additional information available where in the array we should search for the element $x$. However, if the array is \emph{sorted}, the situation is different. In this case it is possible to perform the search much faster, because the order of the elements in the array guides the search. The following \key{binary search} algorithm efficiently searches for an element in a sorted array in $O(\log n)$ time. \subsubsection{Method 1} The traditional way to implement binary search resembles looking for a word in a dictionary. At each step, the search halves the active region in the array, until the target element is found, or it turns out that there is no such element. First, the search checks the middle element of the array. If the middle element is the target element, the search terminates. Otherwise, the search recursively continues to the left or right half of the array, depending on the value of the middle element. The above idea can be implemented as follows: \begin{lstlisting} int a = 1, b = n; while (a <= b) { int k = (a+b)/2; if (t[k] == x) // x found at index k if (t[k] > x) b = k-1; else a = k+1; } \end{lstlisting} The algorithm maintains a range $a \ldots b$ that corresponds to the active region of the array. Initially, the range is $1 \ldots n$, the whole array. The algorithm halves the size of the range at each step, so the time complexity is $O(\log n)$. \subsubsection{Method 2} An alternative method for implementing binary search is based on an efficient way to iterate through the elements in the array. The idea is to make jumps and slow the speed when we get closer to the target element. The search goes through the array from left to right, and the initial jump length is $n/2$. At each step, the jump length will be halved: first $n/4$, then $n/8$, $n/16$, etc., until finally the length is 1. After the jumps, either the target element has been found or we know that it does not appear in the array. The following code implements the above idea: \begin{lstlisting} int k = 1; for (int b = n/2; b >= 1; b /= 2) { while (k+b <= n && t[k+b] <= x) k += b; } if (t[k] == x) // x was found at index k \end{lstlisting} The variables $k$ and $b$ contain the position in the array and the jump length. If the array contains the element $x$, the position of $x$ will be in the variable $k$ after the search. The time complexity of the algorithm is $O(\log n)$, because the code in the \texttt{while} loop is performed at most twice for each jump length. \subsubsection{Finding the smallest solution} In practice, it is seldom needed to implement binary search for searching elements in an array, because we can use the standard library. For example, the C++ functions \texttt{lower\_bound} and \texttt{upper\_bound} implement binary search, and the data structure \texttt{set} maintains a set of elements with $O(\log n)$ time operations. However, an important use for binary search is to find the position where the value of a function changes. Suppose that we wish to find the smallest value $k$ that is a valid solution for a problem. We are given a function $\texttt{ok}(x)$ that returns \texttt{true} if $x$ is a valid solution and \texttt{false} otherwise. In addition, we know that $\texttt{ok}(x)$ is \texttt{false} when $x= 1; b /= 2) { while (!ok(x+b)) x += b; } int k = x+1; \end{lstlisting} The search finds the largest value of $x$ for which $\texttt{ok}(x)$ is \texttt{false}. Thus, the next value $k=x+1$ is the smallest possible value for which $\texttt{ok}(k)$ is \texttt{true}. The initial jump length $z$ has to be large enough, for example some value for which we know beforehand that $\texttt{ok}(z)$ is \texttt{true}. The algorithm calls the function \texttt{ok} $O(\log z)$ times, so the total time complexity depends on the function \texttt{ok}. For example, if the function works in $O(n)$ time, the total time complexity is $O(n \log z)$. \subsubsection{Finding the maximum value} Binary search can also be used to find the maximum value for a function that is first increasing and then decreasing. Our task is to find a value $k$ such that \begin{itemize} \item $f(x)f(x+1)$ when $x \ge k$. \end{itemize} The idea is to use binary search for finding the largest value of $x$ for which $f(x)f(x+2)$. The following code implements the search: \begin{lstlisting} int x = -1; for (int b = z; b >= 1; b /= 2) { while (f(x+b) < f(x+b+1)) x += b; } int k = x+1; \end{lstlisting} Note that unlike in the ordinary binary search, here it is not allowed that consecutive values of the function are equal. In this case it would not be possible to know how to continue the search.