\chapter{Square root algorithms} \index{square root algorithm} A \key{square root algorithm} is an algorithm that has a square root in its time complexity. A square root can be seen as a ''poor man's logarithm'': the complexity $O(\sqrt n)$ is better than $O(n)$ but worse than $O(\log n)$. In any case, many square root algorithms are fast in practice and have small constant factors. As an example, let us consider the problem of creating a data structure that supports two operations on an array: modifying an element at a given position and calculating the sum of elements in the given range. We have previously solved the problem using a binary indexed tree and segment tree, that support both operations in $O(\log n)$ time. However, now we will solve the problem in another way using a square root structure that allows us to modify elements in $O(1)$ time and calculate sums in $O(\sqrt n)$ time. The idea is to divide the array into blocks of size $\sqrt n$ so that each block contains the sum of elements inside the block. For example, an array of 16 elements will be divided into blocks of 4 elements as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (16,1); \draw (0,1) rectangle (4,2); \draw (4,1) rectangle (8,2); \draw (8,1) rectangle (12,2); \draw (12,1) rectangle (16,2); \node at (0.5, 0.5) {5}; \node at (1.5, 0.5) {8}; \node at (2.5, 0.5) {6}; \node at (3.5, 0.5) {3}; \node at (4.5, 0.5) {2}; \node at (5.5, 0.5) {7}; \node at (6.5, 0.5) {2}; \node at (7.5, 0.5) {6}; \node at (8.5, 0.5) {7}; \node at (9.5, 0.5) {1}; \node at (10.5, 0.5) {7}; \node at (11.5, 0.5) {5}; \node at (12.5, 0.5) {6}; \node at (13.5, 0.5) {2}; \node at (14.5, 0.5) {3}; \node at (15.5, 0.5) {2}; \node at (2, 1.5) {21}; \node at (6, 1.5) {17}; \node at (10, 1.5) {20}; \node at (14, 1.5) {13}; \end{tikzpicture} \end{center} Using this structure, it is easy to modify the array, because it is only needed to update the sum of a single block after each modification, which can be done in $O(1)$ time. For example, the following picture shows how the value of an element and the sum of the corresponding block change: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (5,0) rectangle (6,1); \draw (0,0) grid (16,1); \fill[color=lightgray] (4,1) rectangle (8,2); \draw (0,1) rectangle (4,2); \draw (4,1) rectangle (8,2); \draw (8,1) rectangle (12,2); \draw (12,1) rectangle (16,2); \node at (0.5, 0.5) {5}; \node at (1.5, 0.5) {8}; \node at (2.5, 0.5) {6}; \node at (3.5, 0.5) {3}; \node at (4.5, 0.5) {2}; \node at (5.5, 0.5) {5}; \node at (6.5, 0.5) {2}; \node at (7.5, 0.5) {6}; \node at (8.5, 0.5) {7}; \node at (9.5, 0.5) {1}; \node at (10.5, 0.5) {7}; \node at (11.5, 0.5) {5}; \node at (12.5, 0.5) {6}; \node at (13.5, 0.5) {2}; \node at (14.5, 0.5) {3}; \node at (15.5, 0.5) {2}; \node at (2, 1.5) {21}; \node at (6, 1.5) {15}; \node at (10, 1.5) {20}; \node at (14, 1.5) {13}; \end{tikzpicture} \end{center} Calculating the sum of elements in a range is a bit more difficult. It turns out that we can always divide the range into three parts such that the sum consists of values of single elements and sums of blocks between them: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (4,1); \fill[color=lightgray] (12,0) rectangle (13,1); \fill[color=lightgray] (13,0) rectangle (14,1); \draw (0,0) grid (16,1); \fill[color=lightgray] (4,1) rectangle (8,2); \fill[color=lightgray] (8,1) rectangle (12,2); \draw (0,1) rectangle (4,2); \draw (4,1) rectangle (8,2); \draw (8,1) rectangle (12,2); \draw (12,1) rectangle (16,2); \node at (0.5, 0.5) {5}; \node at (1.5, 0.5) {8}; \node at (2.5, 0.5) {6}; \node at (3.5, 0.5) {3}; \node at (4.5, 0.5) {2}; \node at (5.5, 0.5) {5}; \node at (6.5, 0.5) {2}; \node at (7.5, 0.5) {6}; \node at (8.5, 0.5) {7}; \node at (9.5, 0.5) {1}; \node at (10.5, 0.5) {7}; \node at (11.5, 0.5) {5}; \node at (12.5, 0.5) {6}; \node at (13.5, 0.5) {2}; \node at (14.5, 0.5) {3}; \node at (15.5, 0.5) {2}; \node at (2, 1.5) {21}; \node at (6, 1.5) {15}; \node at (10, 1.5) {20}; \node at (14, 1.5) {13}; \draw [decoration={brace}, decorate, line width=0.5mm] (14,-0.25) -- (3,-0.25); \end{tikzpicture} \end{center} Since the number of single elements is $O(\sqrt n)$ and also the number of blocks is $O(\sqrt n)$, the time complexity of the sum query is $O(\sqrt n)$. Thus, the parameter $\sqrt n$ balances two things: the array is divided into $\sqrt n$ blocks, each of which contains $\sqrt n$ elements. In practice, it is not needed to use the exact value of $\sqrt n$ as a parameter, but it may be better to use parameters $k$ and $n/k$ where $k$ is different from $\sqrt n$. The optimal parameter depends on the problem and input. For example, if an algorithm often goes through the blocks but rarely inspects single elements inside the blocks, it may be a good idea to divide the array into $k < \sqrt n$ blocks, each of which contains $n/k > \sqrt n$ elements. \section{Batch processing} \index{batch processing} Sometimes the operations of an algorithm can be divided into batches, each of which can be processed separately. Some precalculation is done between the batches in order to process the future operations more efficiently. If there are $O(\sqrt n)$ batches of size $O(\sqrt n)$, this results in a square root algorithm. As an example, let us consider a problem where a grid of size $k \times k$ initially consists of white squares and our task is to perform $n$ operations, each of which is one of the following: \begin{itemize} \item paint square $(y,x)$ black \item find the nearest black square to square $(y,x)$ where the distance between squares $(y_1,x_1)$ and $(y_2,x_2)$ is $|y_1-y_2|+|x_1-x_2|$ \end{itemize} We can solve the problem by dividing the operations into $O(\sqrt n)$ batches, each of which consists of $O(\sqrt n)$ operations. At the beginning of each batch, we calculate for each square of the grid the smallest distance to a black square. This can be done in $O(k^2)$ time using breadth-first search. When processing a batch, we maintain a list of squares that have been painted black in the current batch. The list contains $O(\sqrt n)$ elements, because there are $O(\sqrt n)$ operations in each batch. Now, the distance from a square to the nearest black square is either the precalculated distance or the distance to a square that appears in the list. The algorithm works in $O((k^2+n) \sqrt n)$ time. First, there are $O(\sqrt n)$ breadth-first searches and each search takes $O(k^2)$ time. Second, the total number of squares processed during the algorithm is $O(n)$, and at each square, we go through a list of $O(\sqrt n)$ squares. If the algorithm would perform a breadth-first search at each operation, the time complexity would be $O(k^2 n)$. And if the algorithm would go through all painted squares at each operation, the time complexity would be $O(n^2)$. Thus, the time complexity of the square root algorithm is a combination of these time complexities, but in addition, a factor of $n$ is replaced by $\sqrt n$. \section{Subalgorithms} Some square root algorithms consists of subalgorithms that are specialized for different input parameters. Typically, there are two subalgorithms: one algorithm is efficient when some parameter is smaller than $\sqrt n$, and another algorithm is efficient when the parameter is larger than $\sqrt n$. As an example, let us consider a problem where we are given a tree of $n$ nodes, each with some color. Our task is to find two nodes that have the same color and whose distance is as large as possible. For example, in the following tree, the maximum distance is 4 between the red nodes 3 and 4: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle, fill=green!40] (1) at (1,3) {$2$}; \node[draw, circle, fill=red!40] (2) at (4,3) {$3$}; \node[draw, circle, fill=red!40] (3) at (1,1) {$5$}; \node[draw, circle, fill=blue!40] (4) at (4,1) {$6$}; \node[draw, circle, fill=red!40] (5) at (-2,1) {$4$}; \node[draw, circle, fill=blue!40] (6) at (-2,3) {$1$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (3) -- (6); \path[draw,thick,-] (5) -- (6); \end{tikzpicture} \end{center} The problem can be solved by going through all colors and calculating the maximum distance between two nodes separately for each color. Assume that the current color is $x$ and there are $c$ nodes whose color is $x$. There are two subalgorithms that are specialized for small and large values of $c$: \emph{Case 1}: $c \le \sqrt n$. If the number of nodes is small, we go through all pairs of nodes whose color is $x$ and select the pair that has the maximum distance. For each node, it is needed to calculate the distance to $O(\sqrt n)$ other nodes (see Chapter 18.3), so the total time needed for processing all nodes is $O(n \sqrt n)$. \emph{Case 2}: $c > \sqrt n$. If the number of nodes is large, we go through the whole tree and calculate the maximum distance between two nodes with color $x$. The time complexity of the tree traversal is $O(n)$, and this will be done at most $O(\sqrt n)$ times, so the total time needed is $O(n \sqrt n)$. The time complexity of the algorithm is $O(n \sqrt n)$, because both cases take a total of $O(n \sqrt n)$ time. \section{Mo's algorithm} \index{Mo's algorithm} \key{Mo's algorithm}\footnote{According to \cite{cod15}, this algorithm is named after Mo Tao, a Chinese competitive programmer, but the technique has appeared earlier in the literature.} can be used in many problems that require processing range queries in a \emph{static} array. Before processing the queries, the algorithm sorts them in a special order which guarantees that the algorithm works efficiently. At each moment in the algorithm, there is an active range and the algorithm maintains the answer to a query related to that range. The algorithm processes the queries one by one, and always moves the endpoints of the active range by inserting and removing elements. The time complexity of the algorithm is $O(n \sqrt n f(n))$ when the array contains $n$ elements, there are $n$ queries and each insertion and removal of an element takes $O(f(n))$ time. The trick in Mo's algorithm is the order in which the queries are processed: The array is divided into blocks of $O(\sqrt n)$ elements, and the queries are sorted primarily by the number of the block that contains the first element in the range, and secondarily by the position of the last element in the range. It turns out that using this order, the algorithm only performs $O(n \sqrt n)$ operations, because the left endpoint of the range moves $n$ times $O(\sqrt n)$ steps, and the right endpoint of the range moves $\sqrt n$ times $O(n)$ steps. Thus, both endpoints move a total of $O(n \sqrt n)$ steps during the algorithm. \subsubsection*{Example} As an example, consider a problem where we are given a set of queries, each of them corresponding to a range in an array, and our task is to calculate for each query the number of \emph{distinct} elements in the range. In Mo's algorithm, the queries are always sorted in the same way, but it depends on the problem how the answer to the query is maintained. In this problem, we can maintain an array \texttt{c} where $\texttt{c}[x]$ indicates the number of times an element $x$ occurs in the active range. When we move from one query to another query, the active range changes. For example, if the current range is \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (1,0) rectangle (5,1); \draw (0,0) grid (9,1); \node at (0.5, 0.5) {4}; \node at (1.5, 0.5) {2}; \node at (2.5, 0.5) {5}; \node at (3.5, 0.5) {4}; \node at (4.5, 0.5) {2}; \node at (5.5, 0.5) {4}; \node at (6.5, 0.5) {3}; \node at (7.5, 0.5) {3}; \node at (8.5, 0.5) {4}; \end{tikzpicture} \end{center} and the next range is \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (2,0) rectangle (7,1); \draw (0,0) grid (9,1); \node at (0.5, 0.5) {4}; \node at (1.5, 0.5) {2}; \node at (2.5, 0.5) {5}; \node at (3.5, 0.5) {4}; \node at (4.5, 0.5) {2}; \node at (5.5, 0.5) {4}; \node at (6.5, 0.5) {3}; \node at (7.5, 0.5) {3}; \node at (8.5, 0.5) {4}; \end{tikzpicture} \end{center} there will be three steps: the left endpoint moves one step to the right, and the right endpoint moves two steps to the right. After each step, the array \texttt{c} needs to be updated. After adding an element $x$, we increase the value of $\texttt{c}[x]$ by one, and if $\texttt{c}[x]=1$ after this, we also increase the answer to the query by one. Similarly, after removing an element $x$, we decrease the value of $\texttt{c}[x]$ by one, and if $\texttt{c}[x]=0$ after this, we also decrease the answer to the query by one. In this problem, the time needed to perform each step is $O(1)$, so the total time complexity of the algorithm is $O(n \sqrt n)$.