\chapter{Range queries} \index{range query} \index{sum query} \index{minimum query} \index{maximum query} In a \key{range query}, a range of an array is given and we should calculate some value from the elements in the range. Typical range queries are: \begin{itemize} \item \key{sum query}: calculate the sum of elements in range $[a,b]$ \item \key{minimum query}: find the smallest element in range $[a,b]$ \item \key{maximum query}: find the largest element in range $[a,b]$ \end{itemize} For example, in range $[4,7]$ of the following array, the sum is $4+6+1+3=14$, the minimum is 1 and the maximum is 6: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$4$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$3$}; \node at (7.5,0.5) {$4$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} An easy way to answer a range query is to iterate through all the elements in the range. For example, we can answer a sum query as follows: \begin{lstlisting} int sum(int a, int b) { int s = 0; for (int i = a; i <= b; i++) { s += t[i]; } return s; } \end{lstlisting} The above function handles a sum query in $O(n)$ time, which is slow if the array is large and there are a lot of queries. In this chapter we will learn how range queries can be answered much more efficiently. \section{Static array queries} We will first focus on a simple case where the array is \key{static}, i.e., the elements never change between the queries. In this case, it suffices to process the contents of the array beforehand and construct a data structure that can be used for answering any possible range query efficiently. \subsubsection{Sum query} \index{prefix sum array} Sum queries can be answered efficiently by constructing a \key{sum array} that contains the sum of the range $[1,k]$ for each $k=1,2,\ldots,n$. After this, the sum of any range $[a,b]$ of the original array can be calculated in $O(1)$ time using the precalculated sum array. For example, for the array \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} the corresponding sum array is as follows: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$22$}; \node at (5.5,0.5) {$23$}; \node at (6.5,0.5) {$27$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The following code constructs a prefix sum array \texttt{s} from array \texttt{t} in $O(n)$ time: \begin{lstlisting} for (int i = 1; i <= n; i++) { s[i] = s[i-1]+t[i]; } \end{lstlisting} After this, the following function answers a sum query in $O(1)$ time: \begin{lstlisting} int sum(int a, int b) { return s[b]-s[a-1]; } \end{lstlisting} The function calculates the sum of range $[a,b]$ by subtracting the sum of range $[1,a-1]$ from the sum of range $[1,b]$. Thus, only two values from the sum array are needed, and the query takes $O(1)$ time. Note that thanks to the one-based indexing, the function also works when $a=1$ if $\texttt{s}[0]=0$. As an example, consider the range $[4,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The sum of the range $[4,7]$ is $8+6+1+4=19$. This can be calculated from the sum array using the sums $[1,3]$ and $[1,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (2,0) rectangle (3,1); \fill[color=lightgray] (6,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$22$}; \node at (5.5,0.5) {$23$}; \node at (6.5,0.5) {$27$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} Thus, the sum of the range $[4,7]$ is $27-8=19$. We can also generalize the idea of a sum array for a two-dimensional array. In this case, it will be possible to calculate the sum of any rectangular subarray in $O(1)$ time. The sum array will contain sums for all subarrays that begin from the upper-left corner. \begin{samepage} The following picture illustrates the idea: \begin{center} \begin{tikzpicture}[scale=0.55] \draw[fill=lightgray] (3,2) rectangle (7,5); \draw (0,0) grid (10,7); %\draw[line width=2pt] (3,2) rectangle (7,5); \node[anchor=center] at (6.5, 2.5) {$A$}; \node[anchor=center] at (2.5, 2.5) {$B$}; \node[anchor=center] at (6.5, 5.5) {$C$}; \node[anchor=center] at (2.5, 5.5) {$D$}; \end{tikzpicture} \end{center} \end{samepage} The sum inside the gray subarray can be calculated using the formula \[S(A) - S(B) - S(C) + S(D)\] where $S(X)$ denotes the sum in a rectangular subarray from the upper-left corner to the position of letter $X$. \subsubsection{Minimum query} It is also possible to answer a minimum query in $O(1)$ time after preprocessing, though it is more difficult than answer a sum query. Note that minimum and maximum queries can always be implemented using same techniques, so it suffices to focus on the minimum query. The idea is to find the minimum element for each range of size $2^k$ in the array. For example, in the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} the following minima will be calculated: \begin{center} \begin{tabular}{ccc} \begin{tabular}{ccc} range & size & min \\ \hline $[1,1]$ & 1 & 1 \\ $[2,2]$ & 1 & 3 \\ $[3,3]$ & 1 & 4 \\ $[4,4]$ & 1 & 8 \\ $[5,5]$ & 1 & 6 \\ $[6,6]$ & 1 & 1 \\ $[7,7]$ & 1 & 4 \\ $[8,8]$ & 1 & 2 \\ \end{tabular} & \begin{tabular}{ccc} range & size & min \\ \hline $[1,2]$ & 2 & 1 \\ $[2,3]$ & 2 & 3 \\ $[3,4]$ & 2 & 4 \\ $[4,5]$ & 2 & 6 \\ $[5,6]$ & 2 & 1 \\ $[6,7]$ & 2 & 1 \\ $[7,8]$ & 2 & 2 \\ \\ \end{tabular} & \begin{tabular}{ccc} range & size & min \\ \hline $[1,4]$ & 4 & 1 \\ $[2,5]$ & 4 & 3 \\ $[3,6]$ & 4 & 1 \\ $[4,7]$ & 4 & 1 \\ $[5,8]$ & 4 & 1 \\ $[1,8]$ & 8 & 1 \\ \\ \\ \end{tabular} \end{tabular} \end{center} The number of $2^k$ ranges in an array is $O(n \log n)$ because there are $O(\log n)$ ranges that begin from each array index. The minima for all $2^k$ ranges can be calculated in $O(n \log n)$ time because each $2^k$ range consists of two $2^{k-1}$ ranges, so the minima can be calculated recursively. After this, the minimum of any range $[a,b]$c can be calculated in $O(1)$ time as a minimum of two $2^k$ ranges where $k=\lfloor \log_2(b-a+1) \rfloor$. The first range begins from index $a$, and the second range ends to index $b$. The parameter $k$ is so chosen that two $2^k$ ranges cover the range $[a,b]$ entirely. As an example, consider the range $[2,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (1,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The length of the range $[2,7]$ is 6, and $\lfloor \log_2(6) \rfloor = 2$. Thus, the minimum can be calculated from two ranges of length 4. The ranges are $[2,5]$ and $[4,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (1,0) rectangle (5,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The minimum of the range $[2,5]$ is 3, and the minimum of the range $[4,7]$ is 1. Thus, the minimum of the range $[2,7]$ is 1. \section{Binary indexed tree} \index{binary indexed tree} \index{Fenwick tree} A \key{binary indexed tree} or a \key{Fenwick tree} is a data structure that resembles a sum array. The supported operations are answering a sum query for range $[a,b]$, and updating the element at index $k$. The time complexity for both of the operations is $O(\log n)$. Unlike a sum array, a binary indexed tree can be efficiently updated between the sum queries. This would not be possible using a sum array because we should build the whole sum array again in $O(n)$ time after each update. \subsubsection{Structure} A binary indexed tree can be represented as an array where index $k$ contains the sum of a range in the original array that ends to index $k$. The length of the range is the largest power of two that divides $k$. For example, if $k=6$, the length of the range is $2$ because $2$ divides $6$ but $4$ doesn't divide $6$. \begin{samepage} For example, for the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \end{samepage} the corresponding binary indexed tree is as follows: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} For example, the binary indexed tree contains the value 7 at index 6 because the sum of the elements in the range $[5,6]$ of the original array is $6+1=7$. \subsubsection{Sum query} The basic operation in a binary indexed tree is calculating the sum of a range $[1,k]$ where $k$ is any index in the array. The sum of any range can be constructed by combining sums of subranges in the tree. For example, the range $[1,7]$ will be divided into three subranges: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw[fill=lightgray] (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw[fill=lightgray] (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} Thus, the sum of the range $[1,7]$ is $16+7+4=27$. Because of the structure of the binary indexed tree, the length of each subrange inside a range is distinct, so the sum of a range always consists of sums of $O(\log n)$ subranges. Using the same technique that we previously used with a sum array, we can efficiently calculate the sum of any range $[a,b]$ by substracting the sum of the range $[1,a-1]$ from the sum of the range $[1,b]$. The time complexity remains $O(\log n)$ because it suffices to calculate two sums of $[1,k]$ ranges. \subsubsection{Array update} When an element in the original array changes, several sums in the binary indexed tree change. For example, if the value at index 3 changes, the sums of the following ranges change: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw[fill=lightgray] (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw[fill=lightgray] (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} Also in this case, the length of each range is distinct, so $O(\log n)$ ranges will be updated in the binary indexed tree. \subsubsection{Implementation} The operations of a binary indexed tree can be implemented in an elegant and efficient way using bit manipulation. The bit operation needed is $k \& -k$ that returns the last bit one from number $k$. For example, $6 \& -6=2$ because the number $6$ corresponds to 110 and the number $2$ corresponds to 10. It turns out that when calculating a range sum, the index $k$ in the binary indexed tree should be decreased by $k \& -k$ at every step. Correspondingly, when updating the array, the index $k$ should be increased by $k \& -k$ at every step. The following functions assume that the binary indexed tree is stored to array \texttt{b} and it consists of indices $1 \ldots n$. The function \texttt{sum} calculates the sum of the range $[1,k]$: \begin{lstlisting} int sum(int k) { int s = 0; while (k >= 1) { s += b[k]; k -= k&-k; } return s; } \end{lstlisting} The function \texttt{add} increases the value of element $k$ by $x$: \begin{lstlisting} void add(int k, int x) { while (k <= n) { b[k] += x; k += k&-k; } } \end{lstlisting} The time complexity of both above functions is $O(\log n)$ because the functions change $O(\log n)$ values in the binary indexed tree and each move to the next index takes $O(1)$ time using the bit operation. \section{Segmenttipuu} \index{segmenttipuu@segmenttipuu} \key{Segmenttipuu} on tietorakenne, jonka operaatiot ovat taulukon välin $[a,b]$ välikysely sekä kohdan $k$ arvon päivitys. Segmenttipuun avulla voi toteuttaa summakyselyn, minimikyselyn ja monia muitakin kyselyitä niin, että kummankin operaation aikavaativuus on $O(\log n)$. Segmenttipuun etuna binääri-indeksipuuhun verrattuna on, että se on yleisempi tietorakenne. Binääri-indeksipuulla voi toteuttaa vain summakyselyn, mutta segmenttipuu sallii muitakin kyselyitä. Toisaalta segmenttipuu vie enemmän muistia ja on hieman vaikeampi toteuttaa kuin binääri-indeksipuu. \subsubsection{Rakenne} Segmenttipuussa on $2n-1$ solmua niin, että alimmalla tasolla on $n$ solmua, jotka kuvaavat taulukon sisällön, ja ylemmillä tasoilla on välikyselyihin tarvittavaa tietoa. Segmenttipuun sisältö riippuu siitä, mikä välikysely puun tulee toteuttaa. Oletamme aluksi, että välikysely on tuttu summakysely. Esimerkiksi taulukkoa \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$5$}; \node at (1.5,0.5) {$8$}; \node at (2.5,0.5) {$6$}; \node at (3.5,0.5) {$3$}; \node at (4.5,0.5) {$2$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$2$}; \node at (7.5,0.5) {$6$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} vastaa seuraava segmenttipuu: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} Jokaisessa segmenttipuun solmussa on tietoa $2^k$-kokoisesta välistä taulukossa. Tässä tapauksessa solmussa oleva arvo kertoo, mikä on taulukon lukujen summa solmua vastaavalla välillä. Kunkin solmun arvo saadaan laskemalla yhteen solmun alapuolella vasemmalla ja oikealla olevien solmujen arvot. Segmenttipuu on mukavinta rakentaa niin, että taulukon koko on 2:n potenssi, jolloin tuloksena on täydellinen binääripuu. Jatkossa oletamme aina, että taulukko täyttää tämän vaatimuksen. Jos taulukon koko ei ole 2:n potenssi, sen loppuun voi lisätä tyhjää niin, että koosta tulee 2:n potenssi. \subsubsection{Välikysely} Segmenttipuussa vastaus välikyselyyn lasketaan väliin kuuluvista solmuista, jotka ovat mahdollisimman korkealla puussa. Jokainen solmu antaa vastauksen väliin kuuluvalle osavälille, ja vastaus kyselyyn selviää yhdistämällä segmenttipuusta saadut osavälejä koskeva tiedot. Tarkastellaan esimerkiksi seuraavaa taulukon väliä: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=gray!50] (2,0) rectangle (8,1); \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} Lukujen summa välillä $[3,8]$ on $6+3+2+7+2+6=26$. Segmenttipuusta summa saadaan laskettua seuraavien osasummien avulla: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,fill=gray!50,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,fill=gray!50] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} Taulukon välin summaksi tulee osasummista $9+17=26$. Kun vastaus välikyselyyn lasketaan mahdollisimman korkealla segmenttipuussa olevista solmuista, väliin kuuluu enintään kaksi solmua jokaiselta segmenttipuun tasolta. Tämän ansiosta välikyselyssä tarvittavien solmujen yhteismäärä on vain $O(\log n)$. \subsubsection{Taulukon päivitys} Kun taulukossa oleva arvo muuttuu, segmenttipuussa täytyy päivittää kaikkia solmuja, joiden arvo riippuu muutetusta taulukon kohdasta. Tämä tapahtuu kulkemalla puuta ylöspäin huipulle asti ja tekemällä muutokset. \begin{samepage} Seuraava kuva näyttää, mitkä solmut segmenttipuussa muuttuvat, jos taulukon luku 7 muuttuu. \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=gray!50] (5,0) rectangle (6,1); \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt,fill=gray!50] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,fill=gray!50] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle,fill=gray!50] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} \end{samepage} Polku segmenttipuun pohjalta huipulle muodostuu aina $O(\log n)$ solmusta, joten taulukon arvon muuttuminen vaikuttaa $O(\log n)$ solmuun puussa. \subsubsection{Puun tallennus} Segmenttipuun voi tallentaa muistiin $2N$ alkion taulukkona, jossa $N$ on riittävän suuri 2:n potenssi. Tällaisen segmenttipuun avulla voi ylläpitää taulukkoa, jonka indeksialue on $[0,N-1]$. Segmenttipuun taulukon kohdassa 1 on puun ylimmän solmun arvo, kohdat 2 ja 3 sisältävät seuraavan tason solmujen arvot, jne. Segmenttipuun alin taso eli varsinainen taulukon sisältä tallennetaan kohdasta $N$ alkaen. Niinpä taulukon kohdassa $k$ oleva alkio on segmenttipuun taulukossa kohdassa $k+N$. Esimerkiksi segmenttipuun \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} voi tallentaa taulukkoon seuraavasti ($N=8$): \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (15,1); \node at (0.5,0.5) {$39$}; \node at (1.5,0.5) {$22$}; \node at (2.5,0.5) {$17$}; \node at (3.5,0.5) {$13$}; \node at (4.5,0.5) {$9$}; \node at (5.5,0.5) {$9$}; \node at (6.5,0.5) {$8$}; \node at (7.5,0.5) {$5$}; \node at (8.5,0.5) {$8$}; \node at (9.5,0.5) {$6$}; \node at (10.5,0.5) {$3$}; \node at (11.5,0.5) {$2$}; \node at (12.5,0.5) {$7$}; \node at (13.5,0.5) {$2$}; \node at (14.5,0.5) {$6$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \node at (8.5,1.4) {$9$}; \node at (9.5,1.4) {$10$}; \node at (10.5,1.4) {$11$}; \node at (11.5,1.4) {$12$}; \node at (12.5,1.4) {$13$}; \node at (13.5,1.4) {$14$}; \node at (14.5,1.4) {$15$}; \end{tikzpicture} \end{center} Tätä tallennustapaa käyttäen kohdassa $k$ olevalle solmulle pätee, että \begin{itemize} \item ylempi solmu on kohdassa $\lfloor k/2 \rfloor$, \item vasen alempi solmu on kohdassa $2k$ ja \item oikea alempi solmu on kohdassa $2k+1$. \end{itemize} Huomaa, että tämän seurauksena solmun kohta on parillinen, jos se on vasemmalla ylemmästä solmusta katsoen, ja pariton, jos se on oikealla. \subsubsection{Toteutus} Tarkastellaan seuraavaksi välikyselyn ja päivityksen toteutusta segmenttipuuhun. Seuraavat funktiot olettavat, että segmenttipuu on tallennettu $2n-1$-kokoi\-seen taulukkoon $\texttt{p}$ edellä kuvatulla tavalla. Funktio \texttt{summa} laskee summan välillä $a \ldots b$: \begin{lstlisting} int summa(int a, int b) { a += N; b += N; int s = 0; while (a <= b) { if (a%2 == 1) s += p[a++]; if (b%2 == 0) s += p[b--]; a /= 2; b /= 2; } return s; } \end{lstlisting} Funktio aloittaa summan laskeminen segmenttipuun pohjalta ja liikkuu askel kerrallaan ylemmille tasoille. Funktio laskee välin summan muuttujaan $s$ yhdistämällä puussa olevia osasummia. Välin reunalla oleva osasumma lisätään summaan aina silloin, kun se ei kuulu ylemmän tason osasummaan. Funktio \texttt{lisaa} kasvattaa kohdan $k$ arvoa $x$:llä: \begin{lstlisting} void lisaa(int k, int x) { k += N; p[k] += x; for (k /= 2; k >= 1; k /= 2) { p[k] = p[2*k]+p[2*k+1]; } } \end{lstlisting} Ensin funktio tekee muutoksen puun alimmalle tasolle taulukkoon. Tämän jälkeen se päivittää kaikki osasummat puun huipulle asti. Taulukon \texttt{p} indeksoinnin ansiosta kohdasta $k$ alemmalla tasolla ovat kohdat $2k$ ja $2k+1$. Molemmat segmenttipuun operaatiot toimivat ajassa $O(\log n)$, koska $n$ lukua sisältävässä segmenttipuussa on $O(\log n)$ tasoa ja operaatiot siirtyvät askel kerrallaan segmenttipuun tasoja ylöspäin. \subsubsection{Muut kyselyt} Segmenttipuu mahdollistaa summan lisäksi minkä tahansa välikyselyn, jossa vierekkäisten välien $[a,b]$ ja $[b+1,c]$ tuloksista pystyy laskemaan tehokkaasti välin $[a,c]$ tuloksen. Tällaisia kyselyitä ovat esimerkiksi minimi ja maksimi, suurin yhteinen tekijä sekä bittioperaatiot and, or ja xor. \begin{samepage} Esimerkiksi seuraavan segmenttipuun avulla voi laskea taulukon välien minimejä: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {1}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} \end{samepage} Tässä segmenttipuussa jokainen puun solmu kertoo, mikä on pienin luku sen alapuolella olevassa taulukon osassa. Segmenttipuun ylin luku on pienin luku koko taulukon alueella. Puun toteutus on samanlainen kuin summan laskemisessa, mutta joka kohdassa pitää laskea summan sijasta lukujen minimi. \subsubsection{Binäärihaku puussa} Segmenttipuun sisältämää tietoa voi käyttää binäärihaun kaltaisesti aloittamalla haun puun huipulta. Näin on mahdollista selvittää esimerkiksi minimisegmenttipuusta $O(\log n)$-ajassa, missä kohdassa on taulukon pienin luku. Esimerkiksi seuraavassa puussa pienin alkio on 1, jonka sijainti löytyy kulkemalla puussa huipulta alaspäin: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (8,0) grid (16,1); \node[anchor=center] at (8.5, 0.5) {9}; \node[anchor=center] at (9.5, 0.5) {5}; \node[anchor=center] at (10.5, 0.5) {7}; \node[anchor=center] at (11.5, 0.5) {1}; \node[anchor=center] at (12.5, 0.5) {6}; \node[anchor=center] at (13.5, 0.5) {2}; \node[anchor=center] at (14.5, 0.5) {3}; \node[anchor=center] at (15.5, 0.5) {2}; %\node[anchor=center] at (1,2.5) {13}; \node[draw, circle,minimum size=22pt] (e) at (9,2.5) {5}; \path[draw,thick,-] (e) -- (8.5,1); \path[draw,thick,-] (e) -- (9.5,1); \node[draw, circle,minimum size=22pt] (f) at (11,2.5) {1}; \path[draw,thick,-] (f) -- (10.5,1); \path[draw,thick,-] (f) -- (11.5,1); \node[draw, circle,minimum size=22pt] (g) at (13,2.5) {2}; \path[draw,thick,-] (g) -- (12.5,1); \path[draw,thick,-] (g) -- (13.5,1); \node[draw, circle,minimum size=22pt] (h) at (15,2.5) {2}; \path[draw,thick,-] (h) -- (14.5,1); \path[draw,thick,-] (h) -- (15.5,1); \node[draw, circle,minimum size=22pt] (k) at (10,4.5) {1}; \path[draw,thick,-] (k) -- (e); \path[draw,thick,-] (k) -- (f); \node[draw, circle,minimum size=22pt] (l) at (14,4.5) {2}; \path[draw,thick,-] (l) -- (g); \path[draw,thick,-] (l) -- (h); \node[draw, circle,minimum size=22pt] (n) at (12,6.5) {1}; \path[draw,thick,-] (n) -- (k); \path[draw,thick,-] (n) -- (l); \path[draw=red,thick,->,line width=2pt] (n) -- (k); \path[draw=red,thick,->,line width=2pt] (k) -- (f); \path[draw=red,thick,->,line width=2pt] (f) -- (11.5,1); \end{tikzpicture} \end{center} \section{Lisätekniikoita} \subsubsection{Indeksien pakkaus} Taulukon päälle rakennettujen tietorakenteiden rajoituksena on, että alkiot on indeksoitu kokonaisluvuin $1,2,3,$ jne. Tästä seuraa ongelmia, jos tarvittavat indeksit ovat suuria. Esimerkiksi indeksin $10^9$ käyttäminen vaatisi, että taulukossa olisi $10^9$ alkiota, mikä ei ole realistista. \index{indeksien pakkaus@indeksien pakkaus} Tätä rajoitusta on kuitenkin mahdollista kiertää usein käyttämällä \key{indeksien pakkausta}, jolloin indeksit jaetaan uudestaan niin, että ne ovat kokonaisluvut $1,2,3,$ jne. Tämä on mahdollista silloin, kun kaikki algoritmin aikana tarvittavat indeksit ovat tiedossa algoritmin alussa. Ideana on korvata jokainen alkuperäinen indeksi $x$ indeksillä $p(x)$, missä $p$ jakaa indeksit uudestaan. Vaatimuksena on, että indeksien järjestys ei muutu, eli jos $a