\chapter{Range queries} \index{range query} \index{sum query} \index{minimum query} \index{maximum query} A \key{range query} asks to calculate some information about the elements in a given range of an array. Typical range queries are: \begin{itemize} \item \key{sum query}: calculate the sum of elements in a range \item \key{minimum query}: find the smallest element in a range \item \key{maximum query}: find the largest element in a range \end{itemize} For example, consider the range $[4,7]$ in the following array: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$4$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$3$}; \node at (7.5,0.5) {$4$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} In this range, the sum of elements is $4+6+1+3=16$, the minimum element is 1 and the maximum element is 6. An easy way to process range queries is to go through all the elements in the range. For example, we can calculate the sum in a range $[a,b]$ as follows: \begin{lstlisting} int sum(int a, int b) { int s = 0; for (int i = a; i <= b; i++) { s += t[i]; } return s; } \end{lstlisting} The above function works in $O(n)$ time. However, if the array is large and there are several queries, such an approach is slow. In this chapter, we will learn how range queries can be processed much more efficiently. \section{Static array queries} We first focus on a simple situation where the array is \key{static}, i.e., the elements never change between the queries. In this case, it suffices to preprocess the array and construct a data structure that can be used for finding the answer for any possible range query efficiently. \subsubsection{Sum query} \index{prefix sum array} Sum queries can be processed efficiently by constructing a \key{sum array} that contains the sum of elements in the range $[1,k]$ for each $k=1,2,\ldots,n$. Using the sum array, the sum of elements in any range $[a,b]$ of the original array can be calculated in $O(1)$ time. For example, for the array \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} the corresponding sum array is as follows: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$22$}; \node at (5.5,0.5) {$23$}; \node at (6.5,0.5) {$27$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The following code constructs a sum array \texttt{s} for an array \texttt{t} in $O(n)$ time: \begin{lstlisting} for (int i = 1; i <= n; i++) { s[i] = s[i-1]+t[i]; } \end{lstlisting} After this, the following function processes any sum query in $O(1)$ time: \begin{lstlisting} int sum(int a, int b) { return s[b]-s[a-1]; } \end{lstlisting} The function calculates the sum in the range $[a,b]$ by subtracting the sum in the range $[1,a-1]$ from the sum in the range $[1,b]$. Thus, only two values of the sum array are needed, and the query takes $O(1)$ time. Note that because of the one-based indexing, the function also works when $a=1$ if $\texttt{s}[0]=0$. As an example, consider the range $[4,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The sum in the range is $8+6+1+4=19$. This can be calculated using the precalculated sums for the ranges $[1,3]$ and $[1,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (2,0) rectangle (3,1); \fill[color=lightgray] (6,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$8$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$22$}; \node at (5.5,0.5) {$23$}; \node at (6.5,0.5) {$27$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} Thus, the sum in the range $[4,7]$ is $27-8=19$. It is also possible to generalize this idea to higher dimensions. For example, we can construct a two-dimensional sum array that can be used for calculating the sum of any rectangular subarray in $O(1)$ time. Each value in such an array is the sum of a subarray that begins at the upper-left corner of the array. \begin{samepage} The following picture illustrates the idea: \begin{center} \begin{tikzpicture}[scale=0.54] \draw[fill=lightgray] (3,2) rectangle (7,5); \draw (0,0) grid (10,7); %\draw[line width=2pt] (3,2) rectangle (7,5); \node[anchor=center] at (6.5, 2.5) {$A$}; \node[anchor=center] at (2.5, 2.5) {$B$}; \node[anchor=center] at (6.5, 5.5) {$C$}; \node[anchor=center] at (2.5, 5.5) {$D$}; \end{tikzpicture} \end{center} \end{samepage} The sum of the gray subarray can be calculated using the formula \[S(A) - S(B) - S(C) + S(D),\] where $S(X)$ denotes the sum of a rectangular subarray from the upper-left corner to the position of $X$. \subsubsection{Minimum query} It is also possible to process minimum queries in $O(1)$ time after preprocessing, though it is more difficult than processing sum queries. Note that minimum and maximum queries can always be implemented using same techniques, so it suffices to focus on minimum queries. The idea is to precalculate the minimum element of each range of size $2^k$ in the array. For example, in the array \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} the following minima will be calculated: \begin{center} \begin{tabular}{ccc} \begin{tabular}{ccc} range & size & min \\ \hline $[1,1]$ & 1 & 1 \\ $[2,2]$ & 1 & 3 \\ $[3,3]$ & 1 & 4 \\ $[4,4]$ & 1 & 8 \\ $[5,5]$ & 1 & 6 \\ $[6,6]$ & 1 & 1 \\ $[7,7]$ & 1 & 4 \\ $[8,8]$ & 1 & 2 \\ \end{tabular} & \begin{tabular}{ccc} range & size & min \\ \hline $[1,2]$ & 2 & 1 \\ $[2,3]$ & 2 & 3 \\ $[3,4]$ & 2 & 4 \\ $[4,5]$ & 2 & 6 \\ $[5,6]$ & 2 & 1 \\ $[6,7]$ & 2 & 1 \\ $[7,8]$ & 2 & 2 \\ \\ \end{tabular} & \begin{tabular}{ccc} range & size & min \\ \hline $[1,4]$ & 4 & 1 \\ $[2,5]$ & 4 & 3 \\ $[3,6]$ & 4 & 1 \\ $[4,7]$ & 4 & 1 \\ $[5,8]$ & 4 & 1 \\ $[1,8]$ & 8 & 1 \\ \\ \\ \end{tabular} \end{tabular} \end{center} There are $O(n \log n)$ ranges of size $2^k$, because for each array position, there are $O(\log n)$ ranges that begin at that position. The minima in all ranges of size $2^k$ can be calculated in $O(n \log n)$ time, because each range of size $2^k$ consists of two ranges of size $2^{k-1}$ and the minima can be calculated recursively. After this, the minimum in any range $[a,b]$ can be calculated in $O(1)$ time as a minimum of two ranges of size $2^k$ where $k=\lfloor \log_2(b-a+1) \rfloor$. The first range begins at index $a$, and the second range ends at index $b$. The parameter $k$ is chosen so that the two ranges of size $2^k$ fully cover the range $[a,b]$. As an example, consider the range $[2,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (1,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The length of the range is 6, and $\lfloor \log_2(6) \rfloor = 2$. Thus, the minimum can be calculated from two ranges of length 4. The ranges are $[2,5]$ and $[4,7]$: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (1,0) rectangle (5,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} Since the minimum in the range $[2,5]$ is 3 and the minimum in the range $[4,7]$ is 1, we know that the minimum in the range $[2,7]$ is 1. \section{Binary indexed tree} \index{binary indexed tree} \index{Fenwick tree} A \key{binary indexed tree} or \key{Fenwick tree} can be seen as a dynamic version of a sum array. The tree supports two $O(\log n)$ time operations: calculating the sum of elements in a range, and modifying the value of an element. The benefit in using a binary indexed tree is that the elements of the underlying array can be efficiently updated between the queries. This would not be possible with a sum array, because after each update, we should build the whole sum array again in $O(n)$ time. \subsubsection{Structure} Given an array of $n$ elements, indexed $1 \ldots n$, the binary indexed tree for that array is an array such that the value at position $k$ equals the sum of elements in the original array in a range that ends at position $k$. The length of the range is the largest power of two that divides $k$. For example, if $k=6$, the length of the range is $2$, because $2$ divides $6$ but $4$ does not divide $6$. \begin{samepage} For example, consider the following array: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$3$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$8$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$1$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$2$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} \end{samepage} The corresponding binary indexed tree is as follows: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} For example, the value at position 6 in the binary indexed tree is 7, because the sum of elements in the range $[5,6]$ in the original array is $6+1=7$. \subsubsection{Sum query} The basic operation in a binary indexed tree is to calculate the sum of elements in a range $[1,k]$, where $k$ is any position in the array. The sum of such a range can be calculated as a sum of one or more values stored in the tree. For example, the range $[1,7]$ corresponds to the following values: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw[fill=lightgray] (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw[fill=lightgray] (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$. The structure of the binary indexed tree allows us to calculate the sum of elements in any range using only $O(\log n)$ values from the tree. Using the same technique that we previously used with a sum array, we can efficiently calculate the sum of any range $[a,b]$ by substracting the sum of the range $[1,a-1]$ from the sum of the range $[1,b]$. Also here, only $O(\log n)$ values are needed, because it suffices to calculate two sums of $[1,k]$ ranges. \subsubsection{Array update} When an element in the original array changes, several sums in the binary indexed tree change. For example, if the element at position 3 changes, the sums of the following ranges change: \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (8,1); \node at (0.5,0.5) {$1$}; \node at (1.5,0.5) {$4$}; \node at (2.5,0.5) {$4$}; \node at (3.5,0.5) {$16$}; \node at (4.5,0.5) {$6$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$4$}; \node at (7.5,0.5) {$29$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \draw[->,thick] (0.5,-0.9) -- (0.5,-0.1); \draw[->,thick] (2.5,-0.9) -- (2.5,-0.1); \draw[->,thick] (4.5,-0.9) -- (4.5,-0.1); \draw[->,thick] (6.5,-0.9) -- (6.5,-0.1); \draw[->,thick] (1.5,-1.9) -- (1.5,-0.1); \draw[->,thick] (5.5,-1.9) -- (5.5,-0.1); \draw[->,thick] (3.5,-2.9) -- (3.5,-0.1); \draw[->,thick] (7.5,-3.9) -- (7.5,-0.1); \draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1); \draw[fill=lightgray] (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1); \draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1); \draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1); \draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2); \draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2); \draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3); \draw[fill=lightgray] (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4); \end{tikzpicture} \end{center} However, it turns out that the number of values that need to be updated in the binary indexed tree is only $O(\log n)$. \subsubsection{Implementation} The operations of a binary indexed tree can be implemented in an elegant and efficient way using bit operations. The key fact needed is that $k \& -k$ isolates the last one bit in a number $k$. For example, $6 \& -6=2$ because the number $6$ corresponds to 110 and the number $2$ corresponds to 10. It turns out that when processing a range query, the position $k$ in the binary indexed tree should be decreased by $k \& -k$ at every step, and when updating the array, the position $k$ should be increased by $k \& -k$ at every step. Suppose that the binary indexed tree is stored in an array \texttt{b}. The following function calculates the sum of elements in a range $[1,k]$: \begin{lstlisting} int sum(int k) { int s = 0; while (k >= 1) { s += b[k]; k -= k&-k; } return s; } \end{lstlisting} The following function increases the value of the element at position $k$ by $x$ ($x$ can be positive or negative): \begin{lstlisting} void add(int k, int x) { while (k <= n) { b[k] += x; k += k&-k; } } \end{lstlisting} The time complexity of both the functions is $O(\log n)$, because the functions access $O(\log n)$ values in the binary indexed tree, and each transition to the next position takes $O(1)$ time using bit operations. \section{Segment tree} \index{segment tree} A \key{segment tree} is a data structure that supports two operations: processing a range query and modifying an element in the array. Segment trees can support sum queries, minimum and maximum queries and many other queries so that both operations work in $O(\log n)$ time. Compared to a binary indexed tree, the advantage of a segment tree is that it is a more general data structure. While binary indexed trees only support sum queries, segment trees also support other queries. On the other hand, a segment tree requires more memory and is a bit more difficult to implement. \subsubsection{Structure} A segment tree is a binary tree that contains $2n-1$ nodes. The nodes on the bottom level of the tree correspond to the original array, and the other nodes contain information needed for processing range queries. We will first discuss segment trees that support sum queries. As an example, consider the following array: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node at (0.5,0.5) {$5$}; \node at (1.5,0.5) {$8$}; \node at (2.5,0.5) {$6$}; \node at (3.5,0.5) {$3$}; \node at (4.5,0.5) {$2$}; \node at (5.5,0.5) {$7$}; \node at (6.5,0.5) {$2$}; \node at (7.5,0.5) {$6$}; % \footnotesize % \node at (0.5,1.4) {$1$}; % \node at (1.5,1.4) {$2$}; % \node at (2.5,1.4) {$3$}; % \node at (3.5,1.4) {$4$}; % \node at (4.5,1.4) {$5$}; % \node at (5.5,1.4) {$6$}; % \node at (6.5,1.4) {$7$}; % \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The corresponding segment tree is as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} Each internal node in the segment tree contains information about a range of size $2^k$ in the original array. In the above tree, the value of each internal node is the sum of the corresponding array elements, and it can be calculated as the sum of the values of its left and right child node. It is convenient to build a segment tree for an array whose size is a power of two, because in this case every internal node has a left and right child. In the sequel, we will assume that the tree is built like this. If the size of the array is not a power of two, we can always add zero elements to the array. \subsubsection{Range query} The sum of elements in a given range can be calculated as a sum of values in the segment tree. For example, consider the following range: \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=gray!50] (2,0) rectangle (8,1); \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; % % \footnotesize % \node at (0.5,1.4) {$1$}; % \node at (1.5,1.4) {$2$}; % \node at (2.5,1.4) {$3$}; % \node at (3.5,1.4) {$4$}; % \node at (4.5,1.4) {$5$}; % \node at (5.5,1.4) {$6$}; % \node at (6.5,1.4) {$7$}; % \node at (7.5,1.4) {$8$}; \end{tikzpicture} \end{center} The sum of elements in the range is $6+3+2+7+2+6=26$. The following two nodes in the tree cover the range: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,fill=gray!50,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,fill=gray!50] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} Thus, the sum of elements in the range is $9+17=26$. When the sum is calculated using nodes that are located as high as possible in the tree, at most two nodes on each level of the tree are needed. Hence, the total number of nodes examined is only $O(\log n)$. \subsubsection{Array update} When an element in the array changes, we should update all nodes in the tree whose value depends on the element. This can be done by traversing the path from the element to the top node and updating the nodes along the path. \begin{samepage} The following picture shows which nodes in the segment tree change if the element 7 in the array changes. \begin{center} \begin{tikzpicture}[scale=0.7] \fill[color=gray!50] (5,0) rectangle (6,1); \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt,fill=gray!50] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,fill=gray!50] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle,fill=gray!50] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} \end{samepage} The path from bottom to top always consists of $O(\log n)$ nodes, so each update changes $O(\log n)$ nodes in the tree. \subsubsection{Storing the tree} A segment tree can be stored in an array of $2N$ elements where $N$ is a power of two. From now on, we will assume that the indices of the original array are between $0$ and $N-1$. The element at position 1 in the array corresponds to the top node of the tree, the elements at positions 2 and 3 correspond to the second level of the tree, and so on. Finally, the elements at positions $N \ldots 2N-1$ correspond to the bottom level of the tree, i.e., the elements of the original array. For example, the segment tree \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {2}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle] (a) at (1,2.5) {13}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle] (i) at (2,4.5) {22}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle] (j) at (6,4.5) {17}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle] (m) at (4,6.5) {39}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} can be stored as follows ($N=8$): \begin{center} \begin{tikzpicture}[scale=0.7] %\fill[color=lightgray] (3,0) rectangle (7,1); \draw (0,0) grid (15,1); \node at (0.5,0.5) {$39$}; \node at (1.5,0.5) {$22$}; \node at (2.5,0.5) {$17$}; \node at (3.5,0.5) {$13$}; \node at (4.5,0.5) {$9$}; \node at (5.5,0.5) {$9$}; \node at (6.5,0.5) {$8$}; \node at (7.5,0.5) {$5$}; \node at (8.5,0.5) {$8$}; \node at (9.5,0.5) {$6$}; \node at (10.5,0.5) {$3$}; \node at (11.5,0.5) {$2$}; \node at (12.5,0.5) {$7$}; \node at (13.5,0.5) {$2$}; \node at (14.5,0.5) {$6$}; \footnotesize \node at (0.5,1.4) {$1$}; \node at (1.5,1.4) {$2$}; \node at (2.5,1.4) {$3$}; \node at (3.5,1.4) {$4$}; \node at (4.5,1.4) {$5$}; \node at (5.5,1.4) {$6$}; \node at (6.5,1.4) {$7$}; \node at (7.5,1.4) {$8$}; \node at (8.5,1.4) {$9$}; \node at (9.5,1.4) {$10$}; \node at (10.5,1.4) {$11$}; \node at (11.5,1.4) {$12$}; \node at (12.5,1.4) {$13$}; \node at (13.5,1.4) {$14$}; \node at (14.5,1.4) {$15$}; \end{tikzpicture} \end{center} Using this representation, for a node at position $k$, \begin{itemize} \item the parent node is at position $\lfloor k/2 \rfloor$, \item the left child node is at position $2k$, and \item the right child node is at position $2k+1$. \end{itemize} Note that this implies that the index of a node is even if it is a left child and odd if it is a right child. \subsubsection{Functions} We assume that the segment tree is stored in an array \texttt{p}. The following function calculates the sum of elements in a range $[a,b]$: \begin{lstlisting} int sum(int a, int b) { a += N; b += N; int s = 0; while (a <= b) { if (a%2 == 1) s += p[a++]; if (b%2 == 0) s += p[b--]; a /= 2; b /= 2; } return s; } \end{lstlisting} The function maintains a range in the segment tree array. Initially the range is $[a+N,b+N]$, that corresponds to the range $[a,b]$ in the underlying array. At each step, the function adds the value of the left and right node to the sum if their parent nodes do not belong to the range. After this, the same process continues on the next level of the tree. The following function increases the value of the element at position $k$ by $x$: \begin{lstlisting} void add(int k, int x) { k += N; p[k] += x; for (k /= 2; k >= 1; k /= 2) { p[k] = p[2*k]+p[2*k+1]; } } \end{lstlisting} First the function updates the element at the bottom level of the tree. After this, the function updates the values of all internal nodes in the tree, until it reaches the top node of the tree. Both above functions work in $O(\log n)$ time, because a segment tree of $n$ elements consists of $O(\log n)$ levels, and the operations move one level forward in the tree at each step. \subsubsection{Other queries} A segment tree can support any query where the answer for a range $[a,b]$ can be calculated from the answers for ranges $[a,c]$ and $[c+1,b]$, where $c$ is some index between $a$ and $b$. Examples of such queries are minimum and maximum, greatest common divisor, and bit operations and, or and xor. \begin{samepage} For example, the following segment tree supports minimum queries: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (8,1); \node[anchor=center] at (0.5, 0.5) {5}; \node[anchor=center] at (1.5, 0.5) {8}; \node[anchor=center] at (2.5, 0.5) {6}; \node[anchor=center] at (3.5, 0.5) {3}; \node[anchor=center] at (4.5, 0.5) {1}; \node[anchor=center] at (5.5, 0.5) {7}; \node[anchor=center] at (6.5, 0.5) {2}; \node[anchor=center] at (7.5, 0.5) {6}; \node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5}; \path[draw,thick,-] (a) -- (0.5,1); \path[draw,thick,-] (a) -- (1.5,1); \node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3}; \path[draw,thick,-] (b) -- (2.5,1); \path[draw,thick,-] (b) -- (3.5,1); \node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1}; \path[draw,thick,-] (c) -- (4.5,1); \path[draw,thick,-] (c) -- (5.5,1); \node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2}; \path[draw,thick,-] (d) -- (6.5,1); \path[draw,thick,-] (d) -- (7.5,1); \node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3}; \path[draw,thick,-] (i) -- (a); \path[draw,thick,-] (i) -- (b); \node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1}; \path[draw,thick,-] (j) -- (c); \path[draw,thick,-] (j) -- (d); \node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1}; \path[draw,thick,-] (m) -- (i); \path[draw,thick,-] (m) -- (j); \end{tikzpicture} \end{center} \end{samepage} In this segment tree, every node in the tree contains the smallest element in the corresponding range of the underlying array. The top node of the tree contains the smallest element in the whole array. The operations can be implemented like previously, but instead of sums, minima are calculated. \subsubsection{Binary search in tree} The structure of the segment tree allows us to use binary search for finding elements in the array. For example, if the tree supports the minimum query, we can find the position of the smallest element in $O(\log n)$ time. For example, in the following tree the smallest element 1 can be found by traversing a path downwards from the top node: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (8,0) grid (16,1); \node[anchor=center] at (8.5, 0.5) {9}; \node[anchor=center] at (9.5, 0.5) {5}; \node[anchor=center] at (10.5, 0.5) {7}; \node[anchor=center] at (11.5, 0.5) {1}; \node[anchor=center] at (12.5, 0.5) {6}; \node[anchor=center] at (13.5, 0.5) {2}; \node[anchor=center] at (14.5, 0.5) {3}; \node[anchor=center] at (15.5, 0.5) {2}; %\node[anchor=center] at (1,2.5) {13}; \node[draw, circle,minimum size=22pt] (e) at (9,2.5) {5}; \path[draw,thick,-] (e) -- (8.5,1); \path[draw,thick,-] (e) -- (9.5,1); \node[draw, circle,minimum size=22pt] (f) at (11,2.5) {1}; \path[draw,thick,-] (f) -- (10.5,1); \path[draw,thick,-] (f) -- (11.5,1); \node[draw, circle,minimum size=22pt] (g) at (13,2.5) {2}; \path[draw,thick,-] (g) -- (12.5,1); \path[draw,thick,-] (g) -- (13.5,1); \node[draw, circle,minimum size=22pt] (h) at (15,2.5) {2}; \path[draw,thick,-] (h) -- (14.5,1); \path[draw,thick,-] (h) -- (15.5,1); \node[draw, circle,minimum size=22pt] (k) at (10,4.5) {1}; \path[draw,thick,-] (k) -- (e); \path[draw,thick,-] (k) -- (f); \node[draw, circle,minimum size=22pt] (l) at (14,4.5) {2}; \path[draw,thick,-] (l) -- (g); \path[draw,thick,-] (l) -- (h); \node[draw, circle,minimum size=22pt] (n) at (12,6.5) {1}; \path[draw,thick,-] (n) -- (k); \path[draw,thick,-] (n) -- (l); \path[draw=red,thick,->,line width=2pt] (n) -- (k); \path[draw=red,thick,->,line width=2pt] (k) -- (f); \path[draw=red,thick,->,line width=2pt] (f) -- (11.5,1); \end{tikzpicture} \end{center} \section{Additional techniques} \subsubsection{Index compression} A limitation in data structures that are built upon an array is that the elements are indexed using integers $1,2,3,$ etc. Difficulties arise when large indices are needed. For example, if we wish to use the index $10^9$, the array should contain $10^9$ elements which is not realistic. \index{index compression} However, we can often bypass this limitation by using \key{index compression}, where the original indices are replaced with the indices $1,2,3,$ etc. This can be done if we know all the indices needed during the algorithm beforehand. The idea is to replace each original index $x$ with $p(x)$ where $p$ is a function that compresses the indices. We require that the order of the indices does not change, so if $a