cphb/chapter09.tex

1404 lines
39 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Range queries}
2017-01-03 18:41:30 +01:00
\index{range query}
\index{sum query}
\index{minimum query}
\index{maximum query}
2017-02-15 22:22:40 +01:00
In this chapter, we discuss data structures
2017-05-21 11:34:44 +02:00
that allow us to efficiently process range queries.
In a \key{range query},
our task is to calculate a value
based on a subarray of an array.
2017-02-04 00:54:48 +01:00
Typical range queries are:
2016-12-28 23:54:51 +01:00
\begin{itemize}
2017-05-21 11:34:44 +02:00
\item $\texttt{sum}_q(a,b)$: calculate the sum of values in range $[a,b]$
\item $\texttt{min}_q(a,b)$: find the minimum value in range $[a,b]$
\item $\texttt{max}_q(a,b)$: find the maximum value in range $[a,b]$
2016-12-28 23:54:51 +01:00
\end{itemize}
2017-05-21 11:34:44 +02:00
2017-04-17 16:59:27 +02:00
For example, consider the range $[3,6]$ in the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$4$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$3$};
\node at (7.5,0.5) {$4$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
In this case, $\texttt{sum}_q(3,6)=14$,
$\texttt{min}_q(3,6)=1$ and $\texttt{max}_q(3,6)=6$.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
A simple way to process range queries is to use
a loop that goes through all array values in the range.
For example, the following function can be
used to process sum queries on an array:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
int s = 0;
for (int i = a; i <= b; i++) {
2017-05-21 11:34:44 +02:00
s += array[i];
2016-12-28 23:54:51 +01:00
}
return s;
}
\end{lstlisting}
2017-05-21 11:34:44 +02:00
This function works in $O(n)$ time,
where $n$ is the size of the array.
2017-02-14 20:01:22 +01:00
Thus, we can process $q$ queries in $O(nq)$
time using the function.
2017-02-15 22:22:40 +01:00
However, if both $n$ and $q$ are large, this approach
2017-05-21 11:34:44 +02:00
is slow. Fortunately, it turns out that there are
2017-02-15 22:22:40 +01:00
ways to process range queries much more efficiently.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\section{Static array queries}
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
We first focus on a situation where
2017-05-21 11:34:44 +02:00
the array is \emph{static}, i.e.,
the array values are never updated between the queries.
2017-02-14 20:01:22 +01:00
In this case, it suffices to construct
2017-02-15 22:45:36 +01:00
a static data structure that tells us
the answer for any possible query.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
\subsubsection{Sum queries}
2016-12-28 23:54:51 +01:00
2017-02-22 20:25:13 +01:00
\index{prefix sum array}
2016-12-28 23:54:51 +01:00
2017-04-19 19:44:51 +02:00
We can easily process
2017-05-21 11:34:44 +02:00
sum queries on a static array
by constructing a \key{prefix sum array}.
Each value in the prefix sum array equals
the sum of values in the original array up to that position,
i.e., the value at position $k$ is $\texttt{sum}_q(0,k)$.
The prefix sum array can be constructed in $O(n)$ time.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-22 20:25:13 +01:00
The corresponding prefix sum array is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-22 20:25:13 +01:00
Since the prefix sum array contains all values
2017-05-21 11:34:44 +02:00
of $\texttt{sum}_q(0,k)$,
2017-02-15 22:22:40 +01:00
we can calculate any value of
2017-05-21 11:34:44 +02:00
$\texttt{sum}_q(a,b)$ in $O(1)$ time as follows:
\[ \texttt{sum}_q(a,b) = \texttt{sum}_q(0,b) - \texttt{sum}_q(0,a-1)\]
By defining $\texttt{sum}_q(0,-1)=0$,
2017-04-19 19:44:51 +02:00
the above formula also holds when $a=0$.
2016-12-28 23:54:51 +01:00
2017-04-17 16:59:27 +02:00
For example, consider the range $[3,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
In this case $\texttt{sum}_q(3,6)=8+6+1+4=19$.
This sum can be calculated from
two values of the prefix sum array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1);
\fill[color=lightgray] (6,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Thus, $\texttt{sum}_q(3,6)=\texttt{sum}_q(0,6)-\texttt{sum}_q(0,2)=27-8=19$.
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
It is also possible to generalize this idea
to higher dimensions.
For example, we can construct a two-dimensional
2017-05-21 11:34:44 +02:00
prefix sum array that can be used to calculate
2017-02-04 00:54:48 +01:00
the sum of any rectangular subarray in $O(1)$ time.
2017-05-21 11:34:44 +02:00
Each sum in such an array corresponds to
a subarray
2017-02-04 00:54:48 +01:00
that begins at the upper-left corner of the array.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 18:41:30 +01:00
The following picture illustrates the idea:
2016-12-28 23:54:51 +01:00
\begin{center}
2017-01-03 21:51:20 +01:00
\begin{tikzpicture}[scale=0.54]
2016-12-28 23:54:51 +01:00
\draw[fill=lightgray] (3,2) rectangle (7,5);
\draw (0,0) grid (10,7);
\node[anchor=center] at (6.5, 2.5) {$A$};
\node[anchor=center] at (2.5, 2.5) {$B$};
\node[anchor=center] at (6.5, 5.5) {$C$};
\node[anchor=center] at (2.5, 5.5) {$D$};
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-04 00:54:48 +01:00
The sum of the gray subarray can be calculated
2017-01-03 18:41:30 +01:00
using the formula
2017-02-04 00:54:48 +01:00
\[S(A) - S(B) - S(C) + S(D),\]
2017-05-21 11:34:44 +02:00
where $S(X)$ denotes the sum of values
in a rectangular
2017-01-03 18:41:30 +01:00
subarray from the upper-left corner
2017-02-04 00:54:48 +01:00
to the position of $X$.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
\subsubsection{Minimum queries}
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
\index{sparse table}
Minimum queries are more difficult to process
than sum queries.
Still, there is a quite simple
$O(n \log n)$ time preprocessing
method after which we can answer any minimum
query in $O(1)$ time\footnote{This technique
was introduced in \cite{ben00} and sometimes
called the \key{sparse table} method.
There are also more sophisticated techniques \cite{fis06} where
2017-05-21 11:34:44 +02:00
the preprocessing time is only $O(n)$, but such algorithms
are not needed in competitive programming.}.
2017-05-21 11:34:44 +02:00
Note that since minimum and maximum queries can
be processed similarly,
we can focus on minimum queries.
The idea is to precalculate all values of
$\textrm{min}_q(a,b)$ where
$b-a+1$ (the length of the range) is a power of two.
2017-02-14 20:01:22 +01:00
For example, for the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
the following values are calculated:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tabular}{ccc}
2017-05-21 11:34:44 +02:00
\begin{tabular}{lll}
$a$ & $b$ & $\texttt{min}_q(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 0 & 1 \\
1 & 1 & 3 \\
2 & 2 & 4 \\
3 & 3 & 8 \\
4 & 4 & 6 \\
5 & 5 & 1 \\
6 & 6 & 4 \\
7 & 7 & 2 \\
2016-12-28 23:54:51 +01:00
\end{tabular}
&
2017-05-21 11:34:44 +02:00
\begin{tabular}{lll}
$a$ & $b$ & $\texttt{min}_q(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 1 & 1 \\
1 & 2 & 3 \\
2 & 3 & 4 \\
3 & 4 & 6 \\
4 & 5 & 1 \\
2017-02-14 20:01:22 +01:00
5 & 6 & 1 \\
2017-04-17 16:59:27 +02:00
6 & 7 & 2 \\
2016-12-28 23:54:51 +01:00
\\
\end{tabular}
&
2017-05-21 11:34:44 +02:00
\begin{tabular}{lll}
$a$ & $b$ & $\texttt{min}_q(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 3 & 1 \\
1 & 4 & 3 \\
2 & 5 & 1 \\
2017-02-14 20:01:22 +01:00
3 & 6 & 1 \\
4 & 7 & 1 \\
2017-04-17 16:59:27 +02:00
0 & 7 & 1 \\
2016-12-28 23:54:51 +01:00
\\
\\
\end{tabular}
\end{tabular}
\end{center}
2017-02-14 20:01:22 +01:00
The number of precalculated values is $O(n \log n)$,
because there are $O(\log n)$ range lengths
that are powers of two.
2017-05-21 11:34:44 +02:00
The values can be calculated efficiently
2017-02-14 20:01:22 +01:00
using the recursive formula
2017-05-21 11:34:44 +02:00
\[\texttt{min}_q(a,b) = \min(\texttt{min}_q(a,a+w-1),\texttt{min}_q(a+w,b)),\]
2017-02-14 20:01:22 +01:00
where $b-a+1$ is a power of two and $w=(b-a+1)/2$.
Calculating all those values takes $O(n \log n)$ time.
2017-05-21 11:34:44 +02:00
After this, any value of $\texttt{min}_q(a,b)$ can be calculated
2017-02-14 20:01:22 +01:00
in $O(1)$ time as a minimum of two precalculated values.
Let $k$ be the largest power of two that does not exceed $b-a+1$.
2017-05-21 11:34:44 +02:00
We can calculate the value of $\texttt{min}_q(a,b)$ using the formula
\[\texttt{min}_q(a,b) = \min(\texttt{min}_q(a,a+k-1),\texttt{min}_q(b-k+1,b)).\]
2017-02-14 20:01:22 +01:00
In the above formula, the range $[a,b]$ is represented
as the union of the ranges $[a,a+k-1]$ and $[b-k+1,b]$, both of length $k$.
2017-01-03 18:41:30 +01:00
2017-04-17 16:59:27 +02:00
As an example, consider the range $[1,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The length of the range is 6,
2017-02-14 20:01:22 +01:00
and the largest power of two that does
not exceed 6 is 4.
2017-04-17 16:59:27 +02:00
Thus the range $[1,6]$ is
the union of the ranges $[1,4]$ and $[3,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Since $\texttt{min}_q(1,4)=3$ and $\texttt{min}_q(3,6)=1$,
we conclude that $\texttt{min}_q(1,6)=1$.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
\section{Binary indexed tree}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\index{binary indexed tree}
\index{Fenwick tree}
2016-12-28 23:54:51 +01:00
2017-04-19 19:44:51 +02:00
A \key{binary indexed tree} or a \key{Fenwick tree}\footnote{The
2017-02-25 15:51:29 +01:00
binary indexed tree structure was presented by P. M. Fenwick in 1994 \cite{fen94}.}
2017-05-21 11:34:44 +02:00
can be seen as a dynamic variant of a prefix sum array.
It supports two $O(\log n)$ time operations on an array:
processing a range sum query and updating a value.
2017-02-04 10:48:16 +01:00
2017-02-14 20:01:22 +01:00
The advantage of a binary indexed tree is
2017-05-21 11:34:44 +02:00
that it allows us to efficiently update
array values between sum queries.
2017-02-22 20:25:13 +01:00
This would not be possible using a prefix sum array,
2017-04-19 19:44:51 +02:00
because after each update, it would be necessary to build the
whole prefix sum array again in $O(n)$ time.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
Even if the name of the structure is a binary indexed \emph{tree},
it is usually represented as an array.
In this section we assume that all arrays are one-indexed,
because it makes the implementation easier.
Let $p(k)$ denote the largest power of two that
divides $k$.
We store a binary indexed tree as an array \texttt{tree}
such that
\[ \texttt{tree}[k] = \texttt{sum}_q(k-p(k)+1,k),\]
i.e., each position $k$ contains the sum of values
in a range of the original array whose length is $p(k)$
and that ends at position $k$.
For example, since $p(6)=2$, $\texttt{tree}[6]$
contains the value of $\texttt{sum}_q(5,6)$.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
2017-02-04 10:48:16 +01:00
The corresponding binary indexed tree is as follows:
2017-02-14 20:01:22 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
The following picture shows more clearly
how each value in the binary indexed tree
2017-05-21 11:34:44 +02:00
corresponds to a range in the original array:
2017-02-14 20:01:22 +01:00
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Using a binary indexed tree,
any value of $\texttt{sum}_q(1,k)$
can be calculated in $O(\log n)$ time,
because a range $[1,k]$ can always be divided into
$O(\log n)$ ranges whose sums are stored in the tree.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
For example, the range $[1,7]$ consists of
the following ranges:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw[fill=lightgray] (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw[fill=lightgray] (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Thus, we can calculate the corresponding sum as follows:
\[\texttt{sum}_q(1,7)=\texttt{sum}_q(1,4)+\texttt{sum}_q(5,6)+\texttt{sum}_q(7,7)=16+7+4=27\]
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
To calculate the value of $\texttt{sum}_q(a,b)$ where $a>1$,
2017-02-22 20:25:13 +01:00
we can use the same trick that we used with prefix sum arrays:
2017-05-21 11:34:44 +02:00
\[ \texttt{sum}_q(a,b) = \texttt{sum}_q(1,b) - \texttt{sum}_q(1,a-1).\]
Since we can calculate both $\texttt{sum}_q(1,b)$
and $\texttt{sum}_q(1,a-1)$ in $O(\log n)$ time,
the total time complexity is $O(\log n)$.
Then, after updating a value in the original array,
several values in the binary indexed tree
should be updated.
For example, if the value at position 3 changes,
2017-01-03 19:43:51 +01:00
the sums of the following ranges change:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw[fill=lightgray] (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw[fill=lightgray] (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-14 20:01:22 +01:00
Since each array element belongs to $O(\log n)$
ranges in the binary indexed tree,
2017-05-21 11:34:44 +02:00
it suffices to update $O(\log n)$ values in the tree.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
The operations of a binary indexed tree can be
efficiently implemented using bit operations.
The key fact needed is that we can
calculate any value of $p(k)$ using the formula
\[p(k) = k \& -k.\]
The following function calculates the value of $\texttt{sum}_q(1,k)$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int k) {
2016-12-28 23:54:51 +01:00
int s = 0;
while (k >= 1) {
2017-05-21 11:34:44 +02:00
s += tree[k];
2016-12-28 23:54:51 +01:00
k -= k&-k;
}
return s;
}
\end{lstlisting}
2017-05-21 11:34:44 +02:00
The following function increases the
array value at position $k$ by $x$
2017-02-04 10:48:16 +01:00
($x$ can be positive or negative):
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 19:43:51 +01:00
void add(int k, int x) {
2016-12-28 23:54:51 +01:00
while (k <= n) {
2017-05-21 11:34:44 +02:00
tree[k] += x;
2016-12-28 23:54:51 +01:00
k += k&-k;
}
}
\end{lstlisting}
2017-02-04 10:48:16 +01:00
The time complexity of both the functions is
$O(\log n)$, because the functions access $O(\log n)$
2017-02-14 20:01:22 +01:00
values in the binary indexed tree, and each move
2017-05-21 11:34:44 +02:00
to the next position takes $O(1)$ time.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
\section{Segment tree}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\index{segment tree}
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
A \key{segment tree}\footnote{The bottom-up-implementation in this chapter corresponds to
that in \cite{sta06}. Similar structures were used
in late 1970's to solve geometric problems \cite{ben80}.} is a data structure
2017-02-04 13:04:02 +01:00
that supports two operations:
processing a range query and
2017-05-21 11:34:44 +02:00
updating an array value.
2017-02-04 13:04:02 +01:00
Segment trees can support
sum queries, minimum and maximum queries and many other
2017-01-03 21:51:20 +01:00
queries so that both operations work in $O(\log n)$ time.
Compared to a binary indexed tree,
the advantage of a segment tree is that it is
a more general data structure.
While binary indexed trees only support
2017-04-21 22:27:15 +02:00
sum queries\footnote{In fact, using \emph{two} binary
indexed trees it is possible to support minimum queries \cite{dim15},
but this is more complicated than to use a segment tree.},
segment trees also support other queries.
2017-01-03 21:51:20 +01:00
On the other hand, a segment tree requires more
memory and is a bit more difficult to implement.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-02-27 20:57:28 +01:00
A segment tree is a binary tree
such that the nodes on the bottom level of the tree
2017-02-14 21:39:45 +01:00
correspond to the array elements,
2017-02-04 13:04:02 +01:00
and the other nodes
contain information needed for processing range queries.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
In this section, we assume that the size
2017-02-14 21:39:45 +01:00
of the array is a power of two and zero-based
indexing is used, because it is convenient to build
a segment tree for such an array.
If the size of the array is not a power of two,
we can always append extra elements to it.
We will first discuss segment trees that support sum queries.
As an example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$5$};
\node at (1.5,0.5) {$8$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$3$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$6$};
2017-02-14 21:39:45 +01:00
\footnotesize
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
The corresponding segment tree is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Each internal tree node
corresponds to an array range
whose size is a power of two.
2017-01-03 21:51:20 +01:00
In the above tree, the value of each internal
2017-05-21 11:34:44 +02:00
node is the sum of the corresponding array values,
2017-01-03 21:51:20 +01:00
and it can be calculated as the sum of
the values of its left and right child node.
2017-05-21 11:34:44 +02:00
It turns out that any range $[a,b]$
can be divided into $O(\log n)$ ranges
whose values are stored in tree nodes.
For example, consider the range [2,7]:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (2,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
2017-05-21 11:34:44 +02:00
\footnotesize
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Here $\texttt{sum}_q(2,7)=6+3+2+7+2+6=26$.
In this case, the following two tree nodes
2017-02-14 21:39:45 +01:00
correspond to the range:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,fill=gray!50,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
Thus, another way to calculate the sum is $9+17=26$.
2016-12-28 23:54:51 +01:00
2017-02-04 13:04:02 +01:00
When the sum is calculated using nodes
2017-05-21 11:34:44 +02:00
located as high as possible in the tree,
2017-01-03 21:51:20 +01:00
at most two nodes on each level
2017-02-04 13:04:02 +01:00
of the tree are needed.
Hence, the total number of nodes
2017-05-21 11:34:44 +02:00
is $O(\log n)$.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
After an array update,
we should update all nodes
whose value depends on the updated value.
2017-02-04 13:04:02 +01:00
This can be done by traversing the path
2017-05-21 11:34:44 +02:00
from the updated array element to the top node
2017-02-04 13:04:02 +01:00
and updating the nodes along the path.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
The following picture shows which tree nodes
change if the array value 7 changes:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (5,0) rectangle (6,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt,fill=gray!50] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,fill=gray!50] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
The path from bottom to top
2017-01-03 21:51:20 +01:00
always consists of $O(\log n)$ nodes,
2017-02-04 13:04:02 +01:00
so each update changes $O(\log n)$ nodes in the tree.
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-05-21 11:34:44 +02:00
We store a segment tree as an array
of $2n$ elements where $n$ is the size of
the original array and a power of two.
The tree nodes are stored from top to bottom:
$\texttt{tree}[1]$ is the top node,
$\texttt{tree}[2]$ and $\texttt{tree}[3]$
are its children, and so on.
Finally, the values from $\texttt{tree}[n]$
to $\texttt{tree}[2n-1]$ correspond to
the values of the original array
on the bottom level of the tree.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
For example, the segment tree
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
is stored as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (15,1);
\node at (0.5,0.5) {$39$};
\node at (1.5,0.5) {$22$};
\node at (2.5,0.5) {$17$};
\node at (3.5,0.5) {$13$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$8$};
\node at (7.5,0.5) {$5$};
\node at (8.5,0.5) {$8$};
\node at (9.5,0.5) {$6$};
\node at (10.5,0.5) {$3$};
\node at (11.5,0.5) {$2$};
\node at (12.5,0.5) {$7$};
\node at (13.5,0.5) {$2$};
\node at (14.5,0.5) {$6$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\node at (8.5,1.4) {$9$};
\node at (9.5,1.4) {$10$};
\node at (10.5,1.4) {$11$};
\node at (11.5,1.4) {$12$};
\node at (12.5,1.4) {$13$};
\node at (13.5,1.4) {$14$};
\node at (14.5,1.4) {$15$};
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Using this representation,
2017-05-21 11:34:44 +02:00
the parent of $\texttt{tree}[k]$
is $\texttt{tree}[\lfloor k/2 \rfloor]$,
and its children are $\texttt{tree}[2k]$
and $\texttt{tree}[2k+1]$.
Note that this implies that the position of a node
is even if it is a left child and odd if it is a right child.
2016-12-28 23:54:51 +01:00
2017-02-04 13:04:02 +01:00
The following function
2017-05-21 11:34:44 +02:00
calculates the value of $\texttt{sum}_q(a,b)$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int a, int b) {
2017-05-21 11:34:44 +02:00
a += n; b += n;
2016-12-28 23:54:51 +01:00
int s = 0;
while (a <= b) {
2017-05-21 11:34:44 +02:00
if (a%2 == 1) s += tree[a++];
if (b%2 == 0) s += tree[b--];
2016-12-28 23:54:51 +01:00
a /= 2; b /= 2;
}
return s;
}
\end{lstlisting}
2017-05-21 11:34:44 +02:00
The function maintains a range
that is initially $[a+n,b+n]$.
Then, at each step, the range is moved
one level higher in the tree,
and before that, the values of the nodes that do not
belong to the higher range are added to the sum.
The following function increases the array value
at position $k$ by $x$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 21:51:20 +01:00
void add(int k, int x) {
2017-05-21 11:34:44 +02:00
k += n;
tree[k] += x;
2016-12-28 23:54:51 +01:00
for (k /= 2; k >= 1; k /= 2) {
2017-05-21 11:34:44 +02:00
tree[k] = tree[2*k]+tree[2*k+1];
2016-12-28 23:54:51 +01:00
}
}
\end{lstlisting}
2017-05-21 11:34:44 +02:00
First the function updates the value
2017-02-04 13:04:02 +01:00
at the bottom level of the tree.
2017-01-03 21:51:20 +01:00
After this, the function updates the values of all
2017-05-21 11:34:44 +02:00
internal tree nodes, until it reaches
2017-02-04 13:04:02 +01:00
the top node of the tree.
2017-01-03 21:51:20 +01:00
2017-05-21 11:34:44 +02:00
Both the above functions work
2017-02-04 13:04:02 +01:00
in $O(\log n)$ time, because a segment tree
2017-01-03 21:51:20 +01:00
of $n$ elements consists of $O(\log n)$ levels,
2017-05-21 11:34:44 +02:00
and the functions move one level higher
in the tree at each step.
2017-01-03 21:51:20 +01:00
\subsubsection{Other queries}
2017-05-21 11:34:44 +02:00
Segment trees can support all range queries
where it is possible to divide a range into two parts,
2017-04-19 19:44:51 +02:00
calculate the answer separately for both parts
2017-02-15 22:22:40 +01:00
and then efficiently combine the answers.
2017-02-04 13:04:02 +01:00
Examples of such queries are
2017-01-03 21:51:20 +01:00
minimum and maximum, greatest common divisor,
2017-02-04 13:04:02 +01:00
and bit operations and, or and xor.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
For example, the following segment tree
supports minimum queries:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {1};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
In this case, every tree node contains
the smallest value in the corresponding
array range.
2017-01-03 21:51:20 +01:00
The top node of the tree contains the smallest
2017-05-21 11:34:44 +02:00
value in the whole array.
2017-02-04 13:04:02 +01:00
The operations can be implemented like previously,
2017-01-03 21:51:20 +01:00
but instead of sums, minima are calculated.
2017-05-21 11:34:44 +02:00
The structure of a segment tree also allows us
to use binary search for locating array elements.
2017-02-14 21:39:45 +01:00
For example, if the tree supports minimum queries,
2017-05-21 11:34:44 +02:00
we can find the position of an element
with the smallest value in $O(\log n)$ time.
2017-01-03 21:51:20 +01:00
2017-05-21 11:34:44 +02:00
For example, in the above tree, an
element with the smallest value 1 can be found
2017-02-04 13:04:02 +01:00
by traversing a path downwards from the top node:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
2017-05-21 11:34:44 +02:00
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {1};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\path[draw=red,thick,->,line width=2pt] (m) -- (j);
\path[draw=red,thick,->,line width=2pt] (j) -- (c);
\path[draw=red,thick,->,line width=2pt] (c) -- (4.5,1);
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
\section{Additional techniques}
\subsubsection{Index compression}
2017-02-04 13:04:02 +01:00
A limitation in data structures that
are built upon an array is that
2017-02-15 22:22:40 +01:00
the elements are indexed using
2017-02-14 21:39:45 +01:00
consecutive integers.
2017-02-04 13:04:02 +01:00
Difficulties arise when large indices
are needed.
For example, if we wish to use the index $10^9$,
the array should contain $10^9$
2017-02-15 22:22:40 +01:00
elements which would require too much memory.
2017-01-03 22:11:02 +01:00
\index{index compression}
However, we can often bypass this limitation
2017-02-04 13:04:02 +01:00
by using \key{index compression},
where the original indices are replaced
2017-02-14 21:39:45 +01:00
with indices $1,2,3,$ etc.
2017-01-03 22:11:02 +01:00
This can be done if we know all the indices
needed during the algorithm beforehand.
The idea is to replace each original index $x$
2017-05-21 11:34:44 +02:00
with $c(x)$ where $c$ is a function that
2017-02-04 13:04:02 +01:00
compresses the indices.
2017-01-03 22:11:02 +01:00
We require that the order of the indices
2017-05-21 11:34:44 +02:00
does not change, so if $a<b$, then $c(a)<c(b)$.
This allows us to conveniently perform queries
2017-02-14 21:39:45 +01:00
even if the indices are compressed.
2017-01-03 22:11:02 +01:00
For example, if the original indices are
$555$, $10^9$ and $8$, the new indices are:
2016-12-28 23:54:51 +01:00
\[
\begin{array}{lcl}
2017-05-21 11:34:44 +02:00
c(8) & = & 1 \\
c(555) & = & 2 \\
c(10^9) & = & 3 \\
2016-12-28 23:54:51 +01:00
\end{array}
\]
2017-02-14 21:39:45 +01:00
\subsubsection{Range updates}
2017-01-03 22:11:02 +01:00
So far, we have implemented data structures
2017-02-14 21:39:45 +01:00
that support range queries and updates
2017-01-03 22:11:02 +01:00
of single values.
2017-02-14 21:39:45 +01:00
Let us now consider an opposite situation,
2017-01-03 22:11:02 +01:00
where we should update ranges and
retrieve single values.
We focus on an operation that increases all
2017-02-04 13:04:02 +01:00
elements in a range $[a,b]$ by $x$.
2017-01-03 22:11:02 +01:00
2017-02-22 20:37:10 +01:00
\index{difference array}
2017-01-03 22:11:02 +01:00
Surprisingly, we can use the data structures
presented in this chapter also in this situation.
2017-02-22 20:25:13 +01:00
To do this, we build a \key{difference array}
2017-05-21 11:34:44 +02:00
whose values indicate
the differences between consecutive values
2017-02-22 20:25:13 +01:00
in the original array.
Thus, the original array is the
prefix sum array of the
difference array.
2017-02-14 21:39:45 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$1$};
\node at (3.5,0.5) {$1$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-14 21:39:45 +01:00
2017-02-22 20:25:13 +01:00
The difference array for the above array is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$0$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-04-19 19:44:51 +02:00
For example, the value 2 at position 6 in the original array
2017-05-21 11:34:44 +02:00
corresponds to the sum $3-2+4-3=2$ in the difference array.
2017-01-03 22:11:02 +01:00
2017-02-22 20:25:13 +01:00
The advantage of the difference array is
2017-02-14 21:39:45 +01:00
that we can update a range
in the original array by changing just
2017-02-22 20:25:13 +01:00
two elements in the difference array.
2017-01-03 22:11:02 +01:00
For example, if we want to
2017-05-21 11:34:44 +02:00
increase the original array
values between positions 1 and 4 by 5,
it suffices to increase the
difference array value at position 1 by 5
2017-04-19 19:44:51 +02:00
and decrease the value at position 5 by 5.
2017-01-03 22:11:02 +01:00
The result is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$5$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$-1$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-05-21 11:34:44 +02:00
More generally, to increase the values
in range $[a,b]$ by $x$,
2017-02-04 13:04:02 +01:00
we increase the value at position $a$ by $x$
and decrease the value at position $b+1$ by $x$.
Thus, it is only needed to update single values
and process sum queries,
2017-01-03 22:11:02 +01:00
so we can use a binary indexed tree or a segment tree.
A more difficult problem is to support both
range queries and range updates.
2017-02-04 13:04:02 +01:00
In Chapter 28 we will see that even this is possible.
2016-12-28 23:54:51 +01:00