cphb/chapter09.tex

1447 lines
40 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Range queries}
2017-01-03 18:41:30 +01:00
\index{range query}
\index{sum query}
\index{minimum query}
\index{maximum query}
2017-02-15 22:22:40 +01:00
In this chapter, we discuss data structures
that allow us to efficiently answer range queries.
In a \key{range query}, we are given two indices
to an array, and our task is to calculate some
value based on the elements between the given indices.
2017-02-04 00:54:48 +01:00
Typical range queries are:
2016-12-28 23:54:51 +01:00
\begin{itemize}
2017-02-15 22:22:40 +01:00
\item \key{sum query}: calculate the sum of elements
\item \key{minimum query}: find the smallest element
\item \key{maximum query}: find the largest element
2016-12-28 23:54:51 +01:00
\end{itemize}
2017-04-17 16:59:27 +02:00
For example, consider the range $[3,6]$ in the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$4$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$3$};
\node at (7.5,0.5) {$4$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
In this range, the sum of elements is $4+6+1+3=16$,
the minimum element is 1 and the maximum element is 6.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
A simple way to process range queries is to
go through all elements in the range.
2017-02-15 22:22:40 +01:00
For example, the following function \texttt{sum}
calculates the sum of elements in a range
2017-02-14 20:01:22 +01:00
$[a,b]$ of an array $t$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
int s = 0;
for (int i = a; i <= b; i++) {
s += t[i];
}
return s;
}
\end{lstlisting}
2017-02-14 20:01:22 +01:00
The above function works in $O(n)$ time,
where $n$ is the number of elements in the array.
Thus, we can process $q$ queries in $O(nq)$
time using the function.
2017-02-15 22:22:40 +01:00
However, if both $n$ and $q$ are large, this approach
is slow, and it turns out that there are
ways to process range queries much more efficiently.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\section{Static array queries}
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
We first focus on a situation where
2017-01-03 18:41:30 +01:00
the array is \key{static}, i.e.,
2017-02-14 20:01:22 +01:00
the elements are never modified between the queries.
In this case, it suffices to construct
2017-02-15 22:45:36 +01:00
a static data structure that tells us
the answer for any possible query.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
\subsubsection{Sum queries}
2016-12-28 23:54:51 +01:00
2017-02-22 20:25:13 +01:00
\index{prefix sum array}
2016-12-28 23:54:51 +01:00
2017-04-19 19:44:51 +02:00
We can easily process
2017-02-15 22:41:59 +01:00
sum queries on a static array,
2017-02-22 20:25:13 +01:00
because we can use a data structure called
a \key{prefix sum array}.
Each value in such an array equals
the sum of values in the original array up to that position.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-22 20:25:13 +01:00
The corresponding prefix sum array is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-15 22:41:59 +01:00
Let $\textrm{sum}(a,b)$ denote the sum of elements
in the range $[a,b]$.
2017-02-22 20:25:13 +01:00
Since the prefix sum array contains all values
2017-04-19 19:44:51 +02:00
of $\textrm{sum}(0,k)$,
2017-02-15 22:22:40 +01:00
we can calculate any value of
2017-02-15 22:41:59 +01:00
$\textrm{sum}(a,b)$ in $O(1)$ time, because
2017-04-17 16:59:27 +02:00
\[ \textrm{sum}(a,b) = \textrm{sum}(0,b) - \textrm{sum}(0,a-1).\]
By defining $\textrm{sum}(0,-1)=0$,
2017-04-19 19:44:51 +02:00
the above formula also holds when $a=0$.
2016-12-28 23:54:51 +01:00
2017-04-17 16:59:27 +02:00
For example, consider the range $[3,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The sum in the range is $8+6+1+4=19$.
2017-02-14 20:01:22 +01:00
This sum can be calculated using
2017-02-22 20:25:13 +01:00
two values in the prefix sum array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1);
\fill[color=lightgray] (6,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-04-17 16:59:27 +02:00
Thus, the sum in the range $[3,6]$ is $27-8=19$.
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
It is also possible to generalize this idea
to higher dimensions.
For example, we can construct a two-dimensional
2017-02-22 20:25:13 +01:00
prefix sum array that can be used for calculating
2017-02-04 00:54:48 +01:00
the sum of any rectangular subarray in $O(1)$ time.
Each value in such an array is the sum of a subarray
that begins at the upper-left corner of the array.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 18:41:30 +01:00
The following picture illustrates the idea:
2016-12-28 23:54:51 +01:00
\begin{center}
2017-01-03 21:51:20 +01:00
\begin{tikzpicture}[scale=0.54]
2016-12-28 23:54:51 +01:00
\draw[fill=lightgray] (3,2) rectangle (7,5);
\draw (0,0) grid (10,7);
%\draw[line width=2pt] (3,2) rectangle (7,5);
\node[anchor=center] at (6.5, 2.5) {$A$};
\node[anchor=center] at (2.5, 2.5) {$B$};
\node[anchor=center] at (6.5, 5.5) {$C$};
\node[anchor=center] at (2.5, 5.5) {$D$};
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-04 00:54:48 +01:00
The sum of the gray subarray can be calculated
2017-01-03 18:41:30 +01:00
using the formula
2017-02-04 00:54:48 +01:00
\[S(A) - S(B) - S(C) + S(D),\]
where $S(X)$ denotes the sum of a rectangular
2017-01-03 18:41:30 +01:00
subarray from the upper-left corner
2017-02-04 00:54:48 +01:00
to the position of $X$.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
\subsubsection{Minimum queries}
2016-12-28 23:54:51 +01:00
2017-02-21 00:17:36 +01:00
Next we will see how we can
process range minimum queries in $O(1)$ time
after an $O(n \log n)$ time preprocessing using \index{sparse table}
2017-04-17 11:18:29 +02:00
a data structure called a \key{sparse table}\footnote{The
sparse table structure was introduced in \cite{ben00}.
There are also more sophisticated techniques \cite{fis06} where
the preprocessing time of the array is only $O(n)$, but such algorithms
are not needed in competitive programming.}.
2017-01-03 18:41:30 +01:00
Note that minimum and maximum queries can always
2017-02-14 20:01:22 +01:00
be processed using similar techniques,
2017-02-04 00:54:48 +01:00
so it suffices to focus on minimum queries.
2016-12-28 23:54:51 +01:00
2017-02-15 22:45:36 +01:00
Let $\textrm{rmq}(a,b)$ (''range minimum query'')
denote the minimum element in the range $[a,b]$.
2017-02-15 22:22:40 +01:00
The idea is to precalculate all values of $\textrm{rmq}(a,b)$
2017-02-14 20:01:22 +01:00
where $b-a+1$, the length of the range, is a power of two.
For example, for the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-14 20:01:22 +01:00
the following values will be calculated:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tabular}{ccc}
\begin{tabular}{ccc}
2017-02-14 20:01:22 +01:00
$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 0 & 1 \\
1 & 1 & 3 \\
2 & 2 & 4 \\
3 & 3 & 8 \\
4 & 4 & 6 \\
5 & 5 & 1 \\
6 & 6 & 4 \\
7 & 7 & 2 \\
2016-12-28 23:54:51 +01:00
\end{tabular}
&
\begin{tabular}{ccc}
2017-02-14 20:01:22 +01:00
$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 1 & 1 \\
1 & 2 & 3 \\
2 & 3 & 4 \\
3 & 4 & 6 \\
4 & 5 & 1 \\
2017-02-14 20:01:22 +01:00
5 & 6 & 1 \\
2017-04-17 16:59:27 +02:00
6 & 7 & 2 \\
2016-12-28 23:54:51 +01:00
\\
\end{tabular}
&
\begin{tabular}{ccc}
2017-02-14 20:01:22 +01:00
$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
2016-12-28 23:54:51 +01:00
\hline
2017-04-17 16:59:27 +02:00
0 & 3 & 1 \\
1 & 4 & 3 \\
2 & 5 & 1 \\
2017-02-14 20:01:22 +01:00
3 & 6 & 1 \\
4 & 7 & 1 \\
2017-04-17 16:59:27 +02:00
0 & 7 & 1 \\
2016-12-28 23:54:51 +01:00
\\
\\
\end{tabular}
\end{tabular}
\end{center}
2017-02-14 20:01:22 +01:00
The number of precalculated values is $O(n \log n)$,
because there are $O(\log n)$ range lengths
that are powers of two.
In addition, the values can be calculated efficiently
using the recursive formula
\[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+w-1),\textrm{rmq}(a+w,b)),\]
where $b-a+1$ is a power of two and $w=(b-a+1)/2$.
Calculating all those values takes $O(n \log n)$ time.
After this, any value of $\textrm{rmq}(a,b)$ can be calculated
in $O(1)$ time as a minimum of two precalculated values.
Let $k$ be the largest power of two that does not exceed $b-a+1$.
We can calculate the value of $\textrm{rmq}(a,b)$ using the formula
\[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+k-1),\textrm{rmq}(b-k+1,b)).\]
In the above formula, the range $[a,b]$ is represented
as the union of the ranges $[a,a+k-1]$ and $[b-k+1,b]$, both of length $k$.
2017-01-03 18:41:30 +01:00
2017-04-17 16:59:27 +02:00
As an example, consider the range $[1,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The length of the range is 6,
2017-02-14 20:01:22 +01:00
and the largest power of two that does
not exceed 6 is 4.
2017-04-17 16:59:27 +02:00
Thus the range $[1,6]$ is
the union of the ranges $[1,4]$ and $[3,6]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-17 16:59:27 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-04-17 16:59:27 +02:00
Since $\textrm{rmq}(1,4)=2$ and $\textrm{rmq}(3,6)=0$,
we can conclude that $\textrm{rmq}(1,6)=1$.
2016-12-28 23:54:51 +01:00
2017-02-20 22:23:10 +01:00
\section{Binary indexed trees}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\index{binary indexed tree}
\index{Fenwick tree}
2016-12-28 23:54:51 +01:00
2017-04-19 19:44:51 +02:00
A \key{binary indexed tree} or a \key{Fenwick tree}\footnote{The
2017-02-25 15:51:29 +01:00
binary indexed tree structure was presented by P. M. Fenwick in 1994 \cite{fen94}.}
2017-02-22 20:25:13 +01:00
can be seen as a dynamic version of a prefix sum array.
2017-02-14 20:01:22 +01:00
This data structure supports two $O(\log n)$ time operations:
calculating the sum of elements in a range
2017-02-04 10:48:16 +01:00
and modifying the value of an element.
2017-02-14 20:01:22 +01:00
The advantage of a binary indexed tree is
2017-04-19 19:44:51 +02:00
that it allows us to efficiently \emph{update}
array elements between sum queries.
2017-02-22 20:25:13 +01:00
This would not be possible using a prefix sum array,
2017-04-19 19:44:51 +02:00
because after each update, it would be necessary to build the
whole prefix sum array again in $O(n)$ time.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-04-17 16:59:27 +02:00
In this section we assume that one-based indexing
2017-04-19 19:44:51 +02:00
is used, because it makes the implementation easier.
A binary indexed tree is as an array
whose value at position $x$
2017-04-17 16:59:27 +02:00
equals the sum of elements in the range $[x-k+1,x]$
of the original array,
2017-02-14 20:01:22 +01:00
where $k$ is the largest power of two that divides $x$.
For example, if $x=6$, then $k=2$, because 2 divides 6
but 4 does not divide 6.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-02-04 10:48:16 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-14 20:01:22 +01:00
\begin{samepage}
2017-02-04 10:48:16 +01:00
The corresponding binary indexed tree is as follows:
2017-02-14 20:01:22 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
\end{samepage}
For example, the value at position 6
in the binary indexed tree is 7,
because the sum of elements in the range $[5,6]$
of the array is $6+1=7$.
The following picture shows more clearly
how each value in the binary indexed tree
corresponds to a range in the array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-20 22:23:10 +01:00
\subsubsection{Sum queries}
2016-12-28 23:54:51 +01:00
2017-04-19 19:44:51 +02:00
The values in a binary indexed tree
2017-02-14 20:01:22 +01:00
can be used to efficiently calculate
2017-04-19 19:44:51 +02:00
the sum of array elements in any range $[1,k]$,
2017-02-15 22:41:59 +01:00
because such a range
2017-02-14 20:01:22 +01:00
can be divided into $O(\log n)$ ranges
whose sums are available in the binary indexed tree.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
For example, the range $[1,7]$ corresponds to
the following values:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw[fill=lightgray] (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw[fill=lightgray] (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-04 10:48:16 +01:00
Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$.
2017-02-14 20:01:22 +01:00
2017-02-15 22:41:59 +01:00
To calculate the sum of elements in any range $[a,b]$,
2017-02-22 20:25:13 +01:00
we can use the same trick that we used with prefix sum arrays:
2017-02-15 22:41:59 +01:00
\[ \textrm{sum}(a,b) = \textrm{sum}(1,b) - \textrm{sum}(1,a-1).\]
2017-02-14 20:01:22 +01:00
Also in this case, only $O(\log n)$ values are needed.
2017-01-03 19:43:51 +01:00
2017-02-20 22:23:10 +01:00
\subsubsection{Array updates}
2017-01-03 19:43:51 +01:00
2017-02-15 22:22:40 +01:00
When a value in the array changes,
2017-02-14 20:01:22 +01:00
several values in the binary indexed tree should be updated.
2017-02-04 10:48:16 +01:00
For example, if the element at position 3 changes,
2017-01-03 19:43:51 +01:00
the sums of the following ranges change:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw[fill=lightgray] (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw[fill=lightgray] (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-14 20:01:22 +01:00
Since each array element belongs to $O(\log n)$
ranges in the binary indexed tree,
it suffices to update $O(\log n)$ values.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
The operations of a binary indexed tree can be implemented
2017-02-04 10:48:16 +01:00
in an elegant and efficient way using bit operations.
The key fact needed is that $k \& -k$
2017-02-14 20:01:22 +01:00
isolates the last one bit of a number $k$.
2017-02-15 22:22:40 +01:00
For example, $26 \& -26=2$ because the number $26$
corresponds to 11010 and the number $2$ corresponds to 10.
2016-12-28 23:54:51 +01:00
2017-02-14 20:01:22 +01:00
It turns out that when processing a sum query,
the position $k$ in the binary indexed tree needs to be
2017-02-04 10:48:16 +01:00
decreased by $k \& -k$ at every step,
and when updating the array,
2017-02-14 20:01:22 +01:00
the position $k$ needs to be increased by $k \& -k$ at every step.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
Suppose that the binary indexed tree is stored in an array \texttt{b}.
2017-02-04 13:04:02 +01:00
The following function calculates
the sum of elements in a range $[1,k]$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int k) {
2016-12-28 23:54:51 +01:00
int s = 0;
while (k >= 1) {
s += b[k];
k -= k&-k;
}
return s;
}
\end{lstlisting}
2017-02-04 13:04:02 +01:00
The following function increases the value
2017-02-04 10:48:16 +01:00
of the element at position $k$ by $x$
($x$ can be positive or negative):
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 19:43:51 +01:00
void add(int k, int x) {
2016-12-28 23:54:51 +01:00
while (k <= n) {
b[k] += x;
k += k&-k;
}
}
\end{lstlisting}
2017-02-04 10:48:16 +01:00
The time complexity of both the functions is
$O(\log n)$, because the functions access $O(\log n)$
2017-02-14 20:01:22 +01:00
values in the binary indexed tree, and each move
2017-02-04 10:48:16 +01:00
to the next position
takes $O(1)$ time using bit operations.
2016-12-28 23:54:51 +01:00
2017-02-20 22:23:10 +01:00
\section{Segment trees}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\index{segment tree}
2016-12-28 23:54:51 +01:00
2017-04-17 19:22:56 +02:00
A \key{segment tree}\footnote{Quite similar structures were used
2017-04-19 19:44:51 +02:00
in late 1970's to solve geometric problems \cite{ben80}.
2017-02-25 15:51:29 +01:00
The bottom-up-implementation in this chapter corresponds to
2017-04-17 19:22:56 +02:00
that in \cite{sta06}.} is a data structure
2017-02-04 13:04:02 +01:00
that supports two operations:
processing a range query and
modifying an element in the array.
Segment trees can support
sum queries, minimum and maximum queries and many other
2017-01-03 21:51:20 +01:00
queries so that both operations work in $O(\log n)$ time.
Compared to a binary indexed tree,
the advantage of a segment tree is that it is
a more general data structure.
While binary indexed trees only support
2017-04-21 22:27:15 +02:00
sum queries\footnote{In fact, using \emph{two} binary
indexed trees it is possible to support minimum queries \cite{dim15},
but this is more complicated than to use a segment tree.},
segment trees also support other queries.
2017-01-03 21:51:20 +01:00
On the other hand, a segment tree requires more
memory and is a bit more difficult to implement.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-02-27 20:57:28 +01:00
A segment tree is a binary tree
such that the nodes on the bottom level of the tree
2017-02-14 21:39:45 +01:00
correspond to the array elements,
2017-02-04 13:04:02 +01:00
and the other nodes
contain information needed for processing range queries.
2016-12-28 23:54:51 +01:00
2017-02-14 21:39:45 +01:00
Throughout the section, we assume that the size
of the array is a power of two and zero-based
indexing is used, because it is convenient to build
a segment tree for such an array.
If the size of the array is not a power of two,
we can always append extra elements to it.
We will first discuss segment trees that support sum queries.
As an example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$5$};
\node at (1.5,0.5) {$8$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$3$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$6$};
2017-02-14 21:39:45 +01:00
\footnotesize
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
The corresponding segment tree is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Each internal node in the segment tree contains
information about a range of size $2^k$
in the original array.
In the above tree, the value of each internal
node is the sum of the corresponding array elements,
and it can be calculated as the sum of
the values of its left and right child node.
2017-02-20 22:23:10 +01:00
\subsubsection{Range queries}
2017-01-03 21:51:20 +01:00
2017-02-04 13:04:02 +01:00
The sum of elements in a given range
can be calculated as a sum of values in the segment tree.
2017-01-03 21:51:20 +01:00
For example, consider the following range:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (2,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
2017-02-04 13:04:02 +01:00
%
% \footnotesize
% \node at (0.5,1.4) {$1$};
% \node at (1.5,1.4) {$2$};
% \node at (2.5,1.4) {$3$};
% \node at (3.5,1.4) {$4$};
% \node at (4.5,1.4) {$5$};
% \node at (5.5,1.4) {$6$};
% \node at (6.5,1.4) {$7$};
% \node at (7.5,1.4) {$8$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
The sum of elements in the range is
2017-01-03 21:51:20 +01:00
$6+3+2+7+2+6=26$.
2017-02-04 13:04:02 +01:00
The following two nodes in the tree
2017-02-14 21:39:45 +01:00
correspond to the range:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,fill=gray!50,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
Thus, the sum of elements in the range is $9+17=26$.
2016-12-28 23:54:51 +01:00
2017-02-04 13:04:02 +01:00
When the sum is calculated using nodes
that are located as high as possible in the tree,
2017-01-03 21:51:20 +01:00
at most two nodes on each level
2017-02-04 13:04:02 +01:00
of the tree are needed.
Hence, the total number of nodes
2017-02-14 21:39:45 +01:00
is only $O(\log n)$.
2016-12-28 23:54:51 +01:00
2017-02-20 22:23:10 +01:00
\subsubsection{Array updates}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
When an element in the array changes,
2017-02-04 13:04:02 +01:00
we should update all nodes in the tree
whose value depends on the element.
This can be done by traversing the path
from the element to the top node
and updating the nodes along the path.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 21:51:20 +01:00
The following picture shows which nodes in the segment tree
change if the element 7 in the array changes.
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (5,0) rectangle (6,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt,fill=gray!50] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,fill=gray!50] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-04 13:04:02 +01:00
The path from bottom to top
2017-01-03 21:51:20 +01:00
always consists of $O(\log n)$ nodes,
2017-02-04 13:04:02 +01:00
so each update changes $O(\log n)$ nodes in the tree.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Storing the tree}
2016-12-28 23:54:51 +01:00
2017-02-04 13:04:02 +01:00
A segment tree can be stored in an array
2017-01-03 21:51:20 +01:00
of $2N$ elements where $N$ is a power of two.
2017-02-14 21:39:45 +01:00
Such a tree corresponds to an array
indexed from $0$ to $N-1$.
2016-12-28 23:54:51 +01:00
2017-02-14 21:39:45 +01:00
In the segment tree array,
the element at position 1
2017-02-04 13:04:02 +01:00
corresponds to the top node of the tree,
the elements at positions 2 and 3 correspond to
2017-01-03 21:51:20 +01:00
the second level of the tree, and so on.
2017-02-04 13:04:02 +01:00
Finally, the elements at positions $N \ldots 2N-1$
correspond to the bottom level of the tree, i.e.,
the elements of the original array.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
For example, the segment tree
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
can be stored as follows ($N=8$):
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (15,1);
\node at (0.5,0.5) {$39$};
\node at (1.5,0.5) {$22$};
\node at (2.5,0.5) {$17$};
\node at (3.5,0.5) {$13$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$8$};
\node at (7.5,0.5) {$5$};
\node at (8.5,0.5) {$8$};
\node at (9.5,0.5) {$6$};
\node at (10.5,0.5) {$3$};
\node at (11.5,0.5) {$2$};
\node at (12.5,0.5) {$7$};
\node at (13.5,0.5) {$2$};
\node at (14.5,0.5) {$6$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\node at (8.5,1.4) {$9$};
\node at (9.5,1.4) {$10$};
\node at (10.5,1.4) {$11$};
\node at (11.5,1.4) {$12$};
\node at (12.5,1.4) {$13$};
\node at (13.5,1.4) {$14$};
\node at (14.5,1.4) {$15$};
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Using this representation,
2017-02-04 13:04:02 +01:00
for a node at position $k$,
2016-12-28 23:54:51 +01:00
\begin{itemize}
2017-02-04 13:04:02 +01:00
\item the parent node is at position $\lfloor k/2 \rfloor$,
\item the left child node is at position $2k$, and
\item the right child node is at position $2k+1$.
2016-12-28 23:54:51 +01:00
\end{itemize}
2017-02-14 21:39:45 +01:00
% Note that this implies that the index of a node
% is even if it is a left child and odd if it is a right child.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Functions}
2016-12-28 23:54:51 +01:00
2017-02-14 21:39:45 +01:00
Assume that the segment tree is stored
2017-02-04 13:04:02 +01:00
in an array \texttt{p}.
The following function
calculates the sum of elements in a range $[a,b]$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-02-15 22:22:40 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
a += N; b += N;
int s = 0;
while (a <= b) {
if (a%2 == 1) s += p[a++];
if (b%2 == 0) s += p[b--];
a /= 2; b /= 2;
}
return s;
}
\end{lstlisting}
2017-02-14 21:39:45 +01:00
The function starts at the bottom of the tree
and moves one level up at each step.
Initially, the range $[a+N,b+N]$ corresponds
to the range $[a,b]$ in the original array.
2017-02-04 13:04:02 +01:00
At each step, the function adds the value of
the left and right node to the sum
if their parent nodes do not belong to the range.
2017-02-14 21:39:45 +01:00
This process continues, until the sum of the
range has been calculated.
2016-12-28 23:54:51 +01:00
2017-02-04 13:04:02 +01:00
The following function increases the value
of the element at position $k$ by $x$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 21:51:20 +01:00
void add(int k, int x) {
2016-12-28 23:54:51 +01:00
k += N;
p[k] += x;
for (k /= 2; k >= 1; k /= 2) {
p[k] = p[2*k]+p[2*k+1];
}
}
\end{lstlisting}
2017-02-04 13:04:02 +01:00
First the function updates the element
at the bottom level of the tree.
2017-01-03 21:51:20 +01:00
After this, the function updates the values of all
internal nodes in the tree, until it reaches
2017-02-04 13:04:02 +01:00
the top node of the tree.
2017-01-03 21:51:20 +01:00
2017-02-04 13:04:02 +01:00
Both above functions work
in $O(\log n)$ time, because a segment tree
2017-01-03 21:51:20 +01:00
of $n$ elements consists of $O(\log n)$ levels,
2017-02-04 13:04:02 +01:00
and the operations move one level forward in the tree at each step.
2017-01-03 21:51:20 +01:00
\subsubsection{Other queries}
2017-02-15 22:22:40 +01:00
Segment trees can support any queries
as long as we can divide a range into two parts,
2017-04-19 19:44:51 +02:00
calculate the answer separately for both parts
2017-02-15 22:22:40 +01:00
and then efficiently combine the answers.
2017-02-04 13:04:02 +01:00
Examples of such queries are
2017-01-03 21:51:20 +01:00
minimum and maximum, greatest common divisor,
2017-02-04 13:04:02 +01:00
and bit operations and, or and xor.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 21:51:20 +01:00
For example, the following segment tree
supports minimum queries:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {1};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-01-03 21:51:20 +01:00
In this segment tree, every node in the tree
contains the smallest element in the corresponding
2017-02-14 21:39:45 +01:00
range of the array.
2017-01-03 21:51:20 +01:00
The top node of the tree contains the smallest
2017-04-19 19:44:51 +02:00
element of the whole array.
2017-02-04 13:04:02 +01:00
The operations can be implemented like previously,
2017-01-03 21:51:20 +01:00
but instead of sums, minima are calculated.
2017-02-20 22:23:10 +01:00
\subsubsection{Binary search in a tree}
2017-01-03 21:51:20 +01:00
2017-02-04 13:04:02 +01:00
The structure of the segment tree allows us
to use binary search for finding elements in the array.
2017-02-14 21:39:45 +01:00
For example, if the tree supports minimum queries,
2017-02-04 13:04:02 +01:00
we can find the position of the smallest
2017-01-03 21:51:20 +01:00
element in $O(\log n)$ time.
For example, in the following tree the
2017-02-04 13:04:02 +01:00
smallest element 1 can be found
by traversing a path downwards from the top node:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (8,0) grid (16,1);
\node[anchor=center] at (8.5, 0.5) {9};
\node[anchor=center] at (9.5, 0.5) {5};
\node[anchor=center] at (10.5, 0.5) {7};
\node[anchor=center] at (11.5, 0.5) {1};
\node[anchor=center] at (12.5, 0.5) {6};
\node[anchor=center] at (13.5, 0.5) {2};
\node[anchor=center] at (14.5, 0.5) {3};
\node[anchor=center] at (15.5, 0.5) {2};
%\node[anchor=center] at (1,2.5) {13};
\node[draw, circle,minimum size=22pt] (e) at (9,2.5) {5};
\path[draw,thick,-] (e) -- (8.5,1);
\path[draw,thick,-] (e) -- (9.5,1);
\node[draw, circle,minimum size=22pt] (f) at (11,2.5) {1};
\path[draw,thick,-] (f) -- (10.5,1);
\path[draw,thick,-] (f) -- (11.5,1);
\node[draw, circle,minimum size=22pt] (g) at (13,2.5) {2};
\path[draw,thick,-] (g) -- (12.5,1);
\path[draw,thick,-] (g) -- (13.5,1);
\node[draw, circle,minimum size=22pt] (h) at (15,2.5) {2};
\path[draw,thick,-] (h) -- (14.5,1);
\path[draw,thick,-] (h) -- (15.5,1);
\node[draw, circle,minimum size=22pt] (k) at (10,4.5) {1};
\path[draw,thick,-] (k) -- (e);
\path[draw,thick,-] (k) -- (f);
\node[draw, circle,minimum size=22pt] (l) at (14,4.5) {2};
\path[draw,thick,-] (l) -- (g);
\path[draw,thick,-] (l) -- (h);
\node[draw, circle,minimum size=22pt] (n) at (12,6.5) {1};
\path[draw,thick,-] (n) -- (k);
\path[draw,thick,-] (n) -- (l);
\path[draw=red,thick,->,line width=2pt] (n) -- (k);
\path[draw=red,thick,->,line width=2pt] (k) -- (f);
\path[draw=red,thick,->,line width=2pt] (f) -- (11.5,1);
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
\section{Additional techniques}
\subsubsection{Index compression}
2017-02-04 13:04:02 +01:00
A limitation in data structures that
are built upon an array is that
2017-02-15 22:22:40 +01:00
the elements are indexed using
2017-02-14 21:39:45 +01:00
consecutive integers.
2017-02-04 13:04:02 +01:00
Difficulties arise when large indices
are needed.
For example, if we wish to use the index $10^9$,
the array should contain $10^9$
2017-02-15 22:22:40 +01:00
elements which would require too much memory.
2017-01-03 22:11:02 +01:00
\index{index compression}
However, we can often bypass this limitation
2017-02-04 13:04:02 +01:00
by using \key{index compression},
where the original indices are replaced
2017-02-14 21:39:45 +01:00
with indices $1,2,3,$ etc.
2017-01-03 22:11:02 +01:00
This can be done if we know all the indices
needed during the algorithm beforehand.
The idea is to replace each original index $x$
2017-02-04 13:04:02 +01:00
with $p(x)$ where $p$ is a function that
compresses the indices.
2017-01-03 22:11:02 +01:00
We require that the order of the indices
2017-02-04 13:04:02 +01:00
does not change, so if $a<b$, then $p(a)<p(b)$.
This allows us to conviently perform queries
2017-02-14 21:39:45 +01:00
even if the indices are compressed.
2017-01-03 22:11:02 +01:00
For example, if the original indices are
$555$, $10^9$ and $8$, the new indices are:
2016-12-28 23:54:51 +01:00
\[
\begin{array}{lcl}
p(8) & = & 1 \\
p(555) & = & 2 \\
p(10^9) & = & 3 \\
\end{array}
\]
2017-02-14 21:39:45 +01:00
\subsubsection{Range updates}
2017-01-03 22:11:02 +01:00
So far, we have implemented data structures
2017-02-14 21:39:45 +01:00
that support range queries and updates
2017-01-03 22:11:02 +01:00
of single values.
2017-02-14 21:39:45 +01:00
Let us now consider an opposite situation,
2017-01-03 22:11:02 +01:00
where we should update ranges and
retrieve single values.
We focus on an operation that increases all
2017-02-04 13:04:02 +01:00
elements in a range $[a,b]$ by $x$.
2017-01-03 22:11:02 +01:00
2017-02-22 20:37:10 +01:00
\index{difference array}
2017-01-03 22:11:02 +01:00
Surprisingly, we can use the data structures
presented in this chapter also in this situation.
2017-02-22 20:25:13 +01:00
To do this, we build a \key{difference array}
2017-02-14 21:39:45 +01:00
for the array.
2017-02-22 20:25:13 +01:00
In such an array, each value indicates
the difference between two consecutive values
in the original array.
Thus, the original array is the
prefix sum array of the
difference array.
2017-02-14 21:39:45 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$1$};
\node at (3.5,0.5) {$1$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$2$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-14 21:39:45 +01:00
2017-02-22 20:25:13 +01:00
The difference array for the above array is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$0$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-04-19 19:44:51 +02:00
For example, the value 2 at position 6 in the original array
corresponds to the sum $3-2+4-3=2$.
2017-01-03 22:11:02 +01:00
2017-02-22 20:25:13 +01:00
The advantage of the difference array is
2017-02-14 21:39:45 +01:00
that we can update a range
in the original array by changing just
2017-02-22 20:25:13 +01:00
two elements in the difference array.
2017-01-03 22:11:02 +01:00
For example, if we want to
2017-04-19 19:44:51 +02:00
increase the elements in the range $1 \ldots 4$ by 5,
it suffices to increase the value at position 1 by 5
and decrease the value at position 5 by 5.
2017-01-03 22:11:02 +01:00
The result is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$5$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$-1$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
2017-04-19 19:44:51 +02:00
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
2016-12-28 23:54:51 +01:00
\end{tikzpicture}
\end{center}
2017-02-04 13:04:02 +01:00
More generally, to increase the elements
2017-02-04 13:06:34 +01:00
in a range $[a,b]$ by $x$,
2017-02-04 13:04:02 +01:00
we increase the value at position $a$ by $x$
and decrease the value at position $b+1$ by $x$.
Thus, it is only needed to update single values
and process sum queries,
2017-01-03 22:11:02 +01:00
so we can use a binary indexed tree or a segment tree.
A more difficult problem is to support both
range queries and range updates.
2017-02-04 13:04:02 +01:00
In Chapter 28 we will see that even this is possible.
2016-12-28 23:54:51 +01:00