cphb/luku09.tex

1418 lines
39 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Range queries}
2017-01-03 18:41:30 +01:00
\index{range query}
\index{sum query}
\index{minimum query}
\index{maximum query}
2017-02-04 00:54:48 +01:00
A \key{range query} asks to calculate some information
about the elements in a given range of an array.
Typical range queries are:
2016-12-28 23:54:51 +01:00
\begin{itemize}
2017-02-04 00:54:48 +01:00
\item \key{sum query}: calculate the sum of elements in a range
\item \key{minimum query}: find the smallest element in a range
\item \key{maximum query}: find the largest element in a range
2016-12-28 23:54:51 +01:00
\end{itemize}
2017-02-04 00:54:48 +01:00
For example, consider the range $[4,7]$ in the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$4$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$3$};
\node at (7.5,0.5) {$4$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
In this range, the sum of elements is $4+6+1+3=16$,
the minimum element is 1 and the maximum element is 6.
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
An easy way to process range queries is
to go through all the elements in the range.
For example, we can calculate the sum
in a range $[a,b]$ as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 18:41:30 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
int s = 0;
for (int i = a; i <= b; i++) {
s += t[i];
}
return s;
}
\end{lstlisting}
2017-02-04 00:54:48 +01:00
The above function works in $O(n)$ time.
However, if the array is large and there are several queries,
such an approach is slow.
In this chapter, we will learn how
range queries can be processed much more efficiently.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\section{Static array queries}
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
We first focus on a simple situation where
2017-01-03 18:41:30 +01:00
the array is \key{static}, i.e.,
the elements never change between the queries.
2017-02-04 00:54:48 +01:00
In this case, it suffices to preprocess the
array and construct
a data structure that can be used for
finding the answer for
2017-01-03 18:41:30 +01:00
any possible range query efficiently.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\subsubsection{Sum query}
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\index{prefix sum array}
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
Sum queries can be processed efficiently
2017-01-03 19:43:51 +01:00
by constructing a \key{sum array}
2017-02-04 00:54:48 +01:00
that contains the sum of elements in the range $[1,k]$
2017-01-03 18:41:30 +01:00
for each $k=1,2,\ldots,n$.
2017-02-04 00:54:48 +01:00
Using the sum array, the sum of elements in
any range $[a,b]$ of the original array can
be calculated in $O(1)$ time.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
For example, for the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 19:43:51 +01:00
the corresponding sum array is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The following code constructs a sum array
\texttt{s} for an array \texttt{t} in $O(n)$ time:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
s[i] = s[i-1]+t[i];
}
\end{lstlisting}
2017-02-04 00:54:48 +01:00
After this, the following function processes
any sum query in $O(1)$ time:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 18:41:30 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
return s[b]-s[a-1];
}
\end{lstlisting}
2017-02-04 00:54:48 +01:00
The function calculates the sum in the range $[a,b]$
by subtracting the sum in the range $[1,a-1]$
from the sum in the range $[1,b]$.
Thus, only two values of the sum array
2017-01-03 18:41:30 +01:00
are needed, and the query takes $O(1)$ time.
2017-02-04 00:54:48 +01:00
Note that because of the one-based indexing,
2017-01-03 18:41:30 +01:00
the function also works when $a=1$ if $\texttt{s}[0]=0$.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
As an example, consider the range $[4,7]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The sum in the range is $8+6+1+4=19$.
This can be calculated using the precalculated
sums for the ranges $[1,3]$ and $[1,7]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1);
\fill[color=lightgray] (6,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$22$};
\node at (5.5,0.5) {$23$};
\node at (6.5,0.5) {$27$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
Thus, the sum in the range $[4,7]$ is $27-8=19$.
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
It is also possible to generalize this idea
to higher dimensions.
For example, we can construct a two-dimensional
sum array that can be used for calculating
the sum of any rectangular subarray in $O(1)$ time.
Each value in such an array is the sum of a subarray
that begins at the upper-left corner of the array.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 18:41:30 +01:00
The following picture illustrates the idea:
2016-12-28 23:54:51 +01:00
\begin{center}
2017-01-03 21:51:20 +01:00
\begin{tikzpicture}[scale=0.54]
2016-12-28 23:54:51 +01:00
\draw[fill=lightgray] (3,2) rectangle (7,5);
\draw (0,0) grid (10,7);
%\draw[line width=2pt] (3,2) rectangle (7,5);
\node[anchor=center] at (6.5, 2.5) {$A$};
\node[anchor=center] at (2.5, 2.5) {$B$};
\node[anchor=center] at (6.5, 5.5) {$C$};
\node[anchor=center] at (2.5, 5.5) {$D$};
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-04 00:54:48 +01:00
The sum of the gray subarray can be calculated
2017-01-03 18:41:30 +01:00
using the formula
2017-02-04 00:54:48 +01:00
\[S(A) - S(B) - S(C) + S(D),\]
where $S(X)$ denotes the sum of a rectangular
2017-01-03 18:41:30 +01:00
subarray from the upper-left corner
2017-02-04 00:54:48 +01:00
to the position of $X$.
2016-12-28 23:54:51 +01:00
2017-01-03 18:41:30 +01:00
\subsubsection{Minimum query}
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
It is also possible to process minimum queries
2017-01-03 19:43:51 +01:00
in $O(1)$ time after preprocessing, though it is
2017-02-04 00:54:48 +01:00
more difficult than processing sum queries.
2017-01-03 18:41:30 +01:00
Note that minimum and maximum queries can always
be implemented using same techniques,
2017-02-04 00:54:48 +01:00
so it suffices to focus on minimum queries.
2016-12-28 23:54:51 +01:00
2017-02-04 00:54:48 +01:00
The idea is to precalculate the minimum element of each range
2017-01-03 18:41:30 +01:00
of size $2^k$ in the array.
For example, in the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 18:41:30 +01:00
the following minima will be calculated:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tabular}{ccc}
\begin{tabular}{ccc}
2017-01-03 18:41:30 +01:00
range & size & min \\
2016-12-28 23:54:51 +01:00
\hline
$[1,1]$ & 1 & 1 \\
$[2,2]$ & 1 & 3 \\
$[3,3]$ & 1 & 4 \\
$[4,4]$ & 1 & 8 \\
$[5,5]$ & 1 & 6 \\
$[6,6]$ & 1 & 1 \\
$[7,7]$ & 1 & 4 \\
$[8,8]$ & 1 & 2 \\
\end{tabular}
&
\begin{tabular}{ccc}
2017-01-03 18:41:30 +01:00
range & size & min \\
2016-12-28 23:54:51 +01:00
\hline
$[1,2]$ & 2 & 1 \\
$[2,3]$ & 2 & 3 \\
$[3,4]$ & 2 & 4 \\
$[4,5]$ & 2 & 6 \\
$[5,6]$ & 2 & 1 \\
$[6,7]$ & 2 & 1 \\
$[7,8]$ & 2 & 2 \\
\\
\end{tabular}
&
\begin{tabular}{ccc}
2017-01-03 18:41:30 +01:00
range & size & min \\
2016-12-28 23:54:51 +01:00
\hline
$[1,4]$ & 4 & 1 \\
$[2,5]$ & 4 & 3 \\
$[3,6]$ & 4 & 1 \\
$[4,7]$ & 4 & 1 \\
$[5,8]$ & 4 & 1 \\
$[1,8]$ & 8 & 1 \\
\\
\\
\end{tabular}
\end{tabular}
\end{center}
2017-02-04 00:54:48 +01:00
There are $O(n \log n)$ ranges of size $2^k$,
because for each array position,
there are $O(\log n)$ ranges that begin at that position.
The minima in all ranges of size $2^k$ can be calculated
in $O(n \log n)$ time, because each range of size $2^k$
consists of two ranges of size $2^{k-1}$ and the minima
2017-01-03 18:41:30 +01:00
can be calculated recursively.
2017-02-04 00:54:48 +01:00
After this, the minimum in any range $[a,b]$
2017-01-03 18:41:30 +01:00
can be calculated in $O(1)$ time as a minimum of
2017-02-04 00:54:48 +01:00
two ranges of size $2^k$ where $k=\lfloor \log_2(b-a+1) \rfloor$.
The first range begins at index $a$,
and the second range ends at index $b$.
The parameter $k$ is chosen so that
the two ranges of size $2^k$
fully cover the range $[a,b]$.
2017-01-03 18:41:30 +01:00
As an example, consider the range $[2,7]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
The length of the range is 6,
2017-01-03 18:41:30 +01:00
and $\lfloor \log_2(6) \rfloor = 2$.
Thus, the minimum can be calculated
from two ranges of length 4.
The ranges are $[2,5]$ and $[4,7]$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-02-04 00:54:48 +01:00
Since the minimum in the range $[2,5]$ is 3
and the minimum in the range $[4,7]$ is 1,
we know that the minimum in the range $[2,7]$ is 1.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\section{Binary indexed tree}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\index{binary indexed tree}
\index{Fenwick tree}
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
A \key{binary indexed tree} or \key{Fenwick tree}
can be seen as a dynamic version of a sum array.
The tree supports two $O(\log n)$ time operations:
calculating the sum of elements in a range,
and modifying the value of an element.
The benefit in using a binary indexed tree is
that the elements of the underlying array
can be efficiently updated between the queries.
This would not be possible with a sum array,
because after each update, we should build the
whole sum array again in $O(n)$ time.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
Given an array of $n$ elements, indexed $1 \ldots n$,
the binary indexed tree for that array
is an array such that the value at position $k$
equals the sum of elements in the original array in a range
that ends at position $k$.
2017-01-03 19:43:51 +01:00
The length of the range is the largest power of two
that divides $k$.
2017-02-04 10:48:16 +01:00
For example, if $k=6$, the length of the range is $2$,
because $2$ divides $6$ but $4$ does not divide $6$.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-02-04 10:48:16 +01:00
For example, consider the following array:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-04 10:48:16 +01:00
The corresponding binary indexed tree is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-04 10:48:16 +01:00
For example, the value at position 6
in the binary indexed tree is 7,
because the sum of elements in the range $[5,6]$
in the original array is $6+1=7$.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Sum query}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
The basic operation in a binary indexed tree is
2017-02-04 10:48:16 +01:00
to calculate the sum of elements in a range $[1,k]$,
where $k$ is any position in the array.
The sum of such a range can be calculated as a
sum of one or more values stored in the tree.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
For example, the range $[1,7]$ corresponds to
the following values:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw[fill=lightgray] (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw[fill=lightgray] (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-04 10:48:16 +01:00
Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$.
The structure of the binary indexed tree allows us to calculate
the sum of elements in any range using only $O(\log n)$
values from the tree.
2017-01-03 19:43:51 +01:00
Using the same technique that we previously used
with a sum array,
we can efficiently calculate the sum of any range
$[a,b]$ by substracting the sum of the range $[1,a-1]$
from the sum of the range $[1,b]$.
2017-02-04 10:48:16 +01:00
Also here, only $O(\log n)$ values are needed,
2017-01-03 19:43:51 +01:00
because it suffices to calculate two sums of $[1,k]$ ranges.
\subsubsection{Array update}
When an element in the original array changes,
several sums in the binary indexed tree change.
2017-02-04 10:48:16 +01:00
For example, if the element at position 3 changes,
2017-01-03 19:43:51 +01:00
the sums of the following ranges change:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\draw[->,thick] (0.5,-0.9) -- (0.5,-0.1);
\draw[->,thick] (2.5,-0.9) -- (2.5,-0.1);
\draw[->,thick] (4.5,-0.9) -- (4.5,-0.1);
\draw[->,thick] (6.5,-0.9) -- (6.5,-0.1);
\draw[->,thick] (1.5,-1.9) -- (1.5,-0.1);
\draw[->,thick] (5.5,-1.9) -- (5.5,-0.1);
\draw[->,thick] (3.5,-2.9) -- (3.5,-0.1);
\draw[->,thick] (7.5,-3.9) -- (7.5,-0.1);
\draw (0,-1) -- (1,-1) -- (1,-1.5) -- (0,-1.5) -- (0,-1);
\draw[fill=lightgray] (2,-1) -- (3,-1) -- (3,-1.5) -- (2,-1.5) -- (2,-1);
\draw (4,-1) -- (5,-1) -- (5,-1.5) -- (4,-1.5) -- (4,-1);
\draw (6,-1) -- (7,-1) -- (7,-1.5) -- (6,-1.5) -- (6,-1);
\draw (0,-2) -- (2,-2) -- (2,-2.5) -- (0,-2.5) -- (0,-2);
\draw (4,-2) -- (6,-2) -- (6,-2.5) -- (4,-2.5) -- (4,-2);
\draw[fill=lightgray] (0,-3) -- (4,-3) -- (4,-3.5) -- (0,-3.5) -- (0,-3);
\draw[fill=lightgray] (0,-4) -- (8,-4) -- (8,-4.5) -- (0,-4.5) -- (0,-4);
\end{tikzpicture}
\end{center}
2017-02-04 10:48:16 +01:00
However, it turns out that
the number of values that need to be updated
in the binary indexed tree is only $O(\log n)$.
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-01-03 19:43:51 +01:00
The operations of a binary indexed tree can be implemented
2017-02-04 10:48:16 +01:00
in an elegant and efficient way using bit operations.
The key fact needed is that $k \& -k$
isolates the last one bit in a number $k$.
2017-01-03 19:43:51 +01:00
For example, $6 \& -6=2$ because the number $6$
corresponds to 110 and the number $2$ corresponds to 10.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
It turns out that when processing a range query,
the position $k$ in the binary indexed tree should be
decreased by $k \& -k$ at every step,
and when updating the array,
the position $k$ should be increased by $k \& -k$ at every step.
2016-12-28 23:54:51 +01:00
2017-02-04 10:48:16 +01:00
Suppose that the binary indexed tree is stored in an array \texttt{b}.
The following function \texttt{sum} calculates
the sum of elements in the range $[1,k]$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 19:43:51 +01:00
int sum(int k) {
2016-12-28 23:54:51 +01:00
int s = 0;
while (k >= 1) {
s += b[k];
k -= k&-k;
}
return s;
}
\end{lstlisting}
2017-02-04 10:48:16 +01:00
The following function \texttt{add} increases the value
of the element at position $k$ by $x$
($x$ can be positive or negative):
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 19:43:51 +01:00
void add(int k, int x) {
2016-12-28 23:54:51 +01:00
while (k <= n) {
b[k] += x;
k += k&-k;
}
}
\end{lstlisting}
2017-02-04 10:48:16 +01:00
The time complexity of both the functions is
$O(\log n)$, because the functions access $O(\log n)$
values in the binary indexed tree, and each transition
to the next position
takes $O(1)$ time using bit operations.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\section{Segment tree}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\index{segment tree}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
A \key{segment tree} is a data structure
whose supported operations are
handling a range query for range $[a,b]$
and updating the element at index $k$.
Using a segment tree, we can implement sum
queries, minimum queries and many other
queries so that both operations work in $O(\log n)$ time.
Compared to a binary indexed tree,
the advantage of a segment tree is that it is
a more general data structure.
While binary indexed trees only support
sum queries, segment trees also support other queries.
On the other hand, a segment tree requires more
memory and is a bit more difficult to implement.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Structure}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
A segment tree contains $2n-1$ nodes
so that the bottom $n$ nodes correspond
to the original array and the other nodes
contain information needed for range queries.
The values in a segment tree depend on
the supported query type.
We will first assume that the supported
query is the sum query.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
For example, the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$5$};
\node at (1.5,0.5) {$8$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$3$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$6$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
corresponds to the following segment tree:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Each internal node in the segment tree contains
information about a range of size $2^k$
in the original array.
In the above tree, the value of each internal
node is the sum of the corresponding array elements,
and it can be calculated as the sum of
the values of its left and right child node.
It is convenient to build a segment tree
when the size of the array is a power of two
and the tree is a complete binary tree.
In the sequel, we will assume that the tree
is built like this.
If the size of the array is not a power of two,
we can always extend it using zero elements.
\subsubsection{Range query}
In a segment tree, the answer for a range query
is calculated from nodes that belong to the range
and are as high as possible in the tree.
Each node gives the answer for a subrange,
and the answer for the entire range can be
calculated by combining these values.
For example, consider the following range:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (2,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
The sum of elements in the range $[3,8]$ is
$6+3+2+7+2+6=26$.
The sum can be calculated from the segment tree
using the following subranges:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,fill=gray!50,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Thus, the sum of the range is $9+17=26$.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
When the answer for a range query is
calculated using as high nodes as possible,
at most two nodes on each level
of the segment tree are needed.
Because of this, the total number of nodes
examined is only $O(\log n)$.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Array update}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
When an element in the array changes,
we should update all nodes in the segment tree
whose value depends on the changed element.
This can be done by travelling from the bottom
to the top in the tree and updating the nodes along the path.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 21:51:20 +01:00
The following picture shows which nodes in the segment tree
change if the element 7 in the array changes.
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=gray!50] (5,0) rectangle (6,1);
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt,fill=gray!50] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,fill=gray!50] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,fill=gray!50] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-01-03 21:51:20 +01:00
The path from the bottom of the segment tree to the top
always consists of $O(\log n)$ nodes,
so updating the array affects $O(\log n)$ nodes in the tree.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Storing the tree}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
A segment tree can be stored as an array
of $2N$ elements where $N$ is a power of two.
From now on, we will assume that the indices
of the original array are between $0$ and $N-1$.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
The element at index 1 in the segment tree array
contains the top node of the tree,
the elements at indices 2 and 3 correspond to
the second level of the tree, and so on.
Finally, the elements beginning from index $N$
contain the bottom level of the tree, i.e.,
the actual content of the original array.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
For example, the segment tree
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {2};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle] (a) at (1,2.5) {13};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {9};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {9};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {8};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle] (i) at (2,4.5) {22};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle] (j) at (6,4.5) {17};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle] (m) at (4,6.5) {39};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
can be stored as follows ($N=8$):
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (15,1);
\node at (0.5,0.5) {$39$};
\node at (1.5,0.5) {$22$};
\node at (2.5,0.5) {$17$};
\node at (3.5,0.5) {$13$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$8$};
\node at (7.5,0.5) {$5$};
\node at (8.5,0.5) {$8$};
\node at (9.5,0.5) {$6$};
\node at (10.5,0.5) {$3$};
\node at (11.5,0.5) {$2$};
\node at (12.5,0.5) {$7$};
\node at (13.5,0.5) {$2$};
\node at (14.5,0.5) {$6$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\node at (8.5,1.4) {$9$};
\node at (9.5,1.4) {$10$};
\node at (10.5,1.4) {$11$};
\node at (11.5,1.4) {$12$};
\node at (12.5,1.4) {$13$};
\node at (13.5,1.4) {$14$};
\node at (14.5,1.4) {$15$};
\end{tikzpicture}
\end{center}
2017-01-03 21:51:20 +01:00
Using this representation,
for a node at index $k$,
2016-12-28 23:54:51 +01:00
\begin{itemize}
2017-01-03 21:51:20 +01:00
\item the parent node is at index $\lfloor k/2 \rfloor$,
\item the left child node is at index $2k$, and
\item the right child node is at index $2k+1$.
2016-12-28 23:54:51 +01:00
\end{itemize}
2017-01-03 21:51:20 +01:00
Note that this implies that the index of a node
is even if it is a left child and odd if it is a right child.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
\subsubsection{Functions}
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
We assume that the segment tree is stored
in the array \texttt{p}.
The following function calculates the sum of range $[a,b]$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 21:51:20 +01:00
int sum(int a, int b) {
2016-12-28 23:54:51 +01:00
a += N; b += N;
int s = 0;
while (a <= b) {
if (a%2 == 1) s += p[a++];
if (b%2 == 0) s += p[b--];
a /= 2; b /= 2;
}
return s;
}
\end{lstlisting}
2017-01-03 21:51:20 +01:00
The function begins from the bottom of the tree
and moves step by step upwards in the tree.
The function calculates the range sum to
the variable $s$ by combining the sums in the tree nodes.
The value of a node is added to the sum if
the parent node doesn't belong to the range.
2016-12-28 23:54:51 +01:00
2017-01-03 21:51:20 +01:00
The function \texttt{add} increases the value
of element $k$ by $x$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-03 21:51:20 +01:00
void add(int k, int x) {
2016-12-28 23:54:51 +01:00
k += N;
p[k] += x;
for (k /= 2; k >= 1; k /= 2) {
p[k] = p[2*k]+p[2*k+1];
}
}
\end{lstlisting}
2017-01-03 21:51:20 +01:00
First the function updates the bottom level
of the tree that corresponds to the original array.
After this, the function updates the values of all
internal nodes in the tree, until it reaches
the root node of the tree.
Both operations in the segment tree work
in $O(\log n)$ time because a segment tree
of $n$ elements consists of $O(\log n)$ levels,
and the operations move one level forward at each step.
\subsubsection{Other queries}
Besides the sum query,
the segment tree can support any range query
where the answer for range $[a,b]$
can be efficiently calculated
from ranges $[a,c]$ and $[c+1,b]$ where
$c$ is some element between $a$ and $b$.
Such queries are, for example,
minimum and maximum, greatest common divisor,
and bit operations.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-01-03 21:51:20 +01:00
For example, the following segment tree
supports minimum queries:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node[anchor=center] at (0.5, 0.5) {5};
\node[anchor=center] at (1.5, 0.5) {8};
\node[anchor=center] at (2.5, 0.5) {6};
\node[anchor=center] at (3.5, 0.5) {3};
\node[anchor=center] at (4.5, 0.5) {1};
\node[anchor=center] at (5.5, 0.5) {7};
\node[anchor=center] at (6.5, 0.5) {2};
\node[anchor=center] at (7.5, 0.5) {6};
\node[draw, circle,minimum size=22pt] (a) at (1,2.5) {5};
\path[draw,thick,-] (a) -- (0.5,1);
\path[draw,thick,-] (a) -- (1.5,1);
\node[draw, circle,minimum size=22pt] (b) at (3,2.5) {3};
\path[draw,thick,-] (b) -- (2.5,1);
\path[draw,thick,-] (b) -- (3.5,1);
\node[draw, circle,minimum size=22pt] (c) at (5,2.5) {1};
\path[draw,thick,-] (c) -- (4.5,1);
\path[draw,thick,-] (c) -- (5.5,1);
\node[draw, circle,minimum size=22pt] (d) at (7,2.5) {2};
\path[draw,thick,-] (d) -- (6.5,1);
\path[draw,thick,-] (d) -- (7.5,1);
\node[draw, circle,minimum size=22pt] (i) at (2,4.5) {3};
\path[draw,thick,-] (i) -- (a);
\path[draw,thick,-] (i) -- (b);
\node[draw, circle,minimum size=22pt] (j) at (6,4.5) {1};
\path[draw,thick,-] (j) -- (c);
\path[draw,thick,-] (j) -- (d);
\node[draw, circle,minimum size=22pt] (m) at (4,6.5) {1};
\path[draw,thick,-] (m) -- (i);
\path[draw,thick,-] (m) -- (j);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-01-03 21:51:20 +01:00
In this segment tree, every node in the tree
contains the smallest element in the corresponding
range of the original array.
The top node of the tree contains the smallest
element in the array.
The tree can be implemented like previously,
but instead of sums, minima are calculated.
\subsubsection{Binary search in tree}
The structure of the segment tree makes it possible
to use binary search.
For example, if the tree supports the minimum query,
we can find the index of the smallest
element in $O(\log n)$ time.
For example, in the following tree the
smallest element is 1 that can be found
by following a path downwards from the top node:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (8,0) grid (16,1);
\node[anchor=center] at (8.5, 0.5) {9};
\node[anchor=center] at (9.5, 0.5) {5};
\node[anchor=center] at (10.5, 0.5) {7};
\node[anchor=center] at (11.5, 0.5) {1};
\node[anchor=center] at (12.5, 0.5) {6};
\node[anchor=center] at (13.5, 0.5) {2};
\node[anchor=center] at (14.5, 0.5) {3};
\node[anchor=center] at (15.5, 0.5) {2};
%\node[anchor=center] at (1,2.5) {13};
\node[draw, circle,minimum size=22pt] (e) at (9,2.5) {5};
\path[draw,thick,-] (e) -- (8.5,1);
\path[draw,thick,-] (e) -- (9.5,1);
\node[draw, circle,minimum size=22pt] (f) at (11,2.5) {1};
\path[draw,thick,-] (f) -- (10.5,1);
\path[draw,thick,-] (f) -- (11.5,1);
\node[draw, circle,minimum size=22pt] (g) at (13,2.5) {2};
\path[draw,thick,-] (g) -- (12.5,1);
\path[draw,thick,-] (g) -- (13.5,1);
\node[draw, circle,minimum size=22pt] (h) at (15,2.5) {2};
\path[draw,thick,-] (h) -- (14.5,1);
\path[draw,thick,-] (h) -- (15.5,1);
\node[draw, circle,minimum size=22pt] (k) at (10,4.5) {1};
\path[draw,thick,-] (k) -- (e);
\path[draw,thick,-] (k) -- (f);
\node[draw, circle,minimum size=22pt] (l) at (14,4.5) {2};
\path[draw,thick,-] (l) -- (g);
\path[draw,thick,-] (l) -- (h);
\node[draw, circle,minimum size=22pt] (n) at (12,6.5) {1};
\path[draw,thick,-] (n) -- (k);
\path[draw,thick,-] (n) -- (l);
\path[draw=red,thick,->,line width=2pt] (n) -- (k);
\path[draw=red,thick,->,line width=2pt] (k) -- (f);
\path[draw=red,thick,->,line width=2pt] (f) -- (11.5,1);
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
\section{Additional techniques}
\subsubsection{Index compression}
A limitation in data structures that have
been built upon an array is that
the elements are indexed using integers
$1,2,3,$ etc.
Difficulties arise when the indices
needed are large.
For example, using the index $10^9$ would
require that the array would contain $10^9$
elements which is not realistic.
\index{index compression}
However, we can often bypass this limitation
by using \key{index compression}
where the indices are redistributed so that
they are integers $1,2,3,$ etc.
This can be done if we know all the indices
needed during the algorithm beforehand.
The idea is to replace each original index $x$
with index $p(x)$ where $p$ is a function that
redistributes the indices.
We require that the order of the indices
doesn't change, so if $a<b$, then $p(a)<p(b)$.
Thanks to this, we can conviently perform queries
despite the fact that the indices are compressed.
For example, if the original indices are
$555$, $10^9$ and $8$, the new indices are:
2016-12-28 23:54:51 +01:00
\[
\begin{array}{lcl}
p(8) & = & 1 \\
p(555) & = & 2 \\
p(10^9) & = & 3 \\
\end{array}
\]
2017-01-03 22:11:02 +01:00
\subsubsection{Range update}
So far, we have implemented data structures
that support range queries and modifications
of single values.
Let us now consider a reverse situation
where we should update ranges and
retrieve single values.
We focus on an operation that increases all
elements in range $[a,b]$ by $x$.
Surprisingly, we can use the data structures
presented in this chapter also in this situation.
This requires that we change the array so that
each element indicates the \emph{change}
with respect to the previous element.
For example, the array
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$1$};
\node at (3.5,0.5) {$1$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$2$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
becomes as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$0$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
The original array is the sum array of the new array.
Thus, any value in the original array corresponds
to a sum of elements in the new array.
For example, the value 6 at index 5 in the original array
corresponds to the sum $3-2+4=5$.
The benefit in using the new array is
that we can update a range by changing just
two elements in the new array.
For example, if we want to
increase the range $2 \ldots 5$ by 5,
it suffices to increase the element at index 2 by 5
and decrease the element at index 6 by 5.
The result is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$5$};
\node at (2.5,0.5) {$-2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$0$};
\node at (5.5,0.5) {$-1$};
\node at (6.5,0.5) {$-3$};
\node at (7.5,0.5) {$0$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
2017-01-03 22:11:02 +01:00
More generally, to increase the range
$a \ldots b$ by $x$,
we increase the element at index $a$ by $x$
and decrease the element at index $b+1$ by $x$.
The required operations are calculating
the sum in a range and updating a value,
so we can use a binary indexed tree or a segment tree.
A more difficult problem is to support both
range queries and range updates.
In Chapter 28 we will see that this is possible
as well.
2016-12-28 23:54:51 +01:00