Corrections

This commit is contained in:
Antti H S Laaksonen 2017-02-14 21:01:22 +02:00
parent 64fc16a2dc
commit 9854d9d6ea
1 changed files with 166 additions and 154 deletions

View File

@ -42,14 +42,14 @@ For example, consider the range $[4,7]$ in the following array:
In this range, the sum of elements is $4+6+1+3=16$, In this range, the sum of elements is $4+6+1+3=16$,
the minimum element is 1 and the maximum element is 6. the minimum element is 1 and the maximum element is 6.
A simple way to process range queries is to
An easy way to process range queries is go through all elements in the range.
to go through all the elements in the range. For example, the following function \texttt{rsq}
For example, we can calculate the sum calculates the sum of elements in any range
in a range $[a,b]$ as follows: $[a,b]$ of an array $t$:
\begin{lstlisting} \begin{lstlisting}
int sum(int a, int b) { int rsq(int a, int b) {
int s = 0; int s = 0;
for (int i = a; i <= b; i++) { for (int i = a; i <= b; i++) {
s += t[i]; s += t[i];
@ -58,36 +58,39 @@ int sum(int a, int b) {
} }
\end{lstlisting} \end{lstlisting}
The above function works in $O(n)$ time. The above function works in $O(n)$ time,
However, if the array is large and there are several queries, where $n$ is the number of elements in the array.
such an approach is slow. Thus, we can process $q$ queries in $O(nq)$
time using the function.
If both $n$ and $q$ are large, this approach
is slow.
In this chapter, we will learn how In this chapter, we will learn how
range queries can be processed much more efficiently. range queries can be processed much more efficiently.
\section{Static array queries} \section{Static array queries}
We first focus on a simple situation where We first focus on a situation where
the array is \key{static}, i.e., the array is \key{static}, i.e.,
the elements never change between the queries. the elements are never modified between the queries.
In this case, it suffices to preprocess the In this case, it suffices to construct
array and construct a data structure that tells us
a data structure that can be used for the answer for any possible range query efficiently.
finding the answer for
any possible range query efficiently.
\subsubsection{Sum query} \subsubsection{Sum queries}
\index{prefix sum array} \index{sum array}
Sum queries can be processed efficiently Let $\textrm{rsq}(a,b)$ (''range sum query'') be the sum of
by constructing a \key{sum array} elements in the range $[a,b]$ of an array.
that contains the sum of elements in the range $[1,k]$ Our first task is to find a way to calculate any value of $\textrm{rsq}(a,b)$
for each $k=1,2,\ldots,n$. efficiently.
Using the sum array, the sum of elements in It turns out that there is a simple data structure
any range $[a,b]$ of the original array can that we can use: a \key{sum array}.
be calculated in $O(1)$ time. Such an array contains all values of the form
$\textrm{rsq}(1,k)$ where $1 \le k \le n$,
i.e., for each $k$ the sum of the first $k$ elements of the array.
For example, for the array For example, consider the following array:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1); %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -113,7 +116,7 @@ For example, for the array
\node at (7.5,1.4) {$8$}; \node at (7.5,1.4) {$8$};
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
the corresponding sum array is as follows: The corresponding sum array is as follows:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1); %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -140,30 +143,13 @@ the corresponding sum array is as follows:
\node at (7.5,1.4) {$8$}; \node at (7.5,1.4) {$8$};
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
The following code constructs a sum array Now we can calculate any value of
\texttt{s} for an array \texttt{t} in $O(n)$ time: $\textrm{rsq}(a,b)$ in $O(1)$ time, because
\begin{lstlisting} \[ \textrm{rsq}(a,b) = \textrm{rsq}(1,b) - \textrm{rsq}(1,a-1).\]
for (int i = 1; i <= n; i++) { It is convenient to define $\textrm{rsq}(1,0)=0$,
s[i] = s[i-1]+t[i]; so that the above formula can be used also when $a=1$.
}
\end{lstlisting}
After this, the following function processes
any sum query in $O(1)$ time:
\begin{lstlisting}
int sum(int a, int b) {
return s[b]-s[a-1];
}
\end{lstlisting}
The function calculates the sum in the range $[a,b]$ For example, consider the range $[4,7]$:
by subtracting the sum in the range $[1,a-1]$
from the sum in the range $[1,b]$.
Thus, only two values of the sum array
are needed, and the query takes $O(1)$ time.
Note that because of the one-based indexing,
the function also works when $a=1$ if $\texttt{s}[0]=0$.
As an example, consider the range $[4,7]$:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1); \fill[color=lightgray] (3,0) rectangle (7,1);
@ -190,8 +176,8 @@ As an example, consider the range $[4,7]$:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
The sum in the range is $8+6+1+4=19$. The sum in the range is $8+6+1+4=19$.
This can be calculated using the precalculated This sum can be calculated using
sums for the ranges $[1,3]$ and $[1,7]$: two values in the sum array:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1); \fill[color=lightgray] (2,0) rectangle (3,1);
@ -251,18 +237,20 @@ where $S(X)$ denotes the sum of a rectangular
subarray from the upper-left corner subarray from the upper-left corner
to the position of $X$. to the position of $X$.
\subsubsection{Minimum query} \subsubsection{Minimum queries}
It is also possible to process minimum queries Let $\textrm{rmq}(a,b)$ (''range minimum query'') be the
in $O(1)$ time after preprocessing, though it is minimum element in the range $[a,b]$ of an array.
more difficult than processing sum queries. It is possible to process also minimum queries
in $O(1)$ time, though it is more difficult than
processing sum queries.
Note that minimum and maximum queries can always Note that minimum and maximum queries can always
be implemented using same techniques, be processed using similar techniques,
so it suffices to focus on minimum queries. so it suffices to focus on minimum queries.
The idea is to precalculate the minimum element of each range The idea is to precalculate all values $\textrm{rmq}(a,b)$
of size $2^k$ in the array. where $b-a+1$, the length of the range, is a power of two.
For example, in the array For example, for the array
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
@ -288,74 +276,73 @@ For example, in the array
\node at (7.5,1.4) {$8$}; \node at (7.5,1.4) {$8$};
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
the following minima will be calculated: the following values will be calculated:
\begin{center} \begin{center}
\begin{tabular}{ccc} \begin{tabular}{ccc}
\begin{tabular}{ccc} \begin{tabular}{ccc}
range & size & min \\ $a$ & $b$ & $\textrm{rmq}(a,b)$ \\
\hline \hline
$[1,1]$ & 1 & 1 \\ 1 & 1 & 1 \\
$[2,2]$ & 1 & 3 \\ 2 & 2 & 3 \\
$[3,3]$ & 1 & 4 \\ 3 & 3 & 4 \\
$[4,4]$ & 1 & 8 \\ 4 & 4 & 8 \\
$[5,5]$ & 1 & 6 \\ 5 & 5 & 6 \\
$[6,6]$ & 1 & 1 \\ 6 & 6 & 1 \\
$[7,7]$ & 1 & 4 \\ 7 & 7 & 4 \\
$[8,8]$ & 1 & 2 \\ 8 & 8 & 2 \\
\end{tabular} \end{tabular}
& &
\begin{tabular}{ccc} \begin{tabular}{ccc}
range & size & min \\ $a$ & $b$ & $\textrm{rmq}(a,b)$ \\
\hline \hline
$[1,2]$ & 2 & 1 \\ 1 & 2 & 1 \\
$[2,3]$ & 2 & 3 \\ 2 & 3 & 3 \\
$[3,4]$ & 2 & 4 \\ 3 & 4 & 4 \\
$[4,5]$ & 2 & 6 \\ 4 & 5 & 6 \\
$[5,6]$ & 2 & 1 \\ 5 & 6 & 1 \\
$[6,7]$ & 2 & 1 \\ 6 & 7 & 1 \\
$[7,8]$ & 2 & 2 \\ 7 & 8 & 2 \\
\\ \\
\end{tabular} \end{tabular}
& &
\begin{tabular}{ccc} \begin{tabular}{ccc}
range & size & min \\ $a$ & $b$ & $\textrm{rmq}(a,b)$ \\
\hline \hline
$[1,4]$ & 4 & 1 \\ 1 & 4 & 1 \\
$[2,5]$ & 4 & 3 \\ 2 & 5 & 3 \\
$[3,6]$ & 4 & 1 \\ 3 & 6 & 1 \\
$[4,7]$ & 4 & 1 \\ 4 & 7 & 1 \\
$[5,8]$ & 4 & 1 \\ 5 & 8 & 1 \\
$[1,8]$ & 8 & 1 \\ 1 & 8 & 1 \\
\\ \\
\\ \\
\end{tabular} \end{tabular}
\end{tabular} \end{tabular}
\end{center} \end{center}
There are $O(n \log n)$ ranges of size $2^k$, The number of precalculated values is $O(n \log n)$,
because for each array position, because there are $O(\log n)$ range lengths
there are $O(\log n)$ ranges that begin at that position. that are powers of two.
The minima in all ranges of size $2^k$ can be calculated In addition, the values can be calculated efficiently
in $O(n \log n)$ time, because each range of size $2^k$ using the recursive formula
consists of two ranges of size $2^{k-1}$ and the minima \[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+w-1),\textrm{rmq}(a+w,b)),\]
can be calculated recursively. where $b-a+1$ is a power of two and $w=(b-a+1)/2$.
Calculating all those values takes $O(n \log n)$ time.
After this, the minimum in any range $[a,b]$ After this, any value of $\textrm{rmq}(a,b)$ can be calculated
can be calculated in $O(1)$ time as a minimum of in $O(1)$ time as a minimum of two precalculated values.
two ranges of size $2^k$ where $k=\lfloor \log_2(b-a+1) \rfloor$. Let $k$ be the largest power of two that does not exceed $b-a+1$.
The first range begins at index $a$, We can calculate the value of $\textrm{rmq}(a,b)$ using the formula
and the second range ends at index $b$. \[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+k-1),\textrm{rmq}(b-k+1,b)).\]
The parameter $k$ is chosen so that In the above formula, the range $[a,b]$ is represented
the two ranges of size $2^k$ as the union of the ranges $[a,a+k-1]$ and $[b-k+1,b]$, both of length $k$.
fully cover the range $[a,b]$.
As an example, consider the range $[2,7]$: As an example, consider the range $[2,7]$:
\begin{center} \begin{center}
@ -384,10 +371,10 @@ As an example, consider the range $[2,7]$:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
The length of the range is 6, The length of the range is 6,
and $\lfloor \log_2(6) \rfloor = 2$. and the largest power of two that does
Thus, the minimum can be calculated not exceed 6 is 4.
from two ranges of length 4. Thus the range $[2,7]$ is
The ranges are $[2,5]$ and $[4,7]$: the union of the ranges $[2,5]$ and $[4,7]$:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1); \fill[color=lightgray] (1,0) rectangle (5,1);
@ -439,9 +426,8 @@ The ranges are $[2,5]$ and $[4,7]$:
\node at (7.5,1.4) {$8$}; \node at (7.5,1.4) {$8$};
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
Since the minimum in the range $[2,5]$ is 3 Since $\textrm{rmq}(2,5)=3$ and $\textrm{rmq}(4,7)=1$,
and the minimum in the range $[4,7]$ is 1, we can conclude that $\textrm{rmq}(2,7)=1$.
we know that the minimum in the range $[2,7]$ is 1.
\section{Binary indexed tree} \section{Binary indexed tree}
@ -449,29 +435,26 @@ we know that the minimum in the range $[2,7]$ is 1.
\index{Fenwick tree} \index{Fenwick tree}
A \key{binary indexed tree} or \key{Fenwick tree} A \key{binary indexed tree} or \key{Fenwick tree}
can be seen as a dynamic version of a sum array. can be seen as a dynamic variant of a sum array.
The tree supports two $O(\log n)$ time operations: This data structure supports two $O(\log n)$ time operations:
calculating the sum of elements in a range, calculating the sum of elements in a range
and modifying the value of an element. and modifying the value of an element.
The benefit in using a binary indexed tree is The advantage of a binary indexed tree is
that the elements of the underlying array that it allows us to efficiently update
can be efficiently updated between the queries. the array between the sum queries.
This would not be possible with a sum array, This would not be possible using a sum array,
because after each update, we should build the because after each update, we should build the
whole sum array again in $O(n)$ time. whole sum array again in $O(n)$ time.
\subsubsection{Structure} \subsubsection{Structure}
Given an array of $n$ elements, indexed $1 \ldots n$, A binary indexed tree can be represented as an array
the binary indexed tree for that array whose each value is the sum of elements in a range.
is an array such that the value at position $k$ More precisely, the value at position $x$ is $\textrm{rsq}(x-k+1,x)$,
equals the sum of elements in the original array in a range where $k$ is the largest power of two that divides $x$.
that ends at position $k$. For example, if $x=6$, then $k=2$, because 2 divides 6
The length of the range is the largest power of two but 4 does not divide 6.
that divides $k$.
For example, if $k=6$, the length of the range is $2$,
because $2$ divides $6$ but $4$ does not divide $6$.
\begin{samepage} \begin{samepage}
For example, consider the following array: For example, consider the following array:
@ -500,7 +483,43 @@ For example, consider the following array:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
\end{samepage} \end{samepage}
\begin{samepage}
The corresponding binary indexed tree is as follows: The corresponding binary indexed tree is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$16$};
\node at (4.5,0.5) {$6$};
\node at (5.5,0.5) {$7$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$29$};
\footnotesize
\node at (0.5,1.4) {$1$};
\node at (1.5,1.4) {$2$};
\node at (2.5,1.4) {$3$};
\node at (3.5,1.4) {$4$};
\node at (4.5,1.4) {$5$};
\node at (5.5,1.4) {$6$};
\node at (6.5,1.4) {$7$};
\node at (7.5,1.4) {$8$};
\end{tikzpicture}
\end{center}
\end{samepage}
For example, the value at position 6
in the binary indexed tree is 7,
because the sum of elements in the range $[5,6]$
of the array is $6+1=7$.
The following picture shows more clearly
how each value in the binary indexed tree
corresponds to a range in the array:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=0.7] \begin{tikzpicture}[scale=0.7]
%\fill[color=lightgray] (3,0) rectangle (7,1); %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -545,18 +564,16 @@ The corresponding binary indexed tree is as follows:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
For example, the value at position 6
in the binary indexed tree is 7,
because the sum of elements in the range $[5,6]$
in the original array is $6+1=7$.
\subsubsection{Sum query} \subsubsection{Sum query}
The basic operation in a binary indexed tree is The values in the binary indexed tree
to calculate the sum of elements in a range $[1,k]$, can be used to efficiently calculate
where $k$ is any position in the array. any value of $\textrm{rsq}(1,k)$:
The sum of such a range can be calculated as a the sum of elements in the range $[1,k]$
sum of one or more values stored in the tree. of the array.
It turns out that any range $[1,k]$
can be divided into $O(\log n)$ ranges
whose sums are available in the binary indexed tree.
For example, the range $[1,7]$ corresponds to For example, the range $[1,7]$ corresponds to
the following values: the following values:
@ -605,22 +622,16 @@ the following values:
\end{center} \end{center}
Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$. Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$.
The structure of the binary indexed tree allows us to calculate
the sum of elements in any range using only $O(\log n)$
values from the tree.
Using the same technique that we previously used To calculate the value of $\textrm{rsq}(a,b)$,
with a sum array, we can use the same trick that we used with sum arrays:
we can efficiently calculate the sum of any range \[ \textrm{rsq}(a,b) = \textrm{rsq}(1,b) - \textrm{rsq}(1,a-1).\]
$[a,b]$ by substracting the sum of the range $[1,a-1]$ Also in this case, only $O(\log n)$ values are needed.
from the sum of the range $[1,b]$.
Also here, only $O(\log n)$ values are needed,
because it suffices to calculate two sums of $[1,k]$ ranges.
\subsubsection{Array update} \subsubsection{Array update}
When an element in the original array changes, When a value in the array is updated,
several sums in the binary indexed tree change. several values in the binary indexed tree should be updated.
For example, if the element at position 3 changes, For example, if the element at position 3 changes,
the sums of the following ranges change: the sums of the following ranges change:
\begin{center} \begin{center}
@ -667,24 +678,25 @@ the sums of the following ranges change:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
However, it turns out that Since each array element belongs to $O(\log n)$
the number of values that need to be updated ranges in the binary indexed tree,
in the binary indexed tree is only $O(\log n)$. it suffices to update $O(\log n)$ values.
\subsubsection{Implementation} \subsubsection{Implementation}
The operations of a binary indexed tree can be implemented The operations of a binary indexed tree can be implemented
in an elegant and efficient way using bit operations. in an elegant and efficient way using bit operations.
The key fact needed is that $k \& -k$ The key fact needed is that $k \& -k$
isolates the last one bit in a number $k$. isolates the last one bit of a number $k$.
For example, $6 \& -6=2$ because the number $6$ For example, $6 \& -6=2$ because the number $6$
corresponds to 110 and the number $2$ corresponds to 10. corresponds to 110 and the number $2$ corresponds to 10.
It turns out that when processing a range query, It turns out that when processing a sum query,
the position $k$ in the binary indexed tree should be the position $k$ in the binary indexed tree needs to be
decreased by $k \& -k$ at every step, decreased by $k \& -k$ at every step,
and when updating the array, and when updating the array,
the position $k$ should be increased by $k \& -k$ at every step. the position $k$ needs to be increased by $k \& -k$ at every step.
Suppose that the binary indexed tree is stored in an array \texttt{b}. Suppose that the binary indexed tree is stored in an array \texttt{b}.
The following function calculates The following function calculates
@ -714,7 +726,7 @@ void add(int k, int x) {
The time complexity of both the functions is The time complexity of both the functions is
$O(\log n)$, because the functions access $O(\log n)$ $O(\log n)$, because the functions access $O(\log n)$
values in the binary indexed tree, and each transition values in the binary indexed tree, and each move
to the next position to the next position
takes $O(1)$ time using bit operations. takes $O(1)$ time using bit operations.