Corrections

2017-02-14 21:01:22 +02:00 · 2017-02-14 21:01:22 +02:00 · 9854d9d6ea
parent 64fc16a2dc
commit 9854d9d6ea
1 changed files with 166 additions and 154 deletions
--- a/luku09.tex
+++ b/luku09.tex
@ -42,14 +42,14 @@ For example, consider the range $[4,7]$ in the following array:
 In this range, the sum of elements is $4+6+1+3=16$,
 the minimum element is 1 and the maximum element is 6.

-
-An easy way to process range queries is
-to go through all the elements in the range.
-For example, we can calculate the sum
-in a range $[a,b]$ as follows:
+A simple way to process range queries is to
+go through all elements in the range.
+For example, the following function \texttt{rsq}
+calculates the sum of elements in any range
+$[a,b]$ of an array $t$:

 \begin{lstlisting}
-int sum(int a, int b) {
+int rsq(int a, int b) {
    int s = 0;
    for (int i = a; i <= b; i++) {
        s += t[i];
@ -58,36 +58,39 @@ int sum(int a, int b) {
 }
 \end{lstlisting}

-The above function works in $O(n)$ time.
-However, if the array is large and there are several queries,
-such an approach is slow.
+The above function works in $O(n)$ time,
+where $n$ is the number of elements in the array.
+Thus, we can process $q$ queries in $O(nq)$
+time using the function.
+If both $n$ and $q$ are large, this approach
+is slow.
 In this chapter, we will learn how
 range queries can be processed much more efficiently.

 \section{Static array queries}

-We first focus on a simple situation where
+We first focus on a situation where
 the array is \key{static}, i.e.,
-the elements never change between the queries.
-In this case, it suffices to preprocess the
-array and construct
-a data structure that can be used for
-finding the answer for
-any possible range query efficiently.
+the elements are never modified between the queries.
+In this case, it suffices to construct
+a data structure that tells us
+the answer for any possible range query efficiently.

-\subsubsection{Sum query}
+\subsubsection{Sum queries}

-\index{prefix sum array}
+\index{sum array}

-Sum queries can be processed efficiently
-by constructing a \key{sum array}
-that contains the sum of elements in the range $[1,k]$
-for each $k=1,2,\ldots,n$.
-Using the sum array, the sum of elements in
-any range $[a,b]$ of the original array can
-be calculated in $O(1)$ time.
+Let $\textrm{rsq}(a,b)$ (''range sum query'') be the sum of
+elements in the range $[a,b]$ of an array.
+Our first task is to find a way to calculate any value of $\textrm{rsq}(a,b)$
+efficiently.
+It turns out that there is a simple data structure
+that we can use: a \key{sum array}.
+Such an array contains all values of the form
+$\textrm{rsq}(1,k)$ where $1 \le k \le n$,
+i.e., for each $k$ the sum of the first $k$ elements of the array.

-For example, for the array
+For example, consider the following array:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -113,7 +116,7 @@ For example, for the array
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-the corresponding sum array is as follows:
+The corresponding sum array is as follows:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -140,30 +143,13 @@ the corresponding sum array is as follows:
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-The following code constructs a sum array
-\texttt{s} for an array \texttt{t} in $O(n)$ time:
-\begin{lstlisting}
-for (int i = 1; i <= n; i++) {
-    s[i] = s[i-1]+t[i];
-}
-\end{lstlisting}
-After this, the following function processes
-any sum query in $O(1)$ time:
-\begin{lstlisting}
-int sum(int a, int b) {
-    return s[b]-s[a-1];
-}
-\end{lstlisting}
+Now we can calculate any value of
+$\textrm{rsq}(a,b)$ in $O(1)$ time, because
+\[ \textrm{rsq}(a,b) = \textrm{rsq}(1,b) - \textrm{rsq}(1,a-1).\]
+It is convenient to define $\textrm{rsq}(1,0)=0$,
+so that the above formula can be used also when $a=1$.

-The function calculates the sum in the range $[a,b]$
-by subtracting the sum in the range $[1,a-1]$
-from the sum in the range $[1,b]$.
-Thus, only two values of the sum array
-are needed, and the query takes $O(1)$ time.
-Note that because of the one-based indexing,
-the function also works when $a=1$ if $\texttt{s}[0]=0$. 
-
-As an example, consider the range $[4,7]$:
+For example, consider the range $[4,7]$:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \fill[color=lightgray] (3,0) rectangle (7,1);
@ -190,8 +176,8 @@ As an example, consider the range $[4,7]$:
 \end{tikzpicture}
 \end{center}
 The sum in the range is $8+6+1+4=19$.
-This can be calculated using the precalculated
-sums for the ranges $[1,3]$ and $[1,7]$:
+This sum can be calculated using
+two values in the sum array:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \fill[color=lightgray] (2,0) rectangle (3,1);
@ -251,18 +237,20 @@ where $S(X)$ denotes the sum of a rectangular
 subarray from the upper-left corner
 to the position of $X$.

-\subsubsection{Minimum query}
+\subsubsection{Minimum queries}

-It is also possible to process minimum queries
-in $O(1)$ time after preprocessing, though it is
-more difficult than processing sum queries.
+Let $\textrm{rmq}(a,b)$ (''range minimum query'') be the
+minimum element in the range $[a,b]$ of an array.
+It is possible to process also minimum queries
+in $O(1)$ time, though it is more difficult than
+processing sum queries.
 Note that minimum and maximum queries can always
-be implemented using same techniques,
+be processed using similar techniques,
 so it suffices to focus on minimum queries.

-The idea is to precalculate the minimum element of each range
-of size $2^k$ in the array.
-For example, in the array
+The idea is to precalculate all values $\textrm{rmq}(a,b)$
+where $b-a+1$, the length of the range, is a power of two.
+For example, for the array

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -288,74 +276,73 @@ For example, in the array
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-the following minima will be calculated:
+the following values will be calculated:

 \begin{center}
 \begin{tabular}{ccc}

 \begin{tabular}{ccc}
-range & size & min \\
+$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
 \hline
-$[1,1]$ & 1 & 1 \\
-$[2,2]$ & 1 & 3 \\
-$[3,3]$ & 1 & 4 \\
-$[4,4]$ & 1 & 8 \\
-$[5,5]$ & 1 & 6 \\
-$[6,6]$ & 1 & 1 \\
-$[7,7]$ & 1 & 4 \\
-$[8,8]$ & 1 & 2 \\
+1 & 1 & 1 \\
+2 & 2 & 3 \\
+3 & 3 & 4 \\
+4 & 4 & 8 \\
+5 & 5 & 6 \\
+6 & 6 & 1 \\
+7 & 7 & 4 \\
+8 & 8 & 2 \\
 \end{tabular}

 &

 \begin{tabular}{ccc}
-range & size & min \\
+$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
 \hline
-$[1,2]$ & 2 & 1 \\
-$[2,3]$ & 2 & 3 \\
-$[3,4]$ & 2 & 4 \\
-$[4,5]$ & 2 & 6 \\
-$[5,6]$ & 2 & 1 \\
-$[6,7]$ & 2 & 1 \\
-$[7,8]$ & 2 & 2 \\
+1 & 2 & 1 \\
+2 & 3 & 3 \\
+3 & 4 & 4 \\
+4 & 5 & 6 \\
+5 & 6 & 1 \\
+6 & 7 & 1 \\
+7 & 8 & 2 \\
 \\
 \end{tabular}

 &

 \begin{tabular}{ccc}
-range & size & min \\
+$a$ & $b$ & $\textrm{rmq}(a,b)$ \\
 \hline
-$[1,4]$ & 4 & 1 \\
-$[2,5]$ & 4 & 3 \\
-$[3,6]$ & 4 & 1 \\
-$[4,7]$ & 4 & 1 \\
-$[5,8]$ & 4 & 1 \\
-$[1,8]$ & 8 & 1 \\
+1 & 4 & 1 \\
+2 & 5 & 3 \\
+3 & 6 & 1 \\
+4 & 7 & 1 \\
+5 & 8 & 1 \\
+1 & 8 & 1 \\
 \\
 \\
 \end{tabular}

 \end{tabular}
-
 \end{center}

-There are $O(n \log n)$ ranges of size $2^k$,
-because for each array position,
-there are $O(\log n)$ ranges that begin at that position.
-The minima in all ranges of size $2^k$ can be calculated
-in $O(n \log n)$ time, because each range of size $2^k$
-consists of two ranges of size $2^{k-1}$ and the minima
-can be calculated recursively.
+The number of precalculated values is $O(n \log n)$,
+because there are $O(\log n)$ range lengths
+that are powers of two.
+In addition, the values can be calculated efficiently
+using the recursive formula
+\[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+w-1),\textrm{rmq}(a+w,b)),\]
+where $b-a+1$ is a power of two and $w=(b-a+1)/2$.
+Calculating all those values takes $O(n \log n)$ time.

-After this, the minimum in any range $[a,b]$
-can be calculated in $O(1)$ time as a minimum of
-two ranges of size $2^k$ where $k=\lfloor \log_2(b-a+1) \rfloor$.
-The first range begins at index $a$,
-and the second range ends at index $b$.
-The parameter $k$ is chosen so that
-the two ranges of size $2^k$
-fully cover the range $[a,b]$.
+After this, any value of $\textrm{rmq}(a,b)$ can be calculated
+in $O(1)$ time as a minimum of two precalculated values.
+Let $k$ be the largest power of two that does not exceed $b-a+1$.
+We can calculate the value of $\textrm{rmq}(a,b)$ using the formula
+\[\textrm{rmq}(a,b) = \min(\textrm{rmq}(a,a+k-1),\textrm{rmq}(b-k+1,b)).\]
+In the above formula, the range $[a,b]$ is represented
+as the union of the ranges $[a,a+k-1]$ and $[b-k+1,b]$, both of length $k$.

 As an example, consider the range $[2,7]$:
 \begin{center}
@ -384,10 +371,10 @@ As an example, consider the range $[2,7]$:
 \end{tikzpicture}
 \end{center}
 The length of the range is 6,
-and $\lfloor \log_2(6) \rfloor = 2$.
-Thus, the minimum can be calculated
-from two ranges of length 4.
-The ranges are $[2,5]$ and $[4,7]$:
+and the largest power of two that does
+not exceed 6 is 4.
+Thus the range $[2,7]$ is
+the union of the ranges $[2,5]$ and $[4,7]$:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \fill[color=lightgray] (1,0) rectangle (5,1);
@ -439,9 +426,8 @@ The ranges are $[2,5]$ and $[4,7]$:
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-Since the minimum in the range $[2,5]$ is 3
-and the minimum in the range $[4,7]$ is 1,
-we know that the minimum in the range $[2,7]$ is 1.
+Since $\textrm{rmq}(2,5)=3$ and $\textrm{rmq}(4,7)=1$,
+we can conclude that $\textrm{rmq}(2,7)=1$.

 \section{Binary indexed tree}

@ -449,29 +435,26 @@ we know that the minimum in the range $[2,7]$ is 1.
 \index{Fenwick tree}

 A \key{binary indexed tree} or \key{Fenwick tree}
-can be seen as a dynamic version of a sum array.
-The tree supports two $O(\log n)$ time operations:
-calculating the sum of elements in a range,
+can be seen as a dynamic variant of a sum array.
+This data structure supports two $O(\log n)$ time operations:
+calculating the sum of elements in a range
 and modifying the value of an element.

-The benefit in using a binary indexed tree is
-that the elements of the underlying array
-can be efficiently updated between the queries.
-This would not be possible with a sum array,
+The advantage of a binary indexed tree is
+that it allows us to efficiently update
+the array between the sum queries.
+This would not be possible using a sum array,
 because after each update, we should build the
 whole sum array again in $O(n)$ time.

 \subsubsection{Structure}

-Given an array of $n$ elements, indexed $1 \ldots n$,
-the binary indexed tree for that array
-is an array such that the value at position $k$
-equals the sum of elements in the original array in a range
-that ends at position $k$.
-The length of the range is the largest power of two
-that divides $k$.
-For example, if $k=6$, the length of the range is $2$,
-because $2$ divides $6$ but $4$ does not divide $6$.
+A binary indexed tree can be represented as an array
+whose each value is the sum of elements in a range.
+More precisely, the value at position $x$ is $\textrm{rsq}(x-k+1,x)$,
+where $k$ is the largest power of two that divides $x$.
+For example, if $x=6$, then $k=2$, because 2 divides 6
+but 4 does not divide 6.

 \begin{samepage}
 For example, consider the following array:
@ -500,7 +483,43 @@ For example, consider the following array:
 \end{tikzpicture}
 \end{center}
 \end{samepage}
+\begin{samepage}
 The corresponding binary indexed tree is as follows:
+\begin{center}
+\begin{tikzpicture}[scale=0.7]
+\draw (0,0) grid (8,1);
+
+\node at (0.5,0.5) {$1$};
+\node at (1.5,0.5) {$4$};
+\node at (2.5,0.5) {$4$};
+\node at (3.5,0.5) {$16$};
+\node at (4.5,0.5) {$6$};
+\node at (5.5,0.5) {$7$};
+\node at (6.5,0.5) {$4$};
+\node at (7.5,0.5) {$29$};
+
+\footnotesize
+\node at (0.5,1.4) {$1$};
+\node at (1.5,1.4) {$2$};
+\node at (2.5,1.4) {$3$};
+\node at (3.5,1.4) {$4$};
+\node at (4.5,1.4) {$5$};
+\node at (5.5,1.4) {$6$};
+\node at (6.5,1.4) {$7$};
+\node at (7.5,1.4) {$8$};
+\end{tikzpicture}
+\end{center}
+\end{samepage}
+
+For example, the value at position 6
+in the binary indexed tree is 7,
+because the sum of elements in the range $[5,6]$
+of the array is $6+1=7$.
+
+The following picture shows more clearly
+how each value in the binary indexed tree
+corresponds to a range in the array:
+
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 %\fill[color=lightgray] (3,0) rectangle (7,1);
@ -545,18 +564,16 @@ The corresponding binary indexed tree is as follows:
 \end{tikzpicture}
 \end{center}

-For example, the value at position 6
-in the binary indexed tree is 7,
-because the sum of elements in the range $[5,6]$
-in the original array is $6+1=7$.
-
 \subsubsection{Sum query}

-The basic operation in a binary indexed tree is
-to calculate the sum of elements in a range $[1,k]$,
-where $k$ is any position in the array.
-The sum of such a range can be calculated as a
-sum of one or more values stored in the tree.
+The values in the binary indexed tree
+can be used to efficiently calculate
+any value of $\textrm{rsq}(1,k)$:
+the sum of elements in the range $[1,k]$
+of the array.
+It turns out that any range $[1,k]$
+can be divided into $O(\log n)$ ranges
+whose sums are available in the binary indexed tree.

 For example, the range $[1,7]$ corresponds to
 the following values:
@ -605,22 +622,16 @@ the following values:
 \end{center}

 Hence, the sum of elements in the range $[1,7]$ is $16+7+4=27$.
-The structure of the binary indexed tree allows us to calculate
-the sum of elements in any range using only $O(\log n)$
-values from the tree.

-Using the same technique that we previously used
-with a sum array,
-we can efficiently calculate the sum of any range
-$[a,b]$ by substracting the sum of the range $[1,a-1]$
-from the sum of the range $[1,b]$.
-Also here, only $O(\log n)$ values are needed,
-because it suffices to calculate two sums of $[1,k]$ ranges.
+To calculate the value of $\textrm{rsq}(a,b)$,
+we can use the same trick that we used with sum arrays:
+\[ \textrm{rsq}(a,b) = \textrm{rsq}(1,b) - \textrm{rsq}(1,a-1).\]
+Also in this case, only $O(\log n)$ values are needed.

 \subsubsection{Array update}

-When an element in the original array changes,
-several sums in the binary indexed tree change.
+When a value in the array is updated,
+several values in the binary indexed tree should be updated.
 For example, if the element at position 3 changes,
 the sums of the following ranges change:
 \begin{center}
@ -667,24 +678,25 @@ the sums of the following ranges change:
 \end{tikzpicture}
 \end{center}

-However, it turns out that
-the number of values that need to be updated
-in the binary indexed tree is only $O(\log n)$.
+Since each array element belongs to $O(\log n)$
+ranges in the binary indexed tree,
+it suffices to update $O(\log n)$ values.
+

 \subsubsection{Implementation}

 The operations of a binary indexed tree can be implemented
 in an elegant and efficient way using bit operations.
 The key fact needed is that $k \& -k$
-isolates the last one bit in a number $k$.
+isolates the last one bit of a number $k$.
 For example, $6 \& -6=2$ because the number $6$
 corresponds to 110 and the number $2$ corresponds to 10.

-It turns out that when processing a range query,
-the position $k$ in the binary indexed tree should be
+It turns out that when processing a sum query,
+the position $k$ in the binary indexed tree needs to be
 decreased by $k \& -k$ at every step,
 and when updating the array,
-the position $k$ should be increased by $k \& -k$ at every step.
+the position $k$ needs to be increased by $k \& -k$ at every step.

 Suppose that the binary indexed tree is stored in an array \texttt{b}.
 The following function calculates
@ -714,7 +726,7 @@ void add(int k, int x) {

 The time complexity of both the functions is
 $O(\log n)$, because the functions access $O(\log n)$
-values in the binary indexed tree, and each transition
+values in the binary indexed tree, and each move
 to the next position
 takes $O(1)$ time using bit operations.