Improve language

This commit is contained in:
Antti H S Laaksonen 2017-05-20 12:17:11 +03:00
parent 1a1b242e9f
commit 74a0559250
1 changed files with 89 additions and 80 deletions

View File

@ -646,25 +646,25 @@ cases where $y=1$ or $x=1$, because we use a
one-indexed array whose values are initially zeros. one-indexed array whose values are initially zeros.
The time complexity of the algorithm is $O(n^2)$. The time complexity of the algorithm is $O(n^2)$.
\section{Knapsack} \section{Knapsack problems}
\index{knapsack} \index{knapsack}
The term \key{knapsack} refers to problems where The term \key{knapsack} refers to problems where
a set of objects is given, and we should find a set of objects is given, and
a subset of objects that has some properties. subsets with some properties
have to be found.
Knapsack problems can often be solved Knapsack problems can often be solved
using dynamic programming. using dynamic programming.
In this section, we consider the following In this section, we consider the following
problem: problem:
Suppose that there are $n$ We are given a list of weights
objects whose weights are $\{w_1,w_2,\ldots,w_n\}$, $[w_1,w_2,\ldots,w_n]$,
and we should determine all distinct weight sums and our task is to determine all distinct
that can be constructed using the objects. sums that can be constructed using the weights.
For example, if the weights are For example, if the weights are
$\{1,3,3,5\}$, the following weight $[1,3,3,5]$, the following sums are possible:
sums are possible:
\begin{center} \begin{center}
\begin{tabular}{rrrrrrrrrrrrr} \begin{tabular}{rrrrrrrrrrrrr}
@ -674,33 +674,40 @@ sums are possible:
\end{tabular} \end{tabular}
\end{center} \end{center}
In this case, all weight sums between $0 \ldots 12$ In this case, all sums between $0 \ldots 12$
are possible, expect 2 and 10. are possible, except 2 and 10.
For example, the weight sum 7 is possible because we For example, the sum 7 is possible because we
can select the weights $\{1,3,3\}$. can select the weights $[1,3,3]$.
Note that there may be multiple objects
with the same weight.
To solve the problem, we focus on subproblems To solve the problem, we focus on subproblems
where we only use the first $k$ objects where we only use the first $k$ weights
to construct weight sums. on the list to construct sums.
Let $\texttt{possible}(k,x)=\texttt{true}$ if Let $\texttt{possible}(x,k)=\texttt{true}$ if
we can construct a weight sum $x$ we can construct a sum $x$
using the first $k$ objects, using the first $k$ weights,
and otherwise $\texttt{possible}(k,x)=\texttt{false}$. and otherwise $\texttt{possible}(x,k)=\texttt{false}$.
The values of the function can be recursively The values of the function can be recursively
calculated as follows: calculated as follows:
\[ \texttt{possible}(k,x) = \texttt{possible}(k-1,x) \lor \texttt{possible}(k-1,x-w_k) \] \[ \texttt{possible}(x,k) = \texttt{possible}(x-w_k,k-1) \lor \texttt{possible}(x,k-1) \]
This means that we can construct a weight sum $x$ The formula is based on the fact that we can
using the $k$ first objects in two ways either use or not use the weight $w_k$ in the sum.
depending on whether we If we use $w_k$, the remaining task is to
use the weight $w_k$ in the sum or not. form the sum $x-w_k$ using the first $k-1$ weights,
(The symbol ''$\lor$'' denotes the ''or'' operation.) and if we do not use $w_k$,
In addition, $\texttt{possible}(0,0)=\texttt{true}$ the remaining task is to form the sum $x$
and $\texttt{possible}(0,x)=\texttt{false}$ when $x \neq 0$. using the first $k-1$ weights.
In addition,
\begin{equation*}
\texttt{possible}(x,0) = \begin{cases}
\texttt{true} & x = 0\\
\texttt{false} & x \neq 0 \\
\end{cases}
\end{equation*}
because if no weights are used,
we can only form the sum 0.
The following table shows all values of the function The following table shows all values of the function
for the weights $\{1,3,3,5\}$ (the symbol ''X'' for the weights $[1,3,3,5]$ (the symbol ''X''
indicates the true values): indicates the true values):
\begin{center} \begin{center}
@ -715,11 +722,11 @@ indicates the true values):
\end{tabular} \end{tabular}
\end{center} \end{center}
After calculating those values, $\texttt{possible}(n,x)$ After calculating those values, $\texttt{possible}(x,n)$
tells us whether we can construct a tells us whether we can construct a
weight sum $x$ using \emph{all} objects. sum $x$ using \emph{all} weights.
Let $W$ denote the total weight of the objects. Let $W$ denote the total sum of the weights.
The following $O(nW)$ time The following $O(nW)$ time
dynamic programming solution dynamic programming solution
corresponds to the recursive function: corresponds to the recursive function:
@ -727,15 +734,15 @@ corresponds to the recursive function:
possible[0][0] = true; possible[0][0] = true;
for (int k = 1; k <= n; k++) { for (int k = 1; k <= n; k++) {
for (int x = 0; x <= W; x++) { for (int x = 0; x <= W; x++) {
possible[k][x] = possible[k-1][x]; if (x-w[k] >= 0) possible[x][k] |= possible[x-w[k]][k-1];
if (x-w[k] >= 0) possible[k][x] |= possible[k-1][x-w[k]]; possible[x][k] |= possible[x][k-1];
} }
} }
\end{lstlisting} \end{lstlisting}
However, here is a better implementation that only uses However, here is a better implementation that only uses
a one-dimensional array $\texttt{possible}[x]$ a one-dimensional array $\texttt{possible}[x]$
that indicates whether we can construct a subset with weight sum $x$. that indicates whether we can construct a subset with sum $x$.
The trick is to update the array from right to left for The trick is to update the array from right to left for
each new weight: each new weight:
\begin{lstlisting} \begin{lstlisting}
@ -747,9 +754,11 @@ for (int k = 1; k <= n; k++) {
} }
\end{lstlisting} \end{lstlisting}
In some other knapsack problems, objects have both weights and values, Note that the general idea presented here can be used
and we should find a maximum-value subset whose weight is restricted. in many knapsack problems.
We can solve such problems using similar ideas as we used here. For example, if we are given objects with weights and values,
we can determine for each weight sum the maximum value
sum of a subset.
\section{Edit distance} \section{Edit distance}
@ -765,55 +774,55 @@ The allowed editing operations are as follows:
\begin{itemize} \begin{itemize}
\item insert a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{ABCA}) \item insert a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{ABCA})
\item remove a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{AC}) \item remove a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{AC})
\item change a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{ADC}) \item modify a character (e.g. \texttt{ABC} $\rightarrow$ \texttt{ADC})
\end{itemize} \end{itemize}
For example, the edit distance between For example, the edit distance between
\texttt{LOVE} and \texttt{MOVIE} is 2, \texttt{LOVE} and \texttt{MOVIE} is 2,
because we can first perform the operation because we can first perform the operation
\texttt{LOVE} $\rightarrow$ \texttt{MOVE} \texttt{LOVE} $\rightarrow$ \texttt{MOVE}
(change) and then the operation (modify) and then the operation
\texttt{MOVE} $\rightarrow$ \texttt{MOVIE} \texttt{MOVE} $\rightarrow$ \texttt{MOVIE}
(insertion). (insert).
This is the smallest possible number of operations, This is the smallest possible number of operations,
because it is clear that it is not possible because it is clear that only one operation is not enough.
to use only one operation.
Suppose we are given strings Suppose that we are given a string \texttt{x}
\texttt{x} and \texttt{y} that contain of length $n$ and a string \texttt{y} of length $m$,
$n$ and $m$ characters, respectively, and we want to calculate the edit distance between
and we wish to calculate the edit distance \texttt{x} and \texttt{y}.
between them. To solve the problem, we define a function
This can be done using $\texttt{distance}(a,b)$ that gives the
dynamic programming in $O(nm)$ time. edit distance between prefixes
Let $f(a,b)$ denote the edit distance $\texttt{x}[0 \ldots a]$ and $\texttt{y}[0 \ldots b]$.
between the first $a$ characters of \texttt{x} Thus, using this function, the edit distance
and the first $b$ characters of \texttt{y}. between \texttt{x} and \texttt{y} equals $\texttt{distance}(n-1,m-1)$.
Using this function, the edit distance between
\texttt{x} and \texttt{y} equals $f(n,m)$.
The base cases for the function are We can calculate the values of \texttt{distance}
\[ as follows:
\begin{array}{lcl} \begin{equation*}
f(0,b) & = & b \\ \begin{aligned}
f(a,0) & = & a \\ \texttt{distance}(a,b) & = \min(& \texttt{distance}(a,b-1)+1 & , \\
\end{array} & & \texttt{distance}(a-1,b)+1 & , \\
\] & & \texttt{distance}(a-1,b-1)+\texttt{cost}(a,b) & ).
and in the general case the formula is \end{aligned}
\[ f(a,b) = \min(f(a,b-1)+1,f(a-1,b)+1,f(a-1,b-1)+c),\] \end{equation*}
where $c=0$ if the $a$th character of \texttt{x} Here $\texttt{cost}(a,b)=0$ if $\texttt{x}[a]=\texttt{y}[b]$,
equals the $b$th character of \texttt{y}, and otherwise $\texttt{cost}(a,b)=1$.
and otherwise $c=1$. The formula considers the following ways to
The formula considers all possible ways to shorten the strings: edit the string \texttt{x}:
\begin{itemize} \begin{itemize}
\item $f(a,b-1)$ means that a character is inserted to \texttt{x} \item $\texttt{distance}(a,b-1)$: insert a character at the end of \texttt{x}
\item $f(a-1,b)$ means that a character is removed from \texttt{x} \item $\texttt{distance}(a-1,b)$: remove the last character from \texttt{x}
\item $f(a-1,b-1)$ means that \texttt{x} and \texttt{y} contain \item $\texttt{distance}(a-1,b-1)$: match or modify the last character of \texttt{x}
the same character ($c=0$),
or a character in \texttt{x} is transformed into
a character in \texttt{y} ($c=1$)
\end{itemize} \end{itemize}
The following table shows the values of $f$ In the two first cases, one editing operation is needed
(insert or remove).
In the last case, if $\texttt{x}[a]=\texttt{y}[b]$,
we can match the last characters without editing,
and otherwise one editing operation is needed (modify).
The following table shows the values of \texttt{distance}
in the example case: in the example case:
\begin{center} \begin{center}
\begin{tikzpicture}[scale=.65] \begin{tikzpicture}[scale=.65]
@ -940,7 +949,7 @@ the edit distance between \texttt{LOV} and \texttt{MOV}, etc.
Sometimes the states of a dynamic programming solution Sometimes the states of a dynamic programming solution
are more complex than fixed combinations of numbers. are more complex than fixed combinations of numbers.
As an example, As an example,
we consider the problem of calculating consider the problem of calculating
the number of distinct ways to the number of distinct ways to
fill an $n \times m$ grid using fill an $n \times m$ grid using
$1 \times 2$ and $2 \times 1$ size tiles. $1 \times 2$ and $2 \times 1$ size tiles.
@ -987,9 +996,9 @@ $\sqsubset \sqsupset \sqsubset \sqsupset \sqcup \sqcup \sqcap$
$\sqsubset \sqsupset \sqsubset \sqsupset \sqsubset \sqsupset \sqcup$ $\sqsubset \sqsupset \sqsubset \sqsupset \sqsubset \sqsupset \sqcup$
\end{itemize} \end{itemize}
Let $f(k,x)$ denote the number of ways to Let $\texttt{count}(k,x)$ denote the number of ways to
construct a solution for rows $1 \ldots k$ construct a solution for rows $1 \ldots k$
in the grid so that string $x$ corresponds to row $k$. of the grid such that string $x$ corresponds to row $k$.
It is possible to use dynamic programming here, It is possible to use dynamic programming here,
because the state of a row is constrained because the state of a row is constrained
only by the state of the previous row. only by the state of the previous row.
@ -1039,7 +1048,7 @@ that worked independently.}:
This formula is very efficient, because it calculates This formula is very efficient, because it calculates
the number of tilings in $O(nm)$ time, the number of tilings in $O(nm)$ time,
but since the answer is a product of real numbers, but since the answer is a product of real numbers,
a practical problem in using the formula is a problem when using the formula is
how to store the intermediate results accurately. how to store the intermediate results accurately.