Corrections

This commit is contained in:
Antti H S Laaksonen 2017-02-18 18:27:38 +02:00
parent fcfaaa6e2d
commit 3810af1386
1 changed files with 17 additions and 16 deletions

View File

@ -5,7 +5,7 @@ for string processing.
Many string problems can be easily solved Many string problems can be easily solved
in $O(n^2)$ time, but the challenge is to in $O(n^2)$ time, but the challenge is to
find algorithms that work in $O(n)$ or $O(n \log n)$ find algorithms that work in $O(n)$ or $O(n \log n)$
time and can process long strings. time.
\index{pattern matching} \index{pattern matching}
@ -21,13 +21,13 @@ The pattern matching problem is easy to solve
in $O(nm)$ time by a brute force algorithm that in $O(nm)$ time by a brute force algorithm that
goes through all positions where the pattern may goes through all positions where the pattern may
occur in the string. occur in the string.
However, in this chapter, we will see, that there However, in this chapter, we will see that there
are more efficient algorithms that require only are more efficient algorithms that require only
$O(n+m)$ time. $O(n+m)$ time.
\index{string} \index{string}
\section{Terminology} \section{String terminology}
\index{alphabet} \index{alphabet}
@ -156,7 +156,7 @@ For example, consider the following trie:
This trie corresponds to the set This trie corresponds to the set
$\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$. $\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$.
The character * in a node means that The character * in a node means that
one of the string in the set ends at the node. one of the strings in the set ends at the node.
This character is needed, because a string This character is needed, because a string
may be a prefix of another string. may be a prefix of another string.
For example, in this trie, \texttt{THE} For example, in this trie, \texttt{THE}
@ -169,8 +169,9 @@ We can also add a new string to the trie
in $O(n)$ time using a similar idea. in $O(n)$ time using a similar idea.
If needed, new nodes will be added to the trie. If needed, new nodes will be added to the trie.
Using a trie, we can also find the longest prefix Using a trie, we can also find
of a string that belongs to the set. for a given string the longest prefix
that belongs to the set.
In addition, by storing additional information In addition, by storing additional information
in each node, in each node,
it is possible to calculate the number of it is possible to calculate the number of
@ -281,7 +282,7 @@ can be calculated in $O(1)$ time using the formula
\subsubsection*{Using hash values} \subsubsection*{Using hash values}
We can efficiently compare strings using hash values. We can efficiently compare strings using hash values.
Instead of comparing the real contents of the strings, Instead of comparing the individual characters of the strings,
the idea is to compare their hash values. the idea is to compare their hash values.
If the hash values are equal, If the hash values are equal,
the strings are \emph{probably} equal, the strings are \emph{probably} equal,
@ -294,7 +295,7 @@ As an example, consider the pattern matching problem:
given a string $s$ and a pattern $p$, given a string $s$ and a pattern $p$,
find the positions where $p$ occurs in $s$. find the positions where $p$ occurs in $s$.
A brute force algorithm goes through all positions A brute force algorithm goes through all positions
where $p$ may occur, and compares the strings where $p$ may occur and compares the strings
character by character. character by character.
The time complexity of such an algorithm is $O(n^2)$. The time complexity of such an algorithm is $O(n^2)$.
@ -428,8 +429,8 @@ constants of the form $2^x$ are used.
\index{Z-array} \index{Z-array}
The \key{Z-array} of a string The \key{Z-array} of a string
contains for each position $k$ in the string gives for each position $k$ in the string
the lengt of the longest substring the length of the longest substring
that begins at position $k$ and is a prefix of the string. that begins at position $k$ and is a prefix of the string.
Such an array can be efficiently constructed Such an array can be efficiently constructed
using the \key{Z-algorithm}. using the \key{Z-algorithm}.
@ -532,11 +533,11 @@ we can use this information to calculate
values for elements in the range $[x,y]$. values for elements in the range $[x,y]$.
The time complexity of the Z-algorithm is $O(n)$, The time complexity of the Z-algorithm is $O(n)$,
because the algorithm always compares strings because the algorithm only compares strings
character by character starting at position $y+1$. character by character starting at position $y+1$.
If the characters match, the value of $y$ increases, If the characters match, the value of $y$ increases,
and it is not needed to compare the character at and it is not needed to compare the character at
position $y$ again, position $y$ again
but the information in the Z-array can be used. but the information in the Z-array can be used.
For example, let us construct the following Z-array: For example, let us construct the following Z-array:
@ -672,7 +673,7 @@ the current $[x,y]$ range will be $[7,11]$:
\end{center} \end{center}
Now, it is possible to calculate the Now, it is possible to calculate the
subsequent values for the Z-array subsequent values of the Z-array
more efficiently, more efficiently,
because we know that because we know that
the ranges $[1,5]$ and $[7,11]$ the ranges $[1,5]$ and $[7,11]$
@ -971,9 +972,9 @@ and thus the new range $[x,y]$ is $[10,16]$:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
After this, all subsequent values for the Z-array After this, all subsequent values of the Z-array
can be calculated using the values already can be calculated using the values already
calculated to the array. All the remaining values can be stored in the array. All the remaining values can be
directly retrieved from the beginning of the Z-array: directly retrieved from the beginning of the Z-array:
\begin{center} \begin{center}
@ -1059,7 +1060,7 @@ $p$\texttt{\#}$s$,
where $p$ and $s$ are separated by a special where $p$ and $s$ are separated by a special
character \texttt{\#} that does not occur character \texttt{\#} that does not occur
in the strings. in the strings.
The Z-array of $p$\texttt{\#}$s$ indicates the positions The Z-array of $p$\texttt{\#}$s$ tells us the positions
where $p$ occurs in $s$, where $p$ occurs in $s$,
because such positions contain the value $p$. because such positions contain the value $p$.