Corrections
This commit is contained in:
parent
fcfaaa6e2d
commit
3810af1386
33
luku26.tex
33
luku26.tex
|
@ -5,7 +5,7 @@ for string processing.
|
||||||
Many string problems can be easily solved
|
Many string problems can be easily solved
|
||||||
in $O(n^2)$ time, but the challenge is to
|
in $O(n^2)$ time, but the challenge is to
|
||||||
find algorithms that work in $O(n)$ or $O(n \log n)$
|
find algorithms that work in $O(n)$ or $O(n \log n)$
|
||||||
time and can process long strings.
|
time.
|
||||||
|
|
||||||
\index{pattern matching}
|
\index{pattern matching}
|
||||||
|
|
||||||
|
@ -21,13 +21,13 @@ The pattern matching problem is easy to solve
|
||||||
in $O(nm)$ time by a brute force algorithm that
|
in $O(nm)$ time by a brute force algorithm that
|
||||||
goes through all positions where the pattern may
|
goes through all positions where the pattern may
|
||||||
occur in the string.
|
occur in the string.
|
||||||
However, in this chapter, we will see, that there
|
However, in this chapter, we will see that there
|
||||||
are more efficient algorithms that require only
|
are more efficient algorithms that require only
|
||||||
$O(n+m)$ time.
|
$O(n+m)$ time.
|
||||||
|
|
||||||
\index{string}
|
\index{string}
|
||||||
|
|
||||||
\section{Terminology}
|
\section{String terminology}
|
||||||
|
|
||||||
\index{alphabet}
|
\index{alphabet}
|
||||||
|
|
||||||
|
@ -156,7 +156,7 @@ For example, consider the following trie:
|
||||||
This trie corresponds to the set
|
This trie corresponds to the set
|
||||||
$\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$.
|
$\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$.
|
||||||
The character * in a node means that
|
The character * in a node means that
|
||||||
one of the string in the set ends at the node.
|
one of the strings in the set ends at the node.
|
||||||
This character is needed, because a string
|
This character is needed, because a string
|
||||||
may be a prefix of another string.
|
may be a prefix of another string.
|
||||||
For example, in this trie, \texttt{THE}
|
For example, in this trie, \texttt{THE}
|
||||||
|
@ -169,8 +169,9 @@ We can also add a new string to the trie
|
||||||
in $O(n)$ time using a similar idea.
|
in $O(n)$ time using a similar idea.
|
||||||
If needed, new nodes will be added to the trie.
|
If needed, new nodes will be added to the trie.
|
||||||
|
|
||||||
Using a trie, we can also find the longest prefix
|
Using a trie, we can also find
|
||||||
of a string that belongs to the set.
|
for a given string the longest prefix
|
||||||
|
that belongs to the set.
|
||||||
In addition, by storing additional information
|
In addition, by storing additional information
|
||||||
in each node,
|
in each node,
|
||||||
it is possible to calculate the number of
|
it is possible to calculate the number of
|
||||||
|
@ -281,7 +282,7 @@ can be calculated in $O(1)$ time using the formula
|
||||||
\subsubsection*{Using hash values}
|
\subsubsection*{Using hash values}
|
||||||
|
|
||||||
We can efficiently compare strings using hash values.
|
We can efficiently compare strings using hash values.
|
||||||
Instead of comparing the real contents of the strings,
|
Instead of comparing the individual characters of the strings,
|
||||||
the idea is to compare their hash values.
|
the idea is to compare their hash values.
|
||||||
If the hash values are equal,
|
If the hash values are equal,
|
||||||
the strings are \emph{probably} equal,
|
the strings are \emph{probably} equal,
|
||||||
|
@ -294,7 +295,7 @@ As an example, consider the pattern matching problem:
|
||||||
given a string $s$ and a pattern $p$,
|
given a string $s$ and a pattern $p$,
|
||||||
find the positions where $p$ occurs in $s$.
|
find the positions where $p$ occurs in $s$.
|
||||||
A brute force algorithm goes through all positions
|
A brute force algorithm goes through all positions
|
||||||
where $p$ may occur, and compares the strings
|
where $p$ may occur and compares the strings
|
||||||
character by character.
|
character by character.
|
||||||
The time complexity of such an algorithm is $O(n^2)$.
|
The time complexity of such an algorithm is $O(n^2)$.
|
||||||
|
|
||||||
|
@ -428,8 +429,8 @@ constants of the form $2^x$ are used.
|
||||||
\index{Z-array}
|
\index{Z-array}
|
||||||
|
|
||||||
The \key{Z-array} of a string
|
The \key{Z-array} of a string
|
||||||
contains for each position $k$ in the string
|
gives for each position $k$ in the string
|
||||||
the lengt of the longest substring
|
the length of the longest substring
|
||||||
that begins at position $k$ and is a prefix of the string.
|
that begins at position $k$ and is a prefix of the string.
|
||||||
Such an array can be efficiently constructed
|
Such an array can be efficiently constructed
|
||||||
using the \key{Z-algorithm}.
|
using the \key{Z-algorithm}.
|
||||||
|
@ -532,11 +533,11 @@ we can use this information to calculate
|
||||||
values for elements in the range $[x,y]$.
|
values for elements in the range $[x,y]$.
|
||||||
|
|
||||||
The time complexity of the Z-algorithm is $O(n)$,
|
The time complexity of the Z-algorithm is $O(n)$,
|
||||||
because the algorithm always compares strings
|
because the algorithm only compares strings
|
||||||
character by character starting at position $y+1$.
|
character by character starting at position $y+1$.
|
||||||
If the characters match, the value of $y$ increases,
|
If the characters match, the value of $y$ increases,
|
||||||
and it is not needed to compare the character at
|
and it is not needed to compare the character at
|
||||||
position $y$ again,
|
position $y$ again
|
||||||
but the information in the Z-array can be used.
|
but the information in the Z-array can be used.
|
||||||
|
|
||||||
For example, let us construct the following Z-array:
|
For example, let us construct the following Z-array:
|
||||||
|
@ -672,7 +673,7 @@ the current $[x,y]$ range will be $[7,11]$:
|
||||||
\end{center}
|
\end{center}
|
||||||
|
|
||||||
Now, it is possible to calculate the
|
Now, it is possible to calculate the
|
||||||
subsequent values for the Z-array
|
subsequent values of the Z-array
|
||||||
more efficiently,
|
more efficiently,
|
||||||
because we know that
|
because we know that
|
||||||
the ranges $[1,5]$ and $[7,11]$
|
the ranges $[1,5]$ and $[7,11]$
|
||||||
|
@ -971,9 +972,9 @@ and thus the new range $[x,y]$ is $[10,16]$:
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
\end{center}
|
\end{center}
|
||||||
|
|
||||||
After this, all subsequent values for the Z-array
|
After this, all subsequent values of the Z-array
|
||||||
can be calculated using the values already
|
can be calculated using the values already
|
||||||
calculated to the array. All the remaining values can be
|
stored in the array. All the remaining values can be
|
||||||
directly retrieved from the beginning of the Z-array:
|
directly retrieved from the beginning of the Z-array:
|
||||||
|
|
||||||
\begin{center}
|
\begin{center}
|
||||||
|
@ -1059,7 +1060,7 @@ $p$\texttt{\#}$s$,
|
||||||
where $p$ and $s$ are separated by a special
|
where $p$ and $s$ are separated by a special
|
||||||
character \texttt{\#} that does not occur
|
character \texttt{\#} that does not occur
|
||||||
in the strings.
|
in the strings.
|
||||||
The Z-array of $p$\texttt{\#}$s$ indicates the positions
|
The Z-array of $p$\texttt{\#}$s$ tells us the positions
|
||||||
where $p$ occurs in $s$,
|
where $p$ occurs in $s$,
|
||||||
because such positions contain the value $p$.
|
because such positions contain the value $p$.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue