Corrections

2017-02-11 19:11:50 +02:00 · 2017-02-11 19:11:50 +02:00 · d858bb3c42
commit d858bb3c42
parent f728fdc84f
1 changed files with 285 additions and 209 deletions
--- a/luku26.tex
+++ b/luku26.tex
@ -1,11 +1,35 @@
 \chapter{String algorithms}

-\index{string}
-\index{alphabet}
+This chapter deals with efficient algorithms
+for processing strings.
+Many string problems can be easily solved
+in $O(n^2)$ time, but the challenge is to
+find algorithms that work in $O(n)$ or $O(n \log n)$
+time and can process long strings.

-A string $s$ of length $n$
-is a sequence of characters
-$s[1],s[2],\ldots,s[n]$.
+\index{pattern matching}
+
+For example, a fundamental problem related to strings
+is the \key{pattern matching} problem:
+given a string of length $n$ and a pattern of length $m$,
+our task is to find the positions where the pattern
+occurs in the string.
+For example, the pattern \texttt{ABC} occurs two
+times in the string \texttt{ABABCBABC}.
+
+The pattern matching problem is easy to solve
+in $O(nm)$ time by a brute force algorithm that
+goes through all positions where the pattern may
+occur in the string.
+However, in this chapter, we will see, that there
+are more efficient algorithms that require only
+$O(n+m)$ time.
+
+\index{string}
+
+\section{Terminology}
+
+\index{alphabet}

 An \key{alphabet} is a set of characters
 that may appear in strings.
@ -15,76 +39,73 @@ consists of the capital letters of English.

 \index{substring}

-A \key{substring} consists of consecutive
-characters in a string.
-The number of substrings in a string is $n(n+1)/2$.
-For example, \texttt{ORITH} is a substring
-in \texttt{ALGORITHM}, and it corresponds
-to \texttt{ALG\underline{ORITH}M}.
+A \key{substring} is a sequence of consecutive
+characters of a string.
+The number of substrings of a string is $n(n+1)/2$.
+For example, the substrings of the string
+\texttt{ABCD} are
+\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D},
+\texttt{AB}, \texttt{BC}, \texttt{CD},
+\texttt{ABC}, \texttt{BCD} and \texttt{ABCD}.

 \index{subsequence}

-A \key{subsequence} is a subset of characters
-in a string in their original order.
-The number of subsequences in a string is $2^n-1$.
-For example, \texttt{LGRHM} is a subsequece
-in \texttt{ALGORITHM}, and it corresponds
-to \texttt{A\underline{LG}O\underline{R}IT\underline{HM}}.
+A \key{subsequence} is a sequence of
+(not necessarily consecutive) characters
+of a string in their original order.
+The number of subsequences of a string is $2^n-1$.
+For example, the subsequences of the string
+\texttt{ABCD} are
+\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D},
+\texttt{AB}, \texttt{AC}, \texttt{AD},
+\texttt{BC}, \texttt{BD}, \texttt{CD},
+\texttt{ABC}, \texttt{ABD}, \texttt{ACD},
+\texttt{BCD} and \texttt{ABCD}.

 \index{prefix}
 \index{suffix}

-A \key{prefix} is a subtring that contains the first
-character of a string,
-and a \key{suffix} is a substring that contains the last character.
-For example, the prefixes of
-\texttt{STORY} are \texttt{S}, \texttt{ST},
-\texttt{STO}, \texttt{STOR} and \texttt{STORY},
-and the suffixes are \texttt{Y}, \texttt{RY},
-\texttt{ORY}, \texttt{TORY} and \texttt{STORY}.
-A prefix or a suffix is \key{proper}
-if it is not the whole string.
+A \key{prefix} is a subtring that starts at the beginning
+of a string,
+and a \key{suffix} is a substring that ends at the end
+of a string.
+For example, for the string \texttt{ABCD},
+the prefixes are
+\texttt{A}, \texttt{AB}, \texttt{ABC} and \texttt{ABCD}
+and the suffixes are
+\texttt{D}, \texttt{CD}, \texttt{BCD} and \texttt{ABCD}.

 \index{rotation}

 A \key{rotation} can be generated by moving
-characters one by one from the beginning to the end
-in a string (or vice versa).
-For example, the rotations of \texttt{STORY} are
-\texttt{STORY},
-\texttt{TORYS},
-\texttt{ORYST},
-\texttt{RYSTO} and
-\texttt{YSTOR}.
+characters one by one from the beginning
+to the end of a string (or vice versa).
+For example, the rotations of the string
+\texttt{ABCD} are
+\texttt{ABCD}, \texttt{BCDA}, \texttt{CDAB} and \texttt{DABC}.

 \index{period}

 A \key{period} is a prefix of a string such that
-we can construct the string by repeating the period.
+the string can be constructed by repeating the period.
 The last repetition may be partial and contain
 only a prefix of the period.
-Often it is interesting to find the \key{shortest period}
-of a string.
 For example, the shortest period of
 \texttt{ABCABCA} is \texttt{ABC}.
-In this case, we first repeat the period twice
-and then partially.

 \index{border}

 A \key{border} is a string that is both
 a prefix and a suffix of a string.
-For example, the borders for \texttt{ABADABA}
-are \texttt{A}, \texttt{ABA} and \texttt{ABADABA}.
-Often we want to find the \key{longest border}
-that is not the whole string.
+For example, the borders of the string \texttt{ABACABA}
+are \texttt{A}, \texttt{ABA} and \texttt{ABACABA}.

 \index{lexicographical order}

-Usually we compare string using the \key{lexicographical order}
+Strings are usually compared using the \key{lexicographical order}
 that corresponds to the alphabetical order.
-It means that $x<y$ if either $x$ is a proper prefix of $y$,
-or there is an index $k$ such that
+It means that $x<y$ if either $x \neq y$ and $x$ is a prefix of $y$,
+or there is a position $k$ such that
 $x[i]=y[i]$ when $i<k$ and $x[k]<y[k]$.

 \section{Trie structure}
@ -93,15 +114,13 @@ $x[i]=y[i]$ when $i<k$ and $x[k]<y[k]$.

 A \key{trie} is a tree structure that
 maintains a set of strings.
-Strings are stored in a trie as chains
-of characters that start at the root
-of the tree.
+Each string in a trie corresponds to
+a chain of characters starting at
+the root node.
 If two strings have a common prefix,
-they also share a chain in the tree.
+they also have a common chain in the tree.

-For example, the following trie corresponds
-to the set
-$\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$:
+For example, consider the following trie:

 \begin{center}
 \begin{tikzpicture}[scale=0.9]
@ -133,36 +152,40 @@ $\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$:
 \path[draw,thick,->] (12) -- node[font=\small,label=right:\texttt{E}] {} (13);
 \end{tikzpicture}
 \end{center}
+
+This trie corresponds to the set
+$\{\texttt{CANAL},\texttt{CANDY},\texttt{THE},\texttt{THERE}\}$.
 The character * in a node means that
-a string ends at the node.
-This character is needed because a string
+one of the string in the set ends at the node.
+This character is needed, because a string
 may be a prefix of another string.
 For example, in this trie, \texttt{THE}
-is a suffix of \texttt{THERE}.
+is a prefix of \texttt{THERE}.

-Inserting and searching a string in a trie take $O(n)$ time
-where $n$ is the length of the string.
-Both operations can be implemented by
-starting at the root node and following the
-chain of characters that appear in the string.
+We can check if a trie contains a string
+in $O(n)$ time where $n$ is the length of the string,
+because we can follow the chain that starts at the root node.
+We can also add a new string to the trie
+in $O(n)$ time using a similar idea.
 If needed, new nodes will be added to the trie.

-Tries can be used for searching both strings
-and prefixes of strings.
-In addition, it is possible to calculate numbers
-of strings that correspond to each prefix,
-which can be useful in some applications.
+Using a trie, we can also find the longest prefix
+of a string that belongs to the set.
+In addition, by storing additional information
+in each node,
+it is possible to calculate the number of
+strings that have a given prefix.

-A trie can be stored as an array
+A trie can be stored in an array
 \begin{lstlisting}
 int t[N][A];
 \end{lstlisting}
 where $N$ is the maximum number of nodes
-(the total length of the string to be stored)
+(the maximum total length of the strings in the set)
 and $A$ is the size of the alphabet.
 The nodes of a trie are numbered
 $1,2,3,\ldots$ so that the number of the root is 1,
-and $\texttt{t}[s][c]$ is the next node in chain
+and $\texttt{t}[s][c]$ is the next node in the chain
 from node $s$ using character $c$.

 \section{String hashing}
@ -173,7 +196,7 @@ from node $s$ using character $c$.
 \key{String hashing} is a technique that
 allows us to efficiently check whether two
 substrings in a string are equal.
-The idea is to compare hash values of the
+The idea is to compare the hash values of the
 substrings instead of their individual characters.

 \subsubsection*{Calculating hash values}
@ -190,7 +213,7 @@ which makes it possible to compare strings
 based on their hash values.

 A usual way to implement string hashing
-is to use polynomial hashing, which means
+is polynomial hashing, which means
 that the hash value is calculated using the formula
 \[(c[1] A^{n-1} + c[2] A^{n-2} + \cdots + c[n] A^0) \bmod B  ,\]
 where $c[1],c[2],\ldots,c[n]$
@ -218,7 +241,7 @@ in the string \texttt{ALLEY} are:
 \end{tikzpicture}
 \end{center}

-If $A=3$ and $B=97$, the hash value
+Thus, if $A=3$ and $B=97$, the hash value
 for the string \texttt{ALLEY} is

 \[(65 \cdot 3^4 + 76 \cdot 3^3 + 76 \cdot 3^2 + 69 \cdot 3^1 + 89 \cdot 3^0) \bmod 97 = 52.\]
@ -232,8 +255,8 @@ we can calculate the hash value of any substring
 in $O(1)$ time after an $O(n)$ time preprocessing.

 The idea is to construct an array $h$ such that
-$h[k]$ contains the hash value for the prefix
-of the string that ends at index $k$.
+$h[k]$ contains the hash value of the prefix
+of the string that ends at position $k$.
 The array values can be recursively calculated as follows:
 \[
 \begin{array}{lcl}
@ -250,9 +273,8 @@ p[k] & = & (p[k-1] A) \bmod B. \\
 \end{array}
 \]
 Constructing these arrays takes $O(n)$ time.
-After this, the hash value for a substring
-of the string
-that begins at index $a$ and ends at index $b$
+After this, the hash value of a substring
+that begins at position $a$ and ends at position $b$
 can be calculated in $O(1)$ time using the formula
 \[(h[b]-h[a-1] p[b-a+1]) \bmod B.\]

@ -268,16 +290,15 @@ the strings are \emph{certainly} different.

 Using hashing, we can often make a brute force
 algorithm efficient.
-As an example, let's consider a brute force
-algorithm that calculates how many times
-a string $p$ occurs as a substring in
-a string $s$.
-The algorithm goes through all locations
-where $p$ can occur, and compares the strings
+As an example, consider the pattern matching problem:
+given a string $s$ and a pattern $p$,
+find the positions where $p$ occurs in $s$.
+A brute force algorithm goes through all positions
+where $p$ may occur, and compares the strings
 character by character.
 The time complexity of such an algorithm is $O(n^2)$.

-However, we can make the algorithm more efficient
+We can make the brute force algorithm more efficient
 using hashing, because the algorithm compares
 substrings of strings.
 Using hashing, each comparison only takes $O(1)$ time,
@ -286,23 +307,24 @@ This results in an algorithm with time complexity $O(n)$,
 which is the best possible time complexity for this problem.

 By combining hashing and \emph{binary search},
-it is also possible to check the lexicographic order of
+it is also possible to find out the lexicographic order of
 two strings in logarithmic time.
-This can be done by finding out the length
+This can be done by calculating the length
 of the common prefix of the strings using binary search.
-Once we know the common prefix,
-the next character after the prefix
-indicates the order of the strings.
+Once we know the length of the common prefix,
+we can just check the next character after the prefix,
+because this determines the order of the strings.

 \subsubsection*{Collisions and parameters}

 \index{collision}

-An evident risk in comparing hash values is
-\key{collision}, which means that two strings have
+An evident risk when comparing hash values is
+a \key{collision}, which means that two strings have
 different contents but equal hash values.
-In this case, based on the hash values it seems that
-the strings are equal, but in reality they aren't,
+In this case, an algorithm that relies on
+the hash values concludes that the strings are equal,
+but in reality they are not,
 and the algorithm may give incorrect results.

 Collisions are always possible,
@ -310,49 +332,41 @@ because the number of different strings is larger
 than the number of different hash values.
 However, the probability of a collision is small
 if the constants $A$ and $B$ are carefully chosen.
-There are two goals: the hash values should be
-evenly distributed for the strings,
-and the number of different hash values should
-be large enough.
-
-A good solution is to use large random numbers
-as constants.
-A usual way is to choose constants that are
-near $10^9$, for example
+A usual way is to choose random constants
+near $10^9$, for example as follows:
 \[
 \begin{array}{lcl}
 A & = & 911382323 \\
 B & = & 972663749 \\
 \end{array}
 \]
-This choice ensures that the hash values
-are distributed evenly enough in the range $0 \ldots B-1$.
-The benefit in $10^9$ is that
-the \texttt{long long} type can be used
-for calculating the hash values,
-because the products $AB$ and $BB$ fit in \texttt{long long}.
-But is it enough to have $10^9$ different hash values?

-Let's consider three scenarios where hashing can be used:
+Using such constants,
+the \texttt{long long} type can be used
+when calculating the hash values,
+because the products $AB$ and $BB$ will fit in \texttt{long long}.
+But is it enough to have about $10^9$ different hash values?
+
+Let us consider three scenarios where hashing can be used:

 \textit{Scenario 1:} Strings $x$ and $y$ are compared with
 each other.
 The probability of a collision is $1/B$ assuming that
 all hash values are equally probable.

-\textit{Tapaus 2:} A string $x$ is compared with strings
+\textit{Scenario 2:} A string $x$ is compared with strings
 $y_1,y_2,\ldots,y_n$.
-The probability for one or more collisions is
+The probability of one or more collisions is

-\[1-(1-1/B)^n.\]
+\[1-(1-\frac{1}{B})^n.\]

-\textit{Tapaus 3:} Strings $x_1,x_2,\ldots,x_n$
+\textit{Scenario 3:} Strings $x_1,x_2,\ldots,x_n$
 are compared with each other.
-The probability for one or more collisions is
+The probability of one or more collisions is
 \[ 1 - \frac{B \cdot (B-1) \cdot (B-2) \cdots (B-n+1)}{B^n}.\]

 The following table shows the collision probabilities
-when the value of $B$ varies and $n=10^6$:
+when $n=10^6$ and the value of $B$ varies:

 \begin{center}
 \begin{tabular}{rrrr}
@ -384,12 +398,12 @@ in a room, the probability that some two people
 have the same birthday is large even if $n$ is quite small.
 In hashing, correspondingly, when all hash values are compared
 with each other, the probability that some two
-hash values are the same is large.
+hash values are equal is large.

-A good way to make the probability of a collision
-smaller is to calculate \emph{multiple} hash values
+We can make the probability of a collision
+smaller by calculating \emph{multiple} hash values
 using different parameters.
-It is very unlikely that a collision would occur
+It is unlikely that a collision would occur
 in all hash values at the same time.
 For example, two hash values with parameter
 $B \approx 10^9$ correspond to one hash
@ -401,37 +415,25 @@ which is convenient, because operations with 32 and 64
 bit integers are calculated modulo $2^{32}$ and $2^{64}$.
 However, this is not a good choice, because it is possible
 to construct inputs that always generate collisions when
-constants of the form $2^x$ are used\footnote{
-J. Pachocki and Jakub Radoszweski:
-''Where to use and how not to use polynomial string hashing''.
-\textit{Olympiads in Informatics}, 2013.
-}.
+constants of the form $2^x$ are used.
+% \footnote{
+% J. Pachocki and Jakub Radoszweski:
+% ''Where to use and how not to use polynomial string hashing''.
+% \textit{Olympiads in Informatics}, 2013.
+% }.

 \section{Z-algorithm}

 \index{Z-algorithm}
 \index{Z-array}

-The \key{Z-algorithm} generates a \key{Z-array}
-for the string, that contains for each index $k$
-in the string the length of the longest substring
-that begins at index $k$ and is a prefix of the string.
-Many string problems can be efficiently solved
-using the Z-algorithm.
+The \key{Z-array} of a string
+contains for each position $k$ in the string
+the lengt of the longest substring
+that begins at position $k$ and is a prefix of the string.
+Such an array can be efficiently constructed
+using the \key{Z-algorithm}.

-It is often a matter of taste whether to use
-the Z-algorithm or string hashing.
-Unlike hashing, the Z-algorithm always works
-and there is no risk for collisions.
-On the other hand, the Z-algorithm is more difficult
-to implement and some problems can only be solved
-using hashing.
-
-\subsubsection*{Description}
-
-The Z-algorithm constructs a Z-array that
-indicates for each position the length of the
-longest substring that is also a prefix of the string.
 For example, the Z-array for the string
 \texttt{ACBACDACBACBACDA} is as follows:

@ -494,45 +496,50 @@ For example, the Z-array for the string
 \end{tikzpicture}
 \end{center}

-For example, the position 7 contains the value 5,
+For example, the value at position 7 in the
+above Z-array is 5,
 because the substring \texttt{ACBAC} of length 5
 is a prefix of the string,
 but the substring \texttt{ACBACB} of length 6
 is not a prefix of the string.

-The Z-algorithm scans the string from the left
-to the right, and calculates for each position
+It is often a matter of taste whether to use
+string hashing or the Z-algorithm.
+Unlike hashing, the Z-algorithm always works
+and there is no risk for collisions.
+On the other hand, the Z-algorithm is more difficult
+to implement and some problems can only be solved
+using hashing.
+
+\subsubsection*{Algorithm description}
+
+The Z-algorithm scans the string from left
+to right, and calculates for each position
 the length of the longest substring that
 is a prefix of the string.
-The algorithm compares the first characters
-of the string
-and the active substring with each other to
-find the length of the common prefix.
-
-A straightforward implementation would yield
-an algorithm with time complexity $O(n^2)$
-because the common prefixes may be long.
-However, the Z-algorithm has one important
+A straightforward algorithm
+would have a time complexity of $O(n^2)$,
+but the Z-algorithm has an important
 optimization which ensures that the time complexity
 is only $O(n)$.
+
 The idea is to maintain a range $[x,y]$ such that
 the substring from $x$ to $y$ is a prefix of
 the string and $y$ is as large as possible.
 Since the Z-array already contains information
 about the characters in the range $[x,y]$,
-it is not needed to process them again later in the algorithm.
+we can use this information to calculate
+values for elements in the range $[x,y]$.

 The time complexity of the Z-algorithm is $O(n)$,
-because the algorithm always compares substrings
-character by character only from index $y+1$.
+because the algorithm always compares strings
+character by character starting at position $y+1$.
 If the characters match, the value of $y$ increases,
-and it is not needed to inspect the character again,
+and it is not needed to compare the character at
+position $y$ again,
 but the information in the Z-array can be used.

-\subsubsection*{Example}
-
-Let's construct the following Z-array using
-the Z-algorithm:
+For example, let us construct the following Z-array:

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -595,7 +602,8 @@ the Z-algorithm:

 The first interesting position is 7 where the
 length of the common prefix is 5.
-The corresponding range in the string is $[7,11]$:
+After calculating this value,
+the current $[x,y]$ range will be $[7,11]$:

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -663,14 +671,17 @@ The corresponding range in the string is $[7,11]$:
 \end{tikzpicture}
 \end{center}

-The benefit in the range $[7,11]$ is that the
-algorithm can calculate the subsequent values
-for the Z-array more efficiently.
-Since the ranges $[1,5]$ and $[7,11]$ contain
-the same characters, also the Z-array will
-contain similar values.
-First, the values at indices 8 and 9
-correspond to the values at indices 2 and 3:
+Now, it is possible to calculate the
+subsequent values for the Z-array
+more efficiently,
+because we know that
+the ranges $[1,5]$ and $[7,11]$
+contain the same characters.
+First, since the values at
+positions 2 and 3 are 0,
+we immediately know that
+the values at positions 8 and 9
+are also 0:

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -742,13 +753,9 @@ correspond to the values at indices 2 and 3:
 \end{tikzpicture}
 \end{center}

-After this, the value for index 10 can be
-calculated using the value at index 4.
-The value at index 4 is 2,
-so the first two characters
-in the substring match the beginning of the string.
-However, the characters after index $y=11$ have
-not been inspected yet.
+After this, we know that the value
+at position 10 will be at least 2,
+because the value at position 4 is 2:

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -817,13 +824,85 @@ not been inspected yet.
 \end{tikzpicture}
 \end{center}

-The algorithm compares the substring
-beginning at index $y+1=12$ character by character.
-The previous values in the Z-array cannot be used,
-because this is the first time the characters
-after index 11 are inspected.
+Since we have no information about the characters
+after position 11, we have to begin to compare the strings
+character by character:
+
+\begin{center}
+\begin{tikzpicture}[scale=0.7]
+\fill[color=lightgray] (9,0) rectangle (10,1);
+\fill[color=lightgray] (2,1) rectangle (7,2);
+\fill[color=lightgray] (11,1) rectangle (16,2);
+
+
+\draw (0,0) grid (16,2);
+
+\node at (0.5, 1.5) {A};
+\node at (1.5, 1.5) {C};
+\node at (2.5, 1.5) {B};
+\node at (3.5, 1.5) {A};
+\node at (4.5, 1.5) {C};
+\node at (5.5, 1.5) {D};
+\node at (6.5, 1.5) {A};
+\node at (7.5, 1.5) {C};
+\node at (8.5, 1.5) {B};
+\node at (9.5, 1.5) {A};
+\node at (10.5, 1.5) {C};
+\node at (11.5, 1.5) {B};
+\node at (12.5, 1.5) {A};
+\node at (13.5, 1.5) {C};
+\node at (14.5, 1.5) {D};
+\node at (15.5, 1.5) {A};
+
+\node at (0.5, 0.5) {--};
+\node at (1.5, 0.5) {0};
+\node at (2.5, 0.5) {0};
+\node at (3.5, 0.5) {2};
+\node at (4.5, 0.5) {0};
+\node at (5.5, 0.5) {0};
+\node at (6.5, 0.5) {5};
+\node at (7.5, 0.5) {0};
+\node at (8.5, 0.5) {0};
+\node at (9.5, 0.5) {?};
+\node at (10.5, 0.5) {?};
+\node at (11.5, 0.5) {?};
+\node at (12.5, 0.5) {?};
+\node at (13.5, 0.5) {?};
+\node at (14.5, 0.5) {?};
+\node at (15.5, 0.5) {?};
+
+\draw [decoration={brace}, decorate, line width=0.5mm] (6,3.00) -- (11,3.00);
+
+\node at (6.5,3.50) {$x$};
+\node at (10.5,3.50) {$y$};
+
+
+\footnotesize
+\node at (0.5, 2.5) {1};
+\node at (1.5, 2.5) {2};
+\node at (2.5, 2.5) {3};
+\node at (3.5, 2.5) {4};
+\node at (4.5, 2.5) {5};
+\node at (5.5, 2.5) {6};
+\node at (6.5, 2.5) {7};
+\node at (7.5, 2.5) {8};
+\node at (8.5, 2.5) {9};
+\node at (9.5, 2.5) {10};
+\node at (10.5, 2.5) {11};
+\node at (11.5, 2.5) {12};
+\node at (12.5, 2.5) {13};
+\node at (13.5, 2.5) {14};
+\node at (14.5, 2.5) {15};
+\node at (15.5, 2.5) {16};
+
+%\draw[thick,<->] (11.5,-0.25) .. controls (11,-1.25) and (3,-1.25) .. (2.5,-0.25);
+\end{tikzpicture}
+\end{center}
+
+
 It turns out that the length of the common
-prefix is 7, and the range $[x,y]$ will be updated:
+prefix at position 10 is 7,
+and thus the new range $[x,y]$ is $[10,16]$:

 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -892,9 +971,9 @@ prefix is 7, and the range $[x,y]$ will be updated:
 \end{tikzpicture}
 \end{center}

-After this, all subsequent values in the Z-array
-can be calculated using the information in
-the range $[x,y]$. All the remaining values can be
+After this, all subsequent values for the Z-array
+can be calculated using the values already
+calculated to the array. All the remaining values can be
 directly retrieved from the beginning of the Z-array:

 \begin{center}
@ -964,29 +1043,26 @@ directly retrieved from the beginning of the Z-array:

 \subsubsection{Using the Z-array}

-As an example, let's solve a problem
-where our task is to calculate
-the number of times a string $p$
-occurs as a substring in a string $s$.
-Previously, we solved this problem
+As an example, let us once again consider
+the pattern matching problem,
+where our task is to find the positions
+where a pattern $p$ occurs in a string $s$.
+We already solved this problem efficiently
 using string hashing, but the Z-algorithm
 provides another way to solve the problem.

-A usual idea when using the Z-algorithm
-is to construct a string that consists of
-several strings separated by special characters.
+A usual idea in string processing is to
+construct a string that consists of
+multiple strings separated by special characters.
 In this problem, we can construct a string
 $p$\texttt{\#}$s$,
 where $p$ and $s$ are separated by a special
-character \texttt{\#} that doesn't occur
+character \texttt{\#} that does not occur
 in the strings.
-After this, the Z-array for the string
-$p$\texttt{\#}$s$ indicates the positions
-where $p$ occurs in $s$.
-Such positions are those positions in the Z-array
-that contain the value $p$.
+The Z-array of $p$\texttt{\#}$s$ indicates the positions
+where $p$ occurs in $s$,
+because such positions contain the value $p$.

-\begin{samepage}
 For example, if $s=$\texttt{HATTIVATTI} and $p=$\texttt{ATT},
 the Z-array is as follows:

@ -1041,12 +1117,12 @@ the Z-array is as follows:
 \node at (13.5, 2.5) {14};
 \end{tikzpicture}
 \end{center}
-\end{samepage}
+
 The positions 6 and 11 contain the value 3,
-which means that the substring \texttt{ATT}
+which means that the pattern \texttt{ATT}
 occurs in the corresponding positions
 in the string \texttt{HATTIVATTI}.

 The time complexity of the resulting algorithm
-is $O(n)$, because it suffices to construct and
-go through the Z-array.
+is $O(n)$, because it suffices to construct
+the Z-array and go through its values.