Chapter 26 first version

2017-01-24 22:45:47 +02:00 · 2017-01-24 22:45:47 +02:00 · 0b6b9e335a
parent e15acf135f
commit 0b6b9e335a
1 changed files with 117 additions and 135 deletions
--- a/luku26.tex
+++ b/luku26.tex
@ -392,7 +392,7 @@ using different parameters.
 It is very unlikely that a collision would occur
 in all hash values at the same time.
 For example, two hash values with parameter
-$B \approx 10^9$ corresponds to one hash
+$B \approx 10^9$ correspond to one hash
 value with parameter $B \approx 10^{18}$,
 which makes the probability of a collision very small.
@ -401,40 +401,39 @@ which is convenient, because operations with 32 and 64
 bit integers are calculated modulo $2^{32}$ and $2^{64}$.
 However, this is not a good choice, because it is possible
 to construct inputs that always generate collisions when
-remainders of the form $2^x$ are used\footnote{
+constants of the form $2^x$ are used\footnote{
-J. Pachocki ja Jakub Radoszweski:
+J. Pachocki and Jakub Radoszweski:
 ''Where to use and how not to use polynomial string hashing''.
 \textit{Olympiads in Informatics}, 2013.
 }.
-\section{Z-algoritmi}
+\section{Z-algorithm}
-\index{Z-algoritmi}
+\index{Z-algorithm}
-\index{Z-taulukko}
+\index{Z-array}
-\key{Z-algoritmi} muodostaa merkkijonosta \key{Z-taulukon},
+The \key{Z-algorithm} generates a \key{Z-array}
-joka kertoo kullekin merkkijonon kohdalle,
+for the string, that contains for each index $k$
-mikä on pisin kyseisestä kohdasta alkava osajono,
+in the string the length of the longest substring
-joka on myös merkkijonon alkuosa.
+that begins at index $k$ and is a prefix of the string.
-Z-algoritmin avulla voi ratkaista tehokkaasti
+Many string problems can be efficiently solved
-monia merkkijonotehtäviä.
+using the Z-algorithm.
-Z-algoritmi ja merkkijonohajautus ovat usein
+It is often a matter of taste whether to use
-vaihtoehtoisia tekniikoita, ja on makuasia,
+the Z-algorithm or string hashing.
-kumpaa algoritmia käyttää.
+Unlike hashing, the Z-algorithm always works
-Toisin kuin hajautus, Z-algoritmi toimii
+and there is no risk for collisions.
-varmasti oikein eikä siinä ole törmäysten riskiä.
+On the other hand, the Z-algorithm is more difficult
-Toisaalta Z-algoritmi on vaikeampi toteuttaa eikä
+to implement and some problems can only be solved
-se sovellu kaikkeen samaan kuin hajautus.
+using hashing.
-\subsubsection*{Algoritmin toiminta}
+\subsubsection*{Description}
-Z-algoritmi muodostaa merkkijonolle Z-taulukon,
+The Z-algorithm constructs a Z-array that
-jonka jokaisessa kohdassa lukee,
+indicates for each position the length of the
-kuinka pitkälle kohdasta
+longest substring that is also a prefix of the string.
-alkava osajono vastaa merkkijonon alkuosaa.
+For example, the Z-array for the string
-Esimerkiksi Z-taulukko
+\texttt{ACBACDACBACBACDA} is as follows:
 merkkijonolle \texttt{ACBACDACBACBACDA} on seuraava:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -495,52 +494,45 @@ merkkijonolle \texttt{ACBACDACBACBACDA} on seuraava:
 \end{tikzpicture}
 \end{center}
-Esimerkiksi kohdassa 7 on arvo 5,
+For example, the position 7 contains the value 5,
-koska siitä alkava 5-merkkinen osajono
+because the substring \texttt{ACBAC} of length 5
-\texttt{ACBAC} on merkkijonon alkuosa,
+is a prefix of the string,
-mutta 6-merkkinen osajono \texttt{ACBACB}
+but the substring \texttt{ACBACB} of length 6
-ei ole enää merkkijonon alkuosa.
+is not a prefix of the string.
-Z-algoritmi käy läpi merkkijonon
+The Z-algorithm scans the string from the left
-vasemmalta oikealle ja laskee
+to the right, and calculates for each position
-jokaisessa kohdassa,
+the length of the longest substring that
-kuinka pitkälle kyseisestä kohdasta alkava
+is a prefix of the string.
-osajono täsmää merkkijonon alkuun.
+The algorithm compares the first characters
-Algoritmi laskee yhteisen
+of the string
-alkuosan pituuden vertaamalla
+and the active substring with each other to
-merkkijonon alkua ja osajonon alkua toisiinsa.
+find the length of the common prefix.
-Suoraviivaisesti toteutettuna
+A straightforward implementation would yield
-tällaisen algoritmin aikavaativuus olisi $O(n^2)$,
+an algorithm with time complexity $O(n^2)$
-koska yhteiset alkuosat voivat olla pitkiä.
+because the common prefixes may be long.
-Z-algoritmissa on kuitenkin yksi tärkeä
+However, the Z-algorithm has one important
-optimointi, jonka ansiosta algoritmin
+optimization which ensures that the time complexity
-aikavaativuus on vain $O(n)$.
+is only $O(n)$.
 The idea is to maintain a range $[x,y]$ such that
 the substring from $x$ to $y$ is a prefix of
 the string and $y$ is as large as possible.
 Since the Z-array already contains information
 about the characters in the range $[x,y]$,
 it is not needed to process them again later in the algorithm.
-Ideana on pitää muistissa väliä $[x,y]$,
+The time complexity of the Z-algorithm is $O(n)$,
-joka on aiemmin laskettu merkkijonon
+because the algorithm always compares substrings
-alkuun täsmäävä väli, jossa $y$ on 
+character by character only from index $y+1$.
-mahdollisimman suuri.
+If the characters match, the value of $y$ increases,
-Tällä välillä olevia
+and it is not needed to inspect the character again,
-merkkejä ei tarvitse koskaan
+but the information in the Z-array can be used.
 verrata uudestaan
 merkkijonon alkuun, vaan niitä koskevan
 tiedon saa suoraan Z-taulukon lasketusta osasta.
-Z-algoritmin aikavaativuus on $O(n)$,
+\subsubsection*{Example}
 koska algoritmi aloittaa merkki kerrallaan
 vertailemisen vasta kohdasta $y+1$.
 Jos merkit täsmäävät, kohta $y$
 siirtyy eteenpäin
 eikä algoritmin tarvitse enää
 koskaan vertailla tätä kohtaa,
 vaan algoritmi pystyy hyödyntämään
 Z-taulukon alussa olevaa tietoa.
-\subsubsection*{Esimerkki}
+Let's construct the following Z-array using
-
+the Z-algorithm:
 Katsotaan nyt, miten Z-algoritmi muodostaa
 seuraavan Z-taulukon:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -601,10 +593,9 @@ seuraavan Z-taulukon:
 \end{tikzpicture}
 \end{center}
-Ensimmäinen mielenkiintoinen kohta tulee,
+The first interesting position is 7 where the
-kun yhteisen alkuosan pituus on 5.
+length of the common prefix is 5.
-Silloin algoritmi laittaa muistiin
+The corresponding range in the string is $[7,11]$:
 välin $[7,11]$ seuraavasti:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -672,18 +663,14 @@ välin $[7,11]$ seuraavasti:
 \end{tikzpicture}
 \end{center}
-Välin $[7,11]$ hyötynä on, että algoritmi
+The benefit in the range $[7,11]$ is that the
-voi sen avulla laskea seuraavat
+algorithm can calculate the subsequent values
-Z-taulukon arvot nopeammin.
+for the Z-array more efficiently.
-Koska välin $[7,11]$ merkit ovat samat
+Since the ranges $[1,5]$ and $[7,11]$ contain
-kuin merkkijonon alussa,
+the same characters, also the Z-array will
-myös Z-taulukon arvoissa on vastaavuutta.
+contain similar values.
-
+First, the values at indices 8 and 9
-Ensinnäkin kohdissa 8 ja 9
+correspond to the values at indices 2 and 3:
 tulee olla samat arvot kuin
 kohdissa 2 ja 3,
 koska väli $[7,11]$
 vastaa väliä $[1,5]$:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -755,13 +742,13 @@ vastaa väliä $[1,5]$:
 \end{tikzpicture}
 \end{center}
-Seuraavaksi kohdasta 4 saa tietoa kohdan
+After this, the value for index 10 can be
-10 arvon laskemiseksi.
+calculated using the value at index 4.
-Koska kohdassa 4 on arvo 2,
+The value at index 4 is 2,
-tämä tarkoittaa, että osajono
+so the first two characters
-täsmää kohtaan $y=11$ asti,
+in the substring match the beginning of the string.
-mutta sen jälkeen on tutkimatonta
+However, the characters after index $y=11$ have
-aluetta merkkijonossa.
+not been inspected yet.
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -830,13 +817,13 @@ aluetta merkkijonossa.
 \end{tikzpicture}
 \end{center}
-Nyt algoritmi alkaa vertailla merkkejä
+The algorithm compares the substring
-kohdasta $y+1=12$ alkaen merkki kerrallaan.
+beginning at index $y+1=12$ character by character.
-Algoritmi ei voi hyödyntää valmiina
+The previous values in the Z-array cannot be used,
-Z-taulukossa olevaa tietoa, koska se ei ole vielä aiemmin
+because this is the first time the characters
-tutkinut merkkijonoa näin pitkälle.
+after index 11 are inspected.
-Tuloksena osajonon pituudeksi tulee 7
+It turns out that the length of the common
-ja väli $[x,y]$ päivittyy vastaavasti:
+prefix is 7, and the range $[x,y]$ will be updated:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -905,11 +892,10 @@ ja väli $[x,y]$ päivittyy vastaavasti:
 \end{tikzpicture}
 \end{center}
-Tämän jälkeen kaikkien seuraavien Z-taulukon
+After this, all subsequent values in the Z-array
-arvojen laskemisessa pystyy hyödyntämään
+can be calculated using the information in
-jälleen välin $[x,y]$ antamaa tietoa
+the range $[x,y]$. All the remaining values can be
-ja algoritmi saa Z-taulukon loppuun tulevat
+directly retrieved from the beginning of the Z-array:
 arvot suoraan Z-taulukon alusta:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -976,37 +962,33 @@ arvot suoraan Z-taulukon alusta:
 \end{tikzpicture}
 \end{center}
-\subsubsection{Z-taulukon käyttäminen}
+\subsubsection{Using the Z-array}
-Ratkaistaan esimerkkinä tehtävä,
+As an example, let's solve a problem
-jossa laskettavana on,
+where our task is to calculate
-montako kertaa merkkijono $p$
+the number of times a string $p$
-esiintyy osajonona merkkijonossa $s$.
+occurs as a substring in a string $s$.
-Ratkaisimme tehtävän aiemmin tehokkaasti
+Previously, we solved this problem
-merkkijonohajautuksen avulla,
+using string hashing, but the Z-algorithm
-ja nyt Z-algoritmi tarjoaa siihen
+provides another way to solve the problem.
 vaihtoehtoisen lähestymistavan.
-Usein esiintyvä idea Z-algoritmin yhteydessä
+A usual idea when using the Z-algorithm
-on muodostaa merkkijono,
+is to construct a string that consists of
-jonka osana on useita välimerkeillä
+several strings separated by special characters.
-erotettuja merkkijonoja.
+In this problem, we can construct a string
 Tässä tehtävässä sopiva merkkijono on
 $p$\texttt{\#}$s$,
-jossa merkkijonojen $p$ ja $s$ välissä on
+where $p$ and $s$ are separated by a special
-erikoismerkki \texttt{\#},
+character \texttt{\#} that doesn't occur
-jota ei esiinny merkkijonoissa.
+in the strings.
-Nyt merkkijonoa $p$\texttt{\#}$s$
+After this, the Z-array for the string
-vastaava Z-taulukko kertoo,
+$p$\texttt{\#}$s$ indicates the positions
-missä kohdissa merkkijonoa $p$
+where $p$ occurs in $s$.
-esiintyy merkkijono $s$.
+Such positions are those positions in the Z-array
-Tällaiset kohdat ovat tarkalleen ne
+that contain the value $p$.
 Z-taulukon kohdat, joissa on
 merkkijonon $p$ pituus.
 \begin{samepage}
-Esimerkiksi jos $s=$\texttt{HATTIVATTI} ja $p=$\texttt{ATT},
+For example, if $s=$\texttt{HATTIVATTI} and $p=$\texttt{ATT},
-niin Z-taulukosta tulee:
+the Z-array is as follows:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -1060,11 +1042,11 @@ niin Z-taulukosta tulee:
 \end{tikzpicture}
 \end{center}
 \end{samepage}
-Taulukon kohdissa 6 ja 11 on luku 3,
+The positions 6 and 11 contain the value 3,
-mikä tarkoittaa, että \texttt{ATT}
+which means that the substring \texttt{ATT}
-esiintyy vastaavissa kohdissa merkkijonossa
+occurs in the corresponding positions
-\texttt{HATTIVATTI}.
+in the string \texttt{HATTIVATTI}.
-Tuloksena olevan algoritmin aikavaativuus on
+The time complexity of the resulting algorithm
-$O(n)$, koska riittää muodostaa Z-taulukko
+is $O(n)$, because it suffices to construct and
-ja käydä se läpi.
+go through the Z-array.