Chapter 6 ready

2017-01-02 00:34:14 +02:00 · 2017-01-02 00:34:14 +02:00 · 66f167a6eb
parent c0b6c97340
commit 66f167a6eb
1 changed files with 176 additions and 169 deletions
--- a/luku06.tex
+++ b/luku06.tex
@ -264,22 +264,20 @@ possibilities to select for the next events,
 compared to the strategy that we select the
 event that ends as soon as possible.
-\section{Tehtävät ja deadlinet}
+\section{Tasks and deadlines}
-Annettuna on $n$ tehtävää,
+We are given $n$ tasks with duration and deadline.
-joista jokaisella on kesto ja deadline.
+Our task is to choose an order to perform the tasks.
-Tehtäväsi on valita järjestys,
+For each task, we get $d-x$ points
-jossa suoritat tehtävät.
+where $d$ is the deadline of the task
-Saat kustakin tehtävästä $d-x$ pistettä,
+and $x$ is the moment when we finished the task.
-missä $d$ on tehtävän deadline ja $x$
+What is the largest possible total score
-on tehtävän valmistumishetki.
+we can obtain?
 Mikä on suurin mahdollinen
 yhteispistemäärä, jonka voit saada tehtävistä?
-Esimerkiksi jos tehtävät ovat
+For example, if the tasks are
 \begin{center}
 \begin{tabular}{lll}
-tehtävä & kesto & deadline \\
+task & duration & deadline \\
 \hline
 $A$ & 4 & 2 \\
 $B$ & 3 & 5 \\
@ -287,8 +285,8 @@ $C$ & 2 & 7 \\
 $D$ & 4 & 5 \\
 \end{tabular}
 \end{center}
-niin optimaalinen ratkaisu on suorittaa
+then the optimal solution is to perform
-tehtävät seuraavasti:
+the tasks as follows:
 \begin{center}
 \begin{tikzpicture}[scale=.4]
  \begin{scope}
@ -314,22 +312,21 @@ tehtävät seuraavasti:
  \end{scope}
 \end{tikzpicture}
 \end{center}
-Tässä ratkaisussa $C$ tuottaa 5 pistettä,
+In this solution, $C$ yields 5 points,
-$B$ tuottaa 0 pistettä, $A$ tuottaa $-7$ pistettä
+$B$ yields 0 points, $A$ yields $-7$ points
-ja $D$ tuottaa $-8$ pistettä,
+and $D$ yields $-8$ points,
-joten yhteispistemäärä on $-10$.
+so the total score is $-10$.
-Yllättävää kyllä, tehtävän optimaalinen ratkaisu
+Surprisingly, the optimal solution for the problem
-ei riipu lainkaan deadlineista,
+doesn't depend on the dedalines at all,
-vaan toimiva ahne strategia on
+but a correct greedy strategy is to simply
-yksinkertaisesti
+perform the tasks \emph{sorted by their durations}
-suorittaa tehtävät \emph{järjestyksessä keston mukaan}
+in increasing order.
-lyhimmästä pisimpään.
+The reason for this is that if we ever perform
-Syynä tähän on, että jos missä tahansa vaiheessa
+two successive tasks such that the first task
-suoritetaan peräkkäin kaksi tehtävää,
+takes longer than the second task,
-joista ensimmäinen kestää toista kauemmin,
+we can obtain a better solution if we swap the tasks.
-tehtävien järjestyksen vaihtaminen parantaa ratkaisua.
+For example, if the successive tasks are
 Esimerkiksi jos peräkkäin ovat tehtävät
 \begin{center}
 \begin{tikzpicture}[scale=.4]
  \begin{scope}
@ -348,7 +345,7 @@ Esimerkiksi jos peräkkäin ovat tehtävät
  \end{scope}
 \end{tikzpicture}
 \end{center}
-ja $a>b$, niin järjestyksen muuttaminen muotoon
+and $a>b$, the swapped order of the tasks
 \begin{center}
 \begin{tikzpicture}[scale=.4]
  \begin{scope}
@ -367,97 +364,102 @@ ja $a>b$, niin järjestyksen muuttaminen muotoon
  \end{scope}
 \end{tikzpicture}
 \end{center}
-antaa $X$:lle $b$ pistettä vähemmän ja $Y$:lle $a$ pistettä enemmän,
+gives $b$ points less to $X$ and $a$ points more to $Y$,
-joten kokonaismuutos pistemäärään on $a-b > 0$.
+so the total score increases by $a-b > 0$.
-Optimiratkaisussa
+In an optimal solution,
-kaikille peräkkäin suoritettaville tehtäville
+for each two successive tasks,
-tulee päteä, että lyhyempi tulee ennen pidempää,
+it must hold that the shorter task comes
-mistä seuraa, että tehtävät tulee suorittaa
+before the longer task.
-järjestyksessä keston mukaan.
+Thus, the tasks must be performed
 sorted by their durations.
-\section{Keskiluvut}
+\section{Minimizing sums}
-Tarkastelemme seuraavaksi ongelmaa, jossa
+We will next consider a problem where
-annettuna on $n$ lukua $a_1,a_2,\ldots,a_n$
+we are given $n$ numbers $a_1,a_2,\ldots,a_n$
-ja tehtävänä on etsiä luku $x$ niin, että summa
+and our task is to find a value $x$
 such that the sum
 \[|a_1-x|^c+|a_2-x|^c+\cdots+|a_n-x|^c\]
-on mahdollisimman pieni.
+becomes as small as possible.
-Keskitymme tapauksiin, joissa $c=1$ tai $c=2$.
+We will focus on the cases $c=1$ and $c=2$.
-\subsubsection{Tapaus $c=1$}
+\subsubsection{Case $c=1$}
-Tässä tapauksessa minimoitavana on summa
+In this case, we should minimize the sum
 \[|a_1-x|+|a_2-x|+\cdots+|a_n-x|.\]
-Esimerkiksi jos luvut ovat $[1,2,9,2,6]$,
+For example, if the numbers are $[1,2,9,2,6]$,
-niin paras ratkaisu on valita $x=2$,
+the best solution is to select $x=2$
-jolloin summaksi tulee
+which produces the sum
 \[
 |1-2|+|2-2|+|9-2|+|2-2|+|6-2|=12.
 \]
-Yleisessä tapauksessa paras valinta $x$:n arvoksi
+In the general case, the best choice for $x$
-on lukujen \textit{mediaani}
+is the \textit{median} of the numbers,
-eli keskimmäinen luku järjestyksessä.
+i.e., the middle number after sorting.
-Esimerkiksi luvut $[1,2,9,2,6]$
+For example, the list $[1,2,9,2,6]$
-ovat järjestyksessä $[1,2,2,6,9]$,
+becomes $[1,2,2,6,9]$ after sorting,
-joten mediaani on 2.
+so the median is 2.
-Mediaanin valinta on paras ratkaisu,
+The median is the optimal choice,
-koska jos $x$ on mediaania pienempi,
+because if $x$ is smaller than the median,
-$x$:n suurentaminen pienentää summaa,
+the sum becomes smaller by increasing $x$,
-ja vastaavasti jos $x$ on mediaania suurempi,
+and if $x$ is larger then the median,
-$x$:n pienentäminen pienentää summaa.
+the sum becomes smaller by decreasing $x$
-Niinpä $x$ kannattaa siirtää mahdollisimman
+Thus, we should move $x$ as near the median
-lähelle mediaania eli optimiratkaisu on
+as possible, so the optimal solution that $x$
-valita $x$ mediaaniksi.
+is the median.
-Jos $n$ on parillinen ja mediaaneja on kaksi,
+If $n$ is even and there are two medians,
-kumpikin mediaani sekä kaikki niiden välillä
+both medians and all values between them
-olevat luvut tuottavat optimaalisen ratkaisun.
+are optimal solutions.
-\subsubsection{Tapaus $c=2$}
+\subsubsection{Case $c=2$}
-Tässä tapauksessa minimoitavana on summa
+In this case, we should minimize the sum
 \[(a_1-x)^2+(a_2-x)^2+\cdots+(a_n-x)^2.\]
-Esimerkiksi jos luvut ovat $[1,2,9,2,6]$,
+For example, if the numbers are $[1,2,9,2,6]$,
-niin paras ratkaisu on $x=4$,
+the best solution is to select $x=4$
-jolloin summaksi tulee
+which produces the sum
 \[
 (1-4)^2+(2-4)^2+(9-4)^2+(2-4)^2+(6-4)^2=46.
 \]
-Yleisessä tapauksessa paras valinta $x$:n arvoksi on lukujen
+In the general case, the best choice for $x$
-\textit{keskiarvo}.
+is the \emph{average} of the numbers.
-Esimerkissä lukujen keskiarvo on $(1+2+9+2+6)/5=4$.
+In the example the average is $(1+2+9+2+6)/5=4$.
-Tämän tuloksen voi johtaa järjestämällä summan
+This result can be derived by presenting
-uudestaan muotoon
+the sum as follows:
 \[
 nx^2 - 2x(a_1+a_2+\cdots+a_n) + (a_1^2+a_2^2+\cdots+a_n^2).
 \]
-Viimeinen osa ei riipu $x$:stä, joten sen voi jättää huomiotta.
+The last part doesn't depend on $x$,
-Jäljelle jäävistä osista muodostuu funktio
+so we can ignore it.
-$nx^2-2xs$, kun $s=a_1+a_2+\cdots+a_n$.
+The remaining parts form a function
-Tämä on ylöspäin aukeava paraabeli,
+$nx^2-2xs$ where $s=a_1+a_2+\cdots+a_n$.
-jonka nollakohdat ovat $x=0$ ja $x=2s/n$
+This is a parabola opening upwards
-ja pienin arvo on näiden keskikohta
+with roots $x=0$ and $x=2s/n$,
-$x=s/n$ eli taulukon lukujen keskiarvo.
+and the minimum value is the average
 of the roots $x=s/n$, i.e.,
 the average of the numbers $a_1,a_2,\ldots,a_n$.
-\section{Tiedonpakkaus}
+\section{Data compression}
-\index{tiedonpakkaus}
+\index{data compression}
-\index{binxxrikoodi@binäärikoodi}
+\index{binary code}
-\index{koodisana@koodisana}
+\index{codeword}
-Annettuna on merkkijono ja tehtävänä on
+We are given a string, and our task is to
-\emph{pakata} se niin,
+\emph{compress} it so that it requires less space.
-että tilaa kuluu vähemmän.
+We will do this using a \key{binary code}
-Käytämme tähän \key{binäärikoodia},
+that determines for each character
-joka määrittää kullekin merkille
+a \key{codeword} that consists of bits.
-biteistä muodostuvan \key{koodisanan}.
+After this, we can compress the string
-Tällöin merkkijonon voi pakata
+by replacing each character by the
-korvaamalla jokaisen merkin vastaavalla koodisanalla.
+corresponding codeword.
-Esimerkiksi seuraava binäärikoodi määrittää
+For example, the following binary code
-koodisanat merkeille \texttt{A}–\texttt{D}:
+determines codewords for characters
 \texttt{A}–\texttt{D}:
 \begin{center}
 \begin{tabular}{rr}
-merkki & koodisana \\
+character & codeword \\
 \hline
 \texttt{A} & 00 \\
 \texttt{B} & 01 \\
@ -465,23 +467,25 @@ merkki & koodisana \\
 \texttt{D} & 11 \\
 \end{tabular}
 \end{center}
-Tämä koodi on \key{vakiopituinen}
+This is a \key{constant-length} code
-eli jokainen koodisana on yhtä pitkä.
+which means that the length of each
-Esimerkiksi merkkijono
+codeword is the same.
-\texttt{AABACDACA} on pakattuna
+For example, the compressed form of the string
 \texttt{AABACDACA} is
 \[000001001011001000,\]
-eli se vie tilaa 18 bittiä.
+so 18 bits are needed.
-Pakkausta on kuitenkin mahdollista parantaa
+However, we can compress the string better
-ottamalla käyttöön \key{muuttuvan pituinen} koodi,
+by using a \key{variable-length} code
-jossa koodisanojen pituus voi vaihdella.
+where codewords may have different lengths.
-Tällöin voimme antaa usein esiintyville merkeille
+Then we can give short codewords for
-lyhyen koodisanan ja harvoin esiintyville
+characters that appear often,
-merkeille pitkän koodisanan.
+and long codewords for characters
-Osoittautuu, että yllä olevalle merkkijonolle
+that appear rarely.
-\key{optimaalinen} koodi on seuraava:
+It turns out that the \key{optimal} code
 for the aforementioned string is as follows:
 \begin{center}
 \begin{tabular}{rr}
-merkki & koodisana \\
+character & codeword \\
 \hline
 \texttt{A} & 0 \\
 \texttt{B} & 110 \\
@ -489,27 +493,26 @@ merkki & koodisana \\
 \texttt{D} & 111 \\
 \end{tabular}
 \end{center}
-Optimaalinen koodi tuottaa
+The optimal code produces a compressed string
-mahdollisimman lyhyen pakatun merkkijonon.
+that is as short as possible.
-Tässä tapauksessa optimaalinen koodi
+In this case, the compressed form using
-pakkaa merkkijonon muotoon
+the optimal code is
 \[001100101110100,\]
-ja tilaa kuluu vain 15 bittiä.
+so only 15 bits are needed.
-Paremman koodin ansiosta onnistuimme siis säästämään
+Thus, thanks to a better code it was possible to
-3 bittiä tilaa pakkauksessa.
+save 3 bits in the compressed string.
-Huomaa, että koodin tulee olla aina sellainen,
+Note that it is required that no codeword
-että mikään koodisana ei ole toisen koodisanan
+is a prefix of another codeword.
-alkuosa.
+For example, it is not allowed that a code
-Esimerkiksi ei ole sallittua, että koodissa
+would contain both codewords 10
-olisi molemmat koodisanat 10 ja 1011.
+and 1011.
-Tämä rajoitus johtuu siitä,
+The reason for this is that we also want
-että haluamme myös pystyä palauttamaan
+to be able to generate the original string
-alkuperäisen merkkijonon pakkauksen jälkeen.
+from the compressed string.
-Jos koodisana voisi olla toisen alkuosa,
+If a codeword could be a prefix of another codeword,
-tämä ei välttämättä olisi mahdollista.
+this would not always be possible.
-Esimerkiksi seuraava koodi
+For example, the following code is \emph{not} valid:
 \emph{ei} ole kelvollinen:
 \begin{center}
 \begin{tabular}{rr}
 merkki & koodisana \\
@ -520,40 +523,41 @@ merkki & koodisana \\
 \texttt{D} & 111 \\
 \end{tabular}
 \end{center}
-Tätä koodia käyttäen ei olisi mahdollista tietää,
+Using this code, it would not be possible to know
-tarkoittaako pakattu merkkijono 1011
+if the compressed string 1011 means
-merkkijonoa \texttt{AB} vai merkkijonoa \texttt{C}.
+the string \texttt{AB} or the string \texttt{C}.
-\index{Huffmanin koodaus}
+\index{Huffman coding}
-\subsubsection{Huffmanin koodaus}
+\subsubsection{Huffman coding}
-\key{Huffmanin koodaus} on ahne algoritmi,
+\key{Huffman coding} is a greedy algorithm
-joka muodostaa optimaalisen koodin
+that constructs an optimal code for
-merkkijonon pakkaamista varten.
+compressing a string.
-Se muodostaa merkkien esiintymiskertojen
+The algorithm builds a binary tree
-perustella binääripuun, josta voi lukea
+based on the frequencies of the characters
-kunkin merkin koodisanan
+in the string,
-liikkumalla huipulta merkkiä vastaavaan solmuun.
+and a codeword for each characters can be read
-Liikkuminen vasemmalle vastaa
+by following a path from the root to
-bittiä 0 ja liikkuminen oikealle
+the corresponding node.
-vastaa bittiä 1.
+A move to the left correspons to bit 0,
 and a move to the right corresponds to bit 1.
-Aluksi jokaista merkkijonon merkkiä vastaa solmu,
+Initially, each character of the string is
-jonka painona on merkin esiintymiskertojen määrä merkkijonossa.
+represented by a node whose weight is the
-Sitten joka vaiheessa puusta valitaan
+number of times the character appears in the string.
-kaksi painoltaan pienintä solmua
+Then at each step two nodes with minimum weights
-ja ne yhdistetään luomalla niiden
+are selected and they are combined by creating
-yläpuolelle uusi solmu,
+a new node whose weight is the sum of the weights
-jonka paino on solmujen yhteispaino.
+of the original nodes.
-Näin jatketaan, kunnes kaikki solmut
+The process continues until all nodes have been
-on yhdistetty ja koodi on valmis.
+combined and the code is ready.
-Tarkastellaan nyt, miten Huffmanin koodaus
+Next we will see how Huffman coding creates
-muodostaa optimaalisen koodin merkkijonolle
+the optimal code for the string
 \texttt{AABACDACA}.
-Alkutilanteessa on neljä solmua,
+Initially, there are four nodes that correspond
-jotka vastaavat merkkijonossa olevia merkkejä:
+to the characters in the string:
 \begin{center}
 \begin{tikzpicture}[scale=0.9]
@ -570,13 +574,16 @@ jotka vastaavat merkkijonossa olevia merkkejä:
 %\path[draw,thick,-] (4) -- (5);
 \end{tikzpicture}
 \end{center}
-Merkkiä \texttt{A} vastaavan solmun paino on
+The node that represents character \texttt{A}
-5, koska merkki \texttt{A} esiintyy 5 kertaa merkkijonossa.
+has weight 5 because character \texttt{A}
-Muiden solmujen painot on laskettu vastaavalla tavalla.
+appears 5 times in the string.
 The other weights have been calculated
 in the same way.
-Ensimmäinen askel on yhdistää merkkejä \texttt{B} ja \texttt{D}
+The first step is to combine the nodes that
-vastaavat solmut, joiden kummankin paino on 1.
+correspond to characters \texttt{B} and \texttt{D},
-Tuloksena on:
+both with weight 1.
 The result is:
 \begin{center}
 \begin{tikzpicture}[scale=0.9]
 \node[draw, circle] (1) at (0,0) {$5$};
@ -597,7 +604,7 @@ Tuloksena on:
 \path[draw,thick,-] (4) -- (5);
 \end{tikzpicture}
 \end{center}
-Tämän jälkeen yhdistetään solmut, joiden paino on 2:
+After this, the nodes with weight 2 are combined:
 \begin{center}
 \begin{tikzpicture}[scale=0.9]
 \node[draw, circle] (1) at (1,0) {$5$};
@ -623,7 +630,7 @@ Tämän jälkeen yhdistetään solmut, joiden paino on 2:
 \path[draw,thick,-] (5) -- (6);
 \end{tikzpicture}
 \end{center}
-Lopuksi yhdistetään kaksi viimeistä solmua:
+Finally, the two remaining nodes are combined:
 \begin{center}
 \begin{tikzpicture}[scale=0.9]
 \node[draw, circle] (1) at (2,2) {$5$};
@ -655,11 +662,11 @@ Lopuksi yhdistetään kaksi viimeistä solmua:
 \end{tikzpicture}
 \end{center}
-Nyt kaikki solmut ovat puussa, joten koodi on valmis.
+Now all nodes are in the tree, so the code is ready.
-Puusta voidaan lukea seuraavat koodisanat:
+The following codewords can be read from the tree:
 \begin{center}
 \begin{tabular}{rr}
-merkki & koodisana \\
+character & codeword \\
 \hline
 \texttt{A} & 0 \\
 \texttt{B} & 110 \\