Sorting theory

2016-12-30 00:17:22 +02:00 · 2016-12-30 00:17:22 +02:00 · a356b5014c
parent 9bac8cca9d
commit a356b5014c
1 changed files with 175 additions and 184 deletions
--- a/luku03.tex
+++ b/luku03.tex
@ -1,41 +1,42 @@
 \chapter{Sorting}
-\index{jxrjestxminen@järjestäminen}
+\index{sorting}
-\key{Järjestäminen}
+\key{Sorting}
-on keskeinen algoritmiikan ongelma.
+is a fundamental algorithm design problem.
-Moni tehokas algoritmi
+In addition,
-perustuu järjestämiseen,
+many efficient algorithms
-koska järjestetyn tiedon
+use sorting as a subroutine,
-käsittely on helpompaa
+because it is often easier to process
-kuin sekalaisessa järjestyksessä olevan.
+data if the elements are in a sorted order.
-Esimerkiksi kysymys ''onko taulukossa kahta samaa
+For example, the question ''does the array contain
-alkiota?'' ratkeaa tehokkaasti järjestämisen avulla.
+two equal elements?'' is easy to solve using sorting.
-Jos taulukossa on kaksi samaa alkiota,
+If the array contains two equal elements,
-ne ovat järjestämisen jälkeen peräkkäin,
+they will be next to each other after sorting,
-jolloin niiden löytäminen on helppoa.
+so it is easy to find them.
-Samaan tapaan ratkeaa myös kysymys
+Also the question ''what is the most frequent element
-''mikä on yleisin alkio taulukossa?''.
+in the array?'' can be solved similarly.
-Järjestämiseen on kehitetty monia
+There are many algorithms for sorting, that are
-algoritmeja, jotka tarjoavat hyviä
+also good examples of algorithm design techniques.
-esimerkkejä algoritmien suunnittelun tekniikoista.
+The efficient general sorting algorithms
-Tehokkaat yleiset järjestämis\-algoritmit
+work in $O(n \log n)$ time,
-toimivat ajassa $O(n \log n)$, ja tämä aikavaativuus
+and many algorithms that use sorting
-on myös monella järjestämistä käyttävällä algoritmilla.
+as a subroutine also
 have this time complexity.
-\section{Järjestämisen teoriaa}
+\section{Sorting theory}
-Järjestämisen perusongelma on seuraava:
+The basic problem in sorting is as follows:
 \begin{framed}
 \noindent
-Annettuna on taulukko, jossa on $n$ alkiota.
+Given an array that contains $n$ elements,
-Tehtäväsi on järjestää alkiot pienimmästä
+your task is to sort the elements
-suurimpaan.
+in increasing order.
 \end{framed}
 \noindent
-Esimerkiksi taulukko
+For example, the array
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -59,7 +60,7 @@ Esimerkiksi taulukko
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-on järjestettynä seuraava:
+will be as follows after sorting:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -84,27 +85,26 @@ on järjestettynä seuraava:
 \end{tikzpicture}
 \end{center}
-\subsubsection{$O(n^2)$-algoritmit}
+\subsubsection{$O(n^2)$ algorithms}
-\index{kuplajxrjestxminen@kuplajärjestäminen}
+\index{bubble sort}
-Yksinkertaiset algoritmit taulukon
+Simple algorithms for sorting an array
-järjestämiseen vievät aikaa $O(n^2)$.
+work in $O(n^2)$ time.
-Tällaiset algoritmit ovat lyhyitä ja
+Such algorithms are short and usually
-muodostuvat tyypillisesti
+consist of two nested loops.
-kahdesta sisäkkäisestä silmukasta.
+A famous $O(n^2)$ time algorithm for sorting
-Tunnettu $O(n^2)$-aikainen algoritmi on
+is \key{bubble sort} where the elements
-\key{kuplajärjestäminen},
+''bubble'' forward in the array according to their values.
 jossa alkiot ''kuplivat'' eteenpäin taulukossa
 niiden suuruuden perusteella.
-Kuplajärjestäminen muodostuu $n-1$ kierroksesta,
+Bubble sort consists of $n-1$ rounds.
-joista jokainen käy taulukon läpi vasemmalta oikealle.
+On each round, the algorithm iterates through
-Aina kun taulukosta löytyy kaksi vierekkäistä
+the elements in the array.
-alkiota, joiden järjestys on väärä, algoritmi
+Whenever two successive elements are found
-korjaa niiden järjestyksen.
+that are not in correct order,
-Algoritmin voi toteuttaa seuraavasti
+the algorithm swaps them.
-taulukolle
+The algorithm can be implemented as follows
 for array
 $\texttt{t}[1],\texttt{t}[2],\ldots,\texttt{t}[n]$:
 \begin{lstlisting}
 for (int i = 1; i <= n-1; i++) {
@ -114,13 +114,14 @@ for (int i = 1; i <= n-1; i++) {
 }
 \end{lstlisting}
-Algoritmin ensimmäisen kierroksen jälkeen suurin
+After the first round of the algorithm,
-alkio on paikallaan, toisen kierroksen jälkeen
+the largest element is in the correct place,
-kaksi suurinta alkiota on paikallaan, jne.
+after the second round the second largest
-Niinpä $n-1$ kierroksen jälkeen koko taulukko
+element is in the correct place, etc.
-on järjestyksessä.
+Thus, after $n-1$ rounds, all elements
 will be sorted.
-Esimerkiksi taulukossa
+For example, in the array
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -148,8 +149,8 @@ Esimerkiksi taulukossa
 \end{center}
 \noindent
-kuplajärjestämisen ensimmäinen
+the first round of bubble sort swaps elements
-läpikäynti tekee seuraavat vaihdot:
+as follows:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -257,25 +258,26 @@ läpikäynti tekee seuraavat vaihdot:
 \end{tikzpicture}
 \end{center}
-\subsubsection{Inversiot}
+\subsubsection{Inversions}
-\index{inversio@inversio}
+\index{inversion}
-Kuplajärjestäminen on esimerkki algoritmista,
+Bubble sort is an example of a sorting
-joka perustuu taulukon vierekkäisten alkioiden
+algorithm that always swaps successive
-vaihtamiseen keskenään.
+elements in the array.
-Osoittautuu, että tällaisen algoritmin
+It turns out that the time complexity
-aikavaativuus on \emph{aina} vähintään $O(n^2)$,
+of this kind of an algorithm is \emph{always}
-koska pahimmassa tapauksessa taulukon
+at least $O(n^2)$ because in the worst case,
-järjestäminen vaatii $O(n^2)$ alkioparin vaihtamista.
+$O(n^2)$ swaps are required for sorting the array.
-Hyödyllinen käsite järjestämisalgoritmien
+A useful concept when analyzing sorting
-analyysissa on \key{inversio}.
+algorithms is an \key{inversion}.
-Se on taulukossa oleva alkiopari
+It is a pair of elements
-$(\texttt{t}[a],\texttt{t}[b])$,
+$(\texttt{t}[a],\texttt{t}[b])$
-missä $a<b$ ja $\texttt{t}[a]>\texttt{t}[b]$
+in the array such that
-eli alkiot ovat väärässä järjestyksessä taulukossa.
+$a<b$ and $\texttt{t}[a]>\texttt{t}[b]$,
-Esimerkiksi taulukon
+i.e., they are in wrong order.
 For example, in the array
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -299,61 +301,54 @@ Esimerkiksi taulukon
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-inversiot ovat $(6,3)$, $(6,5)$ ja $(9,8)$.
+the inversions are $(6,3)$, $(6,5)$ and $(9,8)$.
-Inversioiden määrä kuvaa, miten lähellä
+The number of inversions indicates
-järjestystä taulukko on.
+how sorted the array is.
-Taulukko on järjestyksessä tarkalleen
+An array is completely sorted when
-silloin, kun siinä ei ole yhtään inversiota.
+there are no inversions.
-Inversioiden määrä on puolestaan suurin,
+On the other hand, if the array elements
-kun taulukon järjestys on käänteinen,
+are in reverse order,
-jolloin inversioita on
+the number of inversions is maximum:
-\[1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2).\]
+\[1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2)\]
-Jos vierekkäiset taulukon alkiot
+Swapping successive elements that are
-ovat väärässä järjestyksessä,
+in wrong order removes exactly one inversion
-niiden järjestyksen korjaaminen
+from the array.
-poistaa taulukosta tarkalleen yhden inversion.
+Thus, if a sorting algorithm can only
-Niinpä jos järjestämisalgoritmi pystyy
+swap successive elements, each swap removes
-vaihtamaan keskenään vain
+at most one inversion and the time complexity
-taulukon vierekkäisiä alkioita,
+of the algorithm is at least $O(n^2)$.
 jokainen vaihto voi poistaa enintään yhden inversion
 ja algoritmin aikavaativuus on varmasti ainakin $O(n^2)$.
-\subsubsection{$O(n \log n)$-algoritmit}
+\subsubsection{$O(n \log n)$ algorithms}
-\index{lomitusjxrjestxminen@lomitusjärjestäminen}
+\index{merge sort}
-Taulukon järjestäminen on mahdollista
+It is possible to sort an array efficiently
-tehokkaasti ajassa $O(n \log n)$
+in $O(n \log n)$ time using an algorithm
-algoritmilla, joka ei rajoitu vierekkäisten
+that is not limited to swapping successive elements.
-alkoiden vaihtamiseen.
+One such algorithm is \key{mergesort}
-Yksi tällainen algoritmi on
+that sorts an array recursively by dividing
-\key{lomitusjärjestäminen},
+it into smaller subarrays.
 joka järjestää taulukon
 rekursiivisesti jakamalla sen
 pienemmiksi osataulukoiksi.
-Lomitusjärjestäminen järjestää taulukon välin
+Mergesort sorts the subarray $[a,b]$ as follows:
 $[a,b]$ seuraavasti:
 \begin{enumerate}
-\item Jos $a=b$, älä tee mitään, koska väli on valmiiksi järjestyksessä.
+\item If $a=b$, don't do anything because the subarray is already sorted.
-\item Valitse välin jakokohdaksi $k=\lfloor (a+b)/2 \rfloor$.
+\item Calculate the index of the middle element: $k=\lfloor (a+b)/2 \rfloor$.
-\item Järjestä rekursiivisesti välin $[a,k]$ alkiot.
+\item Recursively sort the subarray $[a,k]$.
-\item Järjestä rekursiivisesti välin $[k+1,b]$ alkiot.
+\item Recursively sort the subarray $[k+1,b]$.
-\item \emph{Lomita} järjestetyt välit $[a,k]$ ja $[k+1,b]$
+\item \emph{Merge} the sorted subarrays $[a,k]$ and $[k+1,b]$
-järjestetyksi väliksi $[a,b]$.
+into a sorted subarray $[a,b]$.
 \end{enumerate}
-Lomitusjärjestämisen tehokkuus perustuu siihen,
+Mergesort is an efficient algorithm because it
-että se puolittaa joka askeleella välin kahteen osaan.
+halves the size of the subarray at each step.
-Rekursiosta muodostuu yhteensä $O(\log n)$ tasoa
+The recursion consists of $O(\log n)$ levels,
-ja jokaisen tason käsittely vie aikaa $O(n)$.
+and processing each level takes $O(n)$ time.
-Kohdan 5 lomittaminen on mahdollista ajassa $O(n)$,
+Merging the subarrays $[a,k]$ and $[k+1,b]$
-koska välit $[a,k]$ ja $[k+1,b]$ on jo järjestetty.
+is possible in linear time because they are already sorted.
-Tarkastellaan esimerkkinä seuraavan taulukon
+For example, consider sorting the following array:
 järjestämistä:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -378,8 +373,8 @@ järjestämistä:
 \end{tikzpicture}
 \end{center}
-Taulukko jakautuu ensin kahdeksi
+The array will be divided into two subarrays
-osataulukoksi seuraavasti:
+as follows:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (4,1);
@ -408,8 +403,8 @@ osataulukoksi seuraavasti:
 \end{tikzpicture}
 \end{center}
-Algoritmi järjestää osataulukot rekursiivisesti,
+Then, the subarrays will be sorted recursively
-jolloin tuloksena on:
+as follows:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (4,1);
@ -437,8 +432,8 @@ jolloin tuloksena on:
 \end{tikzpicture}
 \end{center}
-Lopuksi algoritmi lomittaa järjestetyt osataulukot,
+Finally, the algorithm merges the sorted
-jolloin syntyy lopullinen järjestetty taulukko:
+subarrays and creates the final sorted array:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -463,21 +458,19 @@ jolloin syntyy lopullinen järjestetty taulukko:
 \end{tikzpicture}
 \end{center}
-\subsubsection{Järjestämisen alaraja}
+\subsubsection{Sorting lower bound}
-Onko sitten mahdollista järjestää taulukkoa
+Is it possible to sort an array faster
-nopeammin kuin ajassa $O(n \log n)$?
+than in $O(n \log n)$ time?
-Osoittautuu, että tämä \emph{ei} ole mahdollista,
+It turns out that this is \emph{not} possible
-kun rajoitumme
+when we restrict ourselves to sorting algorithms
-järjestämis\-algoritmeihin,
+that are based on comparing array elements.
 jotka perustuvat taulukon alkioiden
 vertailemiseen.
-Aikavaativuuden alaraja on mahdollista todistaa
+The lower bound for the time complexity
-tarkastelemalla järjestämistä
+can be proved by examining the sorting
-prosessina, jossa jokainen kahden alkion vertailu
+as a process where each comparison of two elements
-antaa lisää tietoa taulukon sisällöstä.
+gives more information about the contents of the array.
-Prosessista muodostuu seuraavanlainen puu:
+The process creates the following tree:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
@ -517,47 +510,45 @@ Prosessista muodostuu seuraavanlainen puu:
 \end{tikzpicture}
 \end{center}
-Merkintä ''$x<y?$'' tarkoittaa taulukon alkioiden
+Here ''$x<y?$'' means that some elements
-$x$ ja $y$ vertailua.
+$x$ and $y$ are compared.
-Jos $x<y$, prosessi jatkaa vasemmalle,
+If $x<y$, the process continues to the left,
-ja muuten oikealle.
+and otherwise to the right.
-Prosessin tulokset ovat taulukon mahdolliset
+The results of the process are the possible
-järjestykset, joita on kaikkiaan $n!$ erilaista.
+ways to order the array, a total of $n!$ ways.
-Puun korkeuden tulee olla tämän vuoksi vähintään
+For this reason, the height of the tree
 must be at least
 \[ \log_2(n!) = \log_2(1)+\log_2(2)+\cdots+\log_2(n).\]
-Voimme arvioida tätä summaa alaspäin
+We get an lower bound for this sum
-valitsemalla summasta $n/2$
+by choosing last $n/2$ elements and
-viimeistä termiä ja muuttamalla kunkin
+changing the value of each element to $\log_2(n/2)$.
-termin arvoksi $\log_2(n/2)$.
+This yields an estimate
 Tästä saadaan arvio
 \[ \log_2(n!) \ge (n/2) \cdot \log_2(n/2),\]
-eli puun korkeus ja sen myötä
+so the height of the tree and the minimum
-pienin mahdollinen järjestämisalgoritmin askelten
+possible number of steps in an sorting
-määrä on pahimmassa tapauksessa ainakin luokkaa $n \log n$.
+algorithm in the worst case
 is at least $n \log n$.
-\subsubsection{Laskemisjärjestäminen}
+\subsubsection{Counting sort}
-\index{laskemisjxrjestxminen@laskemisjärjestäminen}
+\index{counting sort}
-Järjestämisen alaraja $n \log n$ ei koske algoritmeja,
+The lower bound $n \log n$ doesn't apply to
-jotka eivät perustu alkioiden vertailemiseen
+algorithms that do not compare array elements
-vaan hyödyntävät jotain muuta tietoa alkioista.
+but use some other information.
-Esimerkki tällaisesta algoritmista on
+An example of such an algorithm is
-\key{laskemisjärjestäminen}, jonka avulla
+\key{counting sort} that sorts an array in
-on mahdollista järjestää
+$O(n)$ time assuming that every element in the array
-taulukko ajassa $O(n)$ olettaen,
+is an integer between $0 \ldots c$ where $c$
-että jokainen taulukon alkio on
+is a small constant.
 kokonaisluku välillä $0 \ldots c$,
 missä $c$ on pieni vakio.
-Algoritmin ideana on luoda \emph{kirjanpito}, josta selviää,
+The algorithm creates a \emph{bookkeeping} array
-montako kertaa mikäkin alkio esiintyy taulukossa.
+whose indices are elements in the original array.
-Kirjanpito on taulukko, jonka indeksit ovat alkuperäisen
+The algorithm iterates through the original array
-taulukon alkioita.
+and calculates how many times each element
-Jokaisen indeksin kohdalla lukee, montako kertaa
+appears in the array.
 kyseinen alkio esiintyy alkuperäisessä taulukossa.
-Esimerkiksi taulukosta
+For example, the array
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (8,1);
@ -581,7 +572,7 @@ Esimerkiksi taulukosta
 \node at (7.5,1.4) {$8$};
 \end{tikzpicture}
 \end{center}
-syntyy seuraava kirjanpito:
+produces the following bookkeeping array:
 \begin{center}
 \begin{tikzpicture}[scale=0.7]
 \draw (0,0) grid (9,1);
@ -609,23 +600,23 @@ syntyy seuraava kirjanpito:
 \end{tikzpicture}
 \end{center}
-Esimerkiksi kirjanpidossa lukee indeksin 3 kohdalla 2,
+For example, the value of element 3
-koska luku 3 esiintyy kahdesti alkuperäisessä
+in the bookkeeping array is 2,
-taulukossa (indekseissä 2 ja 6).
+because the element 3 appears two times
 in the original array (indices 2 and 6).
-Kirjanpidon muodostus vie aikaa $O(n)$,
+The construction of the bookkeeping array
-koska riittää käydä taulukko läpi kerran.
+takes $O(n)$ time. After this, the sorted array
-Tämän jälkeen järjestetyn taulukon luominen
+can be created in $O(n)$ time because
-vie myös aikaa $O(n)$, koska kunkin alkion
+the amount of each element can be retrieved
-määrän saa selville suoraan kirjanpidosta.
+from the bookkeeping array.
-Niinpä laskemisjärjestämisen
+Thus, the total time complexity of counting
-kokonaisaikavaativuus on $O(n)$.
+sort is $O(n)$.
-Laskemisjärjestäminen on hyvin tehokas algoritmi,
+Counting sort is a very efficient algorithm
-mutta sen käyttäminen vaatii,
+but it can only be used when the constant $c$
-että vakio $c$ on niin pieni,
+is so small that the array elements can
-että taulukon alkioita voi käyttää
+be used as indices in the bookkeeping array.
 kirjanpidon taulukon indeksöinnissä.
 \section{Järjestäminen C++:ssa}