Compare commits

...

No commits in common. "7b0a21413d86d7e2498a02e10442b6d970068da7" and "f269ae391910742788ac0d6626df1e221281f191" have entirely different histories.

35 changed files with 25770 additions and 3 deletions

View File

@ -1,4 +1,22 @@
# cphb # Competitive Programmer's Handbook
SOI adjusted Competitive Programmer's Handbook Competitive Programmer's Handbook is a modern introduction to competitive programming.
(see https://github.com/pllk/cphb for the original) The book discusses programming tricks and algorithm design techniques relevant in competitive programming.
## CSES Problem Set
The CSES Problem Set contains a collection of competitive programming problems.
You can practice the techniques presented in the book by solving the problems.
https://cses.fi/problemset/
## License
The license of the book is Creative Commons BY-NC-SA.
## Other books
Guide to Competitive Programming is a printed book, published by Springer, based on Competitive Programmer's Handbook.
There is also a Russian edition Олимпиадное программирование (Olympiad Programming) and a Korean edition 알고리즘 트레이닝: 프로그래밍 대회 입문 가이드.
https://cses.fi/book/

BIN
book.pdf Normal file

Binary file not shown.

131
book.tex Normal file
View File

@ -0,0 +1,131 @@
\documentclass[twoside,12pt,a4paper,english]{book}
%\includeonly{chapter04,list}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{listings}
\usepackage[table]{xcolor}
\usepackage{tikz}
\usepackage{multicol}
\usepackage{hyperref}
\usepackage{array}
\usepackage{microtype}
\usepackage{fouriernc}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\usepackage{framed}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{pifont}
\usepackage{ifthen}
\usepackage{makeidx}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{skak}
\usepackage[scaled=0.95]{inconsolata}
\usetikzlibrary{patterns,snakes}
\pagestyle{plain}
\definecolor{keywords}{HTML}{44548A}
\definecolor{strings}{HTML}{00999A}
\definecolor{comments}{HTML}{990000}
\lstset{language=C++,frame=single,basicstyle=\ttfamily \small,showstringspaces=false,columns=flexible}
\lstset{
literate={ö}{{\"o}}1
{ä}{{\"a}}1
{ü}{{\"u}}1
}
\lstset{xleftmargin=20pt,xrightmargin=5pt}
\lstset{aboveskip=12pt,belowskip=8pt}
\lstset{
commentstyle=\color{comments},
keywordstyle=\color{keywords},
stringstyle=\color{strings}
}
\date{Draft \today}
\usepackage[a4paper,vmargin=30mm,hmargin=33mm,footskip=15mm]{geometry}
\title{\Huge Competitive Programmer's Handbook}
\author{\Large Antti Laaksonen}
\makeindex
\usepackage[totoc]{idxlayout}
\titleformat{\subsubsection}
{\normalfont\large\bfseries\sffamily}{\thesubsection}{1em}{}
\begin{document}
%\selectlanguage{finnish}
%\setcounter{page}{1}
%\pagenumbering{roman}
\frontmatter
\maketitle
\setcounter{tocdepth}{1}
\tableofcontents
\include{preface}
\mainmatter
\pagenumbering{arabic}
\setcounter{page}{1}
\newcommand{\key}[1] {\textbf{#1}}
\part{Basic techniques}
\include{chapter01}
\include{chapter02}
\include{chapter03}
\include{chapter04}
\include{chapter05}
\include{chapter06}
\include{chapter07}
\include{chapter08}
\include{chapter09}
\include{chapter10}
\part{Graph algorithms}
\include{chapter11}
\include{chapter12}
\include{chapter13}
\include{chapter14}
\include{chapter15}
\include{chapter16}
\include{chapter17}
\include{chapter18}
\include{chapter19}
\include{chapter20}
\part{Advanced topics}
\include{chapter21}
\include{chapter22}
\include{chapter23}
\include{chapter24}
\include{chapter25}
\include{chapter26}
\include{chapter27}
\include{chapter28}
\include{chapter29}
\include{chapter30}
\cleardoublepage
\phantomsection
\addcontentsline{toc}{chapter}{Bibliography}
\include{list}
\cleardoublepage
\printindex
\end{document}

990
chapter01.tex Normal file
View File

@ -0,0 +1,990 @@
\chapter{Introduction}
Competitive programming combines two topics:
(1) the design of algorithms and (2) the implementation of algorithms.
The \key{design of algorithms} consists of problem solving
and mathematical thinking.
Skills for analyzing problems and solving them
creatively are needed.
An algorithm for solving a problem
has to be both correct and efficient,
and the core of the problem is often
about inventing an efficient algorithm.
Theoretical knowledge of algorithms
is important to competitive programmers.
Typically, a solution to a problem is
a combination of well-known techniques and
new insights.
The techniques that appear in competitive programming
also form the basis for the scientific research
of algorithms.
The \key{implementation of algorithms} requires good
programming skills.
In competitive programming, the solutions
are graded by testing an implemented algorithm
using a set of test cases.
Thus, it is not enough that the idea of the
algorithm is correct, but the implementation also
has to be correct.
A good coding style in contests is
straightforward and concise.
Programs should be written quickly,
because there is not much time available.
Unlike in traditional software engineering,
the programs are short (usually at most a few
hundred lines of code), and they do not need to
be maintained after the contest.
\section{Programming languages}
\index{programming language}
At the moment, the most popular programming
languages used in contests are C++, Python and Java.
For example, in Google Code Jam 2017,
among the best 3,000 participants,
79 \% used C++,
16 \% used Python and
8 \% used Java \cite{goo17}.
Some participants also used several languages.
Many people think that C++ is the best choice
for a competitive programmer,
and C++ is nearly always available in
contest systems.
The benefits of using C++ are that
it is a very efficient language and
its standard library contains a
large collection
of data structures and algorithms.
On the other hand, it is good to
master several languages and understand
their strengths.
For example, if large integers are needed
in the problem,
Python can be a good choice, because it
contains built-in operations for
calculating with large integers.
Still, most problems in programming contests
are set so that
using a specific programming language
is not an unfair advantage.
All example programs in this book are written in C++,
and the standard library's
data structures and algorithms are often used.
The programs follow the C++11 standard,
which can be used in most contests nowadays.
If you cannot program in C++ yet,
now is a good time to start learning.
\subsubsection{C++ code template}
A typical C++ code template for competitive programming
looks like this:
\begin{lstlisting}
#include <bits/stdc++.h>
using namespace std;
int main() {
// solution comes here
}
\end{lstlisting}
The \texttt{\#include} line at the beginning
of the code is a feature of the \texttt{g++} compiler
that allows us to include the entire standard library.
Thus, it is not needed to separately include
libraries such as \texttt{iostream},
\texttt{vector} and \texttt{algorithm},
but rather they are available automatically.
The \texttt{using} line declares
that the classes and functions
of the standard library can be used directly
in the code.
Without the \texttt{using} line we would have
to write, for example, \texttt{std::cout},
but now it suffices to write \texttt{cout}.
The code can be compiled using the following command:
\begin{lstlisting}
g++ -std=c++11 -O2 -Wall test.cpp -o test
\end{lstlisting}
This command produces a binary file \texttt{test}
from the source code \texttt{test.cpp}.
The compiler follows the C++11 standard
(\texttt{-std=c++11}),
optimizes the code (\texttt{-O2})
and shows warnings about possible errors (\texttt{-Wall}).
\section{Input and output}
\index{input and output}
In most contests, standard streams are used for
reading input and writing output.
In C++, the standard streams are
\texttt{cin} for input and \texttt{cout} for output.
In addition, the C functions
\texttt{scanf} and \texttt{printf} can be used.
The input for the program usually consists of
numbers and strings that are separated with
spaces and newlines.
They can be read from the \texttt{cin} stream
as follows:
\begin{lstlisting}
int a, b;
string x;
cin >> a >> b >> x;
\end{lstlisting}
This kind of code always works,
assuming that there is at least one space
or newline between each element in the input.
For example, the above code can read
both of the following inputs:
\begin{lstlisting}
123 456 monkey
\end{lstlisting}
\begin{lstlisting}
123 456
monkey
\end{lstlisting}
The \texttt{cout} stream is used for output
as follows:
\begin{lstlisting}
int a = 123, b = 456;
string x = "monkey";
cout << a << " " << b << " " << x << "\n";
\end{lstlisting}
Input and output is sometimes
a bottleneck in the program.
The following lines at the beginning of the code
make input and output more efficient:
\begin{lstlisting}
ios::sync_with_stdio(0);
cin.tie(0);
\end{lstlisting}
Note that the newline \texttt{"\textbackslash n"}
works faster than \texttt{endl},
because \texttt{endl} always causes
a flush operation.
The C functions \texttt{scanf}
and \texttt{printf} are an alternative
to the C++ standard streams.
They are usually a bit faster,
but they are also more difficult to use.
The following code reads two integers from the input:
\begin{lstlisting}
int a, b;
scanf("%d %d", &a, &b);
\end{lstlisting}
The following code prints two integers:
\begin{lstlisting}
int a = 123, b = 456;
printf("%d %d\n", a, b);
\end{lstlisting}
Sometimes the program should read a whole line
from the input, possibly containing spaces.
This can be accomplished by using the
\texttt{getline} function:
\begin{lstlisting}
string s;
getline(cin, s);
\end{lstlisting}
If the amount of data is unknown, the following
loop is useful:
\begin{lstlisting}
while (cin >> x) {
// code
}
\end{lstlisting}
This loop reads elements from the input
one after another, until there is no
more data available in the input.
In some contest systems, files are used for
input and output.
An easy solution for this is to write
the code as usual using standard streams,
but add the following lines to the beginning of the code:
\begin{lstlisting}
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
\end{lstlisting}
After this, the program reads the input from the file
''input.txt'' and writes the output to the file
''output.txt''.
\section{Working with numbers}
\index{integer}
\subsubsection{Integers}
The most used integer type in competitive programming
is \texttt{int}, which is a 32-bit type with
a value range of $-2^{31} \ldots 2^{31}-1$
or about $-2 \cdot 10^9 \ldots 2 \cdot 10^9$.
If the type \texttt{int} is not enough,
the 64-bit type \texttt{long long} can be used.
It has a value range of $-2^{63} \ldots 2^{63}-1$
or about $-9 \cdot 10^{18} \ldots 9 \cdot 10^{18}$.
The following code defines a
\texttt{long long} variable:
\begin{lstlisting}
long long x = 123456789123456789LL;
\end{lstlisting}
The suffix \texttt{LL} means that the
type of the number is \texttt{long long}.
A common mistake when using the type \texttt{long long}
is that the type \texttt{int} is still used somewhere
in the code.
For example, the following code contains
a subtle error:
\begin{lstlisting}
int a = 123456789;
long long b = a*a;
cout << b << "\n"; // -1757895751
\end{lstlisting}
Even though the variable \texttt{b} is of type \texttt{long long},
both numbers in the expression \texttt{a*a}
are of type \texttt{int} and the result is
also of type \texttt{int}.
Because of this, the variable \texttt{b} will
contain a wrong result.
The problem can be solved by changing the type
of \texttt{a} to \texttt{long long} or
by changing the expression to \texttt{(long long)a*a}.
Usually contest problems are set so that the
type \texttt{long long} is enough.
Still, it is good to know that
the \texttt{g++} compiler also provides
a 128-bit type \texttt{\_\_int128\_t}
with a value range of
$-2^{127} \ldots 2^{127}-1$ or about $-10^{38} \ldots 10^{38}$.
However, this type is not available in all contest systems.
\subsubsection{Modular arithmetic}
\index{remainder}
\index{modular arithmetic}
We denote by $x \bmod m$ the remainder
when $x$ is divided by $m$.
For example, $17 \bmod 5 = 2$,
because $17 = 3 \cdot 5 + 2$.
Sometimes, the answer to a problem is a
very large number but it is enough to
output it ''modulo $m$'', i.e.,
the remainder when the answer is divided by $m$
(for example, ''modulo $10^9+7$'').
The idea is that even if the actual answer
is very large,
it suffices to use the types
\texttt{int} and \texttt{long long}.
An important property of the remainder is that
in addition, subtraction and multiplication,
the remainder can be taken before the operation:
\[
\begin{array}{rcr}
(a+b) \bmod m & = & (a \bmod m + b \bmod m) \bmod m \\
(a-b) \bmod m & = & (a \bmod m - b \bmod m) \bmod m \\
(a \cdot b) \bmod m & = & (a \bmod m \cdot b \bmod m) \bmod m
\end{array}
\]
Thus, we can take the remainder after every operation
and the numbers will never become too large.
For example, the following code calculates $n!$,
the factorial of $n$, modulo $m$:
\begin{lstlisting}
long long x = 1;
for (int i = 2; i <= n; i++) {
x = (x*i)%m;
}
cout << x%m << "\n";
\end{lstlisting}
Usually we want the remainder to always
be between $0\ldots m-1$.
However, in C++ and other languages,
the remainder of a negative number
is either zero or negative.
An easy way to make sure there
are no negative remainders is to first calculate
the remainder as usual and then add $m$
if the result is negative:
\begin{lstlisting}
x = x%m;
if (x < 0) x += m;
\end{lstlisting}
However, this is only needed when there
are subtractions in the code and the
remainder may become negative.
\subsubsection{Floating point numbers}
\index{floating point number}
The usual floating point types in
competitive programming are
the 64-bit \texttt{double}
and, as an extension in the \texttt{g++} compiler,
the 80-bit \texttt{long double}.
In most cases, \texttt{double} is enough,
but \texttt{long double} is more accurate.
The required precision of the answer
is usually given in the problem statement.
An easy way to output the answer is to use
the \texttt{printf} function
and give the number of decimal places
in the formatting string.
For example, the following code prints
the value of $x$ with 9 decimal places:
\begin{lstlisting}
printf("%.9f\n", x);
\end{lstlisting}
A difficulty when using floating point numbers
is that some numbers cannot be represented
accurately as floating point numbers,
and there will be rounding errors.
For example, the result of the following code
is surprising:
\begin{lstlisting}
double x = 0.3*3+0.1;
printf("%.20f\n", x); // 0.99999999999999988898
\end{lstlisting}
Due to a rounding error,
the value of \texttt{x} is a bit smaller than 1,
while the correct value would be 1.
It is risky to compare floating point numbers
with the \texttt{==} operator,
because it is possible that the values should be
equal but they are not because of precision errors.
A better way to compare floating point numbers
is to assume that two numbers are equal
if the difference between them is less than $\varepsilon$,
where $\varepsilon$ is a small number.
In practice, the numbers can be compared
as follows ($\varepsilon=10^{-9}$):
\begin{lstlisting}
if (abs(a-b) < 1e-9) {
// a and b are equal
}
\end{lstlisting}
Note that while floating point numbers are inaccurate,
integers up to a certain limit can still be
represented accurately.
For example, using \texttt{double},
it is possible to accurately represent all
integers whose absolute value is at most $2^{53}$.
\section{Shortening code}
Short code is ideal in competitive programming,
because programs should be written
as fast as possible.
Because of this, competitive programmers often define
shorter names for datatypes and other parts of code.
\subsubsection{Type names}
\index{tuppdef@\texttt{typedef}}
Using the command \texttt{typedef}
it is possible to give a shorter name
to a datatype.
For example, the name \texttt{long long} is long,
so we can define a shorter name \texttt{ll}:
\begin{lstlisting}
typedef long long ll;
\end{lstlisting}
After this, the code
\begin{lstlisting}
long long a = 123456789;
long long b = 987654321;
cout << a*b << "\n";
\end{lstlisting}
can be shortened as follows:
\begin{lstlisting}
ll a = 123456789;
ll b = 987654321;
cout << a*b << "\n";
\end{lstlisting}
The command \texttt{typedef}
can also be used with more complex types.
For example, the following code gives
the name \texttt{vi} for a vector of integers
and the name \texttt{pi} for a pair
that contains two integers.
\begin{lstlisting}
typedef vector<int> vi;
typedef pair<int,int> pi;
\end{lstlisting}
\subsubsection{Macros}
\index{macro}
Another way to shorten code is to define
\key{macros}.
A macro means that certain strings in
the code will be changed before the compilation.
In C++, macros are defined using the
\texttt{\#define} keyword.
For example, we can define the following macros:
\begin{lstlisting}
#define F first
#define S second
#define PB push_back
#define MP make_pair
\end{lstlisting}
After this, the code
\begin{lstlisting}
v.push_back(make_pair(y1,x1));
v.push_back(make_pair(y2,x2));
int d = v[i].first+v[i].second;
\end{lstlisting}
can be shortened as follows:
\begin{lstlisting}
v.PB(MP(y1,x1));
v.PB(MP(y2,x2));
int d = v[i].F+v[i].S;
\end{lstlisting}
A macro can also have parameters
which makes it possible to shorten loops and other
structures.
For example, we can define the following macro:
\begin{lstlisting}
#define REP(i,a,b) for (int i = a; i <= b; i++)
\end{lstlisting}
After this, the code
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
search(i);
}
\end{lstlisting}
can be shortened as follows:
\begin{lstlisting}
REP(i,1,n) {
search(i);
}
\end{lstlisting}
Sometimes macros cause bugs that may be difficult
to detect. For example, consider the following macro
that calculates the square of a number:
\begin{lstlisting}
#define SQ(a) a*a
\end{lstlisting}
This macro \emph{does not} always work as expected.
For example, the code
\begin{lstlisting}
cout << SQ(3+3) << "\n";
\end{lstlisting}
corresponds to the code
\begin{lstlisting}
cout << 3+3*3+3 << "\n"; // 15
\end{lstlisting}
A better version of the macro is as follows:
\begin{lstlisting}
#define SQ(a) (a)*(a)
\end{lstlisting}
Now the code
\begin{lstlisting}
cout << SQ(3+3) << "\n";
\end{lstlisting}
corresponds to the code
\begin{lstlisting}
cout << (3+3)*(3+3) << "\n"; // 36
\end{lstlisting}
\section{Mathematics}
Mathematics plays an important role in competitive
programming, and it is not possible to become
a successful competitive programmer without
having good mathematical skills.
This section discusses some important
mathematical concepts and formulas that
are needed later in the book.
\subsubsection{Sum formulas}
Each sum of the form
\[\sum_{x=1}^n x^k = 1^k+2^k+3^k+\ldots+n^k,\]
where $k$ is a positive integer,
has a closed-form formula that is a
polynomial of degree $k+1$.
For example\footnote{\index{Faulhaber's formula}
There is even a general formula for such sums, called \key{Faulhaber's formula},
but it is too complex to be presented here.},
\[\sum_{x=1}^n x = 1+2+3+\ldots+n = \frac{n(n+1)}{2}\]
and
\[\sum_{x=1}^n x^2 = 1^2+2^2+3^2+\ldots+n^2 = \frac{n(n+1)(2n+1)}{6}.\]
An \key{arithmetic progression} is a \index{arithmetic progression}
sequence of numbers
where the difference between any two consecutive
numbers is constant.
For example,
\[3, 7, 11, 15\]
is an arithmetic progression with constant 4.
The sum of an arithmetic progression can be calculated
using the formula
\[\underbrace{a + \cdots + b}_{n \,\, \textrm{numbers}} = \frac{n(a+b)}{2}\]
where $a$ is the first number,
$b$ is the last number and
$n$ is the amount of numbers.
For example,
\[3+7+11+15=\frac{4 \cdot (3+15)}{2} = 36.\]
The formula is based on the fact
that the sum consists of $n$ numbers and
the value of each number is $(a+b)/2$ on average.
\index{geometric progression}
A \key{geometric progression} is a sequence
of numbers
where the ratio between any two consecutive
numbers is constant.
For example,
\[3,6,12,24\]
is a geometric progression with constant 2.
The sum of a geometric progression can be calculated
using the formula
\[a + ak + ak^2 + \cdots + b = \frac{bk-a}{k-1}\]
where $a$ is the first number,
$b$ is the last number and the
ratio between consecutive numbers is $k$.
For example,
\[3+6+12+24=\frac{24 \cdot 2 - 3}{2-1} = 45.\]
This formula can be derived as follows. Let
\[ S = a + ak + ak^2 + \cdots + b .\]
By multiplying both sides by $k$, we get
\[ kS = ak + ak^2 + ak^3 + \cdots + bk,\]
and solving the equation
\[ kS-S = bk-a\]
yields the formula.
A special case of a sum of a geometric progression is the formula
\[1+2+4+8+\ldots+2^{n-1}=2^n-1.\]
\index{harmonic sum}
A \key{harmonic sum} is a sum of the form
\[ \sum_{x=1}^n \frac{1}{x} = 1+\frac{1}{2}+\frac{1}{3}+\ldots+\frac{1}{n}.\]
An upper bound for a harmonic sum is $\log_2(n)+1$.
Namely, we can
modify each term $1/k$ so that $k$ becomes
the nearest power of two that does not exceed $k$.
For example, when $n=6$, we can estimate
the sum as follows:
\[ 1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\frac{1}{5}+\frac{1}{6} \le
1+\frac{1}{2}+\frac{1}{2}+\frac{1}{4}+\frac{1}{4}+\frac{1}{4}.\]
This upper bound consists of $\log_2(n)+1$ parts
($1$, $2 \cdot 1/2$, $4 \cdot 1/4$, etc.),
and the value of each part is at most 1.
\subsubsection{Set theory}
\index{set theory}
\index{set}
\index{intersection}
\index{union}
\index{difference}
\index{subset}
\index{universal set}
\index{complement}
A \key{set} is a collection of elements.
For example, the set
\[X=\{2,4,7\}\]
contains elements 2, 4 and 7.
The symbol $\emptyset$ denotes an empty set,
and $|S|$ denotes the size of a set $S$,
i.e., the number of elements in the set.
For example, in the above set, $|X|=3$.
If a set $S$ contains an element $x$,
we write $x \in S$,
and otherwise we write $x \notin S$.
For example, in the above set
\[4 \in X \hspace{10px}\textrm{and}\hspace{10px} 5 \notin X.\]
\begin{samepage}
New sets can be constructed using set operations:
\begin{itemize}
\item The \key{intersection} $A \cap B$ consists of elements
that are in both $A$ and $B$.
For example, if $A=\{1,2,5\}$ and $B=\{2,4\}$,
then $A \cap B = \{2\}$.
\item The \key{union} $A \cup B$ consists of elements
that are in $A$ or $B$ or both.
For example, if $A=\{3,7\}$ and $B=\{2,3,8\}$,
then $A \cup B = \{2,3,7,8\}$.
\item The \key{complement} $\bar A$ consists of elements
that are not in $A$.
The interpretation of a complement depends on
the \key{universal set}, which contains all possible elements.
For example, if $A=\{1,2,5,7\}$ and the universal set is
$\{1,2,\ldots,10\}$, then $\bar A = \{3,4,6,8,9,10\}$.
\item The \key{difference} $A \setminus B = A \cap \bar B$
consists of elements that are in $A$ but not in $B$.
Note that $B$ can contain elements that are not in $A$.
For example, if $A=\{2,3,7,8\}$ and $B=\{3,5,8\}$,
then $A \setminus B = \{2,7\}$.
\end{itemize}
\end{samepage}
If each element of $A$ also belongs to $S$,
we say that $A$ is a \key{subset} of $S$,
denoted by $A \subset S$.
A set $S$ always has $2^{|S|}$ subsets,
including the empty set.
For example, the subsets of the set $\{2,4,7\}$ are
\begin{center}
$\emptyset$,
$\{2\}$, $\{4\}$, $\{7\}$, $\{2,4\}$, $\{2,7\}$, $\{4,7\}$ and $\{2,4,7\}$.
\end{center}
Some often used sets are
$\mathbb{N}$ (natural numbers),
$\mathbb{Z}$ (integers),
$\mathbb{Q}$ (rational numbers) and
$\mathbb{R}$ (real numbers).
The set $\mathbb{N}$
can be defined in two ways, depending
on the situation:
either $\mathbb{N}=\{0,1,2,\ldots\}$
or $\mathbb{N}=\{1,2,3,...\}$.
We can also construct a set using a rule of the form
\[\{f(n) : n \in S\},\]
where $f(n)$ is some function.
This set contains all elements of the form $f(n)$,
where $n$ is an element in $S$.
For example, the set
\[X=\{2n : n \in \mathbb{Z}\}\]
contains all even integers.
\subsubsection{Logic}
\index{logic}
\index{negation}
\index{conjuction}
\index{disjunction}
\index{implication}
\index{equivalence}
The value of a logical expression is either
\key{true} (1) or \key{false} (0).
The most important logical operators are
$\lnot$ (\key{negation}),
$\land$ (\key{conjunction}),
$\lor$ (\key{disjunction}),
$\Rightarrow$ (\key{implication}) and
$\Leftrightarrow$ (\key{equivalence}).
The following table shows the meanings of these operators:
\begin{center}
\begin{tabular}{rr|rrrrrrr}
$A$ & $B$ & $\lnot A$ & $\lnot B$ & $A \land B$ & $A \lor B$ & $A \Rightarrow B$ & $A \Leftrightarrow B$ \\
\hline
0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 0 & 1 & 1 & 1 & 1 \\
\end{tabular}
\end{center}
The expression $\lnot A$ has the opposite value of $A$.
The expression $A \land B$ is true if both $A$ and $B$
are true,
and the expression $A \lor B$ is true if $A$ or $B$ or both
are true.
The expression $A \Rightarrow B$ is true
if whenever $A$ is true, also $B$ is true.
The expression $A \Leftrightarrow B$ is true
if $A$ and $B$ are both true or both false.
\index{predicate}
A \key{predicate} is an expression that is true or false
depending on its parameters.
Predicates are usually denoted by capital letters.
For example, we can define a predicate $P(x)$
that is true exactly when $x$ is a prime number.
Using this definition, $P(7)$ is true but $P(8)$ is false.
\index{quantifier}
A \key{quantifier} connects a logical expression
to the elements of a set.
The most important quantifiers are
$\forall$ (\key{for all}) and $\exists$ (\key{there is}).
For example,
\[\forall x (\exists y (y < x))\]
means that for each element $x$ in the set,
there is an element $y$ in the set
such that $y$ is smaller than $x$.
This is true in the set of integers,
but false in the set of natural numbers.
Using the notation described above,
we can express many kinds of logical propositions.
For example,
\[\forall x ((x>1 \land \lnot P(x)) \Rightarrow (\exists a (\exists b (a > 1 \land b > 1 \land x = ab))))\]
means that if a number $x$ is larger than 1
and not a prime number,
then there are numbers $a$ and $b$
that are larger than $1$ and whose product is $x$.
This proposition is true in the set of integers.
\subsubsection{Functions}
The function $\lfloor x \rfloor$ rounds the number $x$
down to an integer, and the function
$\lceil x \rceil$ rounds the number $x$
up to an integer. For example,
\[ \lfloor 3/2 \rfloor = 1 \hspace{10px} \textrm{and} \hspace{10px} \lceil 3/2 \rceil = 2.\]
The functions $\min(x_1,x_2,\ldots,x_n)$
and $\max(x_1,x_2,\ldots,x_n)$
give the smallest and largest of values
$x_1,x_2,\ldots,x_n$.
For example,
\[ \min(1,2,3)=1 \hspace{10px} \textrm{and} \hspace{10px} \max(1,2,3)=3.\]
\index{factorial}
The \key{factorial} $n!$ can be defined
\[\prod_{x=1}^n x = 1 \cdot 2 \cdot 3 \cdot \ldots \cdot n\]
or recursively
\[
\begin{array}{lcl}
0! & = & 1 \\
n! & = & n \cdot (n-1)! \\
\end{array}
\]
\index{Fibonacci number}
The \key{Fibonacci numbers}
%\footnote{Fibonacci (c. 1175--1250) was an Italian mathematician.}
arise in many situations.
They can be defined recursively as follows:
\[
\begin{array}{lcl}
f(0) & = & 0 \\
f(1) & = & 1 \\
f(n) & = & f(n-1)+f(n-2) \\
\end{array}
\]
The first Fibonacci numbers are
\[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, \ldots\]
There is also a closed-form formula
for calculating Fibonacci numbers, which is sometimes called
\index{Binet's formula} \key{Binet's formula}:
\[f(n)=\frac{(1 + \sqrt{5})^n - (1-\sqrt{5})^n}{2^n \sqrt{5}}.\]
\subsubsection{Logarithms}
\index{logarithm}
The \key{logarithm} of a number $x$
is denoted $\log_k(x)$, where $k$ is the base
of the logarithm.
According to the definition,
$\log_k(x)=a$ exactly when $k^a=x$.
A useful property of logarithms is
that $\log_k(x)$ equals the number of times
we have to divide $x$ by $k$ before we reach
the number 1.
For example, $\log_2(32)=5$
because 5 divisions by 2 are needed:
\[32 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1 \]
Logarithms are often used in the analysis of
algorithms, because many efficient algorithms
halve something at each step.
Hence, we can estimate the efficiency of such algorithms
using logarithms.
The logarithm of a product is
\[\log_k(ab) = \log_k(a)+\log_k(b),\]
and consequently,
\[\log_k(x^n) = n \cdot \log_k(x).\]
In addition, the logarithm of a quotient is
\[\log_k\Big(\frac{a}{b}\Big) = \log_k(a)-\log_k(b).\]
Another useful formula is
\[\log_u(x) = \frac{\log_k(x)}{\log_k(u)},\]
and using this, it is possible to calculate
logarithms to any base if there is a way to
calculate logarithms to some fixed base.
\index{natural logarithm}
The \key{natural logarithm} $\ln(x)$ of a number $x$
is a logarithm whose base is $e \approx 2.71828$.
Another property of logarithms is that
the number of digits of an integer $x$ in base $b$ is
$\lfloor \log_b(x)+1 \rfloor$.
For example, the representation of
$123$ in base $2$ is 1111011 and
$\lfloor \log_2(123)+1 \rfloor = 7$.
\section{Contests and resources}
\subsubsection{IOI}
The International Olympiad in Informatics (IOI)
is an annual programming contest for
secondary school students.
Each country is allowed to send a team of
four students to the contest.
There are usually about 300 participants
from 80 countries.
The IOI consists of two five-hour long contests.
In both contests, the participants are asked to
solve three algorithm tasks of various difficulty.
The tasks are divided into subtasks,
each of which has an assigned score.
Even if the contestants are divided into teams,
they compete as individuals.
The IOI syllabus \cite{iois} regulates the topics
that may appear in IOI tasks.
Almost all the topics in the IOI syllabus
are covered by this book.
Participants for the IOI are selected through
national contests.
Before the IOI, many regional contests are organized,
such as the Baltic Olympiad in Informatics (BOI),
the Central European Olympiad in Informatics (CEOI)
and the Asia-Pacific Informatics Olympiad (APIO).
Some countries organize online practice contests
for future IOI participants,
such as the Croatian Open Competition in Informatics \cite{coci}
and the USA Computing Olympiad \cite{usaco}.
In addition, a large collection of problems from Polish contests
is available online \cite{main}.
\subsubsection{ICPC}
The International Collegiate Programming Contest (ICPC)
is an annual programming contest for university students.
Each team in the contest consists of three students,
and unlike in the IOI, the students work together;
there is only one computer available for each team.
The ICPC consists of several stages, and finally the
best teams are invited to the World Finals.
While there are tens of thousands of participants
in the contest, there are only a small number\footnote{The exact number of final
slots varies from year to year; in 2017, there were 133 final slots.} of final slots available,
so even advancing to the finals
is a great achievement in some regions.
In each ICPC contest, the teams have five hours of time to
solve about ten algorithm problems.
A solution to a problem is accepted only if it solves
all test cases efficiently.
During the contest, competitors may view the results of other teams,
but for the last hour the scoreboard is frozen and it
is not possible to see the results of the last submissions.
The topics that may appear at the ICPC are not so well
specified as those at the IOI.
In any case, it is clear that more knowledge is needed
at the ICPC, especially more mathematical skills.
\subsubsection{Online contests}
There are also many online contests that are open for everybody.
At the moment, the most active contest site is Codeforces,
which organizes contests about weekly.
In Codeforces, participants are divided into two divisions:
beginners compete in Div2 and more experienced programmers in Div1.
Other contest sites include AtCoder, CS Academy, HackerRank and Topcoder.
Some companies organize online contests with onsite finals.
Examples of such contests are Facebook Hacker Cup,
Google Code Jam and Yandex.Algorithm.
Of course, companies also use those contests for recruiting:
performing well in a contest is a good way to prove one's skills.
\subsubsection{Books}
There are already some books (besides this book) that
focus on competitive programming and algorithmic problem solving:
\begin{itemize}
\item S. S. Skiena and M. A. Revilla:
\emph{Programming Challenges: The Programming Contest Training Manual} \cite{ski03}
\item S. Halim and F. Halim:
\emph{Competitive Programming 3: The New Lower Bound of Programming Contests} \cite{hal13}
\item K. Diks et al.: \emph{Looking for a Challenge? The Ultimate Problem Set from
the University of Warsaw Programming Competitions} \cite{dik12}
\end{itemize}
The first two books are intended for beginners,
whereas the last book contains advanced material.
Of course, general algorithm books are also suitable for
competitive programmers.
Some popular books are:
\begin{itemize}
\item T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein:
\emph{Introduction to Algorithms} \cite{cor09}
\item J. Kleinberg and É. Tardos:
\emph{Algorithm Design} \cite{kle05}
\item S. S. Skiena:
\emph{The Algorithm Design Manual} \cite{ski08}
\end{itemize}

538
chapter02.tex Normal file
View File

@ -0,0 +1,538 @@
\chapter{Time complexity}
\index{time complexity}
The efficiency of algorithms is important in competitive programming.
Usually, it is easy to design an algorithm
that solves the problem slowly,
but the real challenge is to invent a
fast algorithm.
If the algorithm is too slow, it will get only
partial points or no points at all.
The \key{time complexity} of an algorithm
estimates how much time the algorithm will use
for some input.
The idea is to represent the efficiency
as a function whose parameter is the size of the input.
By calculating the time complexity,
we can find out whether the algorithm is fast enough
without implementing it.
\section{Calculation rules}
The time complexity of an algorithm
is denoted $O(\cdots)$
where the three dots represent some
function.
Usually, the variable $n$ denotes
the input size.
For example, if the input is an array of numbers,
$n$ will be the size of the array,
and if the input is a string,
$n$ will be the length of the string.
\subsubsection*{Loops}
A common reason why an algorithm is slow is
that it contains many loops that go through the input.
The more nested loops the algorithm contains,
the slower it is.
If there are $k$ nested loops,
the time complexity is $O(n^k)$.
For example, the time complexity of the following code is $O(n)$:
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
// code
}
\end{lstlisting}
And the time complexity of the following code is $O(n^2)$:
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= n; j++) {
// code
}
}
\end{lstlisting}
\subsubsection*{Order of magnitude}
A time complexity does not tell us the exact number
of times the code inside a loop is executed,
but it only shows the order of magnitude.
In the following examples, the code inside the loop
is executed $3n$, $n+5$ and $\lceil n/2 \rceil$ times,
but the time complexity of each code is $O(n)$.
\begin{lstlisting}
for (int i = 1; i <= 3*n; i++) {
// code
}
\end{lstlisting}
\begin{lstlisting}
for (int i = 1; i <= n+5; i++) {
// code
}
\end{lstlisting}
\begin{lstlisting}
for (int i = 1; i <= n; i += 2) {
// code
}
\end{lstlisting}
As another example,
the time complexity of the following code is $O(n^2)$:
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
for (int j = i+1; j <= n; j++) {
// code
}
}
\end{lstlisting}
\subsubsection*{Phases}
If the algorithm consists of consecutive phases,
the total time complexity is the largest
time complexity of a single phase.
The reason for this is that the slowest
phase is usually the bottleneck of the code.
For example, the following code consists
of three phases with time complexities
$O(n)$, $O(n^2)$ and $O(n)$.
Thus, the total time complexity is $O(n^2)$.
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
// code
}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= n; j++) {
// code
}
}
for (int i = 1; i <= n; i++) {
// code
}
\end{lstlisting}
\subsubsection*{Several variables}
Sometimes the time complexity depends on
several factors.
In this case, the time complexity formula
contains several variables.
For example, the time complexity of the
following code is $O(nm)$:
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
// code
}
}
\end{lstlisting}
\subsubsection*{Recursion}
The time complexity of a recursive function
depends on the number of times the function is called
and the time complexity of a single call.
The total time complexity is the product of
these values.
For example, consider the following function:
\begin{lstlisting}
void f(int n) {
if (n == 1) return;
f(n-1);
}
\end{lstlisting}
The call $\texttt{f}(n)$ causes $n$ function calls,
and the time complexity of each call is $O(1)$.
Thus, the total time complexity is $O(n)$.
As another example, consider the following function:
\begin{lstlisting}
void g(int n) {
if (n == 1) return;
g(n-1);
g(n-1);
}
\end{lstlisting}
In this case each function call generates two other
calls, except for $n=1$.
Let us see what happens when $g$ is called
with parameter $n$.
The following table shows the function calls
produced by this single call:
\begin{center}
\begin{tabular}{rr}
function call & number of calls \\
\hline
$g(n)$ & 1 \\
$g(n-1)$ & 2 \\
$g(n-2)$ & 4 \\
$\cdots$ & $\cdots$ \\
$g(1)$ & $2^{n-1}$ \\
\end{tabular}
\end{center}
Based on this, the time complexity is
\[1+2+4+\cdots+2^{n-1} = 2^n-1 = O(2^n).\]
\section{Complexity classes}
\index{complexity classes}
The following list contains common time complexities
of algorithms:
\begin{description}
\item[$O(1)$]
\index{constant-time algorithm}
The running time of a \key{constant-time} algorithm
does not depend on the input size.
A typical constant-time algorithm is a direct
formula that calculates the answer.
\item[$O(\log n)$]
\index{logarithmic algorithm}
A \key{logarithmic} algorithm often halves
the input size at each step.
The running time of such an algorithm
is logarithmic, because
$\log_2 n$ equals the number of times
$n$ must be divided by 2 to get 1.
\item[$O(\sqrt n)$]
A \key{square root algorithm} is slower than
$O(\log n)$ but faster than $O(n)$.
A special property of square roots is that
$\sqrt n = n/\sqrt n$, so the square root $\sqrt n$ lies,
in some sense, in the middle of the input.
\item[$O(n)$]
\index{linear algorithm}
A \key{linear} algorithm goes through the input
a constant number of times.
This is often the best possible time complexity,
because it is usually necessary to access each
input element at least once before
reporting the answer.
\item[$O(n \log n)$]
This time complexity often indicates that the
algorithm sorts the input,
because the time complexity of efficient
sorting algorithms is $O(n \log n)$.
Another possibility is that the algorithm
uses a data structure where each operation
takes $O(\log n)$ time.
\item[$O(n^2)$]
\index{quadratic algorithm}
A \key{quadratic} algorithm often contains
two nested loops.
It is possible to go through all pairs of
the input elements in $O(n^2)$ time.
\item[$O(n^3)$]
\index{cubic algorithm}
A \key{cubic} algorithm often contains
three nested loops.
It is possible to go through all triplets of
the input elements in $O(n^3)$ time.
\item[$O(2^n)$]
This time complexity often indicates that
the algorithm iterates through all
subsets of the input elements.
For example, the subsets of $\{1,2,3\}$ are
$\emptyset$, $\{1\}$, $\{2\}$, $\{3\}$, $\{1,2\}$,
$\{1,3\}$, $\{2,3\}$ and $\{1,2,3\}$.
\item[$O(n!)$]
This time complexity often indicates that
the algorithm iterates through all
permutations of the input elements.
For example, the permutations of $\{1,2,3\}$ are
$(1,2,3)$, $(1,3,2)$, $(2,1,3)$, $(2,3,1)$,
$(3,1,2)$ and $(3,2,1)$.
\end{description}
\index{polynomial algorithm}
An algorithm is \key{polynomial}
if its time complexity is at most $O(n^k)$
where $k$ is a constant.
All the above time complexities except
$O(2^n)$ and $O(n!)$ are polynomial.
In practice, the constant $k$ is usually small,
and therefore a polynomial time complexity
roughly means that the algorithm is \emph{efficient}.
\index{NP-hard problem}
Most algorithms in this book are polynomial.
Still, there are many important problems for which
no polynomial algorithm is known, i.e.,
nobody knows how to solve them efficiently.
\key{NP-hard} problems are an important set
of problems, for which no polynomial algorithm
is known\footnote{A classic book on the topic is
M. R. Garey's and D. S. Johnson's
\emph{Computers and Intractability: A Guide to the Theory
of NP-Completeness} \cite{gar79}.}.
\section{Estimating efficiency}
By calculating the time complexity of an algorithm,
it is possible to check, before
implementing the algorithm, that it is
efficient enough for the problem.
The starting point for estimations is the fact that
a modern computer can perform some hundreds of
millions of operations in a second.
For example, assume that the time limit for
a problem is one second and the input size is $n=10^5$.
If the time complexity is $O(n^2)$,
the algorithm will perform about $(10^5)^2=10^{10}$ operations.
This should take at least some tens of seconds,
so the algorithm seems to be too slow for solving the problem.
On the other hand, given the input size,
we can try to \emph{guess}
the required time complexity of the algorithm
that solves the problem.
The following table contains some useful estimates
assuming a time limit of one second.
\begin{center}
\begin{tabular}{ll}
input size & required time complexity \\
\hline
$n \le 10$ & $O(n!)$ \\
$n \le 20$ & $O(2^n)$ \\
$n \le 500$ & $O(n^3)$ \\
$n \le 5000$ & $O(n^2)$ \\
$n \le 10^6$ & $O(n \log n)$ or $O(n)$ \\
$n$ is large & $O(1)$ or $O(\log n)$ \\
\end{tabular}
\end{center}
For example, if the input size is $n=10^5$,
it is probably expected that the time
complexity of the algorithm is $O(n)$ or $O(n \log n)$.
This information makes it easier to design the algorithm,
because it rules out approaches that would yield
an algorithm with a worse time complexity.
\index{constant factor}
Still, it is important to remember that a
time complexity is only an estimate of efficiency,
because it hides the \emph{constant factors}.
For example, an algorithm that runs in $O(n)$ time
may perform $n/2$ or $5n$ operations.
This has an important effect on the actual
running time of the algorithm.
\section{Maximum subarray sum}
\index{maximum subarray sum}
There are often several possible algorithms
for solving a problem such that their
time complexities are different.
This section discusses a classic problem that
has a straightforward $O(n^3)$ solution.
However, by designing a better algorithm, it
is possible to solve the problem in $O(n^2)$
time and even in $O(n)$ time.
Given an array of $n$ numbers,
our task is to calculate the
\key{maximum subarray sum}, i.e.,
the largest possible sum of
a sequence of consecutive values
in the array\footnote{J. Bentley's
book \emph{Programming Pearls} \cite{ben86} made the problem popular.}.
The problem is interesting when there may be
negative values in the array.
For example, in the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$-1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$-3$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$-5$};
\node at (7.5,0.5) {$2$};
\end{tikzpicture}
\end{center}
\begin{samepage}
the following subarray produces the maximum sum $10$:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (6,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$-1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$-3$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$-5$};
\node at (7.5,0.5) {$2$};
\end{tikzpicture}
\end{center}
\end{samepage}
We assume that an empty subarray is allowed,
so the maximum subarray sum is always at least $0$.
\subsubsection{Algorithm 1}
A straightforward way to solve the problem
is to go through all possible subarrays,
calculate the sum of values in each subarray and maintain
the maximum sum.
The following code implements this algorithm:
\begin{lstlisting}
int best = 0;
for (int a = 0; a < n; a++) {
for (int b = a; b < n; b++) {
int sum = 0;
for (int k = a; k <= b; k++) {
sum += array[k];
}
best = max(best,sum);
}
}
cout << best << "\n";
\end{lstlisting}
The variables \texttt{a} and \texttt{b} fix the first and
last index of the subarray,
and the sum of values is calculated to the variable \texttt{sum}.
The variable \texttt{best} contains the maximum sum found during the search.
The time complexity of the algorithm is $O(n^3)$,
because it consists of three nested loops
that go through the input.
\subsubsection{Algorithm 2}
It is easy to make Algorithm 1 more efficient
by removing one loop from it.
This is possible by calculating the sum at the same
time when the right end of the subarray moves.
The result is the following code:
\begin{lstlisting}
int best = 0;
for (int a = 0; a < n; a++) {
int sum = 0;
for (int b = a; b < n; b++) {
sum += array[b];
best = max(best,sum);
}
}
cout << best << "\n";
\end{lstlisting}
After this change, the time complexity is $O(n^2)$.
\subsubsection{Algorithm 3}
Surprisingly, it is possible to solve the problem
in $O(n)$ time\footnote{In \cite{ben86}, this linear-time algorithm
is attributed to J. B. Kadane, and the algorithm is sometimes
called \index{Kadane's algorithm} \key{Kadane's algorithm}.}, which means
that just one loop is enough.
The idea is to calculate, for each array position,
the maximum sum of a subarray that ends at that position.
After this, the answer for the problem is the
maximum of those sums.
Consider the subproblem of finding the maximum-sum subarray
that ends at position $k$.
There are two possibilities:
\begin{enumerate}
\item The subarray only contains the element at position $k$.
\item The subarray consists of a subarray that ends
at position $k-1$, followed by the element at position $k$.
\end{enumerate}
In the latter case, since we want to
find a subarray with maximum sum,
the subarray that ends at position $k-1$
should also have the maximum sum.
Thus, we can solve the problem efficiently
by calculating the maximum subarray sum
for each ending position from left to right.
The following code implements the algorithm:
\begin{lstlisting}
int best = 0, sum = 0;
for (int k = 0; k < n; k++) {
sum = max(array[k],sum+array[k]);
best = max(best,sum);
}
cout << best << "\n";
\end{lstlisting}
The algorithm only contains one loop
that goes through the input,
so the time complexity is $O(n)$.
This is also the best possible time complexity,
because any algorithm for the problem
has to examine all array elements at least once.
\subsubsection{Efficiency comparison}
It is interesting to study how efficient
algorithms are in practice.
The following table shows the running times
of the above algorithms for different
values of $n$ on a modern computer.
In each test, the input was generated randomly.
The time needed for reading the input was not
measured.
\begin{center}
\begin{tabular}{rrrr}
array size $n$ & Algorithm 1 & Algorithm 2 & Algorithm 3 \\
\hline
$10^2$ & $0.0$ s & $0.0$ s & $0.0$ s \\
$10^3$ & $0.1$ s & $0.0$ s & $0.0$ s \\
$10^4$ & > $10.0$ s & $0.1$ s & $0.0$ s \\
$10^5$ & > $10.0$ s & $5.3$ s & $0.0$ s \\
$10^6$ & > $10.0$ s & > $10.0$ s & $0.0$ s \\
$10^7$ & > $10.0$ s & > $10.0$ s & $0.0$ s \\
\end{tabular}
\end{center}
The comparison shows that all algorithms
are efficient when the input size is small,
but larger inputs bring out remarkable
differences in the running times of the algorithms.
Algorithm 1 becomes slow
when $n=10^4$, and Algorithm 2
becomes slow when $n=10^5$.
Only Algorithm 3 is able to process
even the largest inputs instantly.

863
chapter03.tex Normal file
View File

@ -0,0 +1,863 @@
\chapter{Sorting}
\index{sorting}
\key{Sorting}
is a fundamental algorithm design problem.
Many efficient algorithms
use sorting as a subroutine,
because it is often easier to process
data if the elements are in a sorted order.
For example, the problem ''does an array contain
two equal elements?'' is easy to solve using sorting.
If the array contains two equal elements,
they will be next to each other after sorting,
so it is easy to find them.
Also, the problem ''what is the most frequent element
in an array?'' can be solved similarly.
There are many algorithms for sorting, and they are
also good examples of how to apply
different algorithm design techniques.
The efficient general sorting algorithms
work in $O(n \log n)$ time,
and many algorithms that use sorting
as a subroutine also
have this time complexity.
\section{Sorting theory}
The basic problem in sorting is as follows:
\begin{framed}
\noindent
Given an array that contains $n$ elements,
your task is to sort the elements
in increasing order.
\end{framed}
\noindent
For example, the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$6$};
\end{tikzpicture}
\end{center}
will be as follows after sorting:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$3$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$6$};
\node at (6.5,0.5) {$8$};
\node at (7.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
\subsubsection{$O(n^2)$ algorithms}
\index{bubble sort}
Simple algorithms for sorting an array
work in $O(n^2)$ time.
Such algorithms are short and usually
consist of two nested loops.
A famous $O(n^2)$ time sorting algorithm
is \key{bubble sort} where the elements
''bubble'' in the array according to their values.
Bubble sort consists of $n$ rounds.
On each round, the algorithm iterates through
the elements of the array.
Whenever two consecutive elements are found
that are not in correct order,
the algorithm swaps them.
The algorithm can be implemented as follows:
\begin{lstlisting}
for (int i = 0; i < n; i++) {
for (int j = 0; j < n-1; j++) {
if (array[j] > array[j+1]) {
swap(array[j],array[j+1]);
}
}
}
\end{lstlisting}
After the first round of the algorithm,
the largest element will be in the correct position,
and in general, after $k$ rounds, the $k$ largest
elements will be in the correct positions.
Thus, after $n$ rounds, the whole array
will be sorted.
For example, in the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$8$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$6$};
\end{tikzpicture}
\end{center}
\noindent
the first round of bubble sort swaps elements
as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$6$};
\draw[thick,<->] (3.5,-0.25) .. controls (3.25,-1.00) and (2.75,-1.00) .. (2.5,-0.25);
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$6$};
\draw[thick,<->] (5.5,-0.25) .. controls (5.25,-1.00) and (4.75,-1.00) .. (4.5,-0.25);
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$6$};
\draw[thick,<->] (6.5,-0.25) .. controls (6.25,-1.00) and (5.75,-1.00) .. (5.5,-0.25);
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$8$};
\node at (4.5,0.5) {$2$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$6$};
\node at (7.5,0.5) {$9$};
\draw[thick,<->] (7.5,-0.25) .. controls (7.25,-1.00) and (6.75,-1.00) .. (6.5,-0.25);
\end{tikzpicture}
\end{center}
\subsubsection{Inversions}
\index{inversion}
Bubble sort is an example of a sorting
algorithm that always swaps \emph{consecutive}
elements in the array.
It turns out that the time complexity
of such an algorithm is \emph{always}
at least $O(n^2)$, because in the worst case,
$O(n^2)$ swaps are required for sorting the array.
A useful concept when analyzing sorting
algorithms is an \key{inversion}:
a pair of array elements
$(\texttt{array}[a],\texttt{array}[b])$ such that
$a<b$ and $\texttt{array}[a]>\texttt{array}[b]$,
i.e., the elements are in the wrong order.
For example, the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$6$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$5$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$8$};
\end{tikzpicture}
\end{center}
has three inversions: $(6,3)$, $(6,5)$ and $(9,8)$.
The number of inversions indicates
how much work is needed to sort the array.
An array is completely sorted when
there are no inversions.
On the other hand, if the array elements
are in the reverse order,
the number of inversions is the largest possible:
\[1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2)\]
Swapping a pair of consecutive elements that are
in the wrong order removes exactly one inversion
from the array.
Hence, if a sorting algorithm can only
swap consecutive elements, each swap removes
at most one inversion, and the time complexity
of the algorithm is at least $O(n^2)$.
\subsubsection{$O(n \log n)$ algorithms}
\index{merge sort}
It is possible to sort an array efficiently
in $O(n \log n)$ time using algorithms
that are not limited to swapping consecutive elements.
One such algorithm is \key{merge sort}\footnote{According to \cite{knu983},
merge sort was invented by J. von Neumann in 1945.},
which is based on recursion.
Merge sort sorts a subarray \texttt{array}$[a \ldots b]$ as follows:
\begin{enumerate}
\item If $a=b$, do not do anything, because the subarray is already sorted.
\item Calculate the position of the middle element: $k=\lfloor (a+b)/2 \rfloor$.
\item Recursively sort the subarray \texttt{array}$[a \ldots k]$.
\item Recursively sort the subarray \texttt{array}$[k+1 \ldots b]$.
\item \emph{Merge} the sorted subarrays \texttt{array}$[a \ldots k]$ and
\texttt{array}$[k+1 \ldots b]$
into a sorted subarray \texttt{array}$[a \ldots b]$.
\end{enumerate}
Merge sort is an efficient algorithm, because it
halves the size of the subarray at each step.
The recursion consists of $O(\log n)$ levels,
and processing each level takes $O(n)$ time.
Merging the subarrays \texttt{array}$[a \ldots k]$ and \texttt{array}$[k+1 \ldots b]$
is possible in linear time, because they are already sorted.
For example, consider sorting the following array:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$8$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
The array will be divided into two subarrays
as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (4,1);
\draw (5,0) grid (9,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$2$};
\node at (5.5,0.5) {$8$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$5$};
\node at (8.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
Then, the subarrays will be sorted recursively
as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (4,1);
\draw (5,0) grid (9,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$3$};
\node at (3.5,0.5) {$6$};
\node at (5.5,0.5) {$2$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$8$};
\node at (8.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
Finally, the algorithm merges the sorted
subarrays and creates the final sorted array:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$2$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$3$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$6$};
\node at (6.5,0.5) {$8$};
\node at (7.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
\subsubsection{Sorting lower bound}
Is it possible to sort an array faster
than in $O(n \log n)$ time?
It turns out that this is \emph{not} possible
when we restrict ourselves to sorting algorithms
that are based on comparing array elements.
The lower bound for the time complexity
can be proved by considering sorting
as a process where each comparison of two elements
gives more information about the contents of the array.
The process creates the following tree:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) rectangle (3,1);
\node at (1.5,0.5) {$x < y?$};
\draw[thick,->] (1.5,0) -- (-2.5,-1.5);
\draw[thick,->] (1.5,0) -- (5.5,-1.5);
\draw (-4,-2.5) rectangle (-1,-1.5);
\draw (4,-2.5) rectangle (7,-1.5);
\node at (-2.5,-2) {$x < y?$};
\node at (5.5,-2) {$x < y?$};
\draw[thick,->] (-2.5,-2.5) -- (-4.5,-4);
\draw[thick,->] (-2.5,-2.5) -- (-0.5,-4);
\draw[thick,->] (5.5,-2.5) -- (3.5,-4);
\draw[thick,->] (5.5,-2.5) -- (7.5,-4);
\draw (-6,-5) rectangle (-3,-4);
\draw (-2,-5) rectangle (1,-4);
\draw (2,-5) rectangle (5,-4);
\draw (6,-5) rectangle (9,-4);
\node at (-4.5,-4.5) {$x < y?$};
\node at (-0.5,-4.5) {$x < y?$};
\node at (3.5,-4.5) {$x < y?$};
\node at (7.5,-4.5) {$x < y?$};
\draw[thick,->] (-4.5,-5) -- (-5.5,-6);
\draw[thick,->] (-4.5,-5) -- (-3.5,-6);
\draw[thick,->] (-0.5,-5) -- (0.5,-6);
\draw[thick,->] (-0.5,-5) -- (-1.5,-6);
\draw[thick,->] (3.5,-5) -- (2.5,-6);
\draw[thick,->] (3.5,-5) -- (4.5,-6);
\draw[thick,->] (7.5,-5) -- (6.5,-6);
\draw[thick,->] (7.5,-5) -- (8.5,-6);
\end{tikzpicture}
\end{center}
Here ''$x<y?$'' means that some elements
$x$ and $y$ are compared.
If $x<y$, the process continues to the left,
and otherwise to the right.
The results of the process are the possible
ways to sort the array, a total of $n!$ ways.
For this reason, the height of the tree
must be at least
\[ \log_2(n!) = \log_2(1)+\log_2(2)+\cdots+\log_2(n).\]
We get a lower bound for this sum
by choosing the last $n/2$ elements and
changing the value of each element to $\log_2(n/2)$.
This yields an estimate
\[ \log_2(n!) \ge (n/2) \cdot \log_2(n/2),\]
so the height of the tree and the minimum
possible number of steps in a sorting
algorithm in the worst case
is at least $n \log n$.
\subsubsection{Counting sort}
\index{counting sort}
The lower bound $n \log n$ does not apply to
algorithms that do not compare array elements
but use some other information.
An example of such an algorithm is
\key{counting sort} that sorts an array in
$O(n)$ time assuming that every element in the array
is an integer between $0 \ldots c$ and $c=O(n)$.
The algorithm creates a \emph{bookkeeping} array,
whose indices are elements of the original array.
The algorithm iterates through the original array
and calculates how many times each element
appears in the array.
\newpage
For example, the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$6$};
\node at (3.5,0.5) {$9$};
\node at (4.5,0.5) {$9$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$5$};
\node at (7.5,0.5) {$9$};
\end{tikzpicture}
\end{center}
corresponds to the following bookkeeping array:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (9,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$0$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$0$};
\node at (7.5,0.5) {$0$};
\node at (8.5,0.5) {$3$};
\footnotesize
\node at (0.5,1.5) {$1$};
\node at (1.5,1.5) {$2$};
\node at (2.5,1.5) {$3$};
\node at (3.5,1.5) {$4$};
\node at (4.5,1.5) {$5$};
\node at (5.5,1.5) {$6$};
\node at (6.5,1.5) {$7$};
\node at (7.5,1.5) {$8$};
\node at (8.5,1.5) {$9$};
\end{tikzpicture}
\end{center}
For example, the value at position 3
in the bookkeeping array is 2,
because the element 3 appears 2 times
in the original array.
Construction of the bookkeeping array
takes $O(n)$ time. After this, the sorted array
can be created in $O(n)$ time because
the number of occurrences of each element can be retrieved
from the bookkeeping array.
Thus, the total time complexity of counting
sort is $O(n)$.
Counting sort is a very efficient algorithm
but it can only be used when the constant $c$
is small enough, so that the array elements can
be used as indices in the bookkeeping array.
\section{Sorting in C++}
\index{sort@\texttt{sort}}
It is almost never a good idea to use
a home-made sorting algorithm
in a contest, because there are good
implementations available in programming languages.
For example, the C++ standard library contains
the function \texttt{sort} that can be easily used for
sorting arrays and other data structures.
There are many benefits in using a library function.
First, it saves time because there is no need to
implement the function.
Second, the library implementation is
certainly correct and efficient: it is not probable
that a home-made sorting function would be better.
In this section we will see how to use the
C++ \texttt{sort} function.
The following code sorts
a vector in increasing order:
\begin{lstlisting}
vector<int> v = {4,2,5,3,5,8,3};
sort(v.begin(),v.end());
\end{lstlisting}
After the sorting, the contents of the
vector will be
$[2,3,3,4,5,5,8]$.
The default sorting order is increasing,
but a reverse order is possible as follows:
\begin{lstlisting}
sort(v.rbegin(),v.rend());
\end{lstlisting}
An ordinary array can be sorted as follows:
\begin{lstlisting}
int n = 7; // array size
int a[] = {4,2,5,3,5,8,3};
sort(a,a+n);
\end{lstlisting}
\newpage
The following code sorts the string \texttt{s}:
\begin{lstlisting}
string s = "monkey";
sort(s.begin(), s.end());
\end{lstlisting}
Sorting a string means that the characters
of the string are sorted.
For example, the string ''monkey'' becomes ''ekmnoy''.
\subsubsection{Comparison operators}
\index{comparison operator}
The function \texttt{sort} requires that
a \key{comparison operator} is defined for the data type
of the elements to be sorted.
When sorting, this operator will be used
whenever it is necessary to find out the order of two elements.
Most C++ data types have a built-in comparison operator,
and elements of those types can be sorted automatically.
For example, numbers are sorted according to their values
and strings are sorted in alphabetical order.
\index{pair@\texttt{pair}}
Pairs (\texttt{pair}) are sorted primarily according to their
first elements (\texttt{first}).
However, if the first elements of two pairs are equal,
they are sorted according to their second elements (\texttt{second}):
\begin{lstlisting}
vector<pair<int,int>> v;
v.push_back({1,5});
v.push_back({2,3});
v.push_back({1,2});
sort(v.begin(), v.end());
\end{lstlisting}
After this, the order of the pairs is
$(1,2)$, $(1,5)$ and $(2,3)$.
\index{tuple@\texttt{tuple}}
In a similar way, tuples (\texttt{tuple})
are sorted primarily by the first element,
secondarily by the second element, etc.\footnote{Note that in some older compilers,
the function \texttt{make\_tuple} has to be used to create a tuple instead of
braces (for example, \texttt{make\_tuple(2,1,4)} instead of \texttt{\{2,1,4\}}).}:
\begin{lstlisting}
vector<tuple<int,int,int>> v;
v.push_back({2,1,4});
v.push_back({1,5,3});
v.push_back({2,1,3});
sort(v.begin(), v.end());
\end{lstlisting}
After this, the order of the tuples is
$(1,5,3)$, $(2,1,3)$ and $(2,1,4)$.
\subsubsection{User-defined structs}
User-defined structs do not have a comparison
operator automatically.
The operator should be defined inside
the struct as a function
\texttt{operator<},
whose parameter is another element of the same type.
The operator should return \texttt{true}
if the element is smaller than the parameter,
and \texttt{false} otherwise.
For example, the following struct \texttt{P}
contains the x and y coordinates of a point.
The comparison operator is defined so that
the points are sorted primarily by the x coordinate
and secondarily by the y coordinate.
\begin{lstlisting}
struct P {
int x, y;
bool operator<(const P &p) {
if (x != p.x) return x < p.x;
else return y < p.y;
}
};
\end{lstlisting}
\subsubsection{Comparison functions}
\index{comparison function}
It is also possible to give an external
\key{comparison function} to the \texttt{sort} function
as a callback function.
For example, the following comparison function \texttt{comp}
sorts strings primarily by length and secondarily
by alphabetical order:
\begin{lstlisting}
bool comp(string a, string b) {
if (a.size() != b.size()) return a.size() < b.size();
return a < b;
}
\end{lstlisting}
Now a vector of strings can be sorted as follows:
\begin{lstlisting}
sort(v.begin(), v.end(), comp);
\end{lstlisting}
\section{Binary search}
\index{binary search}
A general method for searching for an element
in an array is to use a \texttt{for} loop
that iterates through the elements of the array.
For example, the following code searches for
an element $x$ in an array:
\begin{lstlisting}
for (int i = 0; i < n; i++) {
if (array[i] == x) {
// x found at index i
}
}
\end{lstlisting}
The time complexity of this approach is $O(n)$,
because in the worst case, it is necessary to check
all elements of the array.
If the order of the elements is arbitrary,
this is also the best possible approach, because
there is no additional information available where
in the array we should search for the element $x$.
However, if the array is \emph{sorted},
the situation is different.
In this case it is possible to perform the
search much faster, because the order of the
elements in the array guides the search.
The following \key{binary search} algorithm
efficiently searches for an element in a sorted array
in $O(\log n)$ time.
\subsubsection{Method 1}
The usual way to implement binary search
resembles looking for a word in a dictionary.
The search maintains an active region in the array,
which initially contains all array elements.
Then, a number of steps is performed,
each of which halves the size of the region.
At each step, the search checks the middle element
of the active region.
If the middle element is the target element,
the search terminates.
Otherwise, the search recursively continues
to the left or right half of the region,
depending on the value of the middle element.
The above idea can be implemented as follows:
\begin{lstlisting}
int a = 0, b = n-1;
while (a <= b) {
int k = (a+b)/2;
if (array[k] == x) {
// x found at index k
}
if (array[k] > x) b = k-1;
else a = k+1;
}
\end{lstlisting}
In this implementation, the active region is $a \ldots b$,
and initially the region is $0 \ldots n-1$.
The algorithm halves the size of the region at each step,
so the time complexity is $O(\log n)$.
\subsubsection{Method 2}
An alternative method to implement binary search
is based on an efficient way to iterate through
the elements of the array.
The idea is to make jumps and slow the speed
when we get closer to the target element.
The search goes through the array from left to
right, and the initial jump length is $n/2$.
At each step, the jump length will be halved:
first $n/4$, then $n/8$, $n/16$, etc., until
finally the length is 1.
After the jumps, either the target element has
been found or we know that it does not appear in the array.
The following code implements the above idea:
\begin{lstlisting}
int k = 0;
for (int b = n/2; b >= 1; b /= 2) {
while (k+b < n && array[k+b] <= x) k += b;
}
if (array[k] == x) {
// x found at index k
}
\end{lstlisting}
During the search, the variable $b$
contains the current jump length.
The time complexity of the algorithm is $O(\log n)$,
because the code in the \texttt{while} loop
is performed at most twice for each jump length.
\subsubsection{C++ functions}
The C++ standard library contains the following functions
that are based on binary search and work in logarithmic time:
\begin{itemize}
\item \texttt{lower\_bound} returns a pointer to the
first array element whose value is at least $x$.
\item \texttt{upper\_bound} returns a pointer to the
first array element whose value is larger than $x$.
\item \texttt{equal\_range} returns both above pointers.
\end{itemize}
The functions assume that the array is sorted.
If there is no such element, the pointer points to
the element after the last array element.
For example, the following code finds out whether
an array contains an element with value $x$:
\begin{lstlisting}
auto k = lower_bound(array,array+n,x)-array;
if (k < n && array[k] == x) {
// x found at index k
}
\end{lstlisting}
Then, the following code counts the number of elements
whose value is $x$:
\begin{lstlisting}
auto a = lower_bound(array, array+n, x);
auto b = upper_bound(array, array+n, x);
cout << b-a << "\n";
\end{lstlisting}
Using \texttt{equal\_range}, the code becomes shorter:
\begin{lstlisting}
auto r = equal_range(array, array+n, x);
cout << r.second-r.first << "\n";
\end{lstlisting}
\subsubsection{Finding the smallest solution}
An important use for binary search is
to find the position where the value of a \emph{function} changes.
Suppose that we wish to find the smallest value $k$
that is a valid solution for a problem.
We are given a function $\texttt{ok}(x)$
that returns \texttt{true} if $x$ is a valid solution
and \texttt{false} otherwise.
In addition, we know that $\texttt{ok}(x)$ is \texttt{false}
when $x<k$ and \texttt{true} when $x \ge k$.
The situation looks as follows:
\begin{center}
\begin{tabular}{r|rrrrrrrr}
$x$ & 0 & 1 & $\cdots$ & $k-1$ & $k$ & $k+1$ & $\cdots$ \\
\hline
$\texttt{ok}(x)$ & \texttt{false} & \texttt{false}
& $\cdots$ & \texttt{false} & \texttt{true} & \texttt{true} & $\cdots$ \\
\end{tabular}
\end{center}
\noindent
Now, the value of $k$ can be found using binary search:
\begin{lstlisting}
int x = -1;
for (int b = z; b >= 1; b /= 2) {
while (!ok(x+b)) x += b;
}
int k = x+1;
\end{lstlisting}
The search finds the largest value of $x$ for which
$\texttt{ok}(x)$ is \texttt{false}.
Thus, the next value $k=x+1$
is the smallest possible value for which
$\texttt{ok}(k)$ is \texttt{true}.
The initial jump length $z$ has to be
large enough, for example some value
for which we know beforehand that $\texttt{ok}(z)$ is \texttt{true}.
The algorithm calls the function \texttt{ok}
$O(\log z)$ times, so the total time complexity
depends on the function \texttt{ok}.
For example, if the function works in $O(n)$ time,
the total time complexity is $O(n \log z)$.
\subsubsection{Finding the maximum value}
Binary search can also be used to find
the maximum value for a function that is
first increasing and then decreasing.
Our task is to find a position $k$ such that
\begin{itemize}
\item
$f(x)<f(x+1)$ when $x<k$, and
\item
$f(x)>f(x+1)$ when $x \ge k$.
\end{itemize}
The idea is to use binary search
for finding the largest value of $x$
for which $f(x)<f(x+1)$.
This implies that $k=x+1$
because $f(x+1)>f(x+2)$.
The following code implements the search:
\begin{lstlisting}
int x = -1;
for (int b = z; b >= 1; b /= 2) {
while (f(x+b) < f(x+b+1)) x += b;
}
int k = x+1;
\end{lstlisting}
Note that unlike in the ordinary binary search,
here it is not allowed that consecutive values
of the function are equal.
In this case it would not be possible to know
how to continue the search.

794
chapter04.tex Normal file
View File

@ -0,0 +1,794 @@
\chapter{Data structures}
\index{data structure}
A \key{data structure} is a way to store
data in the memory of a computer.
It is important to choose an appropriate
data structure for a problem,
because each data structure has its own
advantages and disadvantages.
The crucial question is: which operations
are efficient in the chosen data structure?
This chapter introduces the most important
data structures in the C++ standard library.
It is a good idea to use the standard library
whenever possible,
because it will save a lot of time.
Later in the book we will learn about more sophisticated
data structures that are not available
in the standard library.
\section{Dynamic arrays}
\index{dynamic array}
\index{vector}
A \key{dynamic array} is an array whose
size can be changed during the execution
of the program.
The most popular dynamic array in C++ is
the \texttt{vector} structure,
which can be used almost like an ordinary array.
The following code creates an empty vector and
adds three elements to it:
\begin{lstlisting}
vector<int> v;
v.push_back(3); // [3]
v.push_back(2); // [3,2]
v.push_back(5); // [3,2,5]
\end{lstlisting}
After this, the elements can be accessed like in an ordinary array:
\begin{lstlisting}
cout << v[0] << "\n"; // 3
cout << v[1] << "\n"; // 2
cout << v[2] << "\n"; // 5
\end{lstlisting}
The function \texttt{size} returns the number of elements in the vector.
The following code iterates through
the vector and prints all elements in it:
\begin{lstlisting}
for (int i = 0; i < v.size(); i++) {
cout << v[i] << "\n";
}
\end{lstlisting}
\begin{samepage}
A shorter way to iterate through a vector is as follows:
\begin{lstlisting}
for (auto x : v) {
cout << x << "\n";
}
\end{lstlisting}
\end{samepage}
The function \texttt{back} returns the last element
in the vector, and
the function \texttt{pop\_back} removes the last element:
\begin{lstlisting}
vector<int> v;
v.push_back(5);
v.push_back(2);
cout << v.back() << "\n"; // 2
v.pop_back();
cout << v.back() << "\n"; // 5
\end{lstlisting}
The following code creates a vector with five elements:
\begin{lstlisting}
vector<int> v = {2,4,2,5,1};
\end{lstlisting}
Another way to create a vector is to give the number
of elements and the initial value for each element:
\begin{lstlisting}
// size 10, initial value 0
vector<int> v(10);
\end{lstlisting}
\begin{lstlisting}
// size 10, initial value 5
vector<int> v(10, 5);
\end{lstlisting}
The internal implementation of a vector
uses an ordinary array.
If the size of the vector increases and
the array becomes too small,
a new array is allocated and all the
elements are moved to the new array.
However, this does not happen often and the
average time complexity of
\texttt{push\_back} is $O(1)$.
\index{string}
The \texttt{string} structure
is also a dynamic array that can be used almost like a vector.
In addition, there is special syntax for strings
that is not available in other data structures.
Strings can be combined using the \texttt{+} symbol.
The function $\texttt{substr}(k,x)$ returns the substring
that begins at position $k$ and has length $x$,
and the function $\texttt{find}(\texttt{t})$ finds the position
of the first occurrence of a substring \texttt{t}.
The following code presents some string operations:
\begin{lstlisting}
string a = "hatti";
string b = a+a;
cout << b << "\n"; // hattihatti
b[5] = 'v';
cout << b << "\n"; // hattivatti
string c = b.substr(3,4);
cout << c << "\n"; // tiva
\end{lstlisting}
\section{Set structures}
\index{set}
A \key{set} is a data structure that
maintains a collection of elements.
The basic operations of sets are element
insertion, search and removal.
The C++ standard library contains two set
implementations:
The structure \texttt{set} is based on a balanced
binary tree and its operations work in $O(\log n)$ time.
The structure \texttt{unordered\_set} uses hashing,
and its operations work in $O(1)$ time on average.
The choice of which set implementation to use
is often a matter of taste.
The benefit of the \texttt{set} structure
is that it maintains the order of the elements
and provides functions that are not available
in \texttt{unordered\_set}.
On the other hand, \texttt{unordered\_set}
can be more efficient.
The following code creates a set
that contains integers,
and shows some of the operations.
The function \texttt{insert} adds an element to the set,
the function \texttt{count} returns the number of occurrences
of an element in the set,
and the function \texttt{erase} removes an element from the set.
\begin{lstlisting}
set<int> s;
s.insert(3);
s.insert(2);
s.insert(5);
cout << s.count(3) << "\n"; // 1
cout << s.count(4) << "\n"; // 0
s.erase(3);
s.insert(4);
cout << s.count(3) << "\n"; // 0
cout << s.count(4) << "\n"; // 1
\end{lstlisting}
A set can be used mostly like a vector,
but it is not possible to access
the elements using the \texttt{[]} notation.
The following code creates a set,
prints the number of elements in it, and then
iterates through all the elements:
\begin{lstlisting}
set<int> s = {2,5,6,8};
cout << s.size() << "\n"; // 4
for (auto x : s) {
cout << x << "\n";
}
\end{lstlisting}
An important property of sets is
that all their elements are \emph{distinct}.
Thus, the function \texttt{count} always returns
either 0 (the element is not in the set)
or 1 (the element is in the set),
and the function \texttt{insert} never adds
an element to the set if it is
already there.
The following code illustrates this:
\begin{lstlisting}
set<int> s;
s.insert(5);
s.insert(5);
s.insert(5);
cout << s.count(5) << "\n"; // 1
\end{lstlisting}
C++ also contains the structures
\texttt{multiset} and \texttt{unordered\_multiset}
that otherwise work like \texttt{set}
and \texttt{unordered\_set}
but they can contain multiple instances of an element.
For example, in the following code all three instances
of the number 5 are added to a multiset:
\begin{lstlisting}
multiset<int> s;
s.insert(5);
s.insert(5);
s.insert(5);
cout << s.count(5) << "\n"; // 3
\end{lstlisting}
The function \texttt{erase} removes
all instances of an element
from a multiset:
\begin{lstlisting}
s.erase(5);
cout << s.count(5) << "\n"; // 0
\end{lstlisting}
Often, only one instance should be removed,
which can be done as follows:
\begin{lstlisting}
s.erase(s.find(5));
cout << s.count(5) << "\n"; // 2
\end{lstlisting}
\section{Map structures}
\index{map}
A \key{map} is a generalized array
that consists of key-value-pairs.
While the keys in an ordinary array are always
the consecutive integers $0,1,\ldots,n-1$,
where $n$ is the size of the array,
the keys in a map can be of any data type and
they do not have to be consecutive values.
The C++ standard library contains two map
implementations that correspond to the set
implementations: the structure
\texttt{map} is based on a balanced
binary tree and accessing elements
takes $O(\log n)$ time,
while the structure
\texttt{unordered\_map} uses hashing
and accessing elements takes $O(1)$ time on average.
The following code creates a map
where the keys are strings and the values are integers:
\begin{lstlisting}
map<string,int> m;
m["monkey"] = 4;
m["banana"] = 3;
m["harpsichord"] = 9;
cout << m["banana"] << "\n"; // 3
\end{lstlisting}
If the value of a key is requested
but the map does not contain it,
the key is automatically added to the map with
a default value.
For example, in the following code,
the key ''aybabtu'' with value 0
is added to the map.
\begin{lstlisting}
map<string,int> m;
cout << m["aybabtu"] << "\n"; // 0
\end{lstlisting}
The function \texttt{count} checks
if a key exists in a map:
\begin{lstlisting}
if (m.count("aybabtu")) {
// key exists
}
\end{lstlisting}
The following code prints all the keys and values
in a map:
\begin{lstlisting}
for (auto x : m) {
cout << x.first << " " << x.second << "\n";
}
\end{lstlisting}
\section{Iterators and ranges}
\index{iterator}
Many functions in the C++ standard library
operate with iterators.
An \key{iterator} is a variable that points
to an element in a data structure.
The often used iterators \texttt{begin}
and \texttt{end} define a range that contains
all elements in a data structure.
The iterator \texttt{begin} points to
the first element in the data structure,
and the iterator \texttt{end} points to
the position \emph{after} the last element.
The situation looks as follows:
\begin{center}
\begin{tabular}{llllllllll}
\{ & 3, & 4, & 6, & 8, & 12, & 13, & 14, & 17 & \} \\
& $\uparrow$ & & & & & & & & $\uparrow$ \\
& \multicolumn{3}{l}{\texttt{s.begin()}} & & & & & & \texttt{s.end()} \\
\end{tabular}
\end{center}
Note the asymmetry in the iterators:
\texttt{s.begin()} points to an element in the data structure,
while \texttt{s.end()} points outside the data structure.
Thus, the range defined by the iterators is \emph{half-open}.
\subsubsection{Working with ranges}
Iterators are used in C++ standard library functions
that are given a range of elements in a data structure.
Usually, we want to process all elements in a
data structure, so the iterators
\texttt{begin} and \texttt{end} are given for the function.
For example, the following code sorts a vector
using the function \texttt{sort},
then reverses the order of the elements using the function
\texttt{reverse}, and finally shuffles the order of
the elements using the function \texttt{random\_shuffle}.
\index{sort@\texttt{sort}}
\index{reverse@\texttt{reverse}}
\index{random\_shuffle@\texttt{random\_shuffle}}
\begin{lstlisting}
sort(v.begin(), v.end());
reverse(v.begin(), v.end());
random_shuffle(v.begin(), v.end());
\end{lstlisting}
These functions can also be used with an ordinary array.
In this case, the functions are given pointers to the array
instead of iterators:
\newpage
\begin{lstlisting}
sort(a, a+n);
reverse(a, a+n);
random_shuffle(a, a+n);
\end{lstlisting}
\subsubsection{Set iterators}
Iterators are often used to access
elements of a set.
The following code creates an iterator
\texttt{it} that points to the smallest element in a set:
\begin{lstlisting}
set<int>::iterator it = s.begin();
\end{lstlisting}
A shorter way to write the code is as follows:
\begin{lstlisting}
auto it = s.begin();
\end{lstlisting}
The element to which an iterator points
can be accessed using the \texttt{*} symbol.
For example, the following code prints
the first element in the set:
\begin{lstlisting}
auto it = s.begin();
cout << *it << "\n";
\end{lstlisting}
Iterators can be moved using the operators
\texttt{++} (forward) and \texttt{--} (backward),
meaning that the iterator moves to the next
or previous element in the set.
The following code prints all the elements
in increasing order:
\begin{lstlisting}
for (auto it = s.begin(); it != s.end(); it++) {
cout << *it << "\n";
}
\end{lstlisting}
The following code prints the largest element in the set:
\begin{lstlisting}
auto it = s.end(); it--;
cout << *it << "\n";
\end{lstlisting}
The function $\texttt{find}(x)$ returns an iterator
that points to an element whose value is $x$.
However, if the set does not contain $x$,
the iterator will be \texttt{end}.
\begin{lstlisting}
auto it = s.find(x);
if (it == s.end()) {
// x is not found
}
\end{lstlisting}
The function $\texttt{lower\_bound}(x)$ returns
an iterator to the smallest element in the set
whose value is \emph{at least} $x$, and
the function $\texttt{upper\_bound}(x)$
returns an iterator to the smallest element in the set
whose value is \emph{larger than} $x$.
In both functions, if such an element does not exist,
the return value is \texttt{end}.
These functions are not supported by the
\texttt{unordered\_set} structure which
does not maintain the order of the elements.
\begin{samepage}
For example, the following code finds the element
nearest to $x$:
\begin{lstlisting}
auto it = s.lower_bound(x);
if (it == s.begin()) {
cout << *it << "\n";
} else if (it == s.end()) {
it--;
cout << *it << "\n";
} else {
int a = *it; it--;
int b = *it;
if (x-b < a-x) cout << b << "\n";
else cout << a << "\n";
}
\end{lstlisting}
The code assumes that the set is not empty,
and goes through all possible cases
using an iterator \texttt{it}.
First, the iterator points to the smallest
element whose value is at least $x$.
If \texttt{it} equals \texttt{begin},
the corresponding element is nearest to $x$.
If \texttt{it} equals \texttt{end},
the largest element in the set is nearest to $x$.
If none of the previous cases hold,
the element nearest to $x$ is either the
element that corresponds to \texttt{it} or the previous element.
\end{samepage}
\section{Other structures}
\subsubsection{Bitset}
\index{bitset}
A \key{bitset} is an array
whose each value is either 0 or 1.
For example, the following code creates
a bitset that contains 10 elements:
\begin{lstlisting}
bitset<10> s;
s[1] = 1;
s[3] = 1;
s[4] = 1;
s[7] = 1;
cout << s[4] << "\n"; // 1
cout << s[5] << "\n"; // 0
\end{lstlisting}
The benefit of using bitsets is that
they require less memory than ordinary arrays,
because each element in a bitset only
uses one bit of memory.
For example,
if $n$ bits are stored in an \texttt{int} array,
$32n$ bits of memory will be used,
but a corresponding bitset only requires $n$ bits of memory.
In addition, the values of a bitset
can be efficiently manipulated using
bit operators, which makes it possible to
optimize algorithms using bit sets.
The following code shows another way to create the above bitset:
\begin{lstlisting}
bitset<10> s(string("0010011010")); // from right to left
cout << s[4] << "\n"; // 1
cout << s[5] << "\n"; // 0
\end{lstlisting}
The function \texttt{count} returns the number
of ones in the bitset:
\begin{lstlisting}
bitset<10> s(string("0010011010"));
cout << s.count() << "\n"; // 4
\end{lstlisting}
The following code shows examples of using bit operations:
\begin{lstlisting}
bitset<10> a(string("0010110110"));
bitset<10> b(string("1011011000"));
cout << (a&b) << "\n"; // 0010010000
cout << (a|b) << "\n"; // 1011111110
cout << (a^b) << "\n"; // 1001101110
\end{lstlisting}
\subsubsection{Deque}
\index{deque}
A \key{deque} is a dynamic array
whose size can be efficiently
changed at both ends of the array.
Like a vector, a deque provides the functions
\texttt{push\_back} and \texttt{pop\_back}, but
it also includes the functions
\texttt{push\_front} and \texttt{pop\_front}
which are not available in a vector.
A deque can be used as follows:
\begin{lstlisting}
deque<int> d;
d.push_back(5); // [5]
d.push_back(2); // [5,2]
d.push_front(3); // [3,5,2]
d.pop_back(); // [3,5]
d.pop_front(); // [5]
\end{lstlisting}
The internal implementation of a deque
is more complex than that of a vector,
and for this reason, a deque is slower than a vector.
Still, both adding and removing
elements take $O(1)$ time on average at both ends.
\subsubsection{Stack}
\index{stack}
A \key{stack}
is a data structure that provides two
$O(1)$ time operations:
adding an element to the top,
and removing an element from the top.
It is only possible to access the top
element of a stack.
The following code shows how a stack can be used:
\begin{lstlisting}
stack<int> s;
s.push(3);
s.push(2);
s.push(5);
cout << s.top(); // 5
s.pop();
cout << s.top(); // 2
\end{lstlisting}
\subsubsection{Queue}
\index{queue}
A \key{queue} also
provides two $O(1)$ time operations:
adding an element to the end of the queue,
and removing the first element in the queue.
It is only possible to access the first
and last element of a queue.
The following code shows how a queue can be used:
\begin{lstlisting}
queue<int> q;
q.push(3);
q.push(2);
q.push(5);
cout << q.front(); // 3
q.pop();
cout << q.front(); // 2
\end{lstlisting}
\subsubsection{Priority queue}
\index{priority queue}
\index{heap}
A \key{priority queue}
maintains a set of elements.
The supported operations are insertion and,
depending on the type of the queue,
retrieval and removal of
either the minimum or maximum element.
Insertion and removal take $O(\log n)$ time,
and retrieval takes $O(1)$ time.
While an ordered set efficiently supports
all the operations of a priority queue,
the benefit of using a priority queue is
that it has smaller constant factors.
A priority queue is usually implemented using
a heap structure that is much simpler than a
balanced binary tree used in an ordered set.
\begin{samepage}
By default, the elements in a C++
priority queue are sorted in decreasing order,
and it is possible to find and remove the
largest element in the queue.
The following code illustrates this:
\begin{lstlisting}
priority_queue<int> q;
q.push(3);
q.push(5);
q.push(7);
q.push(2);
cout << q.top() << "\n"; // 7
q.pop();
cout << q.top() << "\n"; // 5
q.pop();
q.push(6);
cout << q.top() << "\n"; // 6
q.pop();
\end{lstlisting}
\end{samepage}
If we want to create a priority queue
that supports finding and removing
the smallest element,
we can do it as follows:
\begin{lstlisting}
priority_queue<int,vector<int>,greater<int>> q;
\end{lstlisting}
\subsubsection{Policy-based data structures}
The \texttt{g++} compiler also supports
some data structures that are not part
of the C++ standard library.
Such structures are called \emph{policy-based}
data structures.
To use these structures, the following lines
must be added to the code:
\begin{lstlisting}
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
\end{lstlisting}
After this, we can define a data structure \texttt{indexed\_set} that
is like \texttt{set} but can be indexed like an array.
The definition for \texttt{int} values is as follows:
\begin{lstlisting}
typedef tree<int,null_type,less<int>,rb_tree_tag,
tree_order_statistics_node_update> indexed_set;
\end{lstlisting}
Now we can create a set as follows:
\begin{lstlisting}
indexed_set s;
s.insert(2);
s.insert(3);
s.insert(7);
s.insert(9);
\end{lstlisting}
The speciality of this set is that we have access to
the indices that the elements would have in a sorted array.
The function $\texttt{find\_by\_order}$ returns
an iterator to the element at a given position:
\begin{lstlisting}
auto x = s.find_by_order(2);
cout << *x << "\n"; // 7
\end{lstlisting}
And the function $\texttt{order\_of\_key}$
returns the position of a given element:
\begin{lstlisting}
cout << s.order_of_key(7) << "\n"; // 2
\end{lstlisting}
If the element does not appear in the set,
we get the position that the element would have
in the set:
\begin{lstlisting}
cout << s.order_of_key(6) << "\n"; // 2
cout << s.order_of_key(8) << "\n"; // 3
\end{lstlisting}
Both the functions work in logarithmic time.
\section{Comparison to sorting}
It is often possible to solve a problem
using either data structures or sorting.
Sometimes there are remarkable differences
in the actual efficiency of these approaches,
which may be hidden in their time complexities.
Let us consider a problem where
we are given two lists $A$ and $B$
that both contain $n$ elements.
Our task is to calculate the number of elements
that belong to both of the lists.
For example, for the lists
\[A = [5,2,8,9] \hspace{10px} \textrm{and} \hspace{10px} B = [3,2,9,5],\]
the answer is 3 because the numbers 2, 5
and 9 belong to both of the lists.
A straightforward solution to the problem is
to go through all pairs of elements in $O(n^2)$ time,
but next we will focus on
more efficient algorithms.
\subsubsection{Algorithm 1}
We construct a set of the elements that appear in $A$,
and after this, we iterate through the elements
of $B$ and check for each elements if it
also belongs to $A$.
This is efficient because the elements of $A$
are in a set.
Using the \texttt{set} structure,
the time complexity of the algorithm is $O(n \log n)$.
\subsubsection{Algorithm 2}
It is not necessary to maintain an ordered set,
so instead of the \texttt{set} structure
we can also use the \texttt{unordered\_set} structure.
This is an easy way to make the algorithm
more efficient, because we only have to change
the underlying data structure.
The time complexity of the new algorithm is $O(n)$.
\subsubsection{Algorithm 3}
Instead of data structures, we can use sorting.
First, we sort both lists $A$ and $B$.
After this, we iterate through both the lists
at the same time and find the common elements.
The time complexity of sorting is $O(n \log n)$,
and the rest of the algorithm works in $O(n)$ time,
so the total time complexity is $O(n \log n)$.
\subsubsection{Efficiency comparison}
The following table shows how efficient
the above algorithms are when $n$ varies and
the elements of the lists are random
integers between $1 \ldots 10^9$:
\begin{center}
\begin{tabular}{rrrr}
$n$ & Algorithm 1 & Algorithm 2 & Algorithm 3 \\
\hline
$10^6$ & $1.5$ s & $0.3$ s & $0.2$ s \\
$2 \cdot 10^6$ & $3.7$ s & $0.8$ s & $0.3$ s \\
$3 \cdot 10^6$ & $5.7$ s & $1.3$ s & $0.5$ s \\
$4 \cdot 10^6$ & $7.7$ s & $1.7$ s & $0.7$ s \\
$5 \cdot 10^6$ & $10.0$ s & $2.3$ s & $0.9$ s \\
\end{tabular}
\end{center}
Algorithms 1 and 2 are equal except that
they use different set structures.
In this problem, this choice has an important effect on
the running time, because Algorithm 2
is 45 times faster than Algorithm 1.
However, the most efficient algorithm is Algorithm 3
which uses sorting.
It only uses half the time compared to Algorithm 2.
Interestingly, the time complexity of both
Algorithm 1 and Algorithm 3 is $O(n \log n)$,
but despite this, Algorithm 3 is ten times faster.
This can be explained by the fact that
sorting is a simple procedure and it is done
only once at the beginning of Algorithm 3,
and the rest of the algorithm works in linear time.
On the other hand,
Algorithm 1 maintains a complex balanced binary tree
during the whole algorithm.

758
chapter05.tex Normal file
View File

@ -0,0 +1,758 @@
\chapter{Complete search}
\key{Complete search}
is a general method that can be used
to solve almost any algorithm problem.
The idea is to generate all possible
solutions to the problem using brute force,
and then select the best solution or count the
number of solutions, depending on the problem.
Complete search is a good technique
if there is enough time to go through all the solutions,
because the search is usually easy to implement
and it always gives the correct answer.
If complete search is too slow,
other techniques, such as greedy algorithms or
dynamic programming, may be needed.
\section{Generating subsets}
\index{subset}
We first consider the problem of generating
all subsets of a set of $n$ elements.
For example, the subsets of $\{0,1,2\}$ are
$\emptyset$, $\{0\}$, $\{1\}$, $\{2\}$, $\{0,1\}$,
$\{0,2\}$, $\{1,2\}$ and $\{0,1,2\}$.
There are two common methods to generate subsets:
we can either perform a recursive search
or exploit the bit representation of integers.
\subsubsection{Method 1}
An elegant way to go through all subsets
of a set is to use recursion.
The following function \texttt{search}
generates the subsets of the set
$\{0,1,\ldots,n-1\}$.
The function maintains a vector \texttt{subset}
that will contain the elements of each subset.
The search begins when the function is called
with parameter 0.
\begin{lstlisting}
void search(int k) {
if (k == n) {
// process subset
} else {
search(k+1);
subset.push_back(k);
search(k+1);
subset.pop_back();
}
}
\end{lstlisting}
When the function \texttt{search}
is called with parameter $k$,
it decides whether to include the
element $k$ in the subset or not,
and in both cases,
then calls itself with parameter $k+1$
However, if $k=n$, the function notices that
all elements have been processed
and a subset has been generated.
The following tree illustrates the function calls when $n=3$.
We can always choose either the left branch
($k$ is not included in the subset) or the right branch
($k$ is included in the subset).
\begin{center}
\begin{tikzpicture}[scale=.45]
\begin{scope}
\small
\node at (0,0) {$\texttt{search}(0)$};
\node at (-8,-4) {$\texttt{search}(1)$};
\node at (8,-4) {$\texttt{search}(1)$};
\path[draw,thick,->] (0,0-0.5) -- (-8,-4+0.5);
\path[draw,thick,->] (0,0-0.5) -- (8,-4+0.5);
\node at (-12,-8) {$\texttt{search}(2)$};
\node at (-4,-8) {$\texttt{search}(2)$};
\node at (4,-8) {$\texttt{search}(2)$};
\node at (12,-8) {$\texttt{search}(2)$};
\path[draw,thick,->] (-8,-4-0.5) -- (-12,-8+0.5);
\path[draw,thick,->] (-8,-4-0.5) -- (-4,-8+0.5);
\path[draw,thick,->] (8,-4-0.5) -- (4,-8+0.5);
\path[draw,thick,->] (8,-4-0.5) -- (12,-8+0.5);
\node at (-14,-12) {$\texttt{search}(3)$};
\node at (-10,-12) {$\texttt{search}(3)$};
\node at (-6,-12) {$\texttt{search}(3)$};
\node at (-2,-12) {$\texttt{search}(3)$};
\node at (2,-12) {$\texttt{search}(3)$};
\node at (6,-12) {$\texttt{search}(3)$};
\node at (10,-12) {$\texttt{search}(3)$};
\node at (14,-12) {$\texttt{search}(3)$};
\node at (-14,-13.5) {$\emptyset$};
\node at (-10,-13.5) {$\{2\}$};
\node at (-6,-13.5) {$\{1\}$};
\node at (-2,-13.5) {$\{1,2\}$};
\node at (2,-13.5) {$\{0\}$};
\node at (6,-13.5) {$\{0,2\}$};
\node at (10,-13.5) {$\{0,1\}$};
\node at (14,-13.5) {$\{0,1,2\}$};
\path[draw,thick,->] (-12,-8-0.5) -- (-14,-12+0.5);
\path[draw,thick,->] (-12,-8-0.5) -- (-10,-12+0.5);
\path[draw,thick,->] (-4,-8-0.5) -- (-6,-12+0.5);
\path[draw,thick,->] (-4,-8-0.5) -- (-2,-12+0.5);
\path[draw,thick,->] (4,-8-0.5) -- (2,-12+0.5);
\path[draw,thick,->] (4,-8-0.5) -- (6,-12+0.5);
\path[draw,thick,->] (12,-8-0.5) -- (10,-12+0.5);
\path[draw,thick,->] (12,-8-0.5) -- (14,-12+0.5);
\end{scope}
\end{tikzpicture}
\end{center}
\subsubsection{Method 2}
Another way to generate subsets is based on
the bit representation of integers.
Each subset of a set of $n$ elements
can be represented as a sequence of $n$ bits,
which corresponds to an integer between $0 \ldots 2^n-1$.
The ones in the bit sequence indicate
which elements are included in the subset.
The usual convention is that
the last bit corresponds to element 0,
the second last bit corresponds to element 1,
and so on.
For example, the bit representation of 25
is 11001, which corresponds to the subset $\{0,3,4\}$.
The following code goes through the subsets
of a set of $n$ elements
\begin{lstlisting}
for (int b = 0; b < (1<<n); b++) {
// process subset
}
\end{lstlisting}
The following code shows how we can find
the elements of a subset that corresponds to a bit sequence.
When processing each subset,
the code builds a vector that contains the
elements in the subset.
\begin{lstlisting}
for (int b = 0; b < (1<<n); b++) {
vector<int> subset;
for (int i = 0; i < n; i++) {
if (b&(1<<i)) subset.push_back(i);
}
}
\end{lstlisting}
\section{Generating permutations}
\index{permutation}
Next we consider the problem of generating
all permutations of a set of $n$ elements.
For example, the permutations of $\{0,1,2\}$ are
$(0,1,2)$, $(0,2,1)$, $(1,0,2)$, $(1,2,0)$,
$(2,0,1)$ and $(2,1,0)$.
Again, there are two approaches:
we can either use recursion or go through the
permutations iteratively.
\subsubsection{Method 1}
Like subsets, permutations can be generated
using recursion.
The following function \texttt{search} goes
through the permutations of the set $\{0,1,\ldots,n-1\}$.
The function builds a vector \texttt{permutation}
that contains the permutation,
and the search begins when the function is
called without parameters.
\begin{lstlisting}
void search() {
if (permutation.size() == n) {
// process permutation
} else {
for (int i = 0; i < n; i++) {
if (chosen[i]) continue;
chosen[i] = true;
permutation.push_back(i);
search();
chosen[i] = false;
permutation.pop_back();
}
}
}
\end{lstlisting}
Each function call adds a new element to
\texttt{permutation}.
The array \texttt{chosen} indicates which
elements are already included in the permutation.
If the size of \texttt{permutation} equals the size of the set,
a permutation has been generated.
\subsubsection{Method 2}
\index{next\_permutation@\texttt{next\_permutation}}
Another method for generating permutations
is to begin with the permutation
$\{0,1,\ldots,n-1\}$ and repeatedly
use a function that constructs the next permutation
in increasing order.
The C++ standard library contains the function
\texttt{next\_permutation} that can be used for this:
\begin{lstlisting}
vector<int> permutation;
for (int i = 0; i < n; i++) {
permutation.push_back(i);
}
do {
// process permutation
} while (next_permutation(permutation.begin(),permutation.end()));
\end{lstlisting}
\section{Backtracking}
\index{backtracking}
A \key{backtracking} algorithm
begins with an empty solution
and extends the solution step by step.
The search recursively
goes through all different ways how
a solution can be constructed.
\index{queen problem}
As an example, consider the problem of
calculating the number
of ways $n$ queens can be placed on
an $n \times n$ chessboard so that
no two queens attack each other.
For example, when $n=4$,
there are two possible solutions:
\begin{center}
\begin{tikzpicture}[scale=.65]
\begin{scope}
\draw (0, 0) grid (4, 4);
\node at (1.5,3.5) {\symqueen};
\node at (3.5,2.5) {\symqueen};
\node at (0.5,1.5) {\symqueen};
\node at (2.5,0.5) {\symqueen};
\draw (6, 0) grid (10, 4);
\node at (6+2.5,3.5) {\symqueen};
\node at (6+0.5,2.5) {\symqueen};
\node at (6+3.5,1.5) {\symqueen};
\node at (6+1.5,0.5) {\symqueen};
\end{scope}
\end{tikzpicture}
\end{center}
The problem can be solved using backtracking
by placing queens to the board row by row.
More precisely, exactly one queen will
be placed on each row so that no queen attacks
any of the queens placed before.
A solution has been found when all
$n$ queens have been placed on the board.
For example, when $n=4$,
some partial solutions generated by
the backtracking algorithm are as follows:
\begin{center}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (4, 4);
\draw (-9, -6) grid (-5, -2);
\draw (-3, -6) grid (1, -2);
\draw (3, -6) grid (7, -2);
\draw (9, -6) grid (13, -2);
\node at (-9+0.5,-3+0.5) {\symqueen};
\node at (-3+1+0.5,-3+0.5) {\symqueen};
\node at (3+2+0.5,-3+0.5) {\symqueen};
\node at (9+3+0.5,-3+0.5) {\symqueen};
\draw (2,0) -- (-7,-2);
\draw (2,0) -- (-1,-2);
\draw (2,0) -- (5,-2);
\draw (2,0) -- (11,-2);
\draw (-11, -12) grid (-7, -8);
\draw (-6, -12) grid (-2, -8);
\draw (-1, -12) grid (3, -8);
\draw (4, -12) grid (8, -8);
\draw[white] (11, -12) grid (15, -8);
\node at (-11+1+0.5,-9+0.5) {\symqueen};
\node at (-6+1+0.5,-9+0.5) {\symqueen};
\node at (-1+1+0.5,-9+0.5) {\symqueen};
\node at (4+1+0.5,-9+0.5) {\symqueen};
\node at (-11+0+0.5,-10+0.5) {\symqueen};
\node at (-6+1+0.5,-10+0.5) {\symqueen};
\node at (-1+2+0.5,-10+0.5) {\symqueen};
\node at (4+3+0.5,-10+0.5) {\symqueen};
\draw (-1,-6) -- (-9,-8);
\draw (-1,-6) -- (-4,-8);
\draw (-1,-6) -- (1,-8);
\draw (-1,-6) -- (6,-8);
\node at (-9,-13) {illegal};
\node at (-4,-13) {illegal};
\node at (1,-13) {illegal};
\node at (6,-13) {valid};
\end{scope}
\end{tikzpicture}
\end{center}
At the bottom level, the three first configurations
are illegal, because the queens attack each other.
However, the fourth configuration is valid
and it can be extended to a complete solution by
placing two more queens to the board.
There is only one way to place the two remaining queens.
\begin{samepage}
The algorithm can be implemented as follows:
\begin{lstlisting}
void search(int y) {
if (y == n) {
count++;
return;
}
for (int x = 0; x < n; x++) {
if (column[x] || diag1[x+y] || diag2[x-y+n-1]) continue;
column[x] = diag1[x+y] = diag2[x-y+n-1] = 1;
search(y+1);
column[x] = diag1[x+y] = diag2[x-y+n-1] = 0;
}
}
\end{lstlisting}
\end{samepage}
The search begins by calling \texttt{search(0)}.
The size of the board is $n \times n$,
and the code calculates the number of solutions
to \texttt{count}.
The code assumes that the rows and columns
of the board are numbered from 0 to $n-1$.
When the function \texttt{search} is
called with parameter $y$,
it places a queen on row $y$
and then calls itself with parameter $y+1$.
Then, if $y=n$, a solution has been found
and the variable \texttt{count} is increased by one.
The array \texttt{column} keeps track of columns
that contain a queen,
and the arrays \texttt{diag1} and \texttt{diag2}
keep track of diagonals.
It is not allowed to add another queen to a
column or diagonal that already contains a queen.
For example, the columns and diagonals of
the $4 \times 4$ board are numbered as follows:
\begin{center}
\begin{tikzpicture}[scale=.65]
\begin{scope}
\draw (0-6, 0) grid (4-6, 4);
\node at (-6+0.5,3.5) {$0$};
\node at (-6+1.5,3.5) {$1$};
\node at (-6+2.5,3.5) {$2$};
\node at (-6+3.5,3.5) {$3$};
\node at (-6+0.5,2.5) {$0$};
\node at (-6+1.5,2.5) {$1$};
\node at (-6+2.5,2.5) {$2$};
\node at (-6+3.5,2.5) {$3$};
\node at (-6+0.5,1.5) {$0$};
\node at (-6+1.5,1.5) {$1$};
\node at (-6+2.5,1.5) {$2$};
\node at (-6+3.5,1.5) {$3$};
\node at (-6+0.5,0.5) {$0$};
\node at (-6+1.5,0.5) {$1$};
\node at (-6+2.5,0.5) {$2$};
\node at (-6+3.5,0.5) {$3$};
\draw (0, 0) grid (4, 4);
\node at (0.5,3.5) {$0$};
\node at (1.5,3.5) {$1$};
\node at (2.5,3.5) {$2$};
\node at (3.5,3.5) {$3$};
\node at (0.5,2.5) {$1$};
\node at (1.5,2.5) {$2$};
\node at (2.5,2.5) {$3$};
\node at (3.5,2.5) {$4$};
\node at (0.5,1.5) {$2$};
\node at (1.5,1.5) {$3$};
\node at (2.5,1.5) {$4$};
\node at (3.5,1.5) {$5$};
\node at (0.5,0.5) {$3$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$5$};
\node at (3.5,0.5) {$6$};
\draw (6, 0) grid (10, 4);
\node at (6.5,3.5) {$3$};
\node at (7.5,3.5) {$4$};
\node at (8.5,3.5) {$5$};
\node at (9.5,3.5) {$6$};
\node at (6.5,2.5) {$2$};
\node at (7.5,2.5) {$3$};
\node at (8.5,2.5) {$4$};
\node at (9.5,2.5) {$5$};
\node at (6.5,1.5) {$1$};
\node at (7.5,1.5) {$2$};
\node at (8.5,1.5) {$3$};
\node at (9.5,1.5) {$4$};
\node at (6.5,0.5) {$0$};
\node at (7.5,0.5) {$1$};
\node at (8.5,0.5) {$2$};
\node at (9.5,0.5) {$3$};
\node at (-4,-1) {\texttt{column}};
\node at (2,-1) {\texttt{diag1}};
\node at (8,-1) {\texttt{diag2}};
\end{scope}
\end{tikzpicture}
\end{center}
Let $q(n)$ denote the number of ways
to place $n$ queens on an $n \times n$ chessboard.
The above backtracking
algorithm tells us that, for example, $q(8)=92$.
When $n$ increases, the search quickly becomes slow,
because the number of solutions increases
exponentially.
For example, calculating $q(16)=14772512$
using the above algorithm already takes about a minute
on a modern computer\footnote{There is no known way to efficiently
calculate larger values of $q(n)$. The current record is
$q(27)=234907967154122528$, calculated in 2016 \cite{q27}.}.
\section{Pruning the search}
We can often optimize backtracking
by pruning the search tree.
The idea is to add ''intelligence'' to the algorithm
so that it will notice as soon as possible
if a partial solution cannot be extended
to a complete solution.
Such optimizations can have a tremendous
effect on the efficiency of the search.
Let us consider the problem
of calculating the number of paths
in an $n \times n$ grid from the upper-left corner
to the lower-right corner such that the
path visits each square exactly once.
For example, in a $7 \times 7$ grid,
there are 111712 such paths.
One of the paths is as follows:
\begin{center}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(5.5,0.5) -- (5.5,3.5) -- (3.5,3.5) --
(3.5,5.5) -- (1.5,5.5) -- (1.5,6.5) --
(4.5,6.5) -- (4.5,4.5) -- (5.5,4.5) --
(5.5,6.5) -- (6.5,6.5) -- (6.5,0.5);
\end{scope}
\end{tikzpicture}
\end{center}
We focus on the $7 \times 7$ case,
because its level of difficulty is appropriate to our needs.
We begin with a straightforward backtracking algorithm,
and then optimize it step by step using observations
of how the search can be pruned.
After each optimization, we measure the running time
of the algorithm and the number of recursive calls,
so that we clearly see the effect of each
optimization on the efficiency of the search.
\subsubsection{Basic algorithm}
The first version of the algorithm does not contain
any optimizations. We simply use backtracking to generate
all possible paths from the upper-left corner to
the lower-right corner and count the number of such paths.
\begin{itemize}
\item
running time: 483 seconds
\item
number of recursive calls: 76 billion
\end{itemize}
\subsubsection{Optimization 1}
In any solution, we first move one step
down or right.
There are always two paths that
are symmetric
about the diagonal of the grid
after the first step.
For example, the following paths are symmetric:
\begin{center}
\begin{tabular}{ccc}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(5.5,0.5) -- (5.5,3.5) -- (3.5,3.5) --
(3.5,5.5) -- (1.5,5.5) -- (1.5,6.5) --
(4.5,6.5) -- (4.5,4.5) -- (5.5,4.5) --
(5.5,6.5) -- (6.5,6.5) -- (6.5,0.5);
\end{scope}
\end{tikzpicture}
& \hspace{20px}
&
\begin{tikzpicture}[scale=.55]
\begin{scope}[yscale=1,xscale=-1,rotate=-90]
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(5.5,0.5) -- (5.5,3.5) -- (3.5,3.5) --
(3.5,5.5) -- (1.5,5.5) -- (1.5,6.5) --
(4.5,6.5) -- (4.5,4.5) -- (5.5,4.5) --
(5.5,6.5) -- (6.5,6.5) -- (6.5,0.5);
\end{scope}
\end{tikzpicture}
\end{tabular}
\end{center}
Hence, we can decide that we always first
move one step down (or right),
and finally multiply the number of solutions by two.
\begin{itemize}
\item
running time: 244 seconds
\item
number of recursive calls: 38 billion
\end{itemize}
\subsubsection{Optimization 2}
If the path reaches the lower-right square
before it has visited all other squares of the grid,
it is clear that
it will not be possible to complete the solution.
An example of this is the following path:
\begin{center}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(6.5,0.5);
\end{scope}
\end{tikzpicture}
\end{center}
Using this observation, we can terminate the search
immediately if we reach the lower-right square too early.
\begin{itemize}
\item
running time: 119 seconds
\item
number of recursive calls: 20 billion
\end{itemize}
\subsubsection{Optimization 3}
If the path touches a wall
and can turn either left or right,
the grid splits into two parts
that contain unvisited squares.
For example, in the following situation,
the path can turn either left or right:
\begin{center}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(5.5,0.5) -- (5.5,6.5);
\end{scope}
\end{tikzpicture}
\end{center}
In this case, we cannot visit all squares anymore,
so we can terminate the search.
This optimization is very useful:
\begin{itemize}
\item
running time: 1.8 seconds
\item
number of recursive calls: 221 million
\end{itemize}
\subsubsection{Optimization 4}
The idea of Optimization 3
can be generalized:
if the path cannot continue forward
but can turn either left or right,
the grid splits into two parts
that both contain unvisited squares.
For example, consider the following path:
\begin{center}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\draw (0, 0) grid (7, 7);
\draw[thick,->] (0.5,6.5) -- (0.5,4.5) -- (2.5,4.5) --
(2.5,3.5) -- (0.5,3.5) -- (0.5,0.5) --
(3.5,0.5) -- (3.5,1.5) -- (1.5,1.5) --
(1.5,2.5) -- (4.5,2.5) -- (4.5,0.5) --
(5.5,0.5) -- (5.5,4.5) -- (3.5,4.5);
\end{scope}
\end{tikzpicture}
\end{center}
It is clear that we cannot visit all squares anymore,
so we can terminate the search.
After this optimization, the search is
very efficient:
\begin{itemize}
\item
running time: 0.6 seconds
\item
number of recursive calls: 69 million
\end{itemize}
~\\
Now is a good moment to stop optimizing
the algorithm and see what we have achieved.
The running time of the original algorithm
was 483 seconds,
and now after the optimizations,
the running time is only 0.6 seconds.
Thus, the algorithm became nearly 1000 times
faster after the optimizations.
This is a usual phenomenon in backtracking,
because the search tree is usually large
and even simple observations can effectively
prune the search.
Especially useful are optimizations that
occur during the first steps of the algorithm,
i.e., at the top of the search tree.
\section{Meet in the middle}
\index{meet in the middle}
\key{Meet in the middle} is a technique
where the search space is divided into
two parts of about equal size.
A separate search is performed
for both of the parts,
and finally the results of the searches are combined.
The technique can be used
if there is an efficient way to combine the
results of the searches.
In such a situation, the two searches may require less
time than one large search.
Typically, we can turn a factor of $2^n$
into a factor of $2^{n/2}$ using the meet in the
middle technique.
As an example, consider a problem where
we are given a list of $n$ numbers and
a number $x$,
and we want to find out if it is possible
to choose some numbers from the list so that
their sum is $x$.
For example, given the list $[2,4,5,9]$ and $x=15$,
we can choose the numbers $[2,4,9]$ to get $2+4+9=15$.
However, if $x=10$ for the same list,
it is not possible to form the sum.
A simple algorithm to the problem is to
go through all subsets of the elements and
check if the sum of any of the subsets is $x$.
The running time of such an algorithm is $O(2^n)$,
because there are $2^n$ subsets.
However, using the meet in the middle technique,
we can achieve a more efficient $O(2^{n/2})$ time algorithm\footnote{This
idea was introduced in 1974 by E. Horowitz and S. Sahni \cite{hor74}.}.
Note that $O(2^n)$ and $O(2^{n/2})$ are different
complexities because $2^{n/2}$ equals $\sqrt{2^n}$.
The idea is to divide the list into
two lists $A$ and $B$ such that both
lists contain about half of the numbers.
The first search generates all subsets
of $A$ and stores their sums to a list $S_A$.
Correspondingly, the second search creates
a list $S_B$ from $B$.
After this, it suffices to check if it is possible
to choose one element from $S_A$ and another
element from $S_B$ such that their sum is $x$.
This is possible exactly when there is a way to
form the sum $x$ using the numbers of the original list.
For example, suppose that the list is $[2,4,5,9]$ and $x=15$.
First, we divide the list into $A=[2,4]$ and $B=[5,9]$.
After this, we create lists
$S_A=[0,2,4,6]$ and $S_B=[0,5,9,14]$.
In this case, the sum $x=15$ is possible to form,
because $S_A$ contains the sum $6$,
$S_B$ contains the sum $9$, and $6+9=15$.
This corresponds to the solution $[2,4,9]$.
We can implement the algorithm so that
its time complexity is $O(2^{n/2})$.
First, we generate \emph{sorted} lists $S_A$ and $S_B$,
which can be done in $O(2^{n/2})$ time using a merge-like technique.
After this, since the lists are sorted,
we can check in $O(2^{n/2})$ time if
the sum $x$ can be created from $S_A$ and $S_B$.

680
chapter06.tex Normal file
View File

@ -0,0 +1,680 @@
\chapter{Greedy algorithms}
\index{greedy algorithm}
A \key{greedy algorithm}
constructs a solution to the problem
by always making a choice that looks
the best at the moment.
A greedy algorithm never takes back
its choices, but directly constructs
the final solution.
For this reason, greedy algorithms
are usually very efficient.
The difficulty in designing greedy algorithms
is to find a greedy strategy
that always produces an optimal solution
to the problem.
The locally optimal choices in a greedy
algorithm should also be globally optimal.
It is often difficult to argue that
a greedy algorithm works.
\section{Coin problem}
As a first example, we consider a problem
where we are given a set of coins
and our task is to form a sum of money $n$
using the coins.
The values of the coins are
$\texttt{coins}=\{c_1,c_2,\ldots,c_k\}$,
and each coin can be used as many times we want.
What is the minimum number of coins needed?
For example, if the coins are the euro coins (in cents)
\[\{1,2,5,10,20,50,100,200\}\]
and $n=520$,
we need at least four coins.
The optimal solution is to select coins
$200+200+100+20$ whose sum is 520.
\subsubsection{Greedy algorithm}
A simple greedy algorithm to the problem
always selects the largest possible coin,
until the required sum of money has been constructed.
This algorithm works in the example case,
because we first select two 200 cent coins,
then one 100 cent coin and finally one 20 cent coin.
But does this algorithm always work?
It turns out that if the coins are the euro coins,
the greedy algorithm \emph{always} works, i.e.,
it always produces a solution with the fewest
possible number of coins.
The correctness of the algorithm can be
shown as follows:
First, each coin 1, 5, 10, 50 and 100 appears
at most once in an optimal solution,
because if the
solution would contain two such coins,
we could replace them by one coin and
obtain a better solution.
For example, if the solution would contain
coins $5+5$, we could replace them by coin $10$.
In the same way, coins 2 and 20 appear
at most twice in an optimal solution,
because we could replace
coins $2+2+2$ by coins $5+1$ and
coins $20+20+20$ by coins $50+10$.
Moreover, an optimal solution cannot contain
coins $2+2+1$ or $20+20+10$,
because we could replace them by coins $5$ and $50$.
Using these observations,
we can show for each coin $x$ that
it is not possible to optimally construct
a sum $x$ or any larger sum by only using coins
that are smaller than $x$.
For example, if $x=100$, the largest optimal
sum using the smaller coins is $50+20+20+5+2+2=99$.
Thus, the greedy algorithm that always selects
the largest coin produces the optimal solution.
This example shows that it can be difficult
to argue that a greedy algorithm works,
even if the algorithm itself is simple.
\subsubsection{General case}
In the general case, the coin set can contain any coins
and the greedy algorithm \emph{does not} necessarily produce
an optimal solution.
We can prove that a greedy algorithm does not work
by showing a counterexample
where the algorithm gives a wrong answer.
In this problem we can easily find a counterexample:
if the coins are $\{1,3,4\}$ and the target sum
is 6, the greedy algorithm produces the solution
$4+1+1$ while the optimal solution is $3+3$.
It is not known if the general coin problem
can be solved using any greedy algorithm\footnote{However, it is possible
to \emph{check} in polynomial time
if the greedy algorithm presented in this chapter works for
a given set of coins \cite{pea05}.}.
However, as we will see in Chapter 7,
in some cases,
the general problem can be efficiently
solved using a dynamic
programming algorithm that always gives the
correct answer.
\section{Scheduling}
Many scheduling problems can be solved
using greedy algorithms.
A classic problem is as follows:
Given $n$ events with their starting and ending
times, find a schedule
that includes as many events as possible.
It is not possible to select an event partially.
For example, consider the following events:
\begin{center}
\begin{tabular}{lll}
event & starting time & ending time \\
\hline
$A$ & 1 & 3 \\
$B$ & 2 & 5 \\
$C$ & 3 & 9 \\
$D$ & 6 & 8 \\
\end{tabular}
\end{center}
In this case the maximum number of events is two.
For example, we can select events $B$ and $D$
as follows:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw (2, 0) rectangle (6, -1);
\draw[fill=lightgray] (4, -1.5) rectangle (10, -2.5);
\draw (6, -3) rectangle (18, -4);
\draw[fill=lightgray] (12, -4.5) rectangle (16, -5.5);
\node at (2.5,-0.5) {$A$};
\node at (4.5,-2) {$B$};
\node at (6.5,-3.5) {$C$};
\node at (12.5,-5) {$D$};
\end{scope}
\end{tikzpicture}
\end{center}
It is possible to invent several greedy algorithms
for the problem, but which of them works in every case?
\subsubsection*{Algorithm 1}
The first idea is to select as \emph{short}
events as possible.
In the example case this algorithm
selects the following events:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw[fill=lightgray] (2, 0) rectangle (6, -1);
\draw (4, -1.5) rectangle (10, -2.5);
\draw (6, -3) rectangle (18, -4);
\draw[fill=lightgray] (12, -4.5) rectangle (16, -5.5);
\node at (2.5,-0.5) {$A$};
\node at (4.5,-2) {$B$};
\node at (6.5,-3.5) {$C$};
\node at (12.5,-5) {$D$};
\end{scope}
\end{tikzpicture}
\end{center}
However, selecting short events is not always
a correct strategy. For example, the algorithm fails
in the following case:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw (1, 0) rectangle (7, -1);
\draw[fill=lightgray] (6, -1.5) rectangle (9, -2.5);
\draw (8, -3) rectangle (14, -4);
\end{scope}
\end{tikzpicture}
\end{center}
If we select the short event, we can only select one event.
However, it would be possible to select both long events.
\subsubsection*{Algorithm 2}
Another idea is to always select the next possible
event that \emph{begins} as \emph{early} as possible.
This algorithm selects the following events:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw[fill=lightgray] (2, 0) rectangle (6, -1);
\draw (4, -1.5) rectangle (10, -2.5);
\draw[fill=lightgray] (6, -3) rectangle (18, -4);
\draw (12, -4.5) rectangle (16, -5.5);
\node at (2.5,-0.5) {$A$};
\node at (4.5,-2) {$B$};
\node at (6.5,-3.5) {$C$};
\node at (12.5,-5) {$D$};
\end{scope}
\end{tikzpicture}
\end{center}
However, we can find a counterexample
also for this algorithm.
For example, in the following case,
the algorithm only selects one event:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw[fill=lightgray] (1, 0) rectangle (14, -1);
\draw (3, -1.5) rectangle (7, -2.5);
\draw (8, -3) rectangle (12, -4);
\end{scope}
\end{tikzpicture}
\end{center}
If we select the first event, it is not possible
to select any other events.
However, it would be possible to select the
other two events.
\subsubsection*{Algorithm 3}
The third idea is to always select the next
possible event that \emph{ends} as \emph{early} as possible.
This algorithm selects the following events:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw[fill=lightgray] (2, 0) rectangle (6, -1);
\draw (4, -1.5) rectangle (10, -2.5);
\draw (6, -3) rectangle (18, -4);
\draw[fill=lightgray] (12, -4.5) rectangle (16, -5.5);
\node at (2.5,-0.5) {$A$};
\node at (4.5,-2) {$B$};
\node at (6.5,-3.5) {$C$};
\node at (12.5,-5) {$D$};
\end{scope}
\end{tikzpicture}
\end{center}
It turns out that this algorithm
\emph{always} produces an optimal solution.
The reason for this is that it is always an optimal choice
to first select an event that ends
as early as possible.
After this, it is an optimal choice
to select the next event
using the same strategy, etc.,
until we cannot select any more events.
One way to argue that the algorithm works
is to consider
what happens if we first select an event
that ends later than the event that ends
as early as possible.
Now, we will have at most an equal number of
choices how we can select the next event.
Hence, selecting an event that ends later
can never yield a better solution,
and the greedy algorithm is correct.
\section{Tasks and deadlines}
Let us now consider a problem where
we are given $n$ tasks with durations and deadlines
and our task is to choose an order to perform the tasks.
For each task, we earn $d-x$ points
where $d$ is the task's deadline
and $x$ is the moment when we finish the task.
What is the largest possible total score
we can obtain?
For example, suppose that the tasks are as follows:
\begin{center}
\begin{tabular}{lll}
task & duration & deadline \\
\hline
$A$ & 4 & 2 \\
$B$ & 3 & 5 \\
$C$ & 2 & 7 \\
$D$ & 4 & 5 \\
\end{tabular}
\end{center}
In this case, an optimal schedule for the tasks
is as follows:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw (0, 0) rectangle (4, -1);
\draw (4, 0) rectangle (10, -1);
\draw (10, 0) rectangle (18, -1);
\draw (18, 0) rectangle (26, -1);
\node at (0.5,-0.5) {$C$};
\node at (4.5,-0.5) {$B$};
\node at (10.5,-0.5) {$A$};
\node at (18.5,-0.5) {$D$};
\draw (0,1.5) -- (26,1.5);
\foreach \i in {0,2,...,26}
{
\draw (\i,1.25) -- (\i,1.75);
}
\footnotesize
\node at (0,2.5) {0};
\node at (10,2.5) {5};
\node at (20,2.5) {10};
\end{scope}
\end{tikzpicture}
\end{center}
In this solution, $C$ yields 5 points,
$B$ yields 0 points, $A$ yields $-7$ points
and $D$ yields $-8$ points,
so the total score is $-10$.
Surprisingly, the optimal solution to the problem
does not depend on the deadlines at all,
but a correct greedy strategy is to simply
perform the tasks \emph{sorted by their durations}
in increasing order.
The reason for this is that if we ever perform
two tasks one after another such that the first task
takes longer than the second task,
we can obtain a better solution if we swap the tasks.
For example, consider the following schedule:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw (0, 0) rectangle (8, -1);
\draw (8, 0) rectangle (12, -1);
\node at (0.5,-0.5) {$X$};
\node at (8.5,-0.5) {$Y$};
\draw [decoration={brace}, decorate, line width=0.3mm] (7.75,-1.5) -- (0.25,-1.5);
\draw [decoration={brace}, decorate, line width=0.3mm] (11.75,-1.5) -- (8.25,-1.5);
\footnotesize
\node at (4,-2.5) {$a$};
\node at (10,-2.5) {$b$};
\end{scope}
\end{tikzpicture}
\end{center}
Here $a>b$, so we should swap the tasks:
\begin{center}
\begin{tikzpicture}[scale=.4]
\begin{scope}
\draw (0, 0) rectangle (4, -1);
\draw (4, 0) rectangle (12, -1);
\node at (0.5,-0.5) {$Y$};
\node at (4.5,-0.5) {$X$};
\draw [decoration={brace}, decorate, line width=0.3mm] (3.75,-1.5) -- (0.25,-1.5);
\draw [decoration={brace}, decorate, line width=0.3mm] (11.75,-1.5) -- (4.25,-1.5);
\footnotesize
\node at (2,-2.5) {$b$};
\node at (8,-2.5) {$a$};
\end{scope}
\end{tikzpicture}
\end{center}
Now $X$ gives $b$ points less and $Y$ gives $a$ points more,
so the total score increases by $a-b > 0$.
In an optimal solution,
for any two consecutive tasks,
it must hold that the shorter task comes
before the longer task.
Thus, the tasks must be performed
sorted by their durations.
\section{Minimizing sums}
We next consider a problem where
we are given $n$ numbers $a_1,a_2,\ldots,a_n$
and our task is to find a value $x$
that minimizes the sum
\[|a_1-x|^c+|a_2-x|^c+\cdots+|a_n-x|^c.\]
We focus on the cases $c=1$ and $c=2$.
\subsubsection{Case $c=1$}
In this case, we should minimize the sum
\[|a_1-x|+|a_2-x|+\cdots+|a_n-x|.\]
For example, if the numbers are $[1,2,9,2,6]$,
the best solution is to select $x=2$
which produces the sum
\[
|1-2|+|2-2|+|9-2|+|2-2|+|6-2|=12.
\]
In the general case, the best choice for $x$
is the \textit{median} of the numbers,
i.e., the middle number after sorting.
For example, the list $[1,2,9,2,6]$
becomes $[1,2,2,6,9]$ after sorting,
so the median is 2.
The median is an optimal choice,
because if $x$ is smaller than the median,
the sum becomes smaller by increasing $x$,
and if $x$ is larger then the median,
the sum becomes smaller by decreasing $x$.
Hence, the optimal solution is that $x$
is the median.
If $n$ is even and there are two medians,
both medians and all values between them
are optimal choices.
\subsubsection{Case $c=2$}
In this case, we should minimize the sum
\[(a_1-x)^2+(a_2-x)^2+\cdots+(a_n-x)^2.\]
For example, if the numbers are $[1,2,9,2,6]$,
the best solution is to select $x=4$
which produces the sum
\[
(1-4)^2+(2-4)^2+(9-4)^2+(2-4)^2+(6-4)^2=46.
\]
In the general case, the best choice for $x$
is the \emph{average} of the numbers.
In the example the average is $(1+2+9+2+6)/5=4$.
This result can be derived by presenting
the sum as follows:
\[
nx^2 - 2x(a_1+a_2+\cdots+a_n) + (a_1^2+a_2^2+\cdots+a_n^2)
\]
The last part does not depend on $x$,
so we can ignore it.
The remaining parts form a function
$nx^2-2xs$ where $s=a_1+a_2+\cdots+a_n$.
This is a parabola opening upwards
with roots $x=0$ and $x=2s/n$,
and the minimum value is the average
of the roots $x=s/n$, i.e.,
the average of the numbers $a_1,a_2,\ldots,a_n$.
\section{Data compression}
\index{data compression}
\index{binary code}
\index{codeword}
A \key{binary code} assigns for each character
of a string a \key{codeword} that consists of bits.
We can \emph{compress} the string using the binary code
by replacing each character by the
corresponding codeword.
For example, the following binary code
assigns codewords for characters
\texttt{A}\texttt{D}:
\begin{center}
\begin{tabular}{rr}
character & codeword \\
\hline
\texttt{A} & 00 \\
\texttt{B} & 01 \\
\texttt{C} & 10 \\
\texttt{D} & 11 \\
\end{tabular}
\end{center}
This is a \key{constant-length} code
which means that the length of each
codeword is the same.
For example, we can compress the string
\texttt{AABACDACA} as follows:
\[00\,00\,01\,00\,10\,11\,00\,10\,00\]
Using this code, the length of the compressed
string is 18 bits.
However, we can compress the string better
if we use a \key{variable-length} code
where codewords may have different lengths.
Then we can give short codewords for
characters that appear often
and long codewords for characters
that appear rarely.
It turns out that an \key{optimal} code
for the above string is as follows:
\begin{center}
\begin{tabular}{rr}
character & codeword \\
\hline
\texttt{A} & 0 \\
\texttt{B} & 110 \\
\texttt{C} & 10 \\
\texttt{D} & 111 \\
\end{tabular}
\end{center}
An optimal code produces a compressed string
that is as short as possible.
In this case, the compressed string using
the optimal code is
\[0\,0\,110\,0\,10\,111\,0\,10\,0,\]
so only 15 bits are needed instead of 18 bits.
Thus, thanks to a better code it was possible to
save 3 bits in the compressed string.
We require that no codeword
is a prefix of another codeword.
For example, it is not allowed that a code
would contain both codewords 10
and 1011.
The reason for this is that we want
to be able to generate the original string
from the compressed string.
If a codeword could be a prefix of another codeword,
this would not always be possible.
For example, the following code is \emph{not} valid:
\begin{center}
\begin{tabular}{rr}
character & codeword \\
\hline
\texttt{A} & 10 \\
\texttt{B} & 11 \\
\texttt{C} & 1011 \\
\texttt{D} & 111 \\
\end{tabular}
\end{center}
Using this code, it would not be possible to know
if the compressed string 1011 corresponds to
the string \texttt{AB} or the string \texttt{C}.
\index{Huffman coding}
\subsubsection{Huffman coding}
\key{Huffman coding}\footnote{D. A. Huffman discovered this method
when solving a university course assignment
and published the algorithm in 1952 \cite{huf52}.} is a greedy algorithm
that constructs an optimal code for
compressing a given string.
The algorithm builds a binary tree
based on the frequencies of the characters
in the string,
and each character's codeword can be read
by following a path from the root to
the corresponding node.
A move to the left corresponds to bit 0,
and a move to the right corresponds to bit 1.
Initially, each character of the string is
represented by a node whose weight is the
number of times the character occurs in the string.
Then at each step two nodes with minimum weights
are combined by creating
a new node whose weight is the sum of the weights
of the original nodes.
The process continues until all nodes have been combined.
Next we will see how Huffman coding creates
the optimal code for the string
\texttt{AABACDACA}.
Initially, there are four nodes that correspond
to the characters of the string:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$5$};
\node[draw, circle] (2) at (2,0) {$1$};
\node[draw, circle] (3) at (4,0) {$2$};
\node[draw, circle] (4) at (6,0) {$1$};
\node[color=blue] at (0,-0.75) {\texttt{A}};
\node[color=blue] at (2,-0.75) {\texttt{B}};
\node[color=blue] at (4,-0.75) {\texttt{C}};
\node[color=blue] at (6,-0.75) {\texttt{D}};
%\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
The node that represents character \texttt{A}
has weight 5 because character \texttt{A}
appears 5 times in the string.
The other weights have been calculated
in the same way.
The first step is to combine the nodes that
correspond to characters \texttt{B} and \texttt{D},
both with weight 1.
The result is:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$5$};
\node[draw, circle] (3) at (2,0) {$2$};
\node[draw, circle] (2) at (4,0) {$1$};
\node[draw, circle] (4) at (6,0) {$1$};
\node[draw, circle] (5) at (5,1) {$2$};
\node[color=blue] at (0,-0.75) {\texttt{A}};
\node[color=blue] at (2,-0.75) {\texttt{C}};
\node[color=blue] at (4,-0.75) {\texttt{B}};
\node[color=blue] at (6,-0.75) {\texttt{D}};
\node at (4.3,0.7) {0};
\node at (5.7,0.7) {1};
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
After this, the nodes with weight 2 are combined:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,0) {$5$};
\node[draw, circle] (3) at (3,1) {$2$};
\node[draw, circle] (2) at (4,0) {$1$};
\node[draw, circle] (4) at (6,0) {$1$};
\node[draw, circle] (5) at (5,1) {$2$};
\node[draw, circle] (6) at (4,2) {$4$};
\node[color=blue] at (1,-0.75) {\texttt{A}};
\node[color=blue] at (3,1-0.75) {\texttt{C}};
\node[color=blue] at (4,-0.75) {\texttt{B}};
\node[color=blue] at (6,-0.75) {\texttt{D}};
\node at (4.3,0.7) {0};
\node at (5.7,0.7) {1};
\node at (3.3,1.7) {0};
\node at (4.7,1.7) {1};
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\end{tikzpicture}
\end{center}
Finally, the two remaining nodes are combined:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (2,2) {$5$};
\node[draw, circle] (3) at (3,1) {$2$};
\node[draw, circle] (2) at (4,0) {$1$};
\node[draw, circle] (4) at (6,0) {$1$};
\node[draw, circle] (5) at (5,1) {$2$};
\node[draw, circle] (6) at (4,2) {$4$};
\node[draw, circle] (7) at (3,3) {$9$};
\node[color=blue] at (2,2-0.75) {\texttt{A}};
\node[color=blue] at (3,1-0.75) {\texttt{C}};
\node[color=blue] at (4,-0.75) {\texttt{B}};
\node[color=blue] at (6,-0.75) {\texttt{D}};
\node at (4.3,0.7) {0};
\node at (5.7,0.7) {1};
\node at (3.3,1.7) {0};
\node at (4.7,1.7) {1};
\node at (2.3,2.7) {0};
\node at (3.7,2.7) {1};
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (7);
\path[draw,thick,-] (6) -- (7);
\end{tikzpicture}
\end{center}
Now all nodes are in the tree, so the code is ready.
The following codewords can be read from the tree:
\begin{center}
\begin{tabular}{rr}
character & codeword \\
\hline
\texttt{A} & 0 \\
\texttt{B} & 110 \\
\texttt{C} & 10 \\
\texttt{D} & 111 \\
\end{tabular}
\end{center}

1049
chapter07.tex Normal file

File diff suppressed because it is too large Load Diff

732
chapter08.tex Normal file
View File

@ -0,0 +1,732 @@
\chapter{Amortized analysis}
\index{amortized analysis}
The time complexity of an algorithm
is often easy to analyze
just by examining the structure
of the algorithm:
what loops does the algorithm contain
and how many times the loops are performed.
However, sometimes a straightforward analysis
does not give a true picture of the efficiency of the algorithm.
\key{Amortized analysis} can be used to analyze
algorithms that contain operations whose
time complexity varies.
The idea is to estimate the total time used to
all such operations during the
execution of the algorithm, instead of focusing
on individual operations.
\section{Two pointers method}
\index{two pointers method}
In the \key{two pointers method},
two pointers are used to
iterate through the array values.
Both pointers can move to one direction only,
which ensures that the algorithm works efficiently.
Next we discuss two problems that can be solved
using the two pointers method.
\subsubsection{Subarray sum}
As the first example,
consider a problem where we are
given an array of $n$ positive integers
and a target sum $x$,
and we want to find a subarray whose sum is $x$
or report that there is no such subarray.
For example, the array
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\end{tikzpicture}
\end{center}
contains a subarray whose sum is 8:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\end{tikzpicture}
\end{center}
This problem can be solved in
$O(n)$ time by using the two pointers method.
The idea is to maintain pointers that point to the
first and last value of a subarray.
On each turn, the left pointer moves one step
to the right, and the right pointer moves to the right
as long as the resulting subarray sum is at most $x$.
If the sum becomes exactly $x$,
a solution has been found.
As an example, consider the following array
and a target sum $x=8$:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\end{tikzpicture}
\end{center}
The initial subarray contains the values
1, 3 and 2 whose sum is 6:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (0,0) rectangle (3,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\draw[thick,->] (0.5,-0.7) -- (0.5,-0.1);
\draw[thick,->] (2.5,-0.7) -- (2.5,-0.1);
\end{tikzpicture}
\end{center}
Then, the left pointer moves one step to the right.
The right pointer does not move, because otherwise
the subarray sum would exceed $x$.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (3,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\draw[thick,->] (1.5,-0.7) -- (1.5,-0.1);
\draw[thick,->] (2.5,-0.7) -- (2.5,-0.1);
\end{tikzpicture}
\end{center}
Again, the left pointer moves one step to the right,
and this time the right pointer moves three
steps to the right.
The subarray sum is $2+5+1=8$, so a subarray
whose sum is $x$ has been found.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$1$};
\node at (5.5,0.5) {$1$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\draw[thick,->] (2.5,-0.7) -- (2.5,-0.1);
\draw[thick,->] (4.5,-0.7) -- (4.5,-0.1);
\end{tikzpicture}
\end{center}
The running time of the algorithm depends on
the number of steps the right pointer moves.
While there is no useful upper bound on how many steps the
pointer can move on a \emph{single} turn.
we know that the pointer moves \emph{a total of}
$O(n)$ steps during the algorithm,
because it only moves to the right.
Since both the left and right pointer
move $O(n)$ steps during the algorithm,
the algorithm works in $O(n)$ time.
\subsubsection{2SUM problem}
\index{2SUM problem}
Another problem that can be solved using
the two pointers method is the following problem,
also known as the \key{2SUM problem}:
given an array of $n$ numbers and
a target sum $x$, find
two array values such that their sum is $x$,
or report that no such values exist.
To solve the problem, we first
sort the array values in increasing order.
After that, we iterate through the array using
two pointers.
The left pointer starts at the first value
and moves one step to the right on each turn.
The right pointer begins at the last value
and always moves to the left until the sum of the
left and right value is at most $x$.
If the sum is exactly $x$,
a solution has been found.
For example, consider the following array
and a target sum $x=12$:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$5$};
\node at (3.5,0.5) {$6$};
\node at (4.5,0.5) {$7$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$10$};
\end{tikzpicture}
\end{center}
The initial positions of the pointers
are as follows.
The sum of the values is $1+10=11$
that is smaller than $x$.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (0,0) rectangle (1,1);
\fill[color=lightgray] (7,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$5$};
\node at (3.5,0.5) {$6$};
\node at (4.5,0.5) {$7$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$10$};
\draw[thick,->] (0.5,-0.7) -- (0.5,-0.1);
\draw[thick,->] (7.5,-0.7) -- (7.5,-0.1);
\end{tikzpicture}
\end{center}
Then the left pointer moves one step to the right.
The right pointer moves three steps to the left,
and the sum becomes $4+7=11$.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (2,1);
\fill[color=lightgray] (4,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$5$};
\node at (3.5,0.5) {$6$};
\node at (4.5,0.5) {$7$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$10$};
\draw[thick,->] (1.5,-0.7) -- (1.5,-0.1);
\draw[thick,->] (4.5,-0.7) -- (4.5,-0.1);
\end{tikzpicture}
\end{center}
After this, the left pointer moves one step to the right again.
The right pointer does not move, and a solution
$5+7=12$ has been found.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1);
\fill[color=lightgray] (4,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$4$};
\node at (2.5,0.5) {$5$};
\node at (3.5,0.5) {$6$};
\node at (4.5,0.5) {$7$};
\node at (5.5,0.5) {$9$};
\node at (6.5,0.5) {$9$};
\node at (7.5,0.5) {$10$};
\draw[thick,->] (2.5,-0.7) -- (2.5,-0.1);
\draw[thick,->] (4.5,-0.7) -- (4.5,-0.1);
\end{tikzpicture}
\end{center}
The running time of the algorithm is
$O(n \log n)$, because it first sorts
the array in $O(n \log n)$ time,
and then both pointers move $O(n)$ steps.
Note that it is possible to solve the problem
in another way in $O(n \log n)$ time using binary search.
In such a solution, we iterate through the array
and for each array value, we try to find another
value that yields the sum $x$.
This can be done by performing $n$ binary searches,
each of which takes $O(\log n)$ time.
\index{3SUM problem}
A more difficult problem is
the \key{3SUM problem} that asks to
find \emph{three} array values
whose sum is $x$.
Using the idea of the above algorithm,
this problem can be solved in $O(n^2)$ time\footnote{For a long time,
it was thought that solving
the 3SUM problem more efficiently than in $O(n^2)$ time
would not be possible.
However, in 2014, it turned out \cite{gro14}
that this is not the case.}.
Can you see how?
\section{Nearest smaller elements}
\index{nearest smaller elements}
Amortized analysis is often used to
estimate the number of operations
performed on a data structure.
The operations may be distributed unevenly so
that most operations occur during a
certain phase of the algorithm, but the total
number of the operations is limited.
As an example, consider the problem
of finding for each array element
the \key{nearest smaller element}, i.e.,
the first smaller element that precedes the element
in the array.
It is possible that no such element exists,
in which case the algorithm should report this.
Next we will see how the problem can be
efficiently solved using a stack structure.
We go through the array from left to right
and maintain a stack of array elements.
At each array position, we remove elements from the stack
until the top element is smaller than the
current element, or the stack is empty.
Then, we report that the top element is
the nearest smaller element of the current element,
or if the stack is empty, there is no such element.
Finally, we add the current element to the stack.
As an example, consider the following array:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\end{tikzpicture}
\end{center}
First, the elements 1, 3 and 4 are added to the stack,
because each element is larger than the previous element.
Thus, the nearest smaller element of 4 is 3,
and the nearest smaller element of 3 is 1.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (3,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\draw (0.2,0.2-1.2) rectangle (0.8,0.8-1.2);
\draw (1.2,0.2-1.2) rectangle (1.8,0.8-1.2);
\draw (2.2,0.2-1.2) rectangle (2.8,0.8-1.2);
\node at (0.5,0.5-1.2) {$1$};
\node at (1.5,0.5-1.2) {$3$};
\node at (2.5,0.5-1.2) {$4$};
\draw[->,thick] (0.8,0.5-1.2) -- (1.2,0.5-1.2);
\draw[->,thick] (1.8,0.5-1.2) -- (2.2,0.5-1.2);
\end{tikzpicture}
\end{center}
The next element 2 is smaller than the two top
elements in the stack.
Thus, the elements 3 and 4 are removed from the stack,
and then the element 2 is added to the stack.
Its nearest smaller element is 1:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (4,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\draw (0.2,0.2-1.2) rectangle (0.8,0.8-1.2);
\draw (3.2,0.2-1.2) rectangle (3.8,0.8-1.2);
\node at (0.5,0.5-1.2) {$1$};
\node at (3.5,0.5-1.2) {$2$};
\draw[->,thick] (0.8,0.5-1.2) -- (3.2,0.5-1.2);
\end{tikzpicture}
\end{center}
Then, the element 5 is larger than the element 2,
so it will be added to the stack, and
its nearest smaller element is 2:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (4,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\draw (0.2,0.2-1.2) rectangle (0.8,0.8-1.2);
\draw (3.2,0.2-1.2) rectangle (3.8,0.8-1.2);
\draw (4.2,0.2-1.2) rectangle (4.8,0.8-1.2);
\node at (0.5,0.5-1.2) {$1$};
\node at (3.5,0.5-1.2) {$2$};
\node at (4.5,0.5-1.2) {$5$};
\draw[->,thick] (0.8,0.5-1.2) -- (3.2,0.5-1.2);
\draw[->,thick] (3.8,0.5-1.2) -- (4.2,0.5-1.2);
\end{tikzpicture}
\end{center}
After this, the element 5 is removed from the stack
and the elements 3 and 4 are added to the stack:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (6,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\draw (0.2,0.2-1.2) rectangle (0.8,0.8-1.2);
\draw (3.2,0.2-1.2) rectangle (3.8,0.8-1.2);
\draw (5.2,0.2-1.2) rectangle (5.8,0.8-1.2);
\draw (6.2,0.2-1.2) rectangle (6.8,0.8-1.2);
\node at (0.5,0.5-1.2) {$1$};
\node at (3.5,0.5-1.2) {$2$};
\node at (5.5,0.5-1.2) {$3$};
\node at (6.5,0.5-1.2) {$4$};
\draw[->,thick] (0.8,0.5-1.2) -- (3.2,0.5-1.2);
\draw[->,thick] (3.8,0.5-1.2) -- (5.2,0.5-1.2);
\draw[->,thick] (5.8,0.5-1.2) -- (6.2,0.5-1.2);
\end{tikzpicture}
\end{center}
Finally, all elements except 1 are removed
from the stack and the last element 2
is added to the stack:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (7,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$1$};
\node at (1.5,0.5) {$3$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$2$};
\node at (4.5,0.5) {$5$};
\node at (5.5,0.5) {$3$};
\node at (6.5,0.5) {$4$};
\node at (7.5,0.5) {$2$};
\draw (0.2,0.2-1.2) rectangle (0.8,0.8-1.2);
\draw (7.2,0.2-1.2) rectangle (7.8,0.8-1.2);
\node at (0.5,0.5-1.2) {$1$};
\node at (7.5,0.5-1.2) {$2$};
\draw[->,thick] (0.8,0.5-1.2) -- (7.2,0.5-1.2);
\end{tikzpicture}
\end{center}
The efficiency of the algorithm depends on
the total number of stack operations.
If the current element is larger than
the top element in the stack, it is directly
added to the stack, which is efficient.
However, sometimes the stack can contain several
larger elements and it takes time to remove them.
Still, each element is added \emph{exactly once} to the stack
and removed \emph{at most once} from the stack.
Thus, each element causes $O(1)$ stack operations,
and the algorithm works in $O(n)$ time.
\section{Sliding window minimum}
\index{sliding window}
\index{sliding window minimum}
A \key{sliding window} is a constant-size subarray
that moves from left to right through the array.
At each window position,
we want to calculate some information
about the elements inside the window.
In this section, we focus on the problem
of maintaining the \key{sliding window minimum},
which means that
we should report the smallest value inside each window.
The sliding window minimum can be calculated
using a similar idea that we used to calculate
the nearest smaller elements.
We maintain a queue
where each element is larger than
the previous element,
and the first element
always corresponds to the minimum element inside the window.
After each window move,
we remove elements from the end of the queue
until the last queue element
is smaller than the new window element,
or the queue becomes empty.
We also remove the first queue element
if it is not inside the window anymore.
Finally, we add the new window element
to the end of the queue.
As an example, consider the following array:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\end{tikzpicture}
\end{center}
Suppose that the size of the sliding window is 4.
At the first window position, the smallest value is 1:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (0,0) rectangle (4,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\draw (1.2,0.2-1.2) rectangle (1.8,0.8-1.2);
\draw (2.2,0.2-1.2) rectangle (2.8,0.8-1.2);
\draw (3.2,0.2-1.2) rectangle (3.8,0.8-1.2);
\node at (1.5,0.5-1.2) {$1$};
\node at (2.5,0.5-1.2) {$4$};
\node at (3.5,0.5-1.2) {$5$};
\draw[->,thick] (1.8,0.5-1.2) -- (2.2,0.5-1.2);
\draw[->,thick] (2.8,0.5-1.2) -- (3.2,0.5-1.2);
\end{tikzpicture}
\end{center}
Then the window moves one step right.
The new element 3 is smaller than the elements
4 and 5 in the queue, so the elements 4 and 5
are removed from the queue
and the element 3 is added to the queue.
The smallest value is still 1.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\draw (1.2,0.2-1.2) rectangle (1.8,0.8-1.2);
\draw (4.2,0.2-1.2) rectangle (4.8,0.8-1.2);
\node at (1.5,0.5-1.2) {$1$};
\node at (4.5,0.5-1.2) {$3$};
\draw[->,thick] (1.8,0.5-1.2) -- (4.2,0.5-1.2);
\end{tikzpicture}
\end{center}
After this, the window moves again,
and the smallest element 1
does not belong to the window anymore.
Thus, it is removed from the queue and the smallest
value is now 3. Also the new element 4
is added to the queue.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (6,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\draw (4.2,0.2-1.2) rectangle (4.8,0.8-1.2);
\draw (5.2,0.2-1.2) rectangle (5.8,0.8-1.2);
\node at (4.5,0.5-1.2) {$3$};
\node at (5.5,0.5-1.2) {$4$};
\draw[->,thick] (4.8,0.5-1.2) -- (5.2,0.5-1.2);
\end{tikzpicture}
\end{center}
The next new element 1 is smaller than all elements
in the queue.
Thus, all elements are removed from the queue
and it will only contain the element 1:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (7,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\draw (6.2,0.2-1.2) rectangle (6.8,0.8-1.2);
\node at (6.5,0.5-1.2) {$1$};
\end{tikzpicture}
\end{center}
Finally the window reaches its last position.
The element 2 is added to the queue,
but the smallest value inside the window
is still 1.
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (4,0) rectangle (8,1);
\draw (0,0) grid (8,1);
\node at (0.5,0.5) {$2$};
\node at (1.5,0.5) {$1$};
\node at (2.5,0.5) {$4$};
\node at (3.5,0.5) {$5$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$4$};
\node at (6.5,0.5) {$1$};
\node at (7.5,0.5) {$2$};
\draw (6.2,0.2-1.2) rectangle (6.8,0.8-1.2);
\draw (7.2,0.2-1.2) rectangle (7.8,0.8-1.2);
\node at (6.5,0.5-1.2) {$1$};
\node at (7.5,0.5-1.2) {$2$};
\draw[->,thick] (6.8,0.5-1.2) -- (7.2,0.5-1.2);
\end{tikzpicture}
\end{center}
Since each array element
is added to the queue exactly once and
removed from the queue at most once,
the algorithm works in $O(n)$ time.

1403
chapter09.tex Normal file

File diff suppressed because it is too large Load Diff

849
chapter10.tex Normal file
View File

@ -0,0 +1,849 @@
\chapter{Bit manipulation}
All data in computer programs is internally stored as bits,
i.e., as numbers 0 and 1.
This chapter discusses the bit representation
of integers, and shows examples
of how to use bit operations.
It turns out that there are many uses for
bit manipulation in algorithm programming.
\section{Bit representation}
\index{bit representation}
In programming, an $n$ bit integer is internally
stored as a binary number that consists of $n$ bits.
For example, the C++ type \texttt{int} is
a 32-bit type, which means that every \texttt{int}
number consists of 32 bits.
Here is the bit representation of
the \texttt{int} number 43:
\[00000000000000000000000000101011\]
The bits in the representation are indexed from right to left.
To convert a bit representation $b_k \cdots b_2 b_1 b_0$ into a number,
we can use the formula
\[b_k 2^k + \ldots + b_2 2^2 + b_1 2^1 + b_0 2^0.\]
For example,
\[1 \cdot 2^5 + 1 \cdot 2^3 + 1 \cdot 2^1 + 1 \cdot 2^0 = 43.\]
The bit representation of a number is either
\key{signed} or \key{unsigned}.
Usually a signed representation is used,
which means that both negative and positive
numbers can be represented.
A signed variable of $n$ bits can contain any
integer between $-2^{n-1}$ and $2^{n-1}-1$.
For example, the \texttt{int} type in C++ is
a signed type, so an \texttt{int} variable can contain any
integer between $-2^{31}$ and $2^{31}-1$.
The first bit in a signed representation
is the sign of the number (0 for nonnegative numbers
and 1 for negative numbers), and
the remaining $n-1$ bits contain the magnitude of the number.
\key{Two's complement} is used, which means that the
opposite number of a number is calculated by first
inverting all the bits in the number,
and then increasing the number by one.
For example, the bit representation of
the \texttt{int} number $-43$ is
\[11111111111111111111111111010101.\]
In an unsigned representation, only nonnegative
numbers can be used, but the upper bound for the values is larger.
An unsigned variable of $n$ bits can contain any
integer between $0$ and $2^n-1$.
For example, in C++, an \texttt{unsigned int} variable
can contain any integer between $0$ and $2^{32}-1$.
There is a connection between the
representations:
a signed number $-x$ equals an unsigned number $2^n-x$.
For example, the following code shows that
the signed number $x=-43$ equals the unsigned
number $y=2^{32}-43$:
\begin{lstlisting}
int x = -43;
unsigned int y = x;
cout << x << "\n"; // -43
cout << y << "\n"; // 4294967253
\end{lstlisting}
If a number is larger than the upper bound
of the bit representation, the number will overflow.
In a signed representation,
the next number after $2^{n-1}-1$ is $-2^{n-1}$,
and in an unsigned representation,
the next number after $2^n-1$ is $0$.
For example, consider the following code:
\begin{lstlisting}
int x = 2147483647
cout << x << "\n"; // 2147483647
x++;
cout << x << "\n"; // -2147483648
\end{lstlisting}
Initially, the value of $x$ is $2^{31}-1$.
This is the largest value that can be stored
in an \texttt{int} variable,
so the next number after $2^{31}-1$ is $-2^{31}$.
\section{Bit operations}
\newcommand\XOR{\mathbin{\char`\^}}
\subsubsection{And operation}
\index{and operation}
The \key{and} operation $x$ \& $y$ produces a number
that has one bits in positions where both
$x$ and $y$ have one bits.
For example, $22$ \& $26$ = 18, because
\begin{center}
\begin{tabular}{rrr}
& 10110 & (22)\\
\& & 11010 & (26) \\
\hline
= & 10010 & (18) \\
\end{tabular}
\end{center}
Using the and operation, we can check if a number
$x$ is even because
$x$ \& $1$ = 0 if $x$ is even, and
$x$ \& $1$ = 1 if $x$ is odd.
More generally, $x$ is divisible by $2^k$
exactly when $x$ \& $(2^k-1)$ = 0.
\subsubsection{Or operation}
\index{or operation}
The \key{or} operation $x$ | $y$ produces a number
that has one bits in positions where at least one
of $x$ and $y$ have one bits.
For example, $22$ | $26$ = 30, because
\begin{center}
\begin{tabular}{rrr}
& 10110 & (22)\\
| & 11010 & (26) \\
\hline
= & 11110 & (30) \\
\end{tabular}
\end{center}
\subsubsection{Xor operation}
\index{xor operation}
The \key{xor} operation $x$ $\XOR$ $y$ produces a number
that has one bits in positions where exactly one
of $x$ and $y$ have one bits.
For example, $22$ $\XOR$ $26$ = 12, because
\begin{center}
\begin{tabular}{rrr}
& 10110 & (22)\\
$\XOR$ & 11010 & (26) \\
\hline
= & 01100 & (12) \\
\end{tabular}
\end{center}
\subsubsection{Not operation}
\index{not operation}
The \key{not} operation \textasciitilde$x$
produces a number where all the bits of $x$
have been inverted.
The formula \textasciitilde$x = -x-1$ holds,
for example, \textasciitilde$29 = -30$.
The result of the not operation at the bit level
depends on the length of the bit representation,
because the operation inverts all bits.
For example, if the numbers are 32-bit
\texttt{int} numbers, the result is as follows:
\begin{center}
\begin{tabular}{rrrr}
$x$ & = & 29 & 00000000000000000000000000011101 \\
\textasciitilde$x$ & = & $-30$ & 11111111111111111111111111100010 \\
\end{tabular}
\end{center}
\subsubsection{Bit shifts}
\index{bit shift}
The left bit shift $x < < k$ appends $k$
zero bits to the number,
and the right bit shift $x > > k$
removes the $k$ last bits from the number.
For example, $14 < < 2 = 56$,
because $14$ and $56$ correspond to 1110 and 111000.
Similarly, $49 > > 3 = 6$,
because $49$ and $6$ correspond to 110001 and 110.
Note that $x < < k$
corresponds to multiplying $x$ by $2^k$,
and $x > > k$
corresponds to dividing $x$ by $2^k$
rounded down to an integer.
\subsubsection{Applications}
A number of the form $1 < < k$ has a one bit
in position $k$ and all other bits are zero,
so we can use such numbers to access single bits of numbers.
In particular, the $k$th bit of a number is one
exactly when $x$ \& $(1 < < k)$ is not zero.
The following code prints the bit representation
of an \texttt{int} number $x$:
\begin{lstlisting}
for (int i = 31; i >= 0; i--) {
if (x&(1<<i)) cout << "1";
else cout << "0";
}
\end{lstlisting}
It is also possible to modify single bits
of numbers using similar ideas.
For example, the formula $x$ | $(1 < < k)$
sets the $k$th bit of $x$ to one,
the formula
$x$ \& \textasciitilde $(1 < < k)$
sets the $k$th bit of $x$ to zero,
and the formula
$x$ $\XOR$ $(1 < < k)$
inverts the $k$th bit of $x$.
The formula $x$ \& $(x-1)$ sets the last
one bit of $x$ to zero,
and the formula $x$ \& $-x$ sets all the
one bits to zero, except for the last one bit.
The formula $x$ | $(x-1)$
inverts all the bits after the last one bit.
Also note that a positive number $x$ is
a power of two exactly when $x$ \& $(x-1) = 0$.
\subsubsection*{Additional functions}
The g++ compiler provides the following
functions for counting bits:
\begin{itemize}
\item
$\texttt{\_\_builtin\_clz}(x)$:
the number of zeros at the beginning of the number
\item
$\texttt{\_\_builtin\_ctz}(x)$:
the number of zeros at the end of the number
\item
$\texttt{\_\_builtin\_popcount}(x)$:
the number of ones in the number
\item
$\texttt{\_\_builtin\_parity}(x)$:
the parity (even or odd) of the number of ones
\end{itemize}
\begin{samepage}
The functions can be used as follows:
\begin{lstlisting}
int x = 5328; // 00000000000000000001010011010000
cout << __builtin_clz(x) << "\n"; // 19
cout << __builtin_ctz(x) << "\n"; // 4
cout << __builtin_popcount(x) << "\n"; // 5
cout << __builtin_parity(x) << "\n"; // 1
\end{lstlisting}
\end{samepage}
While the above functions only support \texttt{int} numbers,
there are also \texttt{long long} versions of
the functions available with the suffix \texttt{ll}.
\section{Representing sets}
Every subset of a set
$\{0,1,2,\ldots,n-1\}$
can be represented as an $n$ bit integer
whose one bits indicate which
elements belong to the subset.
This is an efficient way to represent sets,
because every element requires only one bit of memory,
and set operations can be implemented as bit operations.
For example, since \texttt{int} is a 32-bit type,
an \texttt{int} number can represent any subset
of the set $\{0,1,2,\ldots,31\}$.
The bit representation of the set $\{1,3,4,8\}$ is
\[00000000000000000000000100011010,\]
which corresponds to the number $2^8+2^4+2^3+2^1=282$.
\subsubsection{Set implementation}
The following code declares an \texttt{int}
variable $x$ that can contain
a subset of $\{0,1,2,\ldots,31\}$.
After this, the code adds the elements 1, 3, 4 and 8
to the set and prints the size of the set.
\begin{lstlisting}
int x = 0;
x |= (1<<1);
x |= (1<<3);
x |= (1<<4);
x |= (1<<8);
cout << __builtin_popcount(x) << "\n"; // 4
\end{lstlisting}
Then, the following code prints all
elements that belong to the set:
\begin{lstlisting}
for (int i = 0; i < 32; i++) {
if (x&(1<<i)) cout << i << " ";
}
// output: 1 3 4 8
\end{lstlisting}
\subsubsection{Set operations}
Set operations can be implemented as follows as bit operations:
\begin{center}
\begin{tabular}{lll}
& set syntax & bit syntax \\
\hline
intersection & $a \cap b$ & $a$ \& $b$ \\
union & $a \cup b$ & $a$ | $b$ \\
complement & $\bar a$ & \textasciitilde$a$ \\
difference & $a \setminus b$ & $a$ \& (\textasciitilde$b$) \\
\end{tabular}
\end{center}
For example, the following code first constructs
the sets $x=\{1,3,4,8\}$ and $y=\{3,6,8,9\}$,
and then constructs the set $z = x \cup y = \{1,3,4,6,8,9\}$:
\begin{lstlisting}
int x = (1<<1)|(1<<3)|(1<<4)|(1<<8);
int y = (1<<3)|(1<<6)|(1<<8)|(1<<9);
int z = x|y;
cout << __builtin_popcount(z) << "\n"; // 6
\end{lstlisting}
\subsubsection{Iterating through subsets}
The following code goes through
the subsets of $\{0,1,\ldots,n-1\}$:
\begin{lstlisting}
for (int b = 0; b < (1<<n); b++) {
// process subset b
}
\end{lstlisting}
The following code goes through
the subsets with exactly $k$ elements:
\begin{lstlisting}
for (int b = 0; b < (1<<n); b++) {
if (__builtin_popcount(b) == k) {
// process subset b
}
}
\end{lstlisting}
The following code goes through the subsets
of a set $x$:
\begin{lstlisting}
int b = 0;
do {
// process subset b
} while (b=(b-x)&x);
\end{lstlisting}
\section{Bit optimizations}
Many algorithms can be optimized using
bit operations.
Such optimizations do not change the
time complexity of the algorithm,
but they may have a large impact
on the actual running time of the code.
In this section we discuss examples
of such situations.
\subsubsection{Hamming distances}
\index{Hamming distance}
The \key{Hamming distance}
$\texttt{hamming}(a,b)$ between two
strings $a$ and $b$ of equal length is
the number of positions where the strings differ.
For example,
\[\texttt{hamming}(01101,11001)=2.\]
Consider the following problem: Given
a list of $n$ bit strings, each of length $k$,
calculate the minimum Hamming distance
between two strings in the list.
For example, the answer for $[00111,01101,11110]$
is 2, because
\begin{itemize}[noitemsep]
\item $\texttt{hamming}(00111,01101)=2$,
\item $\texttt{hamming}(00111,11110)=3$, and
\item $\texttt{hamming}(01101,11110)=3$.
\end{itemize}
A straightforward way to solve the problem is
to go through all pairs of strings and calculate
their Hamming distances,
which yields an $O(n^2 k)$ time algorithm.
The following function can be used to
calculate distances:
\begin{lstlisting}
int hamming(string a, string b) {
int d = 0;
for (int i = 0; i < k; i++) {
if (a[i] != b[i]) d++;
}
return d;
}
\end{lstlisting}
However, if $k$ is small, we can optimize the code
by storing the bit strings as integers and
calculating the Hamming distances using bit operations.
In particular, if $k \le 32$, we can just store
the strings as \texttt{int} values and use the
following function to calculate distances:
\begin{lstlisting}
int hamming(int a, int b) {
return __builtin_popcount(a^b);
}
\end{lstlisting}
In the above function, the xor operation constructs
a bit string that has one bits in positions
where $a$ and $b$ differ.
Then, the number of bits is calculated using
the \texttt{\_\_builtin\_popcount} function.
To compare the implementations, we generated
a list of 10000 random bit strings of length 30.
Using the first approach, the search took
13.5 seconds, and after the bit optimization,
it only took 0.5 seconds.
Thus, the bit optimized code was almost
30 times faster than the original code.
\subsubsection{Counting subgrids}
As another example, consider the
following problem:
Given an $n \times n$ grid whose
each square is either black (1) or white (0),
calculate the number of subgrids
whose all corners are black.
For example, the grid
\begin{center}
\begin{tikzpicture}[scale=0.5]
\fill[black] (1,1) rectangle (2,2);
\fill[black] (1,4) rectangle (2,5);
\fill[black] (4,1) rectangle (5,2);
\fill[black] (4,4) rectangle (5,5);
\fill[black] (1,3) rectangle (2,4);
\fill[black] (2,3) rectangle (3,4);
\fill[black] (2,1) rectangle (3,2);
\fill[black] (0,2) rectangle (1,3);
\draw (0,0) grid (5,5);
\end{tikzpicture}
\end{center}
contains two such subgrids:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\fill[black] (1,1) rectangle (2,2);
\fill[black] (1,4) rectangle (2,5);
\fill[black] (4,1) rectangle (5,2);
\fill[black] (4,4) rectangle (5,5);
\fill[black] (1,3) rectangle (2,4);
\fill[black] (2,3) rectangle (3,4);
\fill[black] (2,1) rectangle (3,2);
\fill[black] (0,2) rectangle (1,3);
\draw (0,0) grid (5,5);
\fill[black] (7+1,1) rectangle (7+2,2);
\fill[black] (7+1,4) rectangle (7+2,5);
\fill[black] (7+4,1) rectangle (7+5,2);
\fill[black] (7+4,4) rectangle (7+5,5);
\fill[black] (7+1,3) rectangle (7+2,4);
\fill[black] (7+2,3) rectangle (7+3,4);
\fill[black] (7+2,1) rectangle (7+3,2);
\fill[black] (7+0,2) rectangle (7+1,3);
\draw (7+0,0) grid (7+5,5);
\draw[color=red,line width=1mm] (1,1) rectangle (3,4);
\draw[color=red,line width=1mm] (7+1,1) rectangle (7+5,5);
\end{tikzpicture}
\end{center}
There is an $O(n^3)$ time algorithm for solving the problem:
go through all $O(n^2)$ pairs of rows and for each pair
$(a,b)$ calculate the number of columns that contain a black
square in both rows in $O(n)$ time.
The following code assumes that $\texttt{color}[y][x]$
denotes the color in row $y$ and column $x$:
\begin{lstlisting}
int count = 0;
for (int i = 0; i < n; i++) {
if (color[a][i] == 1 && color[b][i] == 1) count++;
}
\end{lstlisting}
Then, those columns
account for $\texttt{count}(\texttt{count}-1)/2$ subgrids with black corners,
because we can choose any two of them to form a subgrid.
To optimize this algorithm, we divide the grid into blocks
of columns such that each block consists of $N$
consecutive columns. Then, each row is stored as
a list of $N$-bit numbers that describe the colors
of the squares. Now we can process $N$ columns at the same time
using bit operations. In the following code,
$\texttt{color}[y][k]$ represents
a block of $N$ colors as bits.
\begin{lstlisting}
int count = 0;
for (int i = 0; i <= n/N; i++) {
count += __builtin_popcount(color[a][i]&color[b][i]);
}
\end{lstlisting}
The resulting algorithm works in $O(n^3/N)$ time.
We generated a random grid of size $2500 \times 2500$
and compared the original and bit optimized implementation.
While the original code took $29.6$ seconds,
the bit optimized version only took $3.1$ seconds
with $N=32$ (\texttt{int} numbers) and $1.7$ seconds
with $N=64$ (\texttt{long long} numbers).
\section{Dynamic programming}
Bit operations provide an efficient and convenient
way to implement dynamic programming algorithms
whose states contain subsets of elements,
because such states can be stored as integers.
Next we discuss examples of combining
bit operations and dynamic programming.
\subsubsection{Optimal selection}
As a first example, consider the following problem:
We are given the prices of $k$ products
over $n$ days, and we want to buy each product
exactly once.
However, we are allowed to buy at most one product
in a day.
What is the minimum total price?
For example, consider the following scenario ($k=3$ and $n=8$):
\begin{center}
\begin{tikzpicture}[scale=.65]
\draw (0, 0) grid (8,3);
\node at (-2.5,2.5) {product 0};
\node at (-2.5,1.5) {product 1};
\node at (-2.5,0.5) {product 2};
\foreach \x in {0,...,7}
{\node at (\x+0.5,3.5) {\x};}
\foreach \x/\v in {0/6,1/9,2/5,3/2,4/8,5/9,6/1,7/6}
{\node at (\x+0.5,2.5) {\v};}
\foreach \x/\v in {0/8,1/2,2/6,3/2,4/7,5/5,6/7,7/2}
{\node at (\x+0.5,1.5) {\v};}
\foreach \x/\v in {0/5,1/3,2/9,3/7,4/3,5/5,6/1,7/4}
{\node at (\x+0.5,0.5) {\v};}
\end{tikzpicture}
\end{center}
In this scenario, the minimum total price is $5$:
\begin{center}
\begin{tikzpicture}[scale=.65]
\fill [color=lightgray] (1, 1) rectangle (2, 2);
\fill [color=lightgray] (3, 2) rectangle (4, 3);
\fill [color=lightgray] (6, 0) rectangle (7, 1);
\draw (0, 0) grid (8,3);
\node at (-2.5,2.5) {product 0};
\node at (-2.5,1.5) {product 1};
\node at (-2.5,0.5) {product 2};
\foreach \x in {0,...,7}
{\node at (\x+0.5,3.5) {\x};}
\foreach \x/\v in {0/6,1/9,2/5,3/2,4/8,5/9,6/1,7/6}
{\node at (\x+0.5,2.5) {\v};}
\foreach \x/\v in {0/8,1/2,2/6,3/2,4/7,5/5,6/7,7/2}
{\node at (\x+0.5,1.5) {\v};}
\foreach \x/\v in {0/5,1/3,2/9,3/7,4/3,5/5,6/1,7/4}
{\node at (\x+0.5,0.5) {\v};}
\end{tikzpicture}
\end{center}
Let $\texttt{price}[x][d]$ denote the price of product $x$
on day $d$.
For example, in the above scenario $\texttt{price}[2][3] = 7$.
Then, let $\texttt{total}(S,d)$ denote the minimum total
price for buying a subset $S$ of products by day $d$.
Using this function, the solution to the problem is
$\texttt{total}(\{0 \ldots k-1\},n-1)$.
First, $\texttt{total}(\emptyset,d) = 0$,
because it does not cost anything to buy an empty set,
and $\texttt{total}(\{x\},0) = \texttt{price}[x][0]$,
because there is one way to buy one product on the first day.
Then, the following recurrence can be used:
\begin{equation*}
\begin{split}
\texttt{total}(S,d) = \min( & \texttt{total}(S,d-1), \\
& \min_{x \in S} (\texttt{total}(S \setminus x,d-1)+\texttt{price}[x][d]))
\end{split}
\end{equation*}
This means that we either do not buy any product on day $d$
or buy a product $x$ that belongs to $S$.
In the latter case, we remove $x$ from $S$ and add the
price of $x$ to the total price.
The next step is to calculate the values of the function
using dynamic programming.
To store the function values, we declare an array
\begin{lstlisting}
int total[1<<K][N];
\end{lstlisting}
where $K$ and $N$ are suitably large constants.
The first dimension of the array corresponds to a bit
representation of a subset.
First, the cases where $d=0$ can be processed as follows:
\begin{lstlisting}
for (int x = 0; x < k; x++) {
total[1<<x][0] = price[x][0];
}
\end{lstlisting}
Then, the recurrence translates into the following code:
\begin{lstlisting}
for (int d = 1; d < n; d++) {
for (int s = 0; s < (1<<k); s++) {
total[s][d] = total[s][d-1];
for (int x = 0; x < k; x++) {
if (s&(1<<x)) {
total[s][d] = min(total[s][d],
total[s^(1<<x)][d-1]+price[x][d]);
}
}
}
}
\end{lstlisting}
The time complexity of the algorithm is $O(n 2^k k)$.
\subsubsection{From permutations to subsets}
Using dynamic programming, it is often possible
to change an iteration over permutations into
an iteration over subsets\footnote{This technique was introduced in 1962
by M. Held and R. M. Karp \cite{hel62}.}.
The benefit of this is that
$n!$, the number of permutations,
is much larger than $2^n$, the number of subsets.
For example, if $n=20$, then
$n! \approx 2.4 \cdot 10^{18}$ and $2^n \approx 10^6$.
Thus, for certain values of $n$,
we can efficiently go through the subsets but not through the permutations.
As an example, consider the following problem:
There is an elevator with maximum weight $x$,
and $n$ people with known weights
who want to get from the ground floor
to the top floor.
What is the minimum number of rides needed
if the people enter the elevator in an optimal order?
For example, suppose that $x=10$, $n=5$
and the weights are as follows:
\begin{center}
\begin{tabular}{ll}
person & weight \\
\hline
0 & 2 \\
1 & 3 \\
2 & 3 \\
3 & 5 \\
4 & 6 \\
\end{tabular}
\end{center}
In this case, the minimum number of rides is 2.
One optimal order is $\{0,2,3,1,4\}$,
which partitions the people into two rides:
first $\{0,2,3\}$ (total weight 10),
and then $\{1,4\}$ (total weight 9).
The problem can be easily solved in $O(n! n)$ time
by testing all possible permutations of $n$ people.
However, we can use dynamic programming to get
a more efficient $O(2^n n)$ time algorithm.
The idea is to calculate for each subset of people
two values: the minimum number of rides needed and
the minimum weight of people who ride in the last group.
Let $\texttt{weight}[p]$ denote the weight of
person $p$.
We define two functions:
$\texttt{rides}(S)$ is the minimum number of
rides for a subset $S$,
and $\texttt{last}(S)$ is the minimum weight
of the last ride.
For example, in the above scenario
\[ \texttt{rides}(\{1,3,4\})=2 \hspace{10px} \textrm{and}
\hspace{10px} \texttt{last}(\{1,3,4\})=5,\]
because the optimal rides are $\{1,4\}$ and $\{3\}$,
and the second ride has weight 5.
Of course, our final goal is to calculate the value
of $\texttt{rides}(\{0 \ldots n-1\})$.
We can calculate the values
of the functions recursively and then apply
dynamic programming.
The idea is to go through all people
who belong to $S$ and optimally
choose the last person $p$ who enters the elevator.
Each such choice yields a subproblem
for a smaller subset of people.
If $\texttt{last}(S \setminus p)+\texttt{weight}[p] \le x$,
we can add $p$ to the last ride.
Otherwise, we have to reserve a new ride
that initially only contains $p$.
To implement dynamic programming,
we declare an array
\begin{lstlisting}
pair<int,int> best[1<<N];
\end{lstlisting}
that contains for each subset $S$
a pair $(\texttt{rides}(S),\texttt{last}(S))$.
We set the value for an empty group as follows:
\begin{lstlisting}
best[0] = {1,0};
\end{lstlisting}
Then, we can fill the array as follows:
\begin{lstlisting}
for (int s = 1; s < (1<<n); s++) {
// initial value: n+1 rides are needed
best[s] = {n+1,0};
for (int p = 0; p < n; p++) {
if (s&(1<<p)) {
auto option = best[s^(1<<p)];
if (option.second+weight[p] <= x) {
// add p to an existing ride
option.second += weight[p];
} else {
// reserve a new ride for p
option.first++;
option.second = weight[p];
}
best[s] = min(best[s], option);
}
}
}
\end{lstlisting}
Note that the above loop guarantees that
for any two subsets $S_1$ and $S_2$
such that $S_1 \subset S_2$, we process $S_1$ before $S_2$.
Thus, the dynamic programming values are calculated in the
correct order.
\subsubsection{Counting subsets}
Our last problem in this chapter is as follows:
Let $X=\{0 \ldots n-1\}$, and each subset $S \subset X$
is assigned an integer $\texttt{value}[S]$.
Our task is to calculate for each $S$
\[\texttt{sum}(S) = \sum_{A \subset S} \texttt{value}[A],\]
i.e., the sum of values of subsets of $S$.
For example, suppose that $n=3$ and the values are as follows:
\begin{multicols}{2}
\begin{itemize}
\item $\texttt{value}[\emptyset] = 3$
\item $\texttt{value}[\{0\}] = 1$
\item $\texttt{value}[\{1\}] = 4$
\item $\texttt{value}[\{0,1\}] = 5$
\item $\texttt{value}[\{2\}] = 5$
\item $\texttt{value}[\{0,2\}] = 1$
\item $\texttt{value}[\{1,2\}] = 3$
\item $\texttt{value}[\{0,1,2\}] = 3$
\end{itemize}
\end{multicols}
In this case, for example,
\begin{equation*}
\begin{split}
\texttt{sum}(\{0,2\}) &= \texttt{value}[\emptyset]+\texttt{value}[\{0\}]+\texttt{value}[\{2\}]+\texttt{value}[\{0,2\}] \\
&= 3 + 1 + 5 + 1 = 10.
\end{split}
\end{equation*}
Because there are a total of $2^n$ subsets,
one possible solution is to go through all
pairs of subsets in $O(2^{2n})$ time.
However, using dynamic programming, we
can solve the problem in $O(2^n n)$ time.
The idea is to focus on sums where the
elements that may be removed from $S$ are restricted.
Let $\texttt{partial}(S,k)$ denote the sum of
values of subsets of $S$ with the restriction
that only elements $0 \ldots k$
may be removed from $S$.
For example,
\[\texttt{partial}(\{0,2\},1)=\texttt{value}[\{2\}]+\texttt{value}[\{0,2\}],\]
because we may only remove elements $0 \ldots 1$.
We can calculate values of \texttt{sum} using
values of \texttt{partial}, because
\[\texttt{sum}(S) = \texttt{partial}(S,n-1).\]
The base cases for the function are
\[\texttt{partial}(S,-1)=\texttt{value}[S],\]
because in this case no elements can be removed from $S$.
Then, in the general case we can use the following recurrence:
\begin{equation*}
\texttt{partial}(S,k) = \begin{cases}
\texttt{partial}(S,k-1) & k \notin S \\
\texttt{partial}(S,k-1) + \texttt{partial}(S \setminus \{k\},k-1) & k \in S
\end{cases}
\end{equation*}
Here we focus on the element $k$.
If $k \in S$, we have two options: we may either keep $k$ in $S$
or remove it from $S$.
There is a particularly clever way to implement the
calculation of sums. We can declare an array
\begin{lstlisting}
int sum[1<<N];
\end{lstlisting}
that will contain the sum of each subset.
The array is initialized as follows:
\begin{lstlisting}
for (int s = 0; s < (1<<n); s++) {
sum[s] = value[s];
}
\end{lstlisting}
Then, we can fill the array as follows:
\begin{lstlisting}
for (int k = 0; k < n; k++) {
for (int s = 0; s < (1<<n); s++) {
if (s&(1<<k)) sum[s] += sum[s^(1<<k)];
}
}
\end{lstlisting}
This code calculates the values of $\texttt{partial}(S,k)$
for $k=0 \ldots n-1$ to the array \texttt{sum}.
Since $\texttt{partial}(S,k)$ is always based on
$\texttt{partial}(S,k-1)$, we can reuse the array
\texttt{sum}, which yields a very efficient implementation.

764
chapter11.tex Normal file
View File

@ -0,0 +1,764 @@
\chapter{Basics of graphs}
Many programming problems can be solved by
modeling the problem as a graph problem
and using an appropriate graph algorithm.
A typical example of a graph is a network
of roads and cities in a country.
Sometimes, though, the graph is hidden
in the problem and it may be difficult to detect it.
This part of the book discusses graph algorithms,
especially focusing on topics that
are important in competitive programming.
In this chapter, we go through concepts
related to graphs,
and study different ways to represent graphs in algorithms.
\section{Graph terminology}
\index{graph}
\index{node}
\index{edge}
A \key{graph} consists of \key{nodes}
and \key{edges}. In this book,
the variable $n$ denotes the number of nodes
in a graph, and the variable $m$ denotes
the number of edges.
The nodes are numbered
using integers $1,2,\ldots,n$.
For example, the following graph consists of 5 nodes and 7 edges:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
\index{path}
A \key{path} leads from node $a$ to node $b$
through edges of the graph.
The \key{length} of a path is the number of
edges in it.
For example, the above graph contains
a path $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$
of length 3
from node 1 to node 5:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\end{tikzpicture}
\end{center}
\index{cycle}
A path is a \key{cycle} if the first and last
node is the same.
For example, the above graph contains
a cycle $1 \rightarrow 3 \rightarrow 4 \rightarrow 1$.
A path is \key{simple} if each node appears
at most once in the path.
%
% \begin{itemize}
% \item $1 \rightarrow 2 \rightarrow 5$ (length 2)
% \item $1 \rightarrow 4 \rightarrow 5$ (length 2)
% \item $1 \rightarrow 2 \rightarrow 4 \rightarrow 5$ (length 3)
% \item $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ (length 3)
% \item $1 \rightarrow 4 \rightarrow 2 \rightarrow 5$ (length 3)
% \item $1 \rightarrow 3 \rightarrow 4 \rightarrow 2 \rightarrow 5$ (length 4)
% \end{itemize}
\subsubsection{Connectivity}
\index{connected graph}
A graph is \key{connected} if there is a path
between any two nodes.
For example, the following graph is connected:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\end{tikzpicture}
\end{center}
The following graph is not connected,
because it is not possible to get
from node 4 to any other node:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\end{tikzpicture}
\end{center}
\index{component}
The connected parts of a graph are
called its \key{components}.
For example, the following graph
contains three components:
$\{1,\,2,\,3\}$,
$\{4,\,5,\,6,\,7\}$ and
$\{8\}$.
\begin{center}
\begin{tikzpicture}[scale=0.8]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (6) at (6,1) {$6$};
\node[draw, circle] (7) at (9,1) {$7$};
\node[draw, circle] (4) at (6,3) {$4$};
\node[draw, circle] (5) at (9,3) {$5$};
\node[draw, circle] (8) at (11,2) {$8$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (4) -- (5);
\path[draw,thick,-] (5) -- (7);
\path[draw,thick,-] (6) -- (7);
\path[draw,thick,-] (6) -- (4);
\end{tikzpicture}
\end{center}
\index{tree}
A \key{tree} is a connected graph
that consists of $n$ nodes and $n-1$ edges.
There is a unique path
between any two nodes of a tree.
For example, the following graph is a tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
%\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (4);
%\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
\subsubsection{Edge directions}
\index{directed graph}
A graph is \key{directed}
if the edges can be traversed
in one direction only.
For example, the following graph is directed:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (4);
\path[draw,thick,->,>=latex] (2) -- (5);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (3) -- (1);
\end{tikzpicture}
\end{center}
The above graph contains
a path $3 \rightarrow 1 \rightarrow 2 \rightarrow 5$
from node $3$ to node $5$,
but there is no path from node $5$ to node $3$.
\subsubsection{Edge weights}
\index{weighted graph}
In a \key{weighted} graph, each edge is assigned
a \key{weight}.
The weights are often interpreted as edge lengths.
For example, the following graph is weighted:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:1] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:7] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:3] {} (5);
\end{tikzpicture}
\end{center}
The length of a path in a weighted graph
is the sum of the edge weights on the path.
For example, in the above graph,
the length of the path
$1 \rightarrow 2 \rightarrow 5$ is $12$,
and the length of the path
$1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ is $11$.
The latter path is the \key{shortest} path from node $1$ to node $5$.
\subsubsection{Neighbors and degrees}
\index{neighbor}
\index{degree}
Two nodes are \key{neighbors} or \key{adjacent}
if there is an edge between them.
The \key{degree} of a node
is the number of its neighbors.
For example, in the following graph,
the neighbors of node 2 are 1, 4 and 5,
so its degree is 3.
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
%\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
The sum of degrees in a graph is always $2m$,
where $m$ is the number of edges,
because each edge
increases the degree of exactly two nodes by one.
For this reason, the sum of degrees is always even.
\index{regular graph}
\index{complete graph}
A graph is \key{regular} if the
degree of every node is a constant $d$.
A graph is \key{complete} if the
degree of every node is $n-1$, i.e.,
the graph contains all possible edges
between the nodes.
\index{indegree}
\index{outdegree}
In a directed graph, the \key{indegree}
of a node is the number of edges
that end at the node,
and the \key{outdegree} of a node
is the number of edges that start at the node.
For example, in the following graph,
the indegree of node 2 is 2,
and the outdegree of node 2 is 1.
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (1) -- (3);
\path[draw,thick,->,>=latex] (1) -- (4);
\path[draw,thick,->,>=latex] (3) -- (4);
\path[draw,thick,->,>=latex] (2) -- (4);
\path[draw,thick,<-,>=latex] (2) -- (5);
\end{tikzpicture}
\end{center}
\subsubsection{Colorings}
\index{coloring}
\index{bipartite graph}
In a \key{coloring} of a graph,
each node is assigned a color so that
no adjacent nodes have the same color.
A graph is \key{bipartite} if
it is possible to color it using two colors.
It turns out that a graph is bipartite
exactly when it does not contain a cycle
with an odd number of edges.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$2$};
\node[draw, circle] (2) at (4,3) {$3$};
\node[draw, circle] (3) at (1,1) {$5$};
\node[draw, circle] (4) at (4,1) {$6$};
\node[draw, circle] (5) at (-2,1) {$4$};
\node[draw, circle] (6) at (-2,3) {$1$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\end{tikzpicture}
\end{center}
is bipartite, because it can be colored as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle, fill=blue!40] (1) at (1,3) {$2$};
\node[draw, circle, fill=red!40] (2) at (4,3) {$3$};
\node[draw, circle, fill=red!40] (3) at (1,1) {$5$};
\node[draw, circle, fill=blue!40] (4) at (4,1) {$6$};
\node[draw, circle, fill=red!40] (5) at (-2,1) {$4$};
\node[draw, circle, fill=blue!40] (6) at (-2,3) {$1$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\end{tikzpicture}
\end{center}
However, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$2$};
\node[draw, circle] (2) at (4,3) {$3$};
\node[draw, circle] (3) at (1,1) {$5$};
\node[draw, circle] (4) at (4,1) {$6$};
\node[draw, circle] (5) at (-2,1) {$4$};
\node[draw, circle] (6) at (-2,3) {$1$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (6);
\end{tikzpicture}
\end{center}
is not bipartite, because it is not possible to color
the following cycle of three nodes using two colors:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$2$};
\node[draw, circle] (2) at (4,3) {$3$};
\node[draw, circle] (3) at (1,1) {$5$};
\node[draw, circle] (4) at (4,1) {$6$};
\node[draw, circle] (5) at (-2,1) {$4$};
\node[draw, circle] (6) at (-2,3) {$1$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (6);
\path[draw=red,thick,-,line width=2pt] (1) -- (3);
\path[draw=red,thick,-,line width=2pt] (3) -- (6);
\path[draw=red,thick,-,line width=2pt] (6) -- (1);
\end{tikzpicture}
\end{center}
\subsubsection{Simplicity}
\index{simple graph}
A graph is \key{simple}
if no edge starts and ends at the same node,
and there are no multiple
edges between two nodes.
Often we assume that graphs are simple.
For example, the following graph is \emph{not} simple:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$2$};
\node[draw, circle] (2) at (4,3) {$3$};
\node[draw, circle] (3) at (1,1) {$5$};
\node[draw, circle] (4) at (4,1) {$6$};
\node[draw, circle] (5) at (-2,1) {$4$};
\node[draw, circle] (6) at (-2,3) {$1$};
\path[draw,thick,-] (1) edge [bend right=20] (2);
\path[draw,thick,-] (2) edge [bend right=20] (1);
%\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (5) -- (6);
\tikzset{every loop/.style={in=135,out=190}}
\path[draw,thick,-] (5) edge [loop left] (5);
\end{tikzpicture}
\end{center}
\section{Graph representation}
There are several ways to represent graphs
in algorithms.
The choice of a data structure
depends on the size of the graph and
the way the algorithm processes it.
Next we will go through three common representations.
\subsubsection{Adjacency list representation}
\index{adjacency list}
In the adjacency list representation,
each node $x$ in the graph is assigned an \key{adjacency list}
that consists of nodes
to which there is an edge from $x$.
Adjacency lists are the most popular
way to represent graphs, and most algorithms can be
efficiently implemented using them.
A convenient way to store the adjacency lists is to declare
an array of vectors as follows:
\begin{lstlisting}
vector<int> adj[N];
\end{lstlisting}
The constant $N$ is chosen so that all
adjacency lists can be stored.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (2) -- (4);
\path[draw,thick,->,>=latex] (3) -- (4);
\path[draw,thick,->,>=latex] (4) -- (1);
\end{tikzpicture}
\end{center}
can be stored as follows:
\begin{lstlisting}
adj[1].push_back(2);
adj[2].push_back(3);
adj[2].push_back(4);
adj[3].push_back(4);
adj[4].push_back(1);
\end{lstlisting}
If the graph is undirected, it can be stored in a similar way,
but each edge is added in both directions.
For a weighted graph, the structure can be extended
as follows:
\begin{lstlisting}
vector<pair<int,int>> adj[N];
\end{lstlisting}
In this case, the adjacency list of node $a$
contains the pair $(b,w)$
always when there is an edge from node $a$ to node $b$
with weight $w$. For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4);
\path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1);
\end{tikzpicture}
\end{center}
can be stored as follows:
\begin{lstlisting}
adj[1].push_back({2,5});
adj[2].push_back({3,7});
adj[2].push_back({4,6});
adj[3].push_back({4,5});
adj[4].push_back({1,2});
\end{lstlisting}
The benefit of using adjacency lists is that
we can efficiently find the nodes to which
we can move from a given node through an edge.
For example, the following loop goes through all nodes
to which we can move from node $s$:
\begin{lstlisting}
for (auto u : adj[s]) {
// process node u
}
\end{lstlisting}
\subsubsection{Adjacency matrix representation}
\index{adjacency matrix}
An \key{adjacency matrix} is a two-dimensional array
that indicates which edges the graph contains.
We can efficiently check from an adjacency matrix
if there is an edge between two nodes.
The matrix can be stored as an array
\begin{lstlisting}
int adj[N][N];
\end{lstlisting}
where each value $\texttt{adj}[a][b]$ indicates
whether the graph contains an edge from
node $a$ to node $b$.
If the edge is included in the graph,
then $\texttt{adj}[a][b]=1$,
and otherwise $\texttt{adj}[a][b]=0$.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (2) -- (4);
\path[draw,thick,->,>=latex] (3) -- (4);
\path[draw,thick,->,>=latex] (4) -- (1);
\end{tikzpicture}
\end{center}
can be represented as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (4,4);
\node at (0.5,0.5) {1};
\node at (1.5,0.5) {0};
\node at (2.5,0.5) {0};
\node at (3.5,0.5) {0};
\node at (0.5,1.5) {0};
\node at (1.5,1.5) {0};
\node at (2.5,1.5) {0};
\node at (3.5,1.5) {1};
\node at (0.5,2.5) {0};
\node at (1.5,2.5) {0};
\node at (2.5,2.5) {1};
\node at (3.5,2.5) {1};
\node at (0.5,3.5) {0};
\node at (1.5,3.5) {1};
\node at (2.5,3.5) {0};
\node at (3.5,3.5) {0};
\node at (-0.5,0.5) {4};
\node at (-0.5,1.5) {3};
\node at (-0.5,2.5) {2};
\node at (-0.5,3.5) {1};
\node at (0.5,4.5) {1};
\node at (1.5,4.5) {2};
\node at (2.5,4.5) {3};
\node at (3.5,4.5) {4};
\end{tikzpicture}
\end{center}
If the graph is weighted, the adjacency matrix
representation can be extended so that
the matrix contains the weight of the edge
if the edge exists.
Using this representation, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4);
\path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1);
\end{tikzpicture}
\end{center}
\begin{samepage}
corresponds to the following matrix:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (4,4);
\node at (0.5,0.5) {2};
\node at (1.5,0.5) {0};
\node at (2.5,0.5) {0};
\node at (3.5,0.5) {0};
\node at (0.5,1.5) {0};
\node at (1.5,1.5) {0};
\node at (2.5,1.5) {0};
\node at (3.5,1.5) {5};
\node at (0.5,2.5) {0};
\node at (1.5,2.5) {0};
\node at (2.5,2.5) {7};
\node at (3.5,2.5) {6};
\node at (0.5,3.5) {0};
\node at (1.5,3.5) {5};
\node at (2.5,3.5) {0};
\node at (3.5,3.5) {0};
\node at (-0.5,0.5) {4};
\node at (-0.5,1.5) {3};
\node at (-0.5,2.5) {2};
\node at (-0.5,3.5) {1};
\node at (0.5,4.5) {1};
\node at (1.5,4.5) {2};
\node at (2.5,4.5) {3};
\node at (3.5,4.5) {4};
\end{tikzpicture}
\end{center}
\end{samepage}
The drawback of the adjacency matrix representation
is that the matrix contains $n^2$ elements,
and usually most of them are zero.
For this reason, the representation cannot be used
if the graph is large.
\subsubsection{Edge list representation}
\index{edge list}
An \key{edge list} contains all edges of a graph
in some order.
This is a convenient way to represent a graph
if the algorithm processes all edges of the graph
and it is not needed to find edges that start
at a given node.
The edge list can be stored in a vector
\begin{lstlisting}
vector<pair<int,int>> edges;
\end{lstlisting}
where each pair $(a,b)$ denotes that
there is an edge from node $a$ to node $b$.
Thus, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (2) -- (4);
\path[draw,thick,->,>=latex] (3) -- (4);
\path[draw,thick,->,>=latex] (4) -- (1);
\end{tikzpicture}
\end{center}
can be represented as follows:
\begin{lstlisting}
edges.push_back({1,2});
edges.push_back({2,3});
edges.push_back({2,4});
edges.push_back({3,4});
edges.push_back({4,1});
\end{lstlisting}
\noindent
If the graph is weighted, the structure can
be extended as follows:
\begin{lstlisting}
vector<tuple<int,int,int>> edges;
\end{lstlisting}
Each element in this list is of the
form $(a,b,w)$, which means that there
is an edge from node $a$ to node $b$ with weight $w$.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4);
\path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1);
\end{tikzpicture}
\end{center}
\begin{samepage}
can be represented as follows\footnote{In some older compilers, the function
\texttt{make\_tuple} must be used instead of the braces (for example,
\texttt{make\_tuple(1,2,5)} instead of \texttt{\{1,2,5\}}).}:
\begin{lstlisting}
edges.push_back({1,2,5});
edges.push_back({2,3,7});
edges.push_back({2,4,6});
edges.push_back({3,4,5});
edges.push_back({4,1,2});
\end{lstlisting}
\end{samepage}

549
chapter12.tex Normal file
View File

@ -0,0 +1,549 @@
\chapter{Graph traversal}
This chapter discusses two fundamental
graph algorithms:
depth-first search and breadth-first search.
Both algorithms are given a starting
node in the graph,
and they visit all nodes that can be reached
from the starting node.
The difference in the algorithms is the order
in which they visit the nodes.
\section{Depth-first search}
\index{depth-first search}
\key{Depth-first search} (DFS)
is a straightforward graph traversal technique.
The algorithm begins at a starting node,
and proceeds to all other nodes that are
reachable from the starting node using
the edges of the graph.
Depth-first search always follows a single
path in the graph as long as it finds
new nodes.
After this, it returns to previous
nodes and begins to explore other parts of the graph.
The algorithm keeps track of visited nodes,
so that it processes each node only once.
\subsubsection*{Example}
Let us consider how depth-first search processes
the following graph:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
We may begin the search at any node of the graph;
now we will begin the search at node 1.
The search first proceeds to node 2:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\end{tikzpicture}
\end{center}
After this, nodes 3 and 5 will be visited:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (5);
\end{tikzpicture}
\end{center}
The neighbors of node 5 are 2 and 3,
but the search has already visited both of them,
so it is time to return to the previous nodes.
Also the neighbors of nodes 3 and 2
have been visited, so we next move
from node 1 to node 4:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (4);
\end{tikzpicture}
\end{center}
After this, the search terminates because it has visited
all nodes.
The time complexity of depth-first search is $O(n+m)$
where $n$ is the number of nodes and $m$ is the
number of edges,
because the algorithm processes each node and edge once.
\subsubsection*{Implementation}
Depth-first search can be conveniently
implemented using recursion.
The following function \texttt{dfs} begins
a depth-first search at a given node.
The function assumes that the graph is
stored as adjacency lists in an array
\begin{lstlisting}
vector<int> adj[N];
\end{lstlisting}
and also maintains an array
\begin{lstlisting}
bool visited[N];
\end{lstlisting}
that keeps track of the visited nodes.
Initially, each array value is \texttt{false},
and when the search arrives at node $s$,
the value of \texttt{visited}[$s$] becomes \texttt{true}.
The function can be implemented as follows:
\begin{lstlisting}
void dfs(int s) {
if (visited[s]) return;
visited[s] = true;
// process node s
for (auto u: adj[s]) {
dfs(u);
}
}
\end{lstlisting}
\section{Breadth-first search}
\index{breadth-first search}
\key{Breadth-first search} (BFS) visits the nodes
in increasing order of their distance
from the starting node.
Thus, we can calculate the distance
from the starting node to all other
nodes using breadth-first search.
However, breadth-first search is more difficult
to implement than depth-first search.
Breadth-first search goes through the nodes
one level after another.
First the search explores the nodes whose
distance from the starting node is 1,
then the nodes whose distance is 2, and so on.
This process continues until all nodes
have been visited.
\subsubsection*{Example}
Let us consider how breadth-first search processes
the following graph:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\end{tikzpicture}
\end{center}
Suppose that the search begins at node 1.
First, we process all nodes that can be reached
from node 1 using a single edge:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (1) -- (4);
\end{tikzpicture}
\end{center}
After this, we proceed to nodes 3 and 5:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
Finally, we visit node 6:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\node[draw, circle,fill=lightgray] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (3) -- (6);
\path[draw=red,thick,->,line width=2pt] (5) -- (6);
\end{tikzpicture}
\end{center}
Now we have calculated the distances
from the starting node to all nodes of the graph.
The distances are as follows:
\begin{tabular}{ll}
\\
node & distance \\
\hline
1 & 0 \\
2 & 1 \\
3 & 2 \\
4 & 1 \\
5 & 2 \\
6 & 3 \\
\\
\end{tabular}
Like in depth-first search,
the time complexity of breadth-first search
is $O(n+m)$, where $n$ is the number of nodes
and $m$ is the number of edges.
\subsubsection*{Implementation}
Breadth-first search is more difficult
to implement than depth-first search,
because the algorithm visits nodes
in different parts of the graph.
A typical implementation is based on
a queue that contains nodes.
At each step, the next node in the queue
will be processed.
The following code assumes that the graph is stored
as adjacency lists and maintains the following
data structures:
\begin{lstlisting}
queue<int> q;
bool visited[N];
int distance[N];
\end{lstlisting}
The queue \texttt{q}
contains nodes to be processed
in increasing order of their distance.
New nodes are always added to the end
of the queue, and the node at the beginning
of the queue is the next node to be processed.
The array \texttt{visited} indicates
which nodes the search has already visited,
and the array \texttt{distance} will contain the
distances from the starting node to all nodes of the graph.
The search can be implemented as follows,
starting at node $x$:
\begin{lstlisting}
visited[x] = true;
distance[x] = 0;
q.push(x);
while (!q.empty()) {
int s = q.front(); q.pop();
// process node s
for (auto u : adj[s]) {
if (visited[u]) continue;
visited[u] = true;
distance[u] = distance[s]+1;
q.push(u);
}
}
\end{lstlisting}
\section{Applications}
Using the graph traversal algorithms,
we can check many properties of graphs.
Usually, both depth-first search and
breadth-first search may be used,
but in practice, depth-first search
is a better choice, because it is
easier to implement.
In the following applications we will
assume that the graph is undirected.
\subsubsection{Connectivity check}
\index{connected graph}
A graph is connected if there is a path
between any two nodes of the graph.
Thus, we can check if a graph is connected
by starting at an arbitrary node and
finding out if we can reach all other nodes.
For example, in the graph
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (7,5) {$2$};
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (5) at (7,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
a depth-first search from node $1$ visits
the following nodes:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (7,5) {$2$};
\node[draw, circle,fill=lightgray] (1) at (3,5) {$1$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle] (5) at (7,3) {$5$};
\node[draw, circle,fill=lightgray] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\end{tikzpicture}
\end{center}
Since the search did not visit all the nodes,
we can conclude that the graph is not connected.
In a similar way, we can also find all connected components
of a graph by iterating through the nodes and always
starting a new depth-first search if the current node
does not belong to any component yet.
\subsubsection{Finding cycles}
\index{cycle}
A graph contains a cycle if during a graph traversal,
we find a node whose neighbor (other than the
previous node in the current path) has already been
visited.
For example, the graph
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (7,5) {$2$};
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (5) at (7,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (3) -- (5);
\end{tikzpicture}
\end{center}
contains two cycles and we can find one
of them as follows:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (2) at (7,5) {$2$};
\node[draw, circle,fill=lightgray] (1) at (3,5) {$1$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle,fill=lightgray] (5) at (7,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (3) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
After moving from node 2 to node 5 we notice that
the neighbor 3 of node 5 has already been visited.
Thus, the graph contains a cycle that goes through node 3,
for example, $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$.
Another way to find out whether a graph contains a cycle
is to simply calculate the number of nodes and edges
in every component.
If a component contains $c$ nodes and no cycle,
it must contain exactly $c-1$ edges
(so it has to be a tree).
If there are $c$ or more edges, the component
surely contains a cycle.
\subsubsection{Bipartiteness check}
\index{bipartite graph}
A graph is bipartite if its nodes can be colored
using two colors so that there are no adjacent
nodes with the same color.
It is surprisingly easy to check if a graph
is bipartite using graph traversal algorithms.
The idea is to color the starting node blue,
all its neighbors red, all their neighbors blue, and so on.
If at some point of the search we notice that
two adjacent nodes have the same color,
this means that the graph is not bipartite.
Otherwise the graph is bipartite and one coloring
has been found.
For example, the graph
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (5,5) {$2$};
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (3) at (7,4) {$3$};
\node[draw, circle] (5) at (5,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (4);
\path[draw,thick,-] (4) -- (1);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (5) -- (3);
\end{tikzpicture}
\end{center}
is not bipartite, because a search from node 1
proceeds as follows:
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=red!40] (2) at (5,5) {$2$};
\node[draw, circle,fill=blue!40] (1) at (3,5) {$1$};
\node[draw, circle,fill=blue!40] (3) at (7,4) {$3$};
\node[draw, circle,fill=red!40] (5) at (5,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (4);
\path[draw,thick,-] (4) -- (1);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (5) -- (3);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (5);
\path[draw=red,thick,->,line width=2pt] (5) -- (2);
\end{tikzpicture}
\end{center}
We notice that the color of both nodes 2 and 5
is red, while they are adjacent nodes in the graph.
Thus, the graph is not bipartite.
This algorithm always works, because when there
are only two colors available,
the color of the starting node in a component
determines the colors of all other nodes in the component.
It does not make any difference whether the
starting node is red or blue.
Note that in the general case,
it is difficult to find out if the nodes
in a graph can be colored using $k$ colors
so that no adjacent nodes have the same color.
Even when $k=3$, no efficient algorithm is known
but the problem is NP-hard.

802
chapter13.tex Normal file
View File

@ -0,0 +1,802 @@
\chapter{Shortest paths}
\index{shortest path}
Finding a shortest path between two nodes
of a graph
is an important problem that has many
practical applications.
For example, a natural problem related to a road network
is to calculate the shortest possible length of a route
between two cities, given the lengths of the roads.
In an unweighted graph, the length of a path equals
the number of its edges, and we can
simply use breadth-first search to find
a shortest path.
However, in this chapter we focus on
weighted graphs
where more sophisticated algorithms
are needed
for finding shortest paths.
\section{BellmanFord algorithm}
\index{BellmanFord algorithm}
The \key{BellmanFord algorithm}\footnote{The algorithm is named after
R. E. Bellman and L. R. Ford who published it independently
in 1958 and 1956, respectively \cite{bel58,for56a}.} finds
shortest paths from a starting node to all
nodes of the graph.
The algorithm can process all kinds of graphs,
provided that the graph does not contain a
cycle with negative length.
If the graph contains a negative cycle,
the algorithm can detect this.
The algorithm keeps track of distances
from the starting node to all nodes of the graph.
Initially, the distance to the starting node is 0
and the distance to all other nodes in infinite.
The algorithm reduces the distances by finding
edges that shorten the paths until it is not
possible to reduce any distance.
\subsubsection{Example}
Let us consider how the BellmanFord algorithm
works in the following graph:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {1};
\node[draw, circle] (2) at (4,3) {2};
\node[draw, circle] (3) at (1,1) {3};
\node[draw, circle] (4) at (4,1) {4};
\node[draw, circle] (5) at (6,2) {6};
\node[color=red] at (1,3+0.55) {$0$};
\node[color=red] at (4,3+0.55) {$\infty$};
\node[color=red] at (1,1-0.55) {$\infty$};
\node[color=red] at (4,1-0.55) {$\infty$};
\node[color=red] at (6,2-0.55) {$\infty$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:3] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:1] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:3] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:2] {} (5);
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (4);
\end{tikzpicture}
\end{center}
Each node of the graph is assigned a distance.
Initially, the distance to the starting node is 0,
and the distance to all other nodes is infinite.
The algorithm searches for edges that reduce distances.
First, all edges from node 1 reduce distances:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {1};
\node[draw, circle] (2) at (4,3) {2};
\node[draw, circle] (3) at (1,1) {3};
\node[draw, circle] (4) at (4,1) {4};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.55) {$0$};
\node[color=red] at (4,3+0.55) {$5$};
\node[color=red] at (1,1-0.55) {$3$};
\node[color=red] at (4,1-0.55) {$7$};
\node[color=red] at (6,2-0.55) {$\infty$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:3] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:1] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:3] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:2] {} (5);
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (4);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (1) -- (4);
\end{tikzpicture}
\end{center}
After this, edges
$2 \rightarrow 5$ and $3 \rightarrow 4$
reduce distances:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {1};
\node[draw, circle] (2) at (4,3) {2};
\node[draw, circle] (3) at (1,1) {3};
\node[draw, circle] (4) at (4,1) {4};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.55) {$0$};
\node[color=red] at (4,3+0.55) {$5$};
\node[color=red] at (1,1-0.55) {$3$};
\node[color=red] at (4,1-0.55) {$4$};
\node[color=red] at (6,2-0.55) {$7$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:3] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:1] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:3] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:2] {} (5);
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (4);
\path[draw=red,thick,->,line width=2pt] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\end{tikzpicture}
\end{center}
Finally, there is one more change:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {1};
\node[draw, circle] (2) at (4,3) {2};
\node[draw, circle] (3) at (1,1) {3};
\node[draw, circle] (4) at (4,1) {4};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.55) {$0$};
\node[color=red] at (4,3+0.55) {$5$};
\node[color=red] at (1,1-0.55) {$3$};
\node[color=red] at (4,1-0.55) {$4$};
\node[color=red] at (6,2-0.55) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:3] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:1] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:3] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:2] {} (5);
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\end{tikzpicture}
\end{center}
After this, no edge can reduce any distance.
This means that the distances are final,
and we have successfully
calculated the shortest distances
from the starting node to all nodes of the graph.
For example, the shortest distance 3
from node 1 to node 5 corresponds to
the following path:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {1};
\node[draw, circle] (2) at (4,3) {2};
\node[draw, circle] (3) at (1,1) {3};
\node[draw, circle] (4) at (4,1) {4};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.55) {$0$};
\node[color=red] at (4,3+0.55) {$5$};
\node[color=red] at (1,1-0.55) {$3$};
\node[color=red] at (4,1-0.55) {$4$};
\node[color=red] at (6,2-0.55) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:3] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:1] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:3] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:2] {} (5);
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (4);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\end{tikzpicture}
\end{center}
\subsubsection{Implementation}
The following implementation of the
BellmanFord algorithm determines the shortest distances
from a node $x$ to all nodes of the graph.
The code assumes that the graph is stored
as an edge list \texttt{edges}
that consists of tuples of the form $(a,b,w)$,
meaning that there is an edge from node $a$ to node $b$
with weight $w$.
The algorithm consists of $n-1$ rounds,
and on each round the algorithm goes through
all edges of the graph and tries to
reduce the distances.
The algorithm constructs an array \texttt{distance}
that will contain the distances from $x$
to all nodes of the graph.
The constant \texttt{INF} denotes an infinite distance.
\begin{lstlisting}
for (int i = 1; i <= n; i++) distance[i] = INF;
distance[x] = 0;
for (int i = 1; i <= n-1; i++) {
for (auto e : edges) {
int a, b, w;
tie(a, b, w) = e;
distance[b] = min(distance[b], distance[a]+w);
}
}
\end{lstlisting}
The time complexity of the algorithm is $O(nm)$,
because the algorithm consists of $n-1$ rounds and
iterates through all $m$ edges during a round.
If there are no negative cycles in the graph,
all distances are final after $n-1$ rounds,
because each shortest path can contain at most $n-1$ edges.
In practice, the final distances can usually
be found faster than in $n-1$ rounds.
Thus, a possible way to make the algorithm more efficient
is to stop the algorithm if no distance
can be reduced during a round.
\subsubsection{Negative cycles}
\index{negative cycle}
The BellmanFord algorithm can also be used to
check if the graph contains a cycle with negative length.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (2,-1) {$3$};
\node[draw, circle] (4) at (4,0) {$4$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:$3$] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:$1$] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:$5$] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:$-7$] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=right:$2$] {} (3);
\end{tikzpicture}
\end{center}
\noindent
contains a negative cycle
$2 \rightarrow 3 \rightarrow 4 \rightarrow 2$
with length $-4$.
If the graph contains a negative cycle,
we can shorten infinitely many times
any path that contains the cycle by repeating the cycle
again and again.
Thus, the concept of a shortest path
is not meaningful in this situation.
A negative cycle can be detected
using the BellmanFord algorithm by
running the algorithm for $n$ rounds.
If the last round reduces any distance,
the graph contains a negative cycle.
Note that this algorithm can be used to
search for
a negative cycle in the whole graph
regardless of the starting node.
\subsubsection{SPFA algorithm}
\index{SPFA algorithm}
The \key{SPFA algorithm} (''Shortest Path Faster Algorithm'') \cite{fan94}
is a variant of the BellmanFord algorithm,
that is often more efficient than the original algorithm.
The SPFA algorithm does not go through all the edges on each round,
but instead, it chooses the edges to be examined
in a more intelligent way.
The algorithm maintains a queue of nodes that might
be used for reducing the distances.
First, the algorithm adds the starting node $x$
to the queue.
Then, the algorithm always processes the
first node in the queue, and when an edge
$a \rightarrow b$ reduces a distance,
node $b$ is added to the queue.
%
% The following implementation uses a
% \texttt{queue} \texttt{q}.
% In addition, an array \texttt{inqueue} indicates
% if a node is already in the queue,
% in which case the algorithm does not add
% the node to the queue again.
%
% \begin{lstlisting}
% for (int i = 1; i <= n; i++) distance[i] = INF;
% distance[x] = 0;
% q.push(x);
% while (!q.empty()) {
% int a = q.front(); q.pop();
% inqueue[a] = false;
% for (auto b : v[a]) {
% if (distance[a]+b.second < distance[b.first]) {
% distance[b.first] = distance[a]+b.second;
% if (!inqueue[b]) {q.push(b); inqueue[b] = true;}
% }
% }
% }
% \end{lstlisting}
The efficiency of the SPFA algorithm depends
on the structure of the graph:
the algorithm is often efficient,
but its worst case time complexity is still
$O(nm)$ and it is possible to create inputs
that make the algorithm as slow as the
original BellmanFord algorithm.
\section{Dijkstra's algorithm}
\index{Dijkstra's algorithm}
\key{Dijkstra's algorithm}\footnote{E. W. Dijkstra published the algorithm in 1959 \cite{dij59};
however, his original paper does not mention how to implement the algorithm efficiently.}
finds shortest
paths from the starting node to all nodes of the graph,
like the BellmanFord algorithm.
The benefit of Dijsktra's algorithm is that
it is more efficient and can be used for
processing large graphs.
However, the algorithm requires that there
are no negative weight edges in the graph.
Like the BellmanFord algorithm,
Dijkstra's algorithm maintains distances
to the nodes and reduces them during the search.
Dijkstra's algorithm is efficient, because
it only processes
each edge in the graph once, using the fact
that there are no negative edges.
\subsubsection{Example}
Let us consider how Dijkstra's algorithm
works in the following graph when the
starting node is node 1:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {3};
\node[draw, circle] (2) at (4,3) {4};
\node[draw, circle] (3) at (1,1) {2};
\node[draw, circle] (4) at (4,1) {1};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.6) {$\infty$};
\node[color=red] at (4,3+0.6) {$\infty$};
\node[color=red] at (1,1-0.6) {$\infty$};
\node[color=red] at (4,1-0.6) {$0$};
\node[color=red] at (6,2-0.6) {$\infty$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:6] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\end{tikzpicture}
\end{center}
Like in the BellmanFord algorithm,
initially the distance to the starting node is 0
and the distance to all other nodes is infinite.
At each step, Dijkstra's algorithm selects a node
that has not been processed yet and whose distance
is as small as possible.
The first such node is node 1 with distance 0.
When a node is selected, the algorithm
goes through all edges that start at the node
and reduces the distances using them:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {3};
\node[draw, circle] (2) at (4,3) {4};
\node[draw, circle] (3) at (1,1) {2};
\node[draw, circle, fill=lightgray] (4) at (4,1) {1};
\node[draw, circle] (5) at (6,2) {5};
\node[color=red] at (1,3+0.6) {$\infty$};
\node[color=red] at (4,3+0.6) {$9$};
\node[color=red] at (1,1-0.6) {$5$};
\node[color=red] at (4,1-0.6) {$0$};
\node[color=red] at (6,2-0.6) {$1$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:6] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw=red,thick,->,line width=2pt] (4) -- (2);
\path[draw=red,thick,->,line width=2pt] (4) -- (3);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\end{tikzpicture}
\end{center}
In this case,
the edges from node 1 reduced the distances of
nodes 2, 4 and 5, whose distances are now 5, 9 and 1.
The next node to be processed is node 5 with distance 1.
This reduces the distance to node 4 from 9 to 3:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,3) {3};
\node[draw, circle] (2) at (4,3) {4};
\node[draw, circle] (3) at (1,1) {2};
\node[draw, circle, fill=lightgray] (4) at (4,1) {1};
\node[draw, circle, fill=lightgray] (5) at (6,2) {5};
\node[color=red] at (1,3+0.6) {$\infty$};
\node[color=red] at (4,3+0.6) {$3$};
\node[color=red] at (1,1-0.6) {$5$};
\node[color=red] at (4,1-0.6) {$0$};
\node[color=red] at (6,2-0.6) {$1$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:6] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- (2);
\end{tikzpicture}
\end{center}
After this, the next node is node 4, which reduces
the distance to node 3 to 9:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {3};
\node[draw, circle, fill=lightgray] (2) at (4,3) {4};
\node[draw, circle] (3) at (1,1) {2};
\node[draw, circle, fill=lightgray] (4) at (4,1) {1};
\node[draw, circle, fill=lightgray] (5) at (6,2) {5};
\node[color=red] at (1,3+0.6) {$9$};
\node[color=red] at (4,3+0.6) {$3$};
\node[color=red] at (1,1-0.6) {$5$};
\node[color=red] at (4,1-0.6) {$0$};
\node[color=red] at (6,2-0.6) {$1$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:6] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw=red,thick,->,line width=2pt] (2) -- (1);
\end{tikzpicture}
\end{center}
A remarkable property in Dijkstra's algorithm is that
whenever a node is selected, its distance is final.
For example, at this point of the algorithm,
the distances 0, 1 and 3 are the final distances
to nodes 1, 5 and 4.
After this, the algorithm processes the two
remaining nodes, and the final distances are as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle, fill=lightgray] (1) at (1,3) {3};
\node[draw, circle, fill=lightgray] (2) at (4,3) {4};
\node[draw, circle, fill=lightgray] (3) at (1,1) {2};
\node[draw, circle, fill=lightgray] (4) at (4,1) {1};
\node[draw, circle, fill=lightgray] (5) at (6,2) {5};
\node[color=red] at (1,3+0.6) {$7$};
\node[color=red] at (4,3+0.6) {$3$};
\node[color=red] at (1,1-0.6) {$5$};
\node[color=red] at (4,1-0.6) {$0$};
\node[color=red] at (6,2-0.6) {$1$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:6] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\end{tikzpicture}
\end{center}
\subsubsection{Negative edges}
The efficiency of Dijkstra's algorithm is
based on the fact that the graph does not
contain negative edges.
If there is a negative edge,
the algorithm may give incorrect results.
As an example, consider the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (2,-1) {$3$};
\node[draw, circle] (4) at (4,0) {$4$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:2] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:3] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:6] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:$-5$] {} (4);
\end{tikzpicture}
\end{center}
\noindent
The shortest path from node 1 to node 4 is
$1 \rightarrow 3 \rightarrow 4$
and its length is 1.
However, Dijkstra's algorithm
finds the path $1 \rightarrow 2 \rightarrow 4$
by following the minimum weight edges.
The algorithm does not take into account that
on the other path, the weight $-5$
compensates the previous large weight $6$.
\subsubsection{Implementation}
The following implementation of Dijkstra's algorithm
calculates the minimum distances from a node $x$
to other nodes of the graph.
The graph is stored as adjacency lists
so that \texttt{adj[$a$]} contains a pair $(b,w)$
always when there is an edge from node $a$ to node $b$
with weight $w$.
An efficient implementation of Dijkstra's algorithm
requires that it is possible to efficiently find the
minimum distance node that has not been processed.
An appropriate data structure for this is a priority queue
that contains the nodes ordered by their distances.
Using a priority queue, the next node to be processed
can be retrieved in logarithmic time.
In the following code, the priority queue
\texttt{q} contains pairs of the form $(-d,x)$,
meaning that the current distance to node $x$ is $d$.
The array $\texttt{distance}$ contains the distance to
each node, and the array $\texttt{processed}$ indicates
whether a node has been processed.
Initially the distance is $0$ to $x$ and $\infty$ to all other nodes.
\begin{lstlisting}
for (int i = 1; i <= n; i++) distance[i] = INF;
distance[x] = 0;
q.push({0,x});
while (!q.empty()) {
int a = q.top().second; q.pop();
if (processed[a]) continue;
processed[a] = true;
for (auto u : adj[a]) {
int b = u.first, w = u.second;
if (distance[a]+w < distance[b]) {
distance[b] = distance[a]+w;
q.push({-distance[b],b});
}
}
}
\end{lstlisting}
Note that the priority queue contains \emph{negative}
distances to nodes.
The reason for this is that the
default version of the C++ priority queue finds maximum
elements, while we want to find minimum elements.
By using negative distances,
we can directly use the default priority queue\footnote{Of
course, we could also declare the priority queue as in Chapter 4.5
and use positive distances, but the implementation would be a bit longer.}.
Also note that there may be several instances of the same
node in the priority queue; however, only the instance with the
minimum distance will be processed.
The time complexity of the above implementation is
$O(n+m \log m)$, because the algorithm goes through
all nodes of the graph and adds for each edge
at most one distance to the priority queue.
\section{FloydWarshall algorithm}
\index{FloydWarshall algorithm}
The \key{FloydWarshall algorithm}\footnote{The algorithm
is named after R. W. Floyd and S. Warshall
who published it independently in 1962 \cite{flo62,war62}.}
provides an alternative way to approach the problem
of finding shortest paths.
Unlike the other algorithms of this chapter,
it finds all shortest paths between the nodes
in a single run.
The algorithm maintains a two-dimensional array
that contains distances between the nodes.
First, distances are calculated only using
direct edges between the nodes,
and after this, the algorithm reduces distances
by using intermediate nodes in paths.
\subsubsection{Example}
Let us consider how the FloydWarshall algorithm
works in the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$3$};
\node[draw, circle] (2) at (4,3) {$4$};
\node[draw, circle] (3) at (1,1) {$2$};
\node[draw, circle] (4) at (4,1) {$1$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\end{tikzpicture}
\end{center}
Initially, the distance from each node to itself is $0$,
and the distance between nodes $a$ and $b$ is $x$
if there is an edge between nodes $a$ and $b$ with weight $x$.
All other distances are infinite.
In this graph, the initial array is as follows:
\begin{center}
\begin{tabular}{r|rrrrr}
& 1 & 2 & 3 & 4 & 5 \\
\hline
1 & 0 & 5 & $\infty$ & 9 & 1 \\
2 & 5 & 0 & 2 & $\infty$ & $\infty$ \\
3 & $\infty$ & 2 & 0 & 7 & $\infty$ \\
4 & 9 & $\infty$ & 7 & 0 & 2 \\
5 & 1 & $\infty$ & $\infty$ & 2 & 0 \\
\end{tabular}
\end{center}
\vspace{10pt}
The algorithm consists of consecutive rounds.
On each round, the algorithm selects a new node
that can act as an intermediate node in paths from now on,
and distances are reduced using this node.
On the first round, node 1 is the new intermediate node.
There is a new path between nodes 2 and 4
with length 14, because node 1 connects them.
There is also a new path
between nodes 2 and 5 with length 6.
\begin{center}
\begin{tabular}{r|rrrrr}
& 1 & 2 & 3 & 4 & 5 \\
\hline
1 & 0 & 5 & $\infty$ & 9 & 1 \\
2 & 5 & 0 & 2 & \textbf{14} & \textbf{6} \\
3 & $\infty$ & 2 & 0 & 7 & $\infty$ \\
4 & 9 & \textbf{14} & 7 & 0 & 2 \\
5 & 1 & \textbf{6} & $\infty$ & 2 & 0 \\
\end{tabular}
\end{center}
\vspace{10pt}
On the second round, node 2 is the new intermediate node.
This creates new paths between nodes 1 and 3
and between nodes 3 and 5:
\begin{center}
\begin{tabular}{r|rrrrr}
& 1 & 2 & 3 & 4 & 5 \\
\hline
1 & 0 & 5 & \textbf{7} & 9 & 1 \\
2 & 5 & 0 & 2 & 14 & 6 \\
3 & \textbf{7} & 2 & 0 & 7 & \textbf{8} \\
4 & 9 & 14 & 7 & 0 & 2 \\
5 & 1 & 6 & \textbf{8} & 2 & 0 \\
\end{tabular}
\end{center}
\vspace{10pt}
On the third round, node 3 is the new intermediate round.
There is a new path between nodes 2 and 4:
\begin{center}
\begin{tabular}{r|rrrrr}
& 1 & 2 & 3 & 4 & 5 \\
\hline
1 & 0 & 5 & 7 & 9 & 1 \\
2 & 5 & 0 & 2 & \textbf{9} & 6 \\
3 & 7 & 2 & 0 & 7 & 8 \\
4 & 9 & \textbf{9} & 7 & 0 & 2 \\
5 & 1 & 6 & 8 & 2 & 0 \\
\end{tabular}
\end{center}
\vspace{10pt}
The algorithm continues like this,
until all nodes have been appointed intermediate nodes.
After the algorithm has finished, the array contains
the minimum distances between any two nodes:
\begin{center}
\begin{tabular}{r|rrrrr}
& 1 & 2 & 3 & 4 & 5 \\
\hline
1 & 0 & 5 & 7 & 3 & 1 \\
2 & 5 & 0 & 2 & 8 & 6 \\
3 & 7 & 2 & 0 & 7 & 8 \\
4 & 3 & 8 & 7 & 0 & 2 \\
5 & 1 & 6 & 8 & 2 & 0 \\
\end{tabular}
\end{center}
For example, the array tells us that the
shortest distance between nodes 2 and 4 is 8.
This corresponds to the following path:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$3$};
\node[draw, circle] (2) at (4,3) {$4$};
\node[draw, circle] (3) at (1,1) {$2$};
\node[draw, circle] (4) at (4,1) {$1$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:7] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:2] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=below:5] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:9] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (5);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (5) -- (2);
\end{tikzpicture}
\end{center}
\subsubsection{Implementation}
The advantage of the
FloydWarshall algorithm that it is
easy to implement.
The following code constructs a
distance matrix where $\texttt{distance}[a][b]$
is the shortest distance between nodes $a$ and $b$.
First, the algorithm initializes \texttt{distance}
using the adjacency matrix \texttt{adj} of the graph:
\begin{lstlisting}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= n; j++) {
if (i == j) distance[i][j] = 0;
else if (adj[i][j]) distance[i][j] = adj[i][j];
else distance[i][j] = INF;
}
}
\end{lstlisting}
After this, the shortest distances can be found as follows:
\begin{lstlisting}
for (int k = 1; k <= n; k++) {
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= n; j++) {
distance[i][j] = min(distance[i][j],
distance[i][k]+distance[k][j]);
}
}
}
\end{lstlisting}
The time complexity of the algorithm is $O(n^3)$,
because it contains three nested loops
that go through the nodes of the graph.
Since the implementation of the FloydWarshall
algorithm is simple, the algorithm can be
a good choice even if it is only needed to find a
single shortest path in the graph.
However, the algorithm can only be used when the graph
is so small that a cubic time complexity is fast enough.

609
chapter14.tex Normal file
View File

@ -0,0 +1,609 @@
\chapter{Tree algorithms}
\index{tree}
A \key{tree} is a connected, acyclic graph
that consists of $n$ nodes and $n-1$ edges.
Removing any edge from a tree divides it
into two components,
and adding any edge to a tree creates a cycle.
Moreover, there is always a unique path between any
two nodes of a tree.
For example, the following tree consists of 8 nodes and 7 edges:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$4$};
\node[draw, circle] (3) at (0,1) {$2$};
\node[draw, circle] (4) at (2,1) {$3$};
\node[draw, circle] (5) at (4,1) {$7$};
\node[draw, circle] (6) at (-2,3) {$5$};
\node[draw, circle] (7) at (-2,1) {$6$};
\node[draw, circle] (8) at (-4,1) {$8$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,-] (7) -- (8);
\end{tikzpicture}
\end{center}
\index{leaf}
The \key{leaves} of a tree are the nodes
with degree 1, i.e., with only one neighbor.
For example, the leaves of the above tree
are nodes 3, 5, 7 and 8.
\index{root}
\index{rooted tree}
In a \key{rooted} tree, one of the nodes
is appointed the \key{root} of the tree,
and all other nodes are
placed underneath the root.
For example, in the following tree,
node 1 is the root node.
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (4) at (2,1) {$4$};
\node[draw, circle] (2) at (-2,1) {$2$};
\node[draw, circle] (3) at (0,1) {$3$};
\node[draw, circle] (7) at (2,-1) {$7$};
\node[draw, circle] (5) at (-3,-1) {$5$};
\node[draw, circle] (6) at (-1,-1) {$6$};
\node[draw, circle] (8) at (-1,-3) {$8$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (4) -- (7);
\path[draw,thick,-] (6) -- (8);
\end{tikzpicture}
\end{center}
\index{child}
\index{parent}
In a rooted tree, the \key{children} of a node
are its lower neighbors, and the \key{parent} of a node
is its upper neighbor.
Each node has exactly one parent,
except for the root that does not have a parent.
For example, in the above tree,
the children of node 2 are nodes 5 and 6,
and its parent is node 1.
\index{subtree}
The structure of a rooted tree is \emph{recursive}:
each node of the tree acts as the root of a \key{subtree}
that contains the node itself and all nodes
that are in the subtrees of its children.
For example, in the above tree, the subtree of node 2
consists of nodes 2, 5, 6 and 8:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (2) at (-2,1) {$2$};
\node[draw, circle] (5) at (-3,-1) {$5$};
\node[draw, circle] (6) at (-1,-1) {$6$};
\node[draw, circle] (8) at (-1,-3) {$8$};
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (6) -- (8);
\end{tikzpicture}
\end{center}
\section{Tree traversal}
General graph traversal algorithms
can be used to traverse the nodes of a tree.
However, the traversal of a tree is easier to implement than
that of a general graph, because
there are no cycles in the tree and it is not
possible to reach a node from multiple directions.
The typical way to traverse a tree is to start
a depth-first search at an arbitrary node.
The following recursive function can be used:
\begin{lstlisting}
void dfs(int s, int e) {
// process node s
for (auto u : adj[s]) {
if (u != e) dfs(u, s);
}
}
\end{lstlisting}
The function is given two parameters: the current node $s$
and the previous node $e$.
The purpose of the parameter $e$ is to make sure
that the search only moves to nodes
that have not been visited yet.
The following function call starts the search
at node $x$:
\begin{lstlisting}
dfs(x, 0);
\end{lstlisting}
In the first call $e=0$, because there is no
previous node, and it is allowed
to proceed to any direction in the tree.
\subsubsection{Dynamic programming}
Dynamic programming can be used to calculate
some information during a tree traversal.
Using dynamic programming, we can, for example,
calculate in $O(n)$ time for each node of a rooted tree the
number of nodes in its subtree
or the length of the longest path from the node
to a leaf.
As an example, let us calculate for each node $s$
a value $\texttt{count}[s]$: the number of nodes in its subtree.
The subtree contains the node itself and
all nodes in the subtrees of its children,
so we can calculate the number of nodes
recursively using the following code:
\begin{lstlisting}
void dfs(int s, int e) {
count[s] = 1;
for (auto u : adj[s]) {
if (u == e) continue;
dfs(u, s);
count[s] += count[u];
}
}
\end{lstlisting}
\section{Diameter}
\index{diameter}
The \key{diameter} of a tree
is the maximum length of a path between two nodes.
For example, consider the following tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$4$};
\node[draw, circle] (3) at (0,1) {$2$};
\node[draw, circle] (4) at (2,1) {$3$};
\node[draw, circle] (5) at (4,1) {$7$};
\node[draw, circle] (6) at (-2,3) {$5$};
\node[draw, circle] (7) at (-2,1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
The diameter of this tree is 4,
which corresponds to the following path:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$4$};
\node[draw, circle] (3) at (0,1) {$2$};
\node[draw, circle] (4) at (2,1) {$3$};
\node[draw, circle] (5) at (4,1) {$7$};
\node[draw, circle] (6) at (-2,3) {$5$};
\node[draw, circle] (7) at (-2,1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
Note that there may be several maximum-length paths.
In the above path, we could replace node 6 with node 5
to obtain another path with length 4.
Next we will discuss two $O(n)$ time algorithms
for calculating the diameter of a tree.
The first algorithm is based on dynamic programming,
and the second algorithm uses two depth-first searches.
\subsubsection{Algorithm 1}
A general way to approach many tree problems
is to first root the tree arbitrarily.
After this, we can try to solve the problem
separately for each subtree.
Our first algorithm for calculating the diameter
is based on this idea.
An important observation is that every path
in a rooted tree has a \emph{highest point}:
the highest node that belongs to the path.
Thus, we can calculate for each node the length
of the longest path whose highest point is the node.
One of those paths corresponds to the diameter of the tree.
For example, in the following tree,
node 1 is the highest point on the path
that corresponds to the diameter:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$4$};
\node[draw, circle] (3) at (-2,1) {$2$};
\node[draw, circle] (4) at (0,1) {$3$};
\node[draw, circle] (5) at (2,-1) {$7$};
\node[draw, circle] (6) at (-3,-1) {$5$};
\node[draw, circle] (7) at (-1,-1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
We calculate for each node $x$ two values:
\begin{itemize}
\item $\texttt{toLeaf}(x)$: the maximum length of a path from $x$ to any leaf
\item $\texttt{maxLength}(x)$: the maximum length of a path
whose highest point is $x$
\end{itemize}
For example, in the above tree,
$\texttt{toLeaf}(1)=2$, because there is a path
$1 \rightarrow 2 \rightarrow 6$,
and $\texttt{maxLength}(1)=4$,
because there is a path
$6 \rightarrow 2 \rightarrow 1 \rightarrow 4 \rightarrow 7$.
In this case, $\texttt{maxLength}(1)$ equals the diameter.
Dynamic programming can be used to calculate the above
values for all nodes in $O(n)$ time.
First, to calculate $\texttt{toLeaf}(x)$,
we go through the children of $x$,
choose a child $c$ with maximum $\texttt{toLeaf}(c)$
and add one to this value.
Then, to calculate $\texttt{maxLength}(x)$,
we choose two distinct children $a$ and $b$
such that the sum $\texttt{toLeaf}(a)+\texttt{toLeaf}(b)$
is maximum and add two to this sum.
\subsubsection{Algorithm 2}
Another efficient way to calculate the diameter
of a tree is based on two depth-first searches.
First, we choose an arbitrary node $a$ in the tree
and find the farthest node $b$ from $a$.
Then, we find the farthest node $c$ from $b$.
The diameter of the tree is the distance between $b$ and $c$.
In the following graph, $a$, $b$ and $c$ could be:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$4$};
\node[draw, circle] (3) at (0,1) {$2$};
\node[draw, circle] (4) at (2,1) {$3$};
\node[draw, circle] (5) at (4,1) {$7$};
\node[draw, circle] (6) at (-2,3) {$5$};
\node[draw, circle] (7) at (-2,1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\node[color=red] at (2,1.6) {$a$};
\node[color=red] at (-2,1.6) {$b$};
\node[color=red] at (4,1.6) {$c$};
\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
This is an elegant method, but why does it work?
It helps to draw the tree differently so that
the path that corresponds to the diameter
is horizontal, and all other
nodes hang from it:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (2,1) {$1$};
\node[draw, circle] (2) at (4,1) {$4$};
\node[draw, circle] (3) at (0,1) {$2$};
\node[draw, circle] (4) at (2,-1) {$3$};
\node[draw, circle] (5) at (6,1) {$7$};
\node[draw, circle] (6) at (0,-1) {$5$};
\node[draw, circle] (7) at (-2,1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\node[color=red] at (2,-1.6) {$a$};
\node[color=red] at (-2,1.6) {$b$};
\node[color=red] at (6,1.6) {$c$};
\node[color=red] at (2,1.6) {$x$};
\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
Node $x$ indicates the place where the path
from node $a$ joins the path that corresponds
to the diameter.
The farthest node from $a$
is node $b$, node $c$ or some other node
that is at least as far from node $x$.
Thus, this node is always a valid choice for
an endpoint of a path that corresponds to the diameter.
\section{All longest paths}
Our next problem is to calculate for every node
in the tree the maximum length of a path
that begins at the node.
This can be seen as a generalization of the
tree diameter problem, because the largest of those
lengths equals the diameter of the tree.
Also this problem can be solved in $O(n)$ time.
As an example, consider the following tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (-1.5,-1) {$4$};
\node[draw, circle] (3) at (2,0) {$2$};
\node[draw, circle] (4) at (-1.5,1) {$3$};
\node[draw, circle] (6) at (3.5,-1) {$6$};
\node[draw, circle] (7) at (3.5,1) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
Let $\texttt{maxLength}(x)$ denote the maximum length
of a path that begins at node $x$.
For example, in the above tree,
$\texttt{maxLength}(4)=3$, because there
is a path $4 \rightarrow 1 \rightarrow 2 \rightarrow 6$.
Here is a complete table of the values:
\begin{center}
\begin{tabular}{l|lllllll}
node $x$ & 1 & 2 & 3 & 4 & 5 & 6 \\
$\texttt{maxLength}(x)$ & 2 & 2 & 3 & 3 & 3 & 3 \\
\end{tabular}
\end{center}
Also in this problem, a good starting point
for solving the problem is to root the tree arbitrarily:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$4$};
\node[draw, circle] (3) at (-2,1) {$2$};
\node[draw, circle] (4) at (0,1) {$3$};
\node[draw, circle] (6) at (-3,-1) {$5$};
\node[draw, circle] (7) at (-1,-1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
The first part of the problem is to calculate for every node $x$
the maximum length of a path that goes through a child of $x$.
For example, the longest path from node 1
goes through its child 2:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$4$};
\node[draw, circle] (3) at (-2,1) {$2$};
\node[draw, circle] (4) at (0,1) {$3$};
\node[draw, circle] (6) at (-3,-1) {$5$};
\node[draw, circle] (7) at (-1,-1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,->,color=red,line width=2pt] (1) -- (3);
\path[draw,thick,->,color=red,line width=2pt] (3) -- (6);
\end{tikzpicture}
\end{center}
This part is easy to solve in $O(n)$ time, because we can use
dynamic programming as we have done previously.
Then, the second part of the problem is to calculate
for every node $x$ the maximum length of a path
through its parent $p$.
For example, the longest path
from node 3 goes through its parent 1:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$4$};
\node[draw, circle] (3) at (-2,1) {$2$};
\node[draw, circle] (4) at (0,1) {$3$};
\node[draw, circle] (6) at (-3,-1) {$5$};
\node[draw, circle] (7) at (-1,-1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,->,color=red,line width=2pt] (4) -- (1);
\path[draw,thick,->,color=red,line width=2pt] (1) -- (3);
\path[draw,thick,->,color=red,line width=2pt] (3) -- (6);
\end{tikzpicture}
\end{center}
At first glance, it seems that we should choose
the longest path from $p$.
However, this \emph{does not} always work,
because the longest path from $p$
may go through $x$.
Here is an example of this situation:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$4$};
\node[draw, circle] (3) at (-2,1) {$2$};
\node[draw, circle] (4) at (0,1) {$3$};
\node[draw, circle] (6) at (-3,-1) {$5$};
\node[draw, circle] (7) at (-1,-1) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\path[draw,thick,->,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,->,color=red,line width=2pt] (1) -- (2);
\end{tikzpicture}
\end{center}
Still, we can solve the second part in
$O(n)$ time by storing \emph{two} maximum lengths
for each node $x$:
\begin{itemize}
\item $\texttt{maxLength}_1(x)$:
the maximum length of a path from $x$
\item $\texttt{maxLength}_2(x)$
the maximum length of a path from $x$
in another direction than the first path
\end{itemize}
For example, in the above graph,
$\texttt{maxLength}_1(1)=2$
using the path $1 \rightarrow 2 \rightarrow 5$,
and $\texttt{maxLength}_2(1)=1$
using the path $1 \rightarrow 3$.
Finally, if the path that corresponds to
$\texttt{maxLength}_1(p)$ goes through $x$,
we conclude that the maximum length is
$\texttt{maxLength}_2(p)+1$,
and otherwise the maximum length is
$\texttt{maxLength}_1(p)+1$.
\section{Binary trees}
\index{binary tree}
\begin{samepage}
A \key{binary tree} is a rooted tree
where each node has a left and right subtree.
It is possible that a subtree of a node is empty.
Thus, every node in a binary tree has
zero, one or two children.
For example, the following tree is a binary tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (-1.5,-1.5) {$2$};
\node[draw, circle] (3) at (1.5,-1.5) {$3$};
\node[draw, circle] (4) at (-3,-3) {$4$};
\node[draw, circle] (5) at (0,-3) {$5$};
\node[draw, circle] (6) at (-1.5,-4.5) {$6$};
\node[draw, circle] (7) at (3,-3) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
\end{samepage}
\index{pre-order}
\index{in-order}
\index{post-order}
The nodes of a binary tree have three natural
orderings that correspond to different ways to
recursively traverse the tree:
\begin{itemize}
\item \key{pre-order}: first process the root,
then traverse the left subtree, then traverse the right subtree
\item \key{in-order}: first traverse the left subtree,
then process the root, then traverse the right subtree
\item \key{post-order}: first traverse the left subtree,
then traverse the right subtree, then process the root
\end{itemize}
For the above tree, the nodes in
pre-order are
$[1,2,4,5,6,3,7]$,
in in-order $[4,2,6,5,1,3,7]$
and in post-order $[4,6,5,2,7,3,1]$.
If we know the pre-order and in-order
of a tree, we can reconstruct the exact structure of the tree.
For example, the above tree is the only possible tree
with pre-order $[1,2,4,5,6,3,7]$ and
in-order $[4,2,6,5,1,3,7]$.
In a similar way, the post-order and in-order
also determine the structure of a tree.
However, the situation is different if we only know
the pre-order and post-order of a tree.
In this case, there may be more than one tree
that match the orderings.
For example, in both of the trees
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (-1.5,-1.5) {$2$};
\path[draw,thick,-] (1) -- (2);
\node[draw, circle] (1b) at (0+4,0) {$1$};
\node[draw, circle] (2b) at (1.5+4,-1.5) {$2$};
\path[draw,thick,-] (1b) -- (2b);
\end{tikzpicture}
\end{center}
the pre-order is $[1,2]$ and the post-order is $[2,1]$,
but the structures of the trees are different.

712
chapter15.tex Normal file
View File

@ -0,0 +1,712 @@
\chapter{Spanning trees}
\index{spanning tree}
A \key{spanning tree} of a graph consists of
all nodes of the graph and some of the
edges of the graph so that there is a path
between any two nodes.
Like trees in general, spanning trees are
connected and acyclic.
Usually there are several ways to construct a spanning tree.
For example, consider the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
One spanning tree for the graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
The weight of a spanning tree is the sum of its edge weights.
For example, the weight of the above spanning tree is
$3+5+9+3+2=22$.
\index{minimum spanning tree}
A \key{minimum spanning tree}
is a spanning tree whose weight is as small as possible.
The weight of a minimum spanning tree for the example graph
is 20, and such a tree can be constructed as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\index{maximum spanning tree}
In a similar way, a \key{maximum spanning tree}
is a spanning tree whose weight is as large as possible.
The weight of a maximum spanning tree for the
example graph is 32:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\end{tikzpicture}
\end{center}
Note that a graph may have several
minimum and maximum spanning trees,
so the trees are not unique.
It turns out that several greedy methods
can be used to construct minimum and maximum
spanning trees.
In this chapter, we discuss two algorithms
that process
the edges of the graph ordered by their weights.
We focus on finding minimum spanning trees,
but the same algorithms can find
maximum spanning trees by processing the edges in reverse order.
\section{Kruskal's algorithm}
\index{Kruskal's algorithm}
In \key{Kruskal's algorithm}\footnote{The algorithm was published in 1956
by J. B. Kruskal \cite{kru56}.}, the initial spanning tree
only contains the nodes of the graph
and does not contain any edges.
Then the algorithm goes through the edges
ordered by their weights, and always adds an edge
to the tree if it does not create a cycle.
The algorithm maintains the components
of the tree.
Initially, each node of the graph
belongs to a separate component.
Always when an edge is added to the tree,
two components are joined.
Finally, all nodes belong to the same component,
and a minimum spanning tree has been found.
\subsubsection{Example}
\begin{samepage}
Let us consider how Kruskal's algorithm processes the
following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
\begin{samepage}
The first step of the algorithm is to sort the
edges in increasing order of their weights.
The result is the following list:
\begin{tabular}{ll}
\\
edge & weight \\
\hline
5--6 & 2 \\
1--2 & 3 \\
3--6 & 3 \\
1--5 & 5 \\
2--3 & 5 \\
2--5 & 6 \\
4--6 & 7 \\
3--4 & 9 \\
\\
\end{tabular}
\end{samepage}
After this, the algorithm goes through the list
and adds each edge to the tree if it joins
two separate components.
Initially, each node is in its own component:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
The first edge to be added to the tree is
the edge 5--6 that creates a component $\{5,6\}$
by joining the components $\{5\}$ and $\{6\}$:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
After this, the edges 1--2, 3--6 and 1--5 are added in a similar way:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
After those steps, most components have been joined
and there are two components in the tree:
$\{1,2,3,5,6\}$ and $\{4\}$.
The next edge in the list is the edge 2--3,
but it will not be included in the tree, because
nodes 2 and 3 are already in the same component.
For the same reason, the edge 2--5 will not be included in the tree.
\begin{samepage}
Finally, the edge 4--6 will be included in the tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
After this, the algorithm will not add any
new edges, because the graph is connected
and there is a path between any two nodes.
The resulting graph is a minimum spanning tree
with weight $2+3+3+5+7=20$.
\subsubsection{Why does this work?}
It is a good question why Kruskal's algorithm works.
Why does the greedy strategy guarantee that we
will find a minimum spanning tree?
Let us see what happens if the minimum weight edge of
the graph is \emph{not} included in the spanning tree.
For example, suppose that a spanning tree
for the previous graph would not contain the
minimum weight edge 5--6.
We do not know the exact structure of such a spanning tree,
but in any case it has to contain some edges.
Assume that the tree would be as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-,dashed] (1) -- (2);
\path[draw,thick,-,dashed] (2) -- (5);
\path[draw,thick,-,dashed] (2) -- (3);
\path[draw,thick,-,dashed] (3) -- (4);
\path[draw,thick,-,dashed] (4) -- (6);
\end{tikzpicture}
\end{center}
However, it is not possible that the above tree
would be a minimum spanning tree for the graph.
The reason for this is that we can remove an edge
from the tree and replace it with the minimum weight edge 5--6.
This produces a spanning tree whose weight is
\emph{smaller}:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-,dashed] (1) -- (2);
\path[draw,thick,-,dashed] (2) -- (5);
\path[draw,thick,-,dashed] (3) -- (4);
\path[draw,thick,-,dashed] (4) -- (6);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\end{tikzpicture}
\end{center}
For this reason, it is always optimal
to include the minimum weight edge
in the tree to produce a minimum spanning tree.
Using a similar argument, we can show that it
is also optimal to add the next edge in weight order
to the tree, and so on.
Hence, Kruskal's algorithm works correctly and
always produces a minimum spanning tree.
\subsubsection{Implementation}
When implementing Kruskal's algorithm,
it is convenient to use
the edge list representation of the graph.
The first phase of the algorithm sorts the
edges in the list in $O(m \log m)$ time.
After this, the second phase of the algorithm
builds the minimum spanning tree as follows:
\begin{lstlisting}
for (...) {
if (!same(a,b)) unite(a,b);
}
\end{lstlisting}
The loop goes through the edges in the list
and always processes an edge $a$--$b$
where $a$ and $b$ are two nodes.
Two functions are needed:
the function \texttt{same} determines
if $a$ and $b$ are in the same component,
and the function \texttt{unite}
joins the components that contain $a$ and $b$.
The problem is how to efficiently implement
the functions \texttt{same} and \texttt{unite}.
One possibility is to implement the function
\texttt{same} as a graph traversal and check if
we can get from node $a$ to node $b$.
However, the time complexity of such a function
would be $O(n+m)$
and the resulting algorithm would be slow,
because the function \texttt{same} will be called for each edge in the graph.
We will solve the problem using a union-find structure
that implements both functions in $O(\log n)$ time.
Thus, the time complexity of Kruskal's algorithm
will be $O(m \log n)$ after sorting the edge list.
\section{Union-find structure}
\index{union-find structure}
A \key{union-find structure} maintains
a collection of sets.
The sets are disjoint, so no element
belongs to more than one set.
Two $O(\log n)$ time operations are supported:
the \texttt{unite} operation joins two sets,
and the \texttt{find} operation finds the representative
of the set that contains a given element\footnote{The structure presented here
was introduced in 1971 by J. D. Hopcroft and J. D. Ullman \cite{hop71}.
Later, in 1975, R. E. Tarjan studied a more sophisticated variant
of the structure \cite{tar75} that is discussed in many algorithm
textbooks nowadays.}.
\subsubsection{Structure}
In a union-find structure, one element in each set
is the representative of the set,
and there is a chain from any other element of the
set to the representative.
For example, assume that the sets are
$\{1,4,7\}$, $\{5\}$ and $\{2,3,6,8\}$:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (0,-1) {$1$};
\node[draw, circle] (2) at (7,0) {$2$};
\node[draw, circle] (3) at (7,-1.5) {$3$};
\node[draw, circle] (4) at (1,0) {$4$};
\node[draw, circle] (5) at (4,0) {$5$};
\node[draw, circle] (6) at (6,-2.5) {$6$};
\node[draw, circle] (7) at (2,-1) {$7$};
\node[draw, circle] (8) at (8,-2.5) {$8$};
\path[draw,thick,->] (1) -- (4);
\path[draw,thick,->] (7) -- (4);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (6) -- (3);
\path[draw,thick,->] (8) -- (3);
\end{tikzpicture}
\end{center}
In this case the representatives
of the sets are 4, 5 and 2.
We can find the representative of any element
by following the chain that begins at the element.
For example, the element 2 is the representative
for the element 6, because
we follow the chain $6 \rightarrow 3 \rightarrow 2$.
Two elements belong to the same set exactly when
their representatives are the same.
Two sets can be joined by connecting the
representative of one set to the
representative of the other set.
For example, the sets
$\{1,4,7\}$ and $\{2,3,6,8\}$
can be joined as follows:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (2,-1) {$1$};
\node[draw, circle] (2) at (7,0) {$2$};
\node[draw, circle] (3) at (7,-1.5) {$3$};
\node[draw, circle] (4) at (3,0) {$4$};
\node[draw, circle] (6) at (6,-2.5) {$6$};
\node[draw, circle] (7) at (4,-1) {$7$};
\node[draw, circle] (8) at (8,-2.5) {$8$};
\path[draw,thick,->] (1) -- (4);
\path[draw,thick,->] (7) -- (4);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (6) -- (3);
\path[draw,thick,->] (8) -- (3);
\path[draw,thick,->] (4) -- (2);
\end{tikzpicture}
\end{center}
The resulting set contains the elements
$\{1,2,3,4,6,7,8\}$.
From this on, the element 2 is the representative
for the entire set and the old representative 4
points to the element 2.
The efficiency of the union-find structure depends on
how the sets are joined.
It turns out that we can follow a simple strategy:
always connect the representative of the
\emph{smaller} set to the representative of the \emph{larger} set
(or if the sets are of equal size,
we can make an arbitrary choice).
Using this strategy, the length of any chain
will be $O(\log n)$, so we can
find the representative of any element
efficiently by following the corresponding chain.
\subsubsection{Implementation}
The union-find structure can be implemented
using arrays.
In the following implementation,
the array \texttt{link} contains for each element
the next element
in the chain or the element itself if it is
a representative,
and the array \texttt{size} indicates for each representative
the size of the corresponding set.
Initially, each element belongs to a separate set:
\begin{lstlisting}
for (int i = 1; i <= n; i++) link[i] = i;
for (int i = 1; i <= n; i++) size[i] = 1;
\end{lstlisting}
The function \texttt{find} returns
the representative for an element $x$.
The representative can be found by following
the chain that begins at $x$.
\begin{lstlisting}
int find(int x) {
while (x != link[x]) x = link[x];
return x;
}
\end{lstlisting}
The function \texttt{same} checks
whether elements $a$ and $b$ belong to the same set.
This can easily be done by using the
function \texttt{find}:
\begin{lstlisting}
bool same(int a, int b) {
return find(a) == find(b);
}
\end{lstlisting}
\begin{samepage}
The function \texttt{unite} joins the sets
that contain elements $a$ and $b$
(the elements have to be in different sets).
The function first finds the representatives
of the sets and then connects the smaller
set to the larger set.
\begin{lstlisting}
void unite(int a, int b) {
a = find(a);
b = find(b);
if (size[a] < size[b]) swap(a,b);
size[a] += size[b];
link[b] = a;
}
\end{lstlisting}
\end{samepage}
The time complexity of the function \texttt{find}
is $O(\log n)$ assuming that the length of each
chain is $O(\log n)$.
In this case, the functions \texttt{same} and \texttt{unite}
also work in $O(\log n)$ time.
The function \texttt{unite} makes sure that the
length of each chain is $O(\log n)$ by connecting
the smaller set to the larger set.
\section{Prim's algorithm}
\index{Prim's algorithm}
\key{Prim's algorithm}\footnote{The algorithm is
named after R. C. Prim who published it in 1957 \cite{pri57}.
However, the same algorithm was discovered already in 1930
by V. Jarník.} is an alternative method
for finding a minimum spanning tree.
The algorithm first adds an arbitrary node
to the tree.
After this, the algorithm always chooses
a minimum-weight edge that
adds a new node to the tree.
Finally, all nodes have been added to the tree
and a minimum spanning tree has been found.
Prim's algorithm resembles Dijkstra's algorithm.
The difference is that Dijkstra's algorithm always
selects an edge whose distance from the starting
node is minimum, but Prim's algorithm simply selects
the minimum weight edge that adds a new node to the tree.
\subsubsection{Example}
Let us consider how Prim's algorithm works
in the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
%\path[draw=red,thick,-,line width=2pt] (5) -- (6);
\end{tikzpicture}
\end{center}
Initially, there are no edges between the nodes:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
An arbitrary node can be the starting node,
so let us choose node 1.
First, we add node 2 that is connected by
an edge of weight 3:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
After this, there are two edges with weight 5,
so we can add either node 3 or node 5 to the tree.
Let us add node 3 first:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\begin{samepage}
The process continues until all nodes have been included in the tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
\subsubsection{Implementation}
Like Dijkstra's algorithm, Prim's algorithm can be
efficiently implemented using a priority queue.
The priority queue should contain all nodes
that can be connected to the current component using
a single edge, in increasing order of the weights
of the corresponding edges.
The time complexity of Prim's algorithm is
$O(n + m \log m)$ that equals the time complexity
of Dijkstra's algorithm.
In practice, Prim's and Kruskal's algorithms
are both efficient, and the choice of the algorithm
is a matter of taste.
Still, most competitive programmers use Kruskal's algorithm.

708
chapter16.tex Normal file
View File

@ -0,0 +1,708 @@
\chapter{Directed graphs}
In this chapter, we focus on two classes of directed graphs:
\begin{itemize}
\item \key{Acyclic graphs}:
There are no cycles in the graph,
so there is no path from any node to itself\footnote{Directed acyclic
graphs are sometimes called DAGs.}.
\item \key{Successor graphs}:
The outdegree of each node is 1,
so each node has a unique successor.
\end{itemize}
It turns out that in both cases,
we can design efficient algorithms that are based
on the special properties of the graphs.
\section{Topological sorting}
\index{topological sorting}
\index{cycle}
A \key{topological sort} is an ordering
of the nodes of a directed graph
such that if there is a path from node $a$ to node $b$,
then node $a$ appears before node $b$ in the ordering.
For example, for the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
one topological sort is
$[4,1,5,2,3,6]$:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (-6,0) {$1$};
\node[draw, circle] (2) at (-3,0) {$2$};
\node[draw, circle] (3) at (-1.5,0) {$3$};
\node[draw, circle] (4) at (-7.5,0) {$4$};
\node[draw, circle] (5) at (-4.5,0) {$5$};
\node[draw, circle] (6) at (-0,0) {$6$};
\path[draw,thick,->,>=latex] (1) edge [bend right=30] (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) edge [bend left=30] (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) edge [bend left=30] (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
An acyclic graph always has a topological sort.
However, if the graph contains a cycle,
it is not possible to form a topological sort,
because no node of the cycle can appear
before the other nodes of the cycle in the ordering.
It turns out that depth-first search can be used
to both check if a directed graph contains a cycle
and, if it does not contain a cycle, to construct a topological sort.
\subsubsection{Algorithm}
The idea is to go through the nodes of the graph
and always begin a depth-first search at the current node
if it has not been processed yet.
During the searches, the nodes have three possible states:
\begin{itemize}
\item state 0: the node has not been processed (white)
\item state 1: the node is under processing (light gray)
\item state 2: the node has been processed (dark gray)
\end{itemize}
Initially, the state of each node is 0.
When a search reaches a node for the first time,
its state becomes 1.
Finally, after all successors of the node have
been processed, its state becomes 2.
If the graph contains a cycle, we will find this out
during the search, because sooner or later
we will arrive at a node whose state is 1.
In this case, it is not possible to construct a topological sort.
If the graph does not contain a cycle, we can construct
a topological sort by
adding each node to a list when the state of the node becomes 2.
This list in reverse order is a topological sort.
\subsubsection{Example 1}
In the example graph, the search first proceeds
from node 1 to node 6:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle,fill=gray!20] (1) at (1,5) {$1$};
\node[draw, circle,fill=gray!20] (2) at (3,5) {$2$};
\node[draw, circle,fill=gray!20] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle,fill=gray!80] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
%\path[draw,thick,->,>=latex] (3) -- (6);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (6);
\end{tikzpicture}
\end{center}
Now node 6 has been processed, so it is added to the list.
After this, also nodes 3, 2 and 1 are added to the list:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle,fill=gray!80] (1) at (1,5) {$1$};
\node[draw, circle,fill=gray!80] (2) at (3,5) {$2$};
\node[draw, circle,fill=gray!80] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle,fill=gray!80] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
At this point, the list is $[6,3,2,1]$.
The next search begins at node 4:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle,fill=gray!80] (1) at (1,5) {$1$};
\node[draw, circle,fill=gray!80] (2) at (3,5) {$2$};
\node[draw, circle,fill=gray!80] (3) at (5,5) {$3$};
\node[draw, circle,fill=gray!20] (4) at (1,3) {$4$};
\node[draw, circle,fill=gray!80] (5) at (3,3) {$5$};
\node[draw, circle,fill=gray!80] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
%\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\path[draw=red,thick,->,line width=2pt] (4) -- (5);
\end{tikzpicture}
\end{center}
Thus, the final list is $[6,3,2,1,5,4]$.
We have processed all nodes, so a topological sort has
been found.
The topological sort is the reverse list
$[4,5,1,2,3,6]$:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (3,0) {$1$};
\node[draw, circle] (2) at (4.5,0) {$2$};
\node[draw, circle] (3) at (6,0) {$3$};
\node[draw, circle] (4) at (0,0) {$4$};
\node[draw, circle] (5) at (1.5,0) {$5$};
\node[draw, circle] (6) at (7.5,0) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) edge [bend left=30] (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) edge [bend right=30] (2);
\path[draw,thick,->,>=latex] (5) edge [bend right=40] (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
Note that a topological sort is not unique,
and there can be several topological sorts for a graph.
\subsubsection{Example 2}
Let us now consider a graph for which we
cannot construct a topological sort,
because the graph contains a cycle:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (3) -- (5);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
The search proceeds as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle,fill=gray!20] (1) at (1,5) {$1$};
\node[draw, circle,fill=gray!20] (2) at (3,5) {$2$};
\node[draw, circle,fill=gray!20] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle,fill=gray!20] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (3) -- (6);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (5);
\path[draw=red,thick,->,line width=2pt] (5) -- (2);
\end{tikzpicture}
\end{center}
The search reaches node 2 whose state is 1,
which means that the graph contains a cycle.
In this example, there is a cycle
$2 \rightarrow 3 \rightarrow 5 \rightarrow 2$.
\section{Dynamic programming}
If a directed graph is acyclic,
dynamic programming can be applied to it.
For example, we can efficiently solve the following
problems concerning paths from a starting node
to an ending node:
\begin{itemize}
\item how many different paths are there?
\item what is the shortest/longest path?
\item what is the minimum/maximum number of edges in a path?
\item which nodes certainly appear in any path?
\end{itemize}
\subsubsection{Counting the number of paths}
As an example, let us calculate the number of paths
from node 1 to node 6 in the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (1) -- (4);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
There are a total of three such paths:
\begin{itemize}
\item $1 \rightarrow 2 \rightarrow 3 \rightarrow 6$
\item $1 \rightarrow 4 \rightarrow 5 \rightarrow 2 \rightarrow 3 \rightarrow 6$
\item $1 \rightarrow 4 \rightarrow 5 \rightarrow 3 \rightarrow 6$
\end{itemize}
Let $\texttt{paths}(x)$ denote the number of paths from
node 1 to node $x$.
As a base case, $\texttt{paths}(1)=1$.
Then, to calculate other values of $\texttt{paths}(x)$,
we may use the recursion
\[\texttt{paths}(x) = \texttt{paths}(a_1)+\texttt{paths}(a_2)+\cdots+\texttt{paths}(a_k)\]
where $a_1,a_2,\ldots,a_k$ are the nodes from which there
is an edge to $x$.
Since the graph is acyclic, the values of $\texttt{paths}(x)$
can be calculated in the order of a topological sort.
A topological sort for the above graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (4.5,0) {$2$};
\node[draw, circle] (3) at (6,0) {$3$};
\node[draw, circle] (4) at (1.5,0) {$4$};
\node[draw, circle] (5) at (3,0) {$5$};
\node[draw, circle] (6) at (7.5,0) {$6$};
\path[draw,thick,->,>=latex] (1) edge [bend left=30] (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (1) -- (4);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) edge [bend right=30] (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\end{tikzpicture}
\end{center}
Hence, the numbers of paths are as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (1) -- (4);
\path[draw,thick,->,>=latex] (4) -- (5);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (5) -- (3);
\path[draw,thick,->,>=latex] (3) -- (6);
\node[color=red] at (1,2.3) {$1$};
\node[color=red] at (3,2.3) {$1$};
\node[color=red] at (5,2.3) {$3$};
\node[color=red] at (1,5.7) {$1$};
\node[color=red] at (3,5.7) {$2$};
\node[color=red] at (5,5.7) {$3$};
\end{tikzpicture}
\end{center}
For example, to calculate the value of $\texttt{paths}(3)$,
we can use the formula $\texttt{paths}(2)+\texttt{paths}(5)$,
because there are edges from nodes 2 and 5
to node 3.
Since $\texttt{paths}(2)=2$ and $\texttt{paths}(5)=1$, we conclude that $\texttt{paths}(3)=3$.
\subsubsection{Extending Dijkstra's algorithm}
\index{Dijkstra's algorithm}
A by-product of Dijkstra's algorithm is a directed, acyclic
graph that indicates for each node of the original graph
the possible ways to reach the node using a shortest path
from the starting node.
Dynamic programming can be applied to that graph.
For example, in the graph
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (0,-2) {$3$};
\node[draw, circle] (4) at (2,-2) {$4$};
\node[draw, circle] (5) at (4,-1) {$5$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=left:5] {} (3);
\path[draw,thick,-] (2) -- node[font=\small,label=right:4] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=above:8] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=below:2] {} (4);
\path[draw,thick,-] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw,thick,-] (2) -- node[font=\small,label=above:2] {} (3);
\end{tikzpicture}
\end{center}
the shortest paths from node 1 may use the following edges:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (0,-2) {$3$};
\node[draw, circle] (4) at (2,-2) {$4$};
\node[draw, circle] (5) at (4,-1) {$5$};
\path[draw,thick,->] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,->] (1) -- node[font=\small,label=left:5] {} (3);
\path[draw,thick,->] (2) -- node[font=\small,label=right:4] {} (4);
\path[draw,thick,->] (3) -- node[font=\small,label=below:2] {} (4);
\path[draw,thick,->] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw,thick,->] (2) -- node[font=\small,label=above:2] {} (3);
\end{tikzpicture}
\end{center}
Now we can, for example, calculate the number of
shortest paths from node 1 to node 5
using dynamic programming:
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (0,-2) {$3$};
\node[draw, circle] (4) at (2,-2) {$4$};
\node[draw, circle] (5) at (4,-1) {$5$};
\path[draw,thick,->] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,->] (1) -- node[font=\small,label=left:5] {} (3);
\path[draw,thick,->] (2) -- node[font=\small,label=right:4] {} (4);
\path[draw,thick,->] (3) -- node[font=\small,label=below:2] {} (4);
\path[draw,thick,->] (4) -- node[font=\small,label=below:1] {} (5);
\path[draw,thick,->] (2) -- node[font=\small,label=above:2] {} (3);
\node[color=red] at (0,0.7) {$1$};
\node[color=red] at (2,0.7) {$1$};
\node[color=red] at (0,-2.7) {$2$};
\node[color=red] at (2,-2.7) {$3$};
\node[color=red] at (4,-1.7) {$3$};
\end{tikzpicture}
\end{center}
\subsubsection{Representing problems as graphs}
Actually, any dynamic programming problem
can be represented as a directed, acyclic graph.
In such a graph, each node corresponds to a dynamic programming state
and the edges indicate how the states depend on each other.
As an example, consider the problem
of forming a sum of money $n$
using coins
$\{c_1,c_2,\ldots,c_k\}$.
In this problem, we can construct a graph where
each node corresponds to a sum of money,
and the edges show how the coins can be chosen.
For example, for coins $\{1,3,4\}$ and $n=6$,
the graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (0) at (0,0) {$0$};
\node[draw, circle] (1) at (2,0) {$1$};
\node[draw, circle] (2) at (4,0) {$2$};
\node[draw, circle] (3) at (6,0) {$3$};
\node[draw, circle] (4) at (8,0) {$4$};
\node[draw, circle] (5) at (10,0) {$5$};
\node[draw, circle] (6) at (12,0) {$6$};
\path[draw,thick,->] (0) -- (1);
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (2) -- (3);
\path[draw,thick,->] (3) -- (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (6);
\path[draw,thick,->] (0) edge [bend right=30] (3);
\path[draw,thick,->] (1) edge [bend right=30] (4);
\path[draw,thick,->] (2) edge [bend right=30] (5);
\path[draw,thick,->] (3) edge [bend right=30] (6);
\path[draw,thick,->] (0) edge [bend left=30] (4);
\path[draw,thick,->] (1) edge [bend left=30] (5);
\path[draw,thick,->] (2) edge [bend left=30] (6);
\end{tikzpicture}
\end{center}
Using this representation,
the shortest path from node 0 to node $n$
corresponds to a solution with the minimum number of coins,
and the total number of paths from node 0 to node $n$
equals the total number of solutions.
\section{Successor paths}
\index{successor graph}
\index{functional graph}
For the rest of the chapter,
we will focus on \key{successor graphs}.
In those graphs,
the outdegree of each node is 1, i.e.,
exactly one edge starts at each node.
A successor graph consists of one or more
components, each of which contains
one cycle and some paths that lead to it.
Successor graphs are sometimes called
\key{functional graphs}.
The reason for this is that any successor graph
corresponds to a function that defines
the edges of the graph.
The parameter for the function is a node of the graph,
and the function gives the successor of that node.
For example, the function
\begin{center}
\begin{tabular}{r|rrrrrrrrr}
$x$ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
\hline
$\texttt{succ}(x)$ & 3 & 5 & 7 & 6 & 2 & 2 & 1 & 6 & 3 \\
\end{tabular}
\end{center}
defines the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (-2,0) {$3$};
\node[draw, circle] (4) at (1,-3) {$4$};
\node[draw, circle] (5) at (4,0) {$5$};
\node[draw, circle] (6) at (2,-1.5) {$6$};
\node[draw, circle] (7) at (-2,-1.5) {$7$};
\node[draw, circle] (8) at (3,-3) {$8$};
\node[draw, circle] (9) at (-4,0) {$9$};
\path[draw,thick,->] (1) -- (3);
\path[draw,thick,->] (2) edge [bend left=40] (5);
\path[draw,thick,->] (3) -- (7);
\path[draw,thick,->] (4) -- (6);
\path[draw,thick,->] (5) edge [bend left=40] (2);
\path[draw,thick,->] (6) -- (2);
\path[draw,thick,->] (7) -- (1);
\path[draw,thick,->] (8) -- (6);
\path[draw,thick,->] (9) -- (3);
\end{tikzpicture}
\end{center}
Since each node of a successor graph has a
unique successor, we can also define a function $\texttt{succ}(x,k)$
that gives the node that we will reach if
we begin at node $x$ and walk $k$ steps forward.
For example, in the above graph $\texttt{succ}(4,6)=2$,
because we will reach node 2 by walking 6 steps from node 4:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$4$};
\node[draw, circle] (2) at (1.5,0) {$6$};
\node[draw, circle] (3) at (3,0) {$2$};
\node[draw, circle] (4) at (4.5,0) {$5$};
\node[draw, circle] (5) at (6,0) {$2$};
\node[draw, circle] (6) at (7.5,0) {$5$};
\node[draw, circle] (7) at (9,0) {$2$};
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (2) -- (3);
\path[draw,thick,->] (3) -- (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (6);
\path[draw,thick,->] (6) -- (7);
\end{tikzpicture}
\end{center}
A straightforward way to calculate a value of $\texttt{succ}(x,k)$
is to start at node $x$ and walk $k$ steps forward, which takes $O(k)$ time.
However, using preprocessing, any value of $\texttt{succ}(x,k)$
can be calculated in only $O(\log k)$ time.
The idea is to precalculate all values of $\texttt{succ}(x,k)$ where
$k$ is a power of two and at most $u$, where $u$ is
the maximum number of steps we will ever walk.
This can be efficiently done, because
we can use the following recursion:
\begin{equation*}
\texttt{succ}(x,k) = \begin{cases}
\texttt{succ}(x) & k = 1\\
\texttt{succ}(\texttt{succ}(x,k/2),k/2) & k > 1\\
\end{cases}
\end{equation*}
Precalculating the values takes $O(n \log u)$ time,
because $O(\log u)$ values are calculated for each node.
In the above graph, the first values are as follows:
\begin{center}
\begin{tabular}{r|rrrrrrrrr}
$x$ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
\hline
$\texttt{succ}(x,1)$ & 3 & 5 & 7 & 6 & 2 & 2 & 1 & 6 & 3 \\
$\texttt{succ}(x,2)$ & 7 & 2 & 1 & 2 & 5 & 5 & 3 & 2 & 7 \\
$\texttt{succ}(x,4)$ & 3 & 2 & 7 & 2 & 5 & 5 & 1 & 2 & 3 \\
$\texttt{succ}(x,8)$ & 7 & 2 & 1 & 2 & 5 & 5 & 3 & 2 & 7 \\
$\cdots$ \\
\end{tabular}
\end{center}
After this, any value of $\texttt{succ}(x,k)$ can be calculated
by presenting the number of steps $k$ as a sum of powers of two.
For example, if we want to calculate the value of $\texttt{succ}(x,11)$,
we first form the representation $11=8+2+1$.
Using that,
\[\texttt{succ}(x,11)=\texttt{succ}(\texttt{succ}(\texttt{succ}(x,8),2),1).\]
For example, in the previous graph
\[\texttt{succ}(4,11)=\texttt{succ}(\texttt{succ}(\texttt{succ}(4,8),2),1)=5.\]
Such a representation always consists of
$O(\log k)$ parts, so calculating a value of $\texttt{succ}(x,k)$
takes $O(\log k)$ time.
\section{Cycle detection}
\index{cycle}
\index{cycle detection}
Consider a successor graph that only contains
a path that ends in a cycle.
We may ask the following questions:
if we begin our walk at the starting node,
what is the first node in the cycle
and how many nodes does the cycle contain?
For example, in the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (5) at (0,0) {$5$};
\node[draw, circle] (4) at (-2,0) {$4$};
\node[draw, circle] (6) at (-1,1.5) {$6$};
\node[draw, circle] (3) at (-4,0) {$3$};
\node[draw, circle] (2) at (-6,0) {$2$};
\node[draw, circle] (1) at (-8,0) {$1$};
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (2) -- (3);
\path[draw,thick,->] (3) -- (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (6);
\path[draw,thick,->] (6) -- (4);
\end{tikzpicture}
\end{center}
we begin our walk at node 1,
the first node that belongs to the cycle is node 4, and the cycle consists
of three nodes (4, 5 and 6).
A simple way to detect the cycle is to walk in the
graph and keep track of
all nodes that have been visited. Once a node is visited
for the second time, we can conclude
that the node is the first node in the cycle.
This method works in $O(n)$ time and also uses
$O(n)$ memory.
However, there are better algorithms for cycle detection.
The time complexity of such algorithms is still $O(n)$,
but they only use $O(1)$ memory.
This is an important improvement if $n$ is large.
Next we will discuss Floyd's algorithm that
achieves these properties.
\subsubsection{Floyd's algorithm}
\index{Floyd's algorithm}
\key{Floyd's algorithm}\footnote{The idea of the algorithm is mentioned in \cite{knu982}
and attributed to R. W. Floyd; however, it is not known if Floyd actually
discovered the algorithm.} walks forward
in the graph using two pointers $a$ and $b$.
Both pointers begin at a node $x$ that
is the starting node of the graph.
Then, on each turn, the pointer $a$ walks
one step forward and the pointer $b$
walks two steps forward.
The process continues until
the pointers meet each other:
\begin{lstlisting}
a = succ(x);
b = succ(succ(x));
while (a != b) {
a = succ(a);
b = succ(succ(b));
}
\end{lstlisting}
At this point, the pointer $a$ has walked $k$ steps
and the pointer $b$ has walked $2k$ steps,
so the length of the cycle divides $k$.
Thus, the first node that belongs to the cycle
can be found by moving the pointer $a$ to node $x$
and advancing the pointers
step by step until they meet again.
\begin{lstlisting}
a = x;
while (a != b) {
a = succ(a);
b = succ(b);
}
first = a;
\end{lstlisting}
After this, the length of the cycle
can be calculated as follows:
\begin{lstlisting}
b = succ(a);
length = 1;
while (a != b) {
b = succ(b);
length++;
}
\end{lstlisting}

563
chapter17.tex Normal file
View File

@ -0,0 +1,563 @@
\chapter{Strong connectivity}
\index{strongly connected graph}
In a directed graph,
the edges can be traversed in one direction only,
so even if the graph is connected,
this does not guarantee that there would be
a path from a node to another node.
For this reason, it is meaningful to define a new concept
that requires more than connectivity.
A graph is \key{strongly connected}
if there is a path from any node to all
other nodes in the graph.
For example, in the following picture,
the left graph is strongly connected
while the right graph is not.
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,1) {$1$};
\node[draw, circle] (2) at (3,1) {$2$};
\node[draw, circle] (3) at (1,-1) {$3$};
\node[draw, circle] (4) at (3,-1) {$4$};
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (2) -- (4);
\path[draw,thick,->] (4) -- (3);
\path[draw,thick,->] (3) -- (1);
\node[draw, circle] (1b) at (6,1) {$1$};
\node[draw, circle] (2b) at (8,1) {$2$};
\node[draw, circle] (3b) at (6,-1) {$3$};
\node[draw, circle] (4b) at (8,-1) {$4$};
\path[draw,thick,->] (1b) -- (2b);
\path[draw,thick,->] (2b) -- (4b);
\path[draw,thick,->] (4b) -- (3b);
\path[draw,thick,->] (1b) -- (3b);
\end{tikzpicture}
\end{center}
The right graph is not strongly connected
because, for example, there is no path
from node 2 to node 1.
\index{strongly connected component}
\index{component graph}
The \key{strongly connected components}
of a graph divide the graph into strongly connected
parts that are as large as possible.
The strongly connected components form an
acyclic \key{component graph} that represents
the deep structure of the original graph.
For example, for the graph
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,->] (2) -- (1);
\path[draw,thick,->] (1) -- (3);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (2) -- (4);
\path[draw,thick,->] (3) -- (5);
\path[draw,thick,->] (4) edge [bend left] (6);
\path[draw,thick,->] (6) edge [bend left] (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (7);
\path[draw,thick,->] (6) -- (7);
\end{tikzpicture}
\end{center}
the strongly connected components are as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,->] (2) -- (1);
\path[draw,thick,->] (1) -- (3);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (2) -- (4);
\path[draw,thick,->] (3) -- (5);
\path[draw,thick,->] (4) edge [bend left] (6);
\path[draw,thick,->] (6) edge [bend left] (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (7);
\path[draw,thick,->] (6) -- (7);
\draw [red,thick,dashed,line width=2pt] (-0.5,2.5) rectangle (-3.5,-0.5);
\draw [red,thick,dashed,line width=2pt] (-4.5,2.5) rectangle (-7.5,1.5);
\draw [red,thick,dashed,line width=2pt] (-4.5,0.5) rectangle (-5.5,-0.5);
\draw [red,thick,dashed,line width=2pt] (-6.5,0.5) rectangle (-7.5,-0.5);
\end{tikzpicture}
\end{center}
The corresponding component graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (-3,1) {$B$};
\node[draw, circle] (2) at (-6,2) {$A$};
\node[draw, circle] (3) at (-5,0) {$D$};
\node[draw, circle] (4) at (-7,0) {$C$};
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (1) -- (3);
\path[draw,thick,->] (2) -- (3);
\path[draw,thick,->] (2) -- (4);
\path[draw,thick,->] (3) -- (4);
\end{tikzpicture}
\end{center}
The components are $A=\{1,2\}$,
$B=\{3,6,7\}$, $C=\{4\}$ and $D=\{5\}$.
A component graph is an acyclic, directed graph,
so it is easier to process than the original graph.
Since the graph does not contain cycles,
we can always construct a topological sort and
use dynamic programming techniques like those
presented in Chapter 16.
\section{Kosaraju's algorithm}
\index{Kosaraju's algorithm}
\key{Kosaraju's algorithm}\footnote{According to \cite{aho83},
S. R. Kosaraju invented this algorithm in 1978
but did not publish it. In 1981, the same algorithm was rediscovered
and published by M. Sharir \cite{sha81}.} is an efficient
method for finding the strongly connected components
of a directed graph.
The algorithm performs two depth-first searches:
the first search constructs a list of nodes
according to the structure of the graph,
and the second search forms the strongly connected components.
\subsubsection{Search 1}
The first phase of Kosaraju's algorithm constructs
a list of nodes in the order in which a
depth-first search processes them.
The algorithm goes through the nodes,
and begins a depth-first search at each
unprocessed node.
Each node will be added to the list
after it has been processed.
In the example graph, the nodes are processed
in the following order:
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\node at (-7,2.75) {$1/8$};
\node at (-5,2.75) {$2/7$};
\node at (-3,2.75) {$9/14$};
\node at (-7,-0.75) {$4/5$};
\node at (-5,-0.75) {$3/6$};
\node at (-3,-0.75) {$11/12$};
\node at (-1,1.75) {$10/13$};
\path[draw,thick,->] (2) -- (1);
\path[draw,thick,->] (1) -- (3);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (2) -- (4);
\path[draw,thick,->] (3) -- (5);
\path[draw,thick,->] (4) edge [bend left] (6);
\path[draw,thick,->] (6) edge [bend left] (4);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (5) -- (7);
\path[draw,thick,->] (6) -- (7);
\end{tikzpicture}
\end{center}
The notation $x/y$ means that
processing the node started
at time $x$ and finished at time $y$.
Thus, the corresponding list is as follows:
\begin{tabular}{ll}
\\
node & processing time \\
\hline
4 & 5 \\
5 & 6 \\
2 & 7 \\
1 & 8 \\
6 & 12 \\
7 & 13 \\
3 & 14 \\
\\
\end{tabular}
%
% In the second phase of the algorithm,
% the nodes will be processed
% in reverse order: $[3,7,6,1,2,5,4]$.
\subsubsection{Search 2}
The second phase of the algorithm
forms the strongly connected components
of the graph.
First, the algorithm reverses every
edge in the graph.
This guarantees that during the second search,
we will always find strongly connected
components that do not have extra nodes.
After reversing the edges,
the example graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,<-] (2) -- (1);
\path[draw,thick,<-] (1) -- (3);
\path[draw,thick,<-] (3) -- (2);
\path[draw,thick,<-] (2) -- (4);
\path[draw,thick,<-] (3) -- (5);
\path[draw,thick,<-] (4) edge [bend left] (6);
\path[draw,thick,<-] (6) edge [bend left] (4);
\path[draw,thick,<-] (4) -- (5);
\path[draw,thick,<-] (5) -- (7);
\path[draw,thick,<-] (6) -- (7);
\end{tikzpicture}
\end{center}
After this, the algorithm goes through
the list of nodes created by the first search,
in \emph{reverse} order.
If a node does not belong to a component,
the algorithm creates a new component
and starts a depth-first search
that adds all new nodes found during the search
to the new component.
In the example graph, the first component
begins at node 3:
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,<-] (2) -- (1);
\path[draw,thick,<-] (1) -- (3);
\path[draw,thick,<-] (3) -- (2);
\path[draw,thick,<-] (2) -- (4);
\path[draw,thick,<-] (3) -- (5);
\path[draw,thick,<-] (4) edge [bend left] (6);
\path[draw,thick,<-] (6) edge [bend left] (4);
\path[draw,thick,<-] (4) -- (5);
\path[draw,thick,<-] (5) -- (7);
\path[draw,thick,<-] (6) -- (7);
\draw [red,thick,dashed,line width=2pt] (-0.5,2.5) rectangle (-3.5,-0.5);
\end{tikzpicture}
\end{center}
Note that since all edges are reversed,
the component does not ''leak'' to other parts in the graph.
\begin{samepage}
The next nodes in the list are nodes 7 and 6,
but they already belong to a component,
so the next new component begins at node 1:
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,<-] (2) -- (1);
\path[draw,thick,<-] (1) -- (3);
\path[draw,thick,<-] (3) -- (2);
\path[draw,thick,<-] (2) -- (4);
\path[draw,thick,<-] (3) -- (5);
\path[draw,thick,<-] (4) edge [bend left] (6);
\path[draw,thick,<-] (6) edge [bend left] (4);
\path[draw,thick,<-] (4) -- (5);
\path[draw,thick,<-] (5) -- (7);
\path[draw,thick,<-] (6) -- (7);
\draw [red,thick,dashed,line width=2pt] (-0.5,2.5) rectangle (-3.5,-0.5);
\draw [red,thick,dashed,line width=2pt] (-4.5,2.5) rectangle (-7.5,1.5);
%\draw [red,thick,dashed,line width=2pt] (-4.5,0.5) rectangle (-5.5,-0.5);
%\draw [red,thick,dashed,line width=2pt] (-6.5,0.5) rectangle (-7.5,-0.5);
\end{tikzpicture}
\end{center}
\end{samepage}
\begin{samepage}
Finally, the algorithm processes nodes 5 and 4
that create the remaining strongly connected components:
\begin{center}
\begin{tikzpicture}[scale=0.9,label distance=-2mm]
\node[draw, circle] (1) at (-1,1) {$7$};
\node[draw, circle] (2) at (-3,2) {$3$};
\node[draw, circle] (4) at (-5,2) {$2$};
\node[draw, circle] (6) at (-7,2) {$1$};
\node[draw, circle] (3) at (-3,0) {$6$};
\node[draw, circle] (5) at (-5,0) {$5$};
\node[draw, circle] (7) at (-7,0) {$4$};
\path[draw,thick,<-] (2) -- (1);
\path[draw,thick,<-] (1) -- (3);
\path[draw,thick,<-] (3) -- (2);
\path[draw,thick,<-] (2) -- (4);
\path[draw,thick,<-] (3) -- (5);
\path[draw,thick,<-] (4) edge [bend left] (6);
\path[draw,thick,<-] (6) edge [bend left] (4);
\path[draw,thick,<-] (4) -- (5);
\path[draw,thick,<-] (5) -- (7);
\path[draw,thick,<-] (6) -- (7);
\draw [red,thick,dashed,line width=2pt] (-0.5,2.5) rectangle (-3.5,-0.5);
\draw [red,thick,dashed,line width=2pt] (-4.5,2.5) rectangle (-7.5,1.5);
\draw [red,thick,dashed,line width=2pt] (-4.5,0.5) rectangle (-5.5,-0.5);
\draw [red,thick,dashed,line width=2pt] (-6.5,0.5) rectangle (-7.5,-0.5);
\end{tikzpicture}
\end{center}
\end{samepage}
The time complexity of the algorithm is $O(n+m)$,
because the algorithm
performs two depth-first searches.
\section{2SAT problem}
\index{2SAT problem}
Strong connectivity is also linked with the
\key{2SAT problem}\footnote{The algorithm presented here was
introduced in \cite{asp79}.
There is also another well-known linear-time algorithm \cite{eve75}
that is based on backtracking.}.
In this problem, we are given a logical formula
\[
(a_1 \lor b_1) \land (a_2 \lor b_2) \land \cdots \land (a_m \lor b_m),
\]
where each $a_i$ and $b_i$ is either a logical variable
($x_1,x_2,\ldots,x_n$)
or a negation of a logical variable
($\lnot x_1, \lnot x_2, \ldots, \lnot x_n$).
The symbols ''$\land$'' and ''$\lor$'' denote
logical operators ''and'' and ''or''.
Our task is to assign each variable a value
so that the formula is true, or state
that this is not possible.
For example, the formula
\[
L_1 = (x_2 \lor \lnot x_1) \land
(\lnot x_1 \lor \lnot x_2) \land
(x_1 \lor x_3) \land
(\lnot x_2 \lor \lnot x_3) \land
(x_1 \lor x_4)
\]
is true when the variables are assigned as follows:
\[
\begin{cases}
x_1 = \textrm{false} \\
x_2 = \textrm{false} \\
x_3 = \textrm{true} \\
x_4 = \textrm{true} \\
\end{cases}
\]
However, the formula
\[
L_2 = (x_1 \lor x_2) \land
(x_1 \lor \lnot x_2) \land
(\lnot x_1 \lor x_3) \land
(\lnot x_1 \lor \lnot x_3)
\]
is always false, regardless of how we
assign the values.
The reason for this is that we cannot
choose a value for $x_1$
without creating a contradiction.
If $x_1$ is false, both $x_2$ and $\lnot x_2$
should be true which is impossible,
and if $x_1$ is true, both $x_3$ and $\lnot x_3$
should be true which is also impossible.
The 2SAT problem can be represented as a graph
whose nodes correspond to
variables $x_i$ and negations $\lnot x_i$,
and edges determine the connections
between the variables.
Each pair $(a_i \lor b_i)$ generates two edges:
$\lnot a_i \to b_i$ and $\lnot b_i \to a_i$.
This means that if $a_i$ does not hold,
$b_i$ must hold, and vice versa.
The graph for the formula $L_1$ is:
\\
\begin{center}
\begin{tikzpicture}[scale=1.0,minimum size=2pt]
\node[draw, circle, inner sep=1.3pt] (1) at (1,2) {$\lnot x_3$};
\node[draw, circle] (2) at (3,2) {$x_2$};
\node[draw, circle, inner sep=1.3pt] (3) at (1,0) {$\lnot x_4$};
\node[draw, circle] (4) at (3,0) {$x_1$};
\node[draw, circle, inner sep=1.3pt] (5) at (5,2) {$\lnot x_1$};
\node[draw, circle] (6) at (7,2) {$x_4$};
\node[draw, circle, inner sep=1.3pt] (7) at (5,0) {$\lnot x_2$};
\node[draw, circle] (8) at (7,0) {$x_3$};
\path[draw,thick,->] (1) -- (4);
\path[draw,thick,->] (4) -- (2);
\path[draw,thick,->] (2) -- (1);
\path[draw,thick,->] (3) -- (4);
\path[draw,thick,->] (2) -- (5);
\path[draw,thick,->] (4) -- (7);
\path[draw,thick,->] (5) -- (6);
\path[draw,thick,->] (5) -- (8);
\path[draw,thick,->] (8) -- (7);
\path[draw,thick,->] (7) -- (5);
\end{tikzpicture}
\end{center}
And the graph for the formula $L_2$ is:
\\
\begin{center}
\begin{tikzpicture}[scale=1.0,minimum size=2pt]
\node[draw, circle] (1) at (1,2) {$x_3$};
\node[draw, circle] (2) at (3,2) {$x_2$};
\node[draw, circle, inner sep=1.3pt] (3) at (5,2) {$\lnot x_2$};
\node[draw, circle, inner sep=1.3pt] (4) at (7,2) {$\lnot x_3$};
\node[draw, circle, inner sep=1.3pt] (5) at (4,3.5) {$\lnot x_1$};
\node[draw, circle] (6) at (4,0.5) {$x_1$};
\path[draw,thick,->] (1) -- (5);
\path[draw,thick,->] (4) -- (5);
\path[draw,thick,->] (6) -- (1);
\path[draw,thick,->] (6) -- (4);
\path[draw,thick,->] (5) -- (2);
\path[draw,thick,->] (5) -- (3);
\path[draw,thick,->] (2) -- (6);
\path[draw,thick,->] (3) -- (6);
\end{tikzpicture}
\end{center}
The structure of the graph tells us whether
it is possible to assign the values
of the variables so
that the formula is true.
It turns out that this can be done
exactly when there are no nodes
$x_i$ and $\lnot x_i$ such that
both nodes belong to the
same strongly connected component.
If there are such nodes,
the graph contains
a path from $x_i$ to $\lnot x_i$
and also a path from $\lnot x_i$ to $x_i$,
so both $x_i$ and $\lnot x_i$ should be true
which is not possible.
In the graph of the formula $L_1$
there are no nodes $x_i$ and $\lnot x_i$
such that both nodes
belong to the same strongly connected component,
so a solution exists.
In the graph of the formula $L_2$
all nodes belong to the same strongly connected component,
so a solution does not exist.
If a solution exists, the values for the variables
can be found by going through the nodes of the
component graph in a reverse topological sort order.
At each step, we process a component
that does not contain edges that lead to an
unprocessed component.
If the variables in the component
have not been assigned values,
their values will be determined
according to the values in the component,
and if they already have values,
they remain unchanged.
The process continues until each variable
has been assigned a value.
The component graph for the formula $L_1$ is as follows:
\begin{center}
\begin{tikzpicture}[scale=1.0]
\node[draw, circle] (1) at (0,0) {$A$};
\node[draw, circle] (2) at (2,0) {$B$};
\node[draw, circle] (3) at (4,0) {$C$};
\node[draw, circle] (4) at (6,0) {$D$};
\path[draw,thick,->] (1) -- (2);
\path[draw,thick,->] (2) -- (3);
\path[draw,thick,->] (3) -- (4);
\end{tikzpicture}
\end{center}
The components are
$A = \{\lnot x_4\}$,
$B = \{x_1, x_2, \lnot x_3\}$,
$C = \{\lnot x_1, \lnot x_2, x_3\}$ and
$D = \{x_4\}$.
When constructing the solution,
we first process the component $D$
where $x_4$ becomes true.
After this, we process the component $C$
where $x_1$ and $x_2$ become false
and $x_3$ becomes true.
All variables have been assigned values,
so the remaining components $A$ and $B$
do not change the variables.
Note that this method works, because the
graph has a special structure:
if there are paths from node $x_i$ to node $x_j$
and from node $x_j$ to node $\lnot x_j$,
then node $x_i$ never becomes true.
The reason for this is that there is also
a path from node $\lnot x_j$ to node $\lnot x_i$,
and both $x_i$ and $x_j$ become false.
\index{3SAT problem}
A more difficult problem is the \key{3SAT problem},
where each part of the formula is of the form
$(a_i \lor b_i \lor c_i)$.
This problem is NP-hard, so no efficient algorithm
for solving the problem is known.

1225
chapter18.tex Normal file

File diff suppressed because it is too large Load Diff

674
chapter19.tex Normal file
View File

@ -0,0 +1,674 @@
\chapter{Paths and circuits}
This chapter focuses on two types of paths in graphs:
\begin{itemize}
\item An \key{Eulerian path} is a path that
goes through each edge exactly once.
\item A \key{Hamiltonian path} is a path
that visits each node exactly once.
\end{itemize}
While Eulerian and Hamiltonian paths look like
similar concepts at first glance,
the computational problems related to them
are very different.
It turns out that there is a simple rule that
determines whether a graph contains an Eulerian path,
and there is also an efficient algorithm to
find such a path if it exists.
On the contrary, checking the existence of a Hamiltonian path is a NP-hard
problem, and no efficient algorithm is known for solving the problem.
\section{Eulerian paths}
\index{Eulerian path}
An \key{Eulerian path}\footnote{L. Euler studied such paths in 1736
when he solved the famous Königsberg bridge problem.
This was the birth of graph theory.} is a path
that goes exactly once through each edge of the graph.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
has an Eulerian path from node 2 to node 5:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:1.}] {} (1);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]left:2.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]south:3.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]left:4.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:5.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]south:6.}] {} (5);
\end{tikzpicture}
\end{center}
\index{Eulerian circuit}
An \key{Eulerian circuit}
is an Eulerian path that starts and ends
at the same node.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (4);
\end{tikzpicture}
\end{center}
has an Eulerian circuit that starts and ends at node 1:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (4);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]left:1.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]south:2.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]right:3.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]south:4.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]north:5.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:6.}] {} (1);
\end{tikzpicture}
\end{center}
\subsubsection{Existence}
The existence of Eulerian paths and circuits
depends on the degrees of the nodes.
First, an undirected graph has an Eulerian path
exactly when all the edges
belong to the same connected component and
\begin{itemize}
\item the degree of each node is even \emph{or}
\item the degree of exactly two nodes is odd,
and the degree of all other nodes is even.
\end{itemize}
In the first case, each Eulerian path is also an Eulerian circuit.
In the second case, the odd-degree nodes are the starting
and ending nodes of an Eulerian path which is not an Eulerian circuit.
\begin{samepage}
For example, in the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
\end{samepage}
nodes 1, 3 and 4 have a degree of 2,
and nodes 2 and 5 have a degree of 3.
Exactly two nodes have an odd degree,
so there is an Eulerian path between nodes 2 and 5,
but the graph does not contain an Eulerian circuit.
In a directed graph,
we focus on indegrees and outdegrees
of the nodes.
A directed graph contains an Eulerian path
exactly when all the edges belong to the same
connected component and
\begin{itemize}
\item in each node, the indegree equals the outdegree, \emph{or}
\item in one node, the indegree is one larger than the outdegree,
in another node, the outdegree is one larger than the indegree,
and in all other nodes, the indegree equals the outdegree.
\end{itemize}
In the first case, each Eulerian path
is also an Eulerian circuit,
and in the second case, the graph contains an Eulerian path
that begins at the node whose outdegree is larger
and ends at the node whose indegree is larger.
For example, in the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (3) -- (5);
\path[draw,thick,->,>=latex] (2) -- (5);
\path[draw,thick,->,>=latex] (5) -- (4);
\end{tikzpicture}
\end{center}
nodes 1, 3 and 4 have both indegree 1 and outdegree 1,
node 2 has indegree 1 and outdegree 2,
and node 5 has indegree 2 and outdegree 1.
Hence, the graph contains an Eulerian path
from node 2 to node 5:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:1.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]south:2.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]south:3.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]left:4.}] {} (1);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]north:5.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]left:6.}] {} (5);
\end{tikzpicture}
\end{center}
\subsubsection{Hierholzer's algorithm}
\index{Hierholzer's algorithm}
\key{Hierholzer's algorithm}\footnote{The algorithm was published
in 1873 after Hierholzer's death \cite{hie73}.} is an efficient
method for constructing
an Eulerian circuit.
The algorithm consists of several rounds,
each of which adds new edges to the circuit.
Of course, we assume that the graph contains
an Eulerian circuit; otherwise Hierholzer's
algorithm cannot find it.
First, the algorithm constructs a circuit that contains
some (not necessarily all) of the edges of the graph.
After this, the algorithm extends the circuit
step by step by adding subcircuits to it.
The process continues until all edges have been added
to the circuit.
The algorithm extends the circuit by always finding
a node $x$ that belongs to the circuit but has
an outgoing edge that is not included in the circuit.
The algorithm constructs a new path from node $x$
that only contains edges that are not yet in the circuit.
Sooner or later,
the path will return to node $x$,
which creates a subcircuit.
If the graph only contains an Eulerian path,
we can still use Hierholzer's algorithm
to find it by adding an extra edge to the graph
and removing the edge after the circuit
has been constructed.
For example, in an undirected graph,
we add the extra edge between the two
odd-degree nodes.
Next we will see how Hierholzer's algorithm
constructs an Eulerian circuit for an undirected graph.
\subsubsection{Example}
\begin{samepage}
Let us consider the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (2) at (1,3) {$2$};
\node[draw, circle] (3) at (3,3) {$3$};
\node[draw, circle] (4) at (5,3) {$4$};
\node[draw, circle] (5) at (1,1) {$5$};
\node[draw, circle] (6) at (3,1) {$6$};
\node[draw, circle] (7) at (5,1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (4) -- (7);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (6) -- (7);
\end{tikzpicture}
\end{center}
\end{samepage}
\begin{samepage}
Suppose that the algorithm first creates a circuit
that begins at node 1.
A possible circuit is
$1 \rightarrow 2 \rightarrow 3 \rightarrow 1$:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (2) at (1,3) {$2$};
\node[draw, circle] (3) at (3,3) {$3$};
\node[draw, circle] (4) at (5,3) {$4$};
\node[draw, circle] (5) at (1,1) {$5$};
\node[draw, circle] (6) at (3,1) {$6$};
\node[draw, circle] (7) at (5,1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (4) -- (7);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (6) -- (7);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]north:1.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:2.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]east:3.}] {} (1);
\end{tikzpicture}
\end{center}
\end{samepage}
After this, the algorithm adds
the subcircuit
$2 \rightarrow 5 \rightarrow 6 \rightarrow 2$
to the circuit:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (2) at (1,3) {$2$};
\node[draw, circle] (3) at (3,3) {$3$};
\node[draw, circle] (4) at (5,3) {$4$};
\node[draw, circle] (5) at (1,1) {$5$};
\node[draw, circle] (6) at (3,1) {$6$};
\node[draw, circle] (7) at (5,1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (4) -- (7);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (6) -- (7);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]north:1.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]west:2.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]south:3.}] {} (6);
\path[draw=red,thick,->,line width=2pt] (6) -- node[font=\small,label={[red]north:4.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:5.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]east:6.}] {} (1);
\end{tikzpicture}
\end{center}
Finally, the algorithm adds the subcircuit
$6 \rightarrow 3 \rightarrow 4 \rightarrow 7 \rightarrow 6$
to the circuit:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (2) at (1,3) {$2$};
\node[draw, circle] (3) at (3,3) {$3$};
\node[draw, circle] (4) at (5,3) {$4$};
\node[draw, circle] (5) at (1,1) {$5$};
\node[draw, circle] (6) at (3,1) {$6$};
\node[draw, circle] (7) at (5,1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (6);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (4) -- (7);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (6) -- (7);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]north:1.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]west:2.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]south:3.}] {} (6);
\path[draw=red,thick,->,line width=2pt] (6) -- node[font=\small,label={[red]east:4.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]north:5.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]east:6.}] {} (7);
\path[draw=red,thick,->,line width=2pt] (7) -- node[font=\small,label={[red]south:7.}] {} (6);
\path[draw=red,thick,->,line width=2pt] (6) -- node[font=\small,label={[red]right:8.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:9.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]east:10.}] {} (1);
\end{tikzpicture}
\end{center}
Now all edges are included in the circuit,
so we have successfully constructed an Eulerian circuit.
\section{Hamiltonian paths}
\index{Hamiltonian path}
A \key{Hamiltonian path}
%\footnote{W. R. Hamilton (1805--1865) was an Irish mathematician.}
is a path
that visits each node of the graph exactly once.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
contains a Hamiltonian path from node 1 to node 3:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]left:1.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]south:2.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]left:3.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:4.}] {} (3);
\end{tikzpicture}
\end{center}
\index{Hamiltonian circuit}
If a Hamiltonian path begins and ends at the same node,
it is called a \key{Hamiltonian circuit}.
The graph above also has an Hamiltonian circuit
that begins and ends at node 1:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- node[font=\small,label={[red]north:1.}] {} (2);
\path[draw=red,thick,->,line width=2pt] (2) -- node[font=\small,label={[red]north:2.}] {} (3);
\path[draw=red,thick,->,line width=2pt] (3) -- node[font=\small,label={[red]south:3.}] {} (5);
\path[draw=red,thick,->,line width=2pt] (5) -- node[font=\small,label={[red]south:4.}] {} (4);
\path[draw=red,thick,->,line width=2pt] (4) -- node[font=\small,label={[red]left:5.}] {} (1);
\end{tikzpicture}
\end{center}
\subsubsection{Existence}
No efficient method is known for testing if a graph
contains a Hamiltonian path, and the problem is NP-hard.
Still, in some special cases, we can be certain
that a graph contains a Hamiltonian path.
A simple observation is that if the graph is complete,
i.e., there is an edge between all pairs of nodes,
it also contains a Hamiltonian path.
Also stronger results have been achieved:
\begin{itemize}
\item
\index{Dirac's theorem}
\key{Dirac's theorem}: %\cite{dir52}
If the degree of each node is at least $n/2$,
the graph contains a Hamiltonian path.
\item
\index{Ore's theorem}
\key{Ore's theorem}: %\cite{ore60}
If the sum of degrees of each non-adjacent pair of nodes
is at least $n$,
the graph contains a Hamiltonian path.
\end{itemize}
A common property in these theorems and other results is
that they guarantee the existence of a Hamiltonian path
if the graph has \emph{a large number} of edges.
This makes sense, because the more edges the graph contains,
the more possibilities there is to construct a Hamiltonian path.
\subsubsection{Construction}
Since there is no efficient way to check if a Hamiltonian
path exists, it is clear that there is also no method
to efficiently construct the path, because otherwise
we could just try to construct the path and see
whether it exists.
A simple way to search for a Hamiltonian path is
to use a backtracking algorithm that goes through all
possible ways to construct the path.
The time complexity of such an algorithm is at least $O(n!)$,
because there are $n!$ different ways to choose the order of $n$ nodes.
A more efficient solution is based on dynamic programming
(see Chapter 10.5).
The idea is to calculate values
of a function $\texttt{possible}(S,x)$,
where $S$ is a subset of nodes and $x$
is one of the nodes.
The function indicates whether there is a Hamiltonian path
that visits the nodes of $S$ and ends at node $x$.
It is possible to implement this solution in $O(2^n n^2)$ time.
\section{De Bruijn sequences}
\index{De Bruijn sequence}
A \key{De Bruijn sequence}
is a string that contains
every string of length $n$
exactly once as a substring, for a fixed
alphabet of $k$ characters.
The length of such a string is
$k^n+n-1$ characters.
For example, when $n=3$ and $k=2$,
an example of a De Bruijn sequence is
\[0001011100.\]
The substrings of this string are all
combinations of three bits:
000, 001, 010, 011, 100, 101, 110 and 111.
It turns out that each De Bruijn sequence
corresponds to an Eulerian path in a graph.
The idea is to construct a graph where
each node contains a string of $n-1$ characters
and each edge adds one character to the string.
The following graph corresponds to the above scenario:
\begin{center}
\begin{tikzpicture}[scale=0.8]
\node[draw, circle] (00) at (-3,0) {00};
\node[draw, circle] (11) at (3,0) {11};
\node[draw, circle] (01) at (0,2) {01};
\node[draw, circle] (10) at (0,-2) {10};
\path[draw,thick,->] (00) edge [bend left=20] node[font=\small,label=1] {} (01);
\path[draw,thick,->] (01) edge [bend left=20] node[font=\small,label=1] {} (11);
\path[draw,thick,->] (11) edge [bend left=20] node[font=\small,label=below:0] {} (10);
\path[draw,thick,->] (10) edge [bend left=20] node[font=\small,label=below:0] {} (00);
\path[draw,thick,->] (01) edge [bend left=30] node[font=\small,label=right:0] {} (10);
\path[draw,thick,->] (10) edge [bend left=30] node[font=\small,label=left:1] {} (01);
\path[draw,thick,-] (00) edge [loop left] node[font=\small,label=below:0] {} (00);
\path[draw,thick,-] (11) edge [loop right] node[font=\small,label=below:1] {} (11);
\end{tikzpicture}
\end{center}
An Eulerian path in this graph corresponds to a string
that contains all strings of length $n$.
The string contains the characters of the starting node
and all characters of the edges.
The starting node has $n-1$ characters
and there are $k^n$ characters in the edges,
so the length of the string is $k^n+n-1$.
\section{Knight's tours}
\index{knight's tour}
A \key{knight's tour} is a sequence of moves
of a knight on an $n \times n$ chessboard
following the rules of chess such that the knight
visits each square exactly once.
A knight's tour is called a \emph{closed} tour
if the knight finally returns to the starting square and
otherwise it is called an \emph{open} tour.
For example, here is an open knight's tour on a $5 \times 5$ board:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (5,5);
\node at (0.5,4.5) {$1$};
\node at (1.5,4.5) {$4$};
\node at (2.5,4.5) {$11$};
\node at (3.5,4.5) {$16$};
\node at (4.5,4.5) {$25$};
\node at (0.5,3.5) {$12$};
\node at (1.5,3.5) {$17$};
\node at (2.5,3.5) {$2$};
\node at (3.5,3.5) {$5$};
\node at (4.5,3.5) {$10$};
\node at (0.5,2.5) {$3$};
\node at (1.5,2.5) {$20$};
\node at (2.5,2.5) {$7$};
\node at (3.5,2.5) {$24$};
\node at (4.5,2.5) {$15$};
\node at (0.5,1.5) {$18$};
\node at (1.5,1.5) {$13$};
\node at (2.5,1.5) {$22$};
\node at (3.5,1.5) {$9$};
\node at (4.5,1.5) {$6$};
\node at (0.5,0.5) {$21$};
\node at (1.5,0.5) {$8$};
\node at (2.5,0.5) {$19$};
\node at (3.5,0.5) {$14$};
\node at (4.5,0.5) {$23$};
\end{tikzpicture}
\end{center}
A knight's tour corresponds to a Hamiltonian path in a graph
whose nodes represent the squares of the board,
and two nodes are connected with an edge if a knight
can move between the squares according to the rules of chess.
A natural way to construct a knight's tour is to use backtracking.
The search can be made more efficient by using
\emph{heuristics} that attempt to guide the knight so that
a complete tour will be found quickly.
\subsubsection{Warnsdorf's rule}
\index{heuristic}
\index{Warnsdorf's rule}
\key{Warnsdorf's rule} is a simple and effective heuristic
for finding a knight's tour\footnote{This heuristic was proposed
in Warnsdorf's book \cite{war23} in 1823. There are
also polynomial algorithms for finding knight's tours
\cite{par97}, but they are more complicated.}.
Using the rule, it is possible to efficiently construct a tour
even on a large board.
The idea is to always move the knight so that it ends up
in a square where the number of possible moves is as
\emph{small} as possible.
For example, in the following situation, there are five
possible squares to which the knight can move (squares $a \ldots e$):
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (5,5);
\node at (0.5,4.5) {$1$};
\node at (2.5,3.5) {$2$};
\node at (4.5,4.5) {$a$};
\node at (0.5,2.5) {$b$};
\node at (4.5,2.5) {$e$};
\node at (1.5,1.5) {$c$};
\node at (3.5,1.5) {$d$};
\end{tikzpicture}
\end{center}
In this situation, Warnsdorf's rule moves the knight to square $a$,
because after this choice, there is only a single possible move.
The other choices would move the knight to squares where
there would be three moves available.

1441
chapter20.tex Normal file

File diff suppressed because it is too large Load Diff

726
chapter21.tex Normal file
View File

@ -0,0 +1,726 @@
\chapter{Number theory}
\index{number theory}
\key{Number theory} is a branch of mathematics
that studies integers.
Number theory is a fascinating field,
because many questions involving integers
are very difficult to solve even if they
seem simple at first glance.
As an example, consider the following equation:
\[x^3 + y^3 + z^3 = 33\]
It is easy to find three real numbers $x$, $y$ and $z$
that satisfy the equation.
For example, we can choose
\[
\begin{array}{lcl}
x = 3, \\
y = \sqrt[3]{3}, \\
z = \sqrt[3]{3}.\\
\end{array}
\]
However, it is an open problem in number theory
if there are any three
\emph{integers} $x$, $y$ and $z$
that would satisfy the equation \cite{bec07}.
In this chapter, we will focus on basic concepts
and algorithms in number theory.
Throughout the chapter, we will assume that all numbers
are integers, if not otherwise stated.
\section{Primes and factors}
\index{divisibility}
\index{factor}
\index{divisor}
A number $a$ is called a \key{factor} or a \key{divisor} of a number $b$
if $a$ divides $b$.
If $a$ is a factor of $b$,
we write $a \mid b$, and otherwise we write $a \nmid b$.
For example, the factors of 24 are
1, 2, 3, 4, 6, 8, 12 and 24.
\index{prime}
\index{prime decomposition}
A number $n>1$ is a \key{prime}
if its only positive factors are 1 and $n$.
For example, 7, 19 and 41 are primes,
but 35 is not a prime, because $5 \cdot 7 = 35$.
For every number $n>1$, there is a unique
\key{prime factorization}
\[ n = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_k^{\alpha_k},\]
where $p_1,p_2,\ldots,p_k$ are distinct primes and
$\alpha_1,\alpha_2,\ldots,\alpha_k$ are positive numbers.
For example, the prime factorization for 84 is
\[84 = 2^2 \cdot 3^1 \cdot 7^1.\]
The \key{number of factors} of a number $n$ is
\[\tau(n)=\prod_{i=1}^k (\alpha_i+1),\]
because for each prime $p_i$, there are
$\alpha_i+1$ ways to choose how many times
it appears in the factor.
For example, the number of factors
of 84 is
$\tau(84)=3 \cdot 2 \cdot 2 = 12$.
The factors are
1, 2, 3, 4, 6, 7, 12, 14, 21, 28, 42 and 84.
The \key{sum of factors} of $n$ is
\[\sigma(n)=\prod_{i=1}^k (1+p_i+\ldots+p_i^{\alpha_i}) = \prod_{i=1}^k \frac{p_i^{a_i+1}-1}{p_i-1},\]
where the latter formula is based on the geometric progression formula.
For example, the sum of factors of 84 is
\[\sigma(84)=\frac{2^3-1}{2-1} \cdot \frac{3^2-1}{3-1} \cdot \frac{7^2-1}{7-1} = 7 \cdot 4 \cdot 8 = 224.\]
The \key{product of factors} of $n$ is
\[\mu(n)=n^{\tau(n)/2},\]
because we can form $\tau(n)/2$ pairs from the factors,
each with product $n$.
For example, the factors of 84
produce the pairs
$1 \cdot 84$, $2 \cdot 42$, $3 \cdot 28$, etc.,
and the product of the factors is $\mu(84)=84^6=351298031616$.
\index{perfect number}
A number $n$ is called a \key{perfect number} if $n=\sigma(n)-n$,
i.e., $n$ equals the sum of its factors
between $1$ and $n-1$.
For example, 28 is a perfect number,
because $28=1+2+4+7+14$.
\subsubsection{Number of primes}
It is easy to show that there is an infinite number
of primes.
If the number of primes would be finite,
we could construct a set $P=\{p_1,p_2,\ldots,p_n\}$
that would contain all the primes.
For example, $p_1=2$, $p_2=3$, $p_3=5$, and so on.
However, using $P$, we could form a new prime
\[p_1 p_2 \cdots p_n+1\]
that is larger than all elements in $P$.
This is a contradiction, and the number of primes
has to be infinite.
\subsubsection{Density of primes}
The density of primes means how often there are primes
among the numbers.
Let $\pi(n)$ denote the number of primes between
$1$ and $n$. For example, $\pi(10)=4$, because
there are 4 primes between $1$ and $10$: 2, 3, 5 and 7.
It is possible to show that
\[\pi(n) \approx \frac{n}{\ln n},\]
which means that primes are quite frequent.
For example, the number of primes between
$1$ and $10^6$ is $\pi(10^6)=78498$,
and $10^6 / \ln 10^6 \approx 72382$.
\subsubsection{Conjectures}
There are many \emph{conjectures} involving primes.
Most people think that the conjectures are true,
but nobody has been able to prove them.
For example, the following conjectures are famous:
\begin{itemize}
\index{Goldbach's conjecture}
\item \key{Goldbach's conjecture}:
Each even integer $n>2$ can be represented as a
sum $n=a+b$ so that both $a$ and $b$ are primes.
\index{twin prime}
\item \key{Twin prime conjecture}:
There is an infinite number of pairs
of the form $\{p,p+2\}$,
where both $p$ and $p+2$ are primes.
\index{Legendre's conjecture}
\item \key{Legendre's conjecture}:
There is always a prime between numbers
$n^2$ and $(n+1)^2$, where $n$ is any positive integer.
\end{itemize}
\subsubsection{Basic algorithms}
If a number $n$ is not prime,
it can be represented as a product $a \cdot b$,
where $a \le \sqrt n$ or $b \le \sqrt n$,
so it certainly has a factor between $2$ and $\lfloor \sqrt n \rfloor$.
Using this observation, we can both test
if a number is prime and find the prime factorization
of a number in $O(\sqrt n)$ time.
The following function \texttt{prime} checks
if the given number $n$ is prime.
The function attempts to divide $n$ by
all numbers between $2$ and $\lfloor \sqrt n \rfloor$,
and if none of them divides $n$, then $n$ is prime.
\begin{lstlisting}
bool prime(int n) {
if (n < 2) return false;
for (int x = 2; x*x <= n; x++) {
if (n%x == 0) return false;
}
return true;
}
\end{lstlisting}
\noindent
The following function \texttt{factors}
constructs a vector that contains the prime
factorization of $n$.
The function divides $n$ by its prime factors,
and adds them to the vector.
The process ends when the remaining number $n$
has no factors between $2$ and $\lfloor \sqrt n \rfloor$.
If $n>1$, it is prime and the last factor.
\begin{lstlisting}
vector<int> factors(int n) {
vector<int> f;
for (int x = 2; x*x <= n; x++) {
while (n%x == 0) {
f.push_back(x);
n /= x;
}
}
if (n > 1) f.push_back(n);
return f;
}
\end{lstlisting}
Note that each prime factor appears in the vector
as many times as it divides the number.
For example, $24=2^3 \cdot 3$,
so the result of the function is $[2,2,2,3]$.
\subsubsection{Sieve of Eratosthenes}
\index{sieve of Eratosthenes}
The \key{sieve of Eratosthenes}
%\footnote{Eratosthenes (c. 276 BC -- c. 194 BC) was a Greek mathematician.}
is a preprocessing
algorithm that builds an array using which we
can efficiently check if a given number between $2 \ldots n$
is prime and, if it is not, find one prime factor of the number.
The algorithm builds an array $\texttt{sieve}$
whose positions $2,3,\ldots,n$ are used.
The value $\texttt{sieve}[k]=0$ means
that $k$ is prime,
and the value $\texttt{sieve}[k] \neq 0$
means that $k$ is not a prime and one
of its prime factors is $\texttt{sieve}[k]$.
The algorithm iterates through the numbers
$2 \ldots n$ one by one.
Always when a new prime $x$ is found,
the algorithm records that the multiples
of $x$ ($2x,3x,4x,\ldots$) are not primes,
because the number $x$ divides them.
For example, if $n=20$, the array is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (19,1);
\node at (0.5,0.5) {$0$};
\node at (1.5,0.5) {$0$};
\node at (2.5,0.5) {$2$};
\node at (3.5,0.5) {$0$};
\node at (4.5,0.5) {$3$};
\node at (5.5,0.5) {$0$};
\node at (6.5,0.5) {$2$};
\node at (7.5,0.5) {$3$};
\node at (8.5,0.5) {$5$};
\node at (9.5,0.5) {$0$};
\node at (10.5,0.5) {$3$};
\node at (11.5,0.5) {$0$};
\node at (12.5,0.5) {$7$};
\node at (13.5,0.5) {$5$};
\node at (14.5,0.5) {$2$};
\node at (15.5,0.5) {$0$};
\node at (16.5,0.5) {$3$};
\node at (17.5,0.5) {$0$};
\node at (18.5,0.5) {$5$};
\footnotesize
\node at (0.5,1.5) {$2$};
\node at (1.5,1.5) {$3$};
\node at (2.5,1.5) {$4$};
\node at (3.5,1.5) {$5$};
\node at (4.5,1.5) {$6$};
\node at (5.5,1.5) {$7$};
\node at (6.5,1.5) {$8$};
\node at (7.5,1.5) {$9$};
\node at (8.5,1.5) {$10$};
\node at (9.5,1.5) {$11$};
\node at (10.5,1.5) {$12$};
\node at (11.5,1.5) {$13$};
\node at (12.5,1.5) {$14$};
\node at (13.5,1.5) {$15$};
\node at (14.5,1.5) {$16$};
\node at (15.5,1.5) {$17$};
\node at (16.5,1.5) {$18$};
\node at (17.5,1.5) {$19$};
\node at (18.5,1.5) {$20$};
\end{tikzpicture}
\end{center}
The following code implements the sieve of
Eratosthenes.
The code assumes that each element of
\texttt{sieve} is initially zero.
\begin{lstlisting}
for (int x = 2; x <= n; x++) {
if (sieve[x]) continue;
for (int u = 2*x; u <= n; u += x) {
sieve[u] = x;
}
}
\end{lstlisting}
The inner loop of the algorithm is executed
$n/x$ times for each value of $x$.
Thus, an upper bound for the running time
of the algorithm is the harmonic sum
\[\sum_{x=2}^n n/x = n/2 + n/3 + n/4 + \cdots + n/n = O(n \log n).\]
\index{harmonic sum}
In fact, the algorithm is more efficient,
because the inner loop will be executed only if
the number $x$ is prime.
It can be shown that the running time of the
algorithm is only $O(n \log \log n)$,
a complexity very near to $O(n)$.
\subsubsection{Euclid's algorithm}
\index{greatest common divisor}
\index{least common multiple}
\index{Euclid's algorithm}
The \key{greatest common divisor} of
numbers $a$ and $b$, $\gcd(a,b)$,
is the greatest number that divides both $a$ and $b$,
and the \key{least common multiple} of
$a$ and $b$, $\textrm{lcm}(a,b)$,
is the smallest number that is divisible by
both $a$ and $b$.
For example,
$\gcd(24,36)=12$ and
$\textrm{lcm}(24,36)=72$.
The greatest common divisor and the least common multiple
are connected as follows:
\[\textrm{lcm}(a,b)=\frac{ab}{\textrm{gcd}(a,b)}\]
\key{Euclid's algorithm}\footnote{Euclid was a Greek mathematician who
lived in about 300 BC. This is perhaps the first known algorithm in history.} provides an efficient way
to find the greatest common divisor of two numbers.
The algorithm is based on the following formula:
\begin{equation*}
\textrm{gcd}(a,b) = \begin{cases}
a & b = 0\\
\textrm{gcd}(b,a \bmod b) & b \neq 0\\
\end{cases}
\end{equation*}
For example,
\[\textrm{gcd}(24,36) = \textrm{gcd}(36,24)
= \textrm{gcd}(24,12) = \textrm{gcd}(12,0)=12.\]
The algorithm can be implemented as follows:
\begin{lstlisting}
int gcd(int a, int b) {
if (b == 0) return a;
return gcd(b, a%b);
}
\end{lstlisting}
It can be shown that Euclid's algorithm works
in $O(\log n)$ time, where $n=\min(a,b)$.
The worst case for the algorithm is
the case when $a$ and $b$ are consecutive Fibonacci numbers.
For example,
\[\textrm{gcd}(13,8)=\textrm{gcd}(8,5)
=\textrm{gcd}(5,3)=\textrm{gcd}(3,2)=\textrm{gcd}(2,1)=\textrm{gcd}(1,0)=1.\]
\subsubsection{Euler's totient function}
\index{coprime}
\index{Euler's totient function}
Numbers $a$ and $b$ are \key{coprime}
if $\textrm{gcd}(a,b)=1$.
\key{Euler's totient function} $\varphi(n)$
%\footnote{Euler presented this function in 1763.}
gives the number of coprime numbers to $n$
between $1$ and $n$.
For example, $\varphi(12)=4$,
because 1, 5, 7 and 11
are coprime to 12.
The value of $\varphi(n)$ can be calculated
from the prime factorization of $n$
using the formula
\[ \varphi(n) = \prod_{i=1}^k p_i^{\alpha_i-1}(p_i-1). \]
For example, $\varphi(12)=2^1 \cdot (2-1) \cdot 3^0 \cdot (3-1)=4$.
Note that $\varphi(n)=n-1$ if $n$ is prime.
\section{Modular arithmetic}
\index{modular arithmetic}
In \key{modular arithmetic},
the set of numbers is limited so
that only numbers $0,1,2,\ldots,m-1$ are used,
where $m$ is a constant.
Each number $x$ is
represented by the number $x \bmod m$:
the remainder after dividing $x$ by $m$.
For example, if $m=17$, then $75$
is represented by $75 \bmod 17 = 7$.
Often we can take remainders before doing
calculations.
In particular, the following formulas hold:
\[
\begin{array}{rcl}
(x+y) \bmod m & = & (x \bmod m + y \bmod m) \bmod m \\
(x-y) \bmod m & = & (x \bmod m - y \bmod m) \bmod m \\
(x \cdot y) \bmod m & = & (x \bmod m \cdot y \bmod m) \bmod m \\
x^n \bmod m & = & (x \bmod m)^n \bmod m \\
\end{array}
\]
\subsubsection{Modular exponentiation}
There is often need to efficiently calculate
the value of $x^n \bmod m$.
This can be done in $O(\log n)$ time
using the following recursion:
\begin{equation*}
x^n = \begin{cases}
1 & n = 0\\
x^{n/2} \cdot x^{n/2} & \text{$n$ is even}\\
x^{n-1} \cdot x & \text{$n$ is odd}
\end{cases}
\end{equation*}
It is important that in the case of an even $n$,
the value of $x^{n/2}$ is calculated only once.
This guarantees that the time complexity of the
algorithm is $O(\log n)$, because $n$ is always halved
when it is even.
The following function calculates the value of
$x^n \bmod m$:
\begin{lstlisting}
int modpow(int x, int n, int m) {
if (n == 0) return 1%m;
long long u = modpow(x,n/2,m);
u = (u*u)%m;
if (n%2 == 1) u = (u*x)%m;
return u;
}
\end{lstlisting}
\subsubsection{Fermat's theorem and Euler's theorem}
\index{Fermat's theorem}
\index{Euler's theorem}
\key{Fermat's theorem}
%\footnote{Fermat discovered this theorem in 1640.}
states that
\[x^{m-1} \bmod m = 1\]
when $m$ is prime and $x$ and $m$ are coprime.
This also yields
\[x^k \bmod m = x^{k \bmod (m-1)} \bmod m.\]
More generally, \key{Euler's theorem}
%\footnote{Euler published this theorem in 1763.}
states that
\[x^{\varphi(m)} \bmod m = 1\]
when $x$ and $m$ are coprime.
Fermat's theorem follows from Euler's theorem,
because if $m$ is a prime, then $\varphi(m)=m-1$.
\subsubsection{Modular inverse}
\index{modular inverse}
The inverse of $x$ modulo $m$
is a number $x^{-1}$ such that
\[ x x^{-1} \bmod m = 1. \]
For example, if $x=6$ and $m=17$,
then $x^{-1}=3$, because $6\cdot3 \bmod 17=1$.
Using modular inverses, we can divide numbers
modulo $m$, because division by $x$
corresponds to multiplication by $x^{-1}$.
For example, to evaluate the value of $36/6 \bmod 17$,
we can use the formula $2 \cdot 3 \bmod 17$,
because $36 \bmod 17 = 2$ and $6^{-1} \bmod 17 = 3$.
However, a modular inverse does not always exist.
For example, if $x=2$ and $m=4$, the equation
\[ x x^{-1} \bmod m = 1 \]
cannot be solved, because all multiples of 2
are even and the remainder can never be 1 when $m=4$.
It turns out that the value of $x^{-1} \bmod m$
can be calculated exactly when $x$ and $m$ are coprime.
If a modular inverse exists, it can be
calculated using the formula
\[
x^{-1} = x^{\varphi(m)-1}.
\]
If $m$ is prime, the formula becomes
\[
x^{-1} = x^{m-2}.
\]
For example,
\[6^{-1} \bmod 17 =6^{17-2} \bmod 17 = 3.\]
This formula allows us to efficiently calculate
modular inverses using the modular exponentation algorithm.
The formula can be derived using Euler's theorem.
First, the modular inverse should satisfy the following equation:
\[
x x^{-1} \bmod m = 1.
\]
On the other hand, according to Euler's theorem,
\[
x^{\varphi(m)} \bmod m = xx^{\varphi(m)-1} \bmod m = 1,
\]
so the numbers $x^{-1}$ and $x^{\varphi(m)-1}$ are equal.
\subsubsection{Computer arithmetic}
In programming, unsigned integers are represented modulo $2^k$,
where $k$ is the number of bits of the data type.
A usual consequence of this is that a number wraps around
if it becomes too large.
For example, in C++, numbers of type \texttt{unsigned int}
are represented modulo $2^{32}$.
The following code declares an \texttt{unsigned int}
variable whose value is $123456789$.
After this, the value will be multiplied by itself,
and the result is
$123456789^2 \bmod 2^{32} = 2537071545$.
\begin{lstlisting}
unsigned int x = 123456789;
cout << x*x << "\n"; // 2537071545
\end{lstlisting}
\section{Solving equations}
\subsubsection*{Diophantine equations}
\index{Diophantine equation}
A \key{Diophantine equation}
%\footnote{Diophantus of Alexandria was a Greek mathematician who lived in the 3th century.}
is an equation of the form
\[ ax + by = c, \]
where $a$, $b$ and $c$ are constants
and the values of $x$ and $y$ should be found.
Each number in the equation has to be an integer.
For example, one solution for the equation
$5x+2y=11$ is $x=3$ and $y=-2$.
\index{extended Euclid's algorithm}
We can efficiently solve a Diophantine equation
by using Euclid's algorithm.
It turns out that we can extend Euclid's algorithm
so that it will find numbers $x$ and $y$
that satisfy the following equation:
\[
ax + by = \textrm{gcd}(a,b)
\]
A Diophantine equation can be solved if
$c$ is divisible by
$\textrm{gcd}(a,b)$,
and otherwise it cannot be solved.
As an example, let us find numbers $x$ and $y$
that satisfy the following equation:
\[
39x + 15y = 12
\]
The equation can be solved, because
$\textrm{gcd}(39,15)=3$ and $3 \mid 12$.
When Euclid's algorithm calculates the
greatest common divisor of 39 and 15,
it produces the following sequence of function calls:
\[
\textrm{gcd}(39,15) = \textrm{gcd}(15,9)
= \textrm{gcd}(9,6) = \textrm{gcd}(6,3)
= \textrm{gcd}(3,0) = 3 \]
This corresponds to the following equations:
\[
\begin{array}{lcl}
39 - 2 \cdot 15 & = & 9 \\
15 - 1 \cdot 9 & = & 6 \\
9 - 1 \cdot 6 & = & 3 \\
\end{array}
\]
Using these equations, we can derive
\[
39 \cdot 2 + 15 \cdot (-5) = 3
\]
and by multiplying this by 4, the result is
\[
39 \cdot 8 + 15 \cdot (-20) = 12,
\]
so a solution to the equation is
$x=8$ and $y=-20$.
A solution to a Diophantine equation is not unique,
because we can form an infinite number of solutions
if we know one solution.
If a pair $(x,y)$ is a solution, then also all pairs
\[(x+\frac{kb}{\textrm{gcd}(a,b)},y-\frac{ka}{\textrm{gcd}(a,b)})\]
are solutions, where $k$ is any integer.
\subsubsection{Chinese remainder theorem}
\index{Chinese remainder theorem}
The \key{Chinese remainder theorem} solves
a group of equations of the form
\[
\begin{array}{lcl}
x & = & a_1 \bmod m_1 \\
x & = & a_2 \bmod m_2 \\
\cdots \\
x & = & a_n \bmod m_n \\
\end{array}
\]
where all pairs of $m_1,m_2,\ldots,m_n$ are coprime.
Let $x^{-1}_m$ be the inverse of $x$ modulo $m$, and
\[ X_k = \frac{m_1 m_2 \cdots m_n}{m_k}.\]
Using this notation, a solution to the equations is
\[x = a_1 X_1 {X_1}^{-1}_{m_1} + a_2 X_2 {X_2}^{-1}_{m_2} + \cdots + a_n X_n {X_n}^{-1}_{m_n}.\]
In this solution, for each $k=1,2,\ldots,n$,
\[a_k X_k {X_k}^{-1}_{m_k} \bmod m_k = a_k,\]
because
\[X_k {X_k}^{-1}_{m_k} \bmod m_k = 1.\]
Since all other terms in the sum are divisible by $m_k$,
they have no effect on the remainder,
and $x \bmod m_k = a_k$.
For example, a solution for
\[
\begin{array}{lcl}
x & = & 3 \bmod 5 \\
x & = & 4 \bmod 7 \\
x & = & 2 \bmod 3 \\
\end{array}
\]
is
\[ 3 \cdot 21 \cdot 1 + 4 \cdot 15 \cdot 1 + 2 \cdot 35 \cdot 2 = 263.\]
Once we have found a solution $x$,
we can create an infinite number of other solutions,
because all numbers of the form
\[x+m_1 m_2 \cdots m_n\]
are solutions.
\section{Other results}
\subsubsection{Lagrange's theorem}
\index{Lagrange's theorem}
\key{Lagrange's theorem}
%\footnote{J.-L. Lagrange (1736--1813) was an Italian mathematician.}
states that every positive integer
can be represented as a sum of four squares, i.e.,
$a^2+b^2+c^2+d^2$.
For example, the number 123 can be represented
as the sum $8^2+5^2+5^2+3^2$.
\subsubsection{Zeckendorf's theorem}
\index{Zeckendorf's theorem}
\index{Fibonacci number}
\key{Zeckendorf's theorem}
%\footnote{E. Zeckendorf published the theorem in 1972 \cite{zec72}; however, this was not a new result.}
states that every
positive integer has a unique representation
as a sum of Fibonacci numbers such that
no two numbers are equal or consecutive
Fibonacci numbers.
For example, the number 74 can be represented
as the sum $55+13+5+1$.
\subsubsection{Pythagorean triples}
\index{Pythagorean triple}
\index{Euclid's formula}
A \key{Pythagorean triple} is a triple $(a,b,c)$
that satisfies the Pythagorean theorem
$a^2+b^2=c^2$, which means that there is a right triangle
with side lengths $a$, $b$ and $c$.
For example, $(3,4,5)$ is a Pythagorean triple.
If $(a,b,c)$ is a Pythagorean triple,
all triples of the form $(ka,kb,kc)$
are also Pythagorean triples where $k>1$.
A Pythagorean triple is \emph{primitive} if
$a$, $b$ and $c$ are coprime,
and all Pythagorean triples can be constructed
from primitive triples using a multiplier $k$.
\key{Euclid's formula} can be used to produce
all primitive Pythagorean triples.
Each such triple is of the form
\[(n^2-m^2,2nm,n^2+m^2),\]
where $0<m<n$, $n$ and $m$ are coprime
and at least one of $n$ and $m$ is even.
For example, when $m=1$ and $n=2$, the formula
produces the smallest Pythagorean triple
\[(2^2-1^2,2\cdot2\cdot1,2^2+1^2)=(3,4,5).\]
\subsubsection{Wilson's theorem}
\index{Wilson's theorem}
\key{Wilson's theorem}
%\footnote{J. Wilson (1741--1793) was an English mathematician.}
states that a number $n$
is prime exactly when
\[(n-1)! \bmod n = n-1.\]
For example, the number 11 is prime, because
\[10! \bmod 11 = 10,\]
and the number 12 is not prime, because
\[11! \bmod 12 = 0 \neq 11.\]
Hence, Wilson's theorem can be used to find out
whether a number is prime. However, in practice, the theorem cannot be
applied to large values of $n$, because it is difficult
to calculate values of $(n-1)!$ when $n$ is large.

925
chapter22.tex Normal file
View File

@ -0,0 +1,925 @@
\chapter{Combinatorics}
\index{combinatorics}
\key{Combinatorics} studies methods for counting
combinations of objects.
Usually, the goal is to find a way to
count the combinations efficiently
without generating each combination separately.
As an example, consider the problem
of counting the number of ways to
represent an integer $n$ as a sum of positive integers.
For example, there are 8 representations
for $4$:
\begin{multicols}{2}
\begin{itemize}
\item $1+1+1+1$
\item $1+1+2$
\item $1+2+1$
\item $2+1+1$
\item $2+2$
\item $3+1$
\item $1+3$
\item $4$
\end{itemize}
\end{multicols}
A combinatorial problem can often be solved
using a recursive function.
In this problem, we can define a function $f(n)$
that gives the number of representations for $n$.
For example, $f(4)=8$ according to the above example.
The values of the function
can be recursively calculated as follows:
\begin{equation*}
f(n) = \begin{cases}
1 & n = 0\\
f(0)+f(1)+\cdots+f(n-1) & n > 0\\
\end{cases}
\end{equation*}
The base case is $f(0)=1$,
because the empty sum represents the number 0.
Then, if $n>0$, we consider all ways to
choose the first number of the sum.
If the first number is $k$,
there are $f(n-k)$ representations
for the remaining part of the sum.
Thus, we calculate the sum of all values
of the form $f(n-k)$ where $k<n$.
The first values for the function are:
\[
\begin{array}{lcl}
f(0) & = & 1 \\
f(1) & = & 1 \\
f(2) & = & 2 \\
f(3) & = & 4 \\
f(4) & = & 8 \\
\end{array}
\]
Sometimes, a recursive formula can be replaced
with a closed-form formula.
In this problem,
\[
f(n)=2^{n-1},
\]
which is based on the fact that there are $n-1$
possible positions for +-signs in the sum
and we can choose any subset of them.
\section{Binomial coefficients}
\index{binomial coefficient}
The \key{binomial coefficient} ${n \choose k}$
equals the number of ways we can choose a subset
of $k$ elements from a set of $n$ elements.
For example, ${5 \choose 3}=10$,
because the set $\{1,2,3,4,5\}$
has 10 subsets of 3 elements:
\[ \{1,2,3\}, \{1,2,4\}, \{1,2,5\}, \{1,3,4\}, \{1,3,5\},
\{1,4,5\}, \{2,3,4\}, \{2,3,5\}, \{2,4,5\}, \{3,4,5\} \]
\subsubsection{Formula 1}
Binomial coefficients can be
recursively calculated as follows:
\[
{n \choose k} = {n-1 \choose k-1} + {n-1 \choose k}
\]
The idea is to fix an element $x$ in the set.
If $x$ is included in the subset,
we have to choose $k-1$
elements from $n-1$ elements,
and if $x$ is not included in the subset,
we have to choose $k$ elements from $n-1$ elements.
The base cases for the recursion are
\[
{n \choose 0} = {n \choose n} = 1,
\]
because there is always exactly
one way to construct an empty subset
and a subset that contains all the elements.
\subsubsection{Formula 2}
Another way to calculate binomial coefficients is as follows:
\[
{n \choose k} = \frac{n!}{k!(n-k)!}.
\]
There are $n!$ permutations of $n$ elements.
We go through all permutations and always
include the first $k$ elements of the permutation
in the subset.
Since the order of the elements in the subset
and outside the subset does not matter,
the result is divided by $k!$ and $(n-k)!$
\subsubsection{Properties}
For binomial coefficients,
\[
{n \choose k} = {n \choose n-k},
\]
because we actually divide a set of $n$ elements into
two subsets: the first contains $k$ elements
and the second contains $n-k$ elements.
The sum of binomial coefficients is
\[
{n \choose 0}+{n \choose 1}+{n \choose 2}+\ldots+{n \choose n}=2^n.
\]
The reason for the name ''binomial coefficient''
can be seen when the binomial $(a+b)$ is raised to
the $n$th power:
\[ (a+b)^n =
{n \choose 0} a^n b^0 +
{n \choose 1} a^{n-1} b^1 +
\ldots +
{n \choose n-1} a^1 b^{n-1} +
{n \choose n} a^0 b^n. \]
\index{Pascal's triangle}
Binomial coefficients also appear in
\key{Pascal's triangle}
where each value equals the sum of two
above values:
\begin{center}
\begin{tikzpicture}{0.9}
\node at (0,0) {1};
\node at (-0.5,-0.5) {1};
\node at (0.5,-0.5) {1};
\node at (-1,-1) {1};
\node at (0,-1) {2};
\node at (1,-1) {1};
\node at (-1.5,-1.5) {1};
\node at (-0.5,-1.5) {3};
\node at (0.5,-1.5) {3};
\node at (1.5,-1.5) {1};
\node at (-2,-2) {1};
\node at (-1,-2) {4};
\node at (0,-2) {6};
\node at (1,-2) {4};
\node at (2,-2) {1};
\node at (-2,-2.5) {$\ldots$};
\node at (-1,-2.5) {$\ldots$};
\node at (0,-2.5) {$\ldots$};
\node at (1,-2.5) {$\ldots$};
\node at (2,-2.5) {$\ldots$};
\end{tikzpicture}
\end{center}
\subsubsection{Boxes and balls}
''Boxes and balls'' is a useful model,
where we count the ways to
place $k$ balls in $n$ boxes.
Let us consider three scenarios:
\textit{Scenario 1}: Each box can contain
at most one ball.
For example, when $n=5$ and $k=2$,
there are 10 solutions:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\newcommand\lax[3]{
\path[draw,thick,-] (#1-0.5,#2+0.5) -- (#1-0.5,#2-0.5) --
(#1+0.5,#2-0.5) -- (#1+0.5,#2+0.5);
\ifthenelse{\equal{#3}{1}}{\draw[fill=black] (#1,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1-0.2,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1+0.2,#2-0.3) circle (0.15);}{}
}
\newcommand\laa[7]{
\lax{#1}{#2}{#3}
\lax{#1+1.2}{#2}{#4}
\lax{#1+2.4}{#2}{#5}
\lax{#1+3.6}{#2}{#6}
\lax{#1+4.8}{#2}{#7}
}
\laa{0}{0}{1}{1}{0}{0}{0}
\laa{0}{-2}{1}{0}{1}{0}{0}
\laa{0}{-4}{1}{0}{0}{1}{0}
\laa{0}{-6}{1}{0}{0}{0}{1}
\laa{8}{0}{0}{1}{1}{0}{0}
\laa{8}{-2}{0}{1}{0}{1}{0}
\laa{8}{-4}{0}{1}{0}{0}{1}
\laa{16}{0}{0}{0}{1}{1}{0}
\laa{16}{-2}{0}{0}{1}{0}{1}
\laa{16}{-4}{0}{0}{0}{1}{1}
\end{tikzpicture}
\end{center}
In this scenario, the answer is directly the
binomial coefficient ${n \choose k}$.
\textit{Scenario 2}: A box can contain multiple balls.
For example, when $n=5$ and $k=2$,
there are 15 solutions:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\newcommand\lax[3]{
\path[draw,thick,-] (#1-0.5,#2+0.5) -- (#1-0.5,#2-0.5) --
(#1+0.5,#2-0.5) -- (#1+0.5,#2+0.5);
\ifthenelse{\equal{#3}{1}}{\draw[fill=black] (#1,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1-0.2,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1+0.2,#2-0.3) circle (0.15);}{}
}
\newcommand\laa[7]{
\lax{#1}{#2}{#3}
\lax{#1+1.2}{#2}{#4}
\lax{#1+2.4}{#2}{#5}
\lax{#1+3.6}{#2}{#6}
\lax{#1+4.8}{#2}{#7}
}
\laa{0}{0}{2}{0}{0}{0}{0}
\laa{0}{-2}{1}{1}{0}{0}{0}
\laa{0}{-4}{1}{0}{1}{0}{0}
\laa{0}{-6}{1}{0}{0}{1}{0}
\laa{0}{-8}{1}{0}{0}{0}{1}
\laa{8}{0}{0}{2}{0}{0}{0}
\laa{8}{-2}{0}{1}{1}{0}{0}
\laa{8}{-4}{0}{1}{0}{1}{0}
\laa{8}{-6}{0}{1}{0}{0}{1}
\laa{8}{-8}{0}{0}{2}{0}{0}
\laa{16}{0}{0}{0}{1}{1}{0}
\laa{16}{-2}{0}{0}{1}{0}{1}
\laa{16}{-4}{0}{0}{0}{2}{0}
\laa{16}{-6}{0}{0}{0}{1}{1}
\laa{16}{-8}{0}{0}{0}{0}{2}
\end{tikzpicture}
\end{center}
The process of placing the balls in the boxes
can be represented as a string
that consists of symbols
''o'' and ''$\rightarrow$''.
Initially, assume that we are standing at the leftmost box.
The symbol ''o'' means that we place a ball
in the current box, and the symbol
''$\rightarrow$'' means that we move to
the next box to the right.
Using this notation, each solution is a string
that contains $k$ times the symbol ''o'' and
$n-1$ times the symbol ''$\rightarrow$''.
For example, the upper-right solution
in the above picture corresponds to the string
''$\rightarrow$ $\rightarrow$ o $\rightarrow$ o $\rightarrow$''.
Thus, the number of solutions is
${k+n-1 \choose k}$.
\textit{Scenario 3}: Each box may contain at most one ball,
and in addition, no two adjacent boxes may both contain a ball.
For example, when $n=5$ and $k=2$,
there are 6 solutions:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\newcommand\lax[3]{
\path[draw,thick,-] (#1-0.5,#2+0.5) -- (#1-0.5,#2-0.5) --
(#1+0.5,#2-0.5) -- (#1+0.5,#2+0.5);
\ifthenelse{\equal{#3}{1}}{\draw[fill=black] (#1,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1-0.2,#2-0.3) circle (0.15);}{}
\ifthenelse{\equal{#3}{2}}{\draw[fill=black] (#1+0.2,#2-0.3) circle (0.15);}{}
}
\newcommand\laa[7]{
\lax{#1}{#2}{#3}
\lax{#1+1.2}{#2}{#4}
\lax{#1+2.4}{#2}{#5}
\lax{#1+3.6}{#2}{#6}
\lax{#1+4.8}{#2}{#7}
}
\laa{0}{0}{1}{0}{1}{0}{0}
\laa{0}{-2}{1}{0}{0}{1}{0}
\laa{8}{0}{1}{0}{0}{0}{1}
\laa{8}{-2}{0}{1}{0}{1}{0}
\laa{16}{0}{0}{1}{0}{0}{1}
\laa{16}{-2}{0}{0}{1}{0}{1}
\end{tikzpicture}
\end{center}
In this scenario, we can assume that
$k$ balls are initially placed in boxes
and there is an empty box between each
two adjacent boxes.
The remaining task is to choose the
positions for the remaining empty boxes.
There are $n-2k+1$ such boxes and
$k+1$ positions for them.
Thus, using the formula of scenario 2,
the number of solutions is
${n-k+1 \choose n-2k+1}$.
\subsubsection{Multinomial coefficients}
\index{multinomial coefficient}
The \key{multinomial coefficient}
\[ {n \choose k_1,k_2,\ldots,k_m} = \frac{n!}{k_1! k_2! \cdots k_m!}, \]
equals the number of ways
we can divide $n$ elements into subsets
of sizes $k_1,k_2,\ldots,k_m$,
where $k_1+k_2+\cdots+k_m=n$.
Multinomial coefficients can be seen as a
generalization of binomial cofficients;
if $m=2$, the above formula
corresponds to the binomial coefficient formula.
\section{Catalan numbers}
\index{Catalan number}
The \key{Catalan number}
%\footnote{E. C. Catalan (1814--1894) was a Belgian mathematician.}
$C_n$ equals the
number of valid
parenthesis expressions that consist of
$n$ left parentheses and $n$ right parentheses.
For example, $C_3=5$, because
we can construct the following parenthesis
expressions using three
left and right parentheses:
\begin{itemize}[noitemsep]
\item \texttt{()()()}
\item \texttt{(())()}
\item \texttt{()(())}
\item \texttt{((()))}
\item \texttt{(()())}
\end{itemize}
\subsubsection{Parenthesis expressions}
\index{parenthesis expression}
What is exactly a \emph{valid parenthesis expression}?
The following rules precisely define all
valid parenthesis expressions:
\begin{itemize}
\item An empty parenthesis expression is valid.
\item If an expression $A$ is valid,
then also the expression
\texttt{(}$A$\texttt{)} is valid.
\item If expressions $A$ and $B$ are valid,
then also the expression $AB$ is valid.
\end{itemize}
Another way to characterize valid
parenthesis expressions is that if
we choose any prefix of such an expression,
it has to contain at least as many left
parentheses as right parentheses.
In addition, the complete expression has to
contain an equal number of left and right
parentheses.
\subsubsection{Formula 1}
Catalan numbers can be calculated using the formula
\[ C_n = \sum_{i=0}^{n-1} C_{i} C_{n-i-1}.\]
The sum goes through the ways to divide the
expression into two parts
such that both parts are valid
expressions and the first part is as short as possible
but not empty.
For any $i$, the first part contains $i+1$ pairs
of parentheses and the number of expressions
is the product of the following values:
\begin{itemize}
\item $C_{i}$: the number of ways to construct an expression
using the parentheses of the first part,
not counting the outermost parentheses
\item $C_{n-i-1}$: the number of ways to construct an
expression using the parentheses of the second part
\end{itemize}
The base case is $C_0=1$,
because we can construct an empty parenthesis
expression using zero pairs of parentheses.
\subsubsection{Formula 2}
Catalan numbers can also be calculated
using binomial coefficients:
\[ C_n = \frac{1}{n+1} {2n \choose n}\]
The formula can be explained as follows:
There are a total of ${2n \choose n}$ ways
to construct a (not necessarily valid)
parenthesis expression that contains $n$ left
parentheses and $n$ right parentheses.
Let us calculate the number of such
expressions that are \emph{not} valid.
If a parenthesis expression is not valid,
it has to contain a prefix where the
number of right parentheses exceeds the
number of left parentheses.
The idea is to reverse each parenthesis
that belongs to such a prefix.
For example, the expression
\texttt{())()(} contains a prefix \texttt{())},
and after reversing the prefix,
the expression becomes \texttt{)((()(}.
The resulting expression consists of $n+1$
left parentheses and $n-1$ right parentheses.
The number of such expressions is ${2n \choose n+1}$,
which equals the number of non-valid
parenthesis expressions.
Thus, the number of valid parenthesis
expressions can be calculated using the formula
\[{2n \choose n}-{2n \choose n+1} = {2n \choose n} - \frac{n}{n+1} {2n \choose n} = \frac{1}{n+1} {2n \choose n}.\]
\subsubsection{Counting trees}
Catalan numbers are also related to trees:
\begin{itemize}
\item there are $C_n$ binary trees of $n$ nodes
\item there are $C_{n-1}$ rooted trees of $n$ nodes
\end{itemize}
\noindent
For example, for $C_3=5$, the binary trees are
\begin{center}
\begin{tikzpicture}[scale=0.7]
\path[draw,thick,-] (0,0) -- (-1,-1);
\path[draw,thick,-] (0,0) -- (1,-1);
\draw[fill=white] (0,0) circle (0.3);
\draw[fill=white] (-1,-1) circle (0.3);
\draw[fill=white] (1,-1) circle (0.3);
\path[draw,thick,-] (4,0) -- (4-0.75,-1) -- (4-1.5,-2);
\draw[fill=white] (4,0) circle (0.3);
\draw[fill=white] (4-0.75,-1) circle (0.3);
\draw[fill=white] (4-1.5,-2) circle (0.3);
\path[draw,thick,-] (6.5,0) -- (6.5-0.75,-1) -- (6.5-0,-2);
\draw[fill=white] (6.5,0) circle (0.3);
\draw[fill=white] (6.5-0.75,-1) circle (0.3);
\draw[fill=white] (6.5-0,-2) circle (0.3);
\path[draw,thick,-] (9,0) -- (9+0.75,-1) -- (9-0,-2);
\draw[fill=white] (9,0) circle (0.3);
\draw[fill=white] (9+0.75,-1) circle (0.3);
\draw[fill=white] (9-0,-2) circle (0.3);
\path[draw,thick,-] (11.5,0) -- (11.5+0.75,-1) -- (11.5+1.5,-2);
\draw[fill=white] (11.5,0) circle (0.3);
\draw[fill=white] (11.5+0.75,-1) circle (0.3);
\draw[fill=white] (11.5+1.5,-2) circle (0.3);
\end{tikzpicture}
\end{center}
and the rooted trees are
\begin{center}
\begin{tikzpicture}[scale=0.7]
\path[draw,thick,-] (0,0) -- (-1,-1);
\path[draw,thick,-] (0,0) -- (0,-1);
\path[draw,thick,-] (0,0) -- (1,-1);
\draw[fill=white] (0,0) circle (0.3);
\draw[fill=white] (-1,-1) circle (0.3);
\draw[fill=white] (0,-1) circle (0.3);
\draw[fill=white] (1,-1) circle (0.3);
\path[draw,thick,-] (3,0) -- (3,-1) -- (3,-2) -- (3,-3);
\draw[fill=white] (3,0) circle (0.3);
\draw[fill=white] (3,-1) circle (0.3);
\draw[fill=white] (3,-2) circle (0.3);
\draw[fill=white] (3,-3) circle (0.3);
\path[draw,thick,-] (6+0,0) -- (6-1,-1);
\path[draw,thick,-] (6+0,0) -- (6+1,-1) -- (6+1,-2);
\draw[fill=white] (6+0,0) circle (0.3);
\draw[fill=white] (6-1,-1) circle (0.3);
\draw[fill=white] (6+1,-1) circle (0.3);
\draw[fill=white] (6+1,-2) circle (0.3);
\path[draw,thick,-] (9+0,0) -- (9+1,-1);
\path[draw,thick,-] (9+0,0) -- (9-1,-1) -- (9-1,-2);
\draw[fill=white] (9+0,0) circle (0.3);
\draw[fill=white] (9+1,-1) circle (0.3);
\draw[fill=white] (9-1,-1) circle (0.3);
\draw[fill=white] (9-1,-2) circle (0.3);
\path[draw,thick,-] (12+0,0) -- (12+0,-1) -- (12-1,-2);
\path[draw,thick,-] (12+0,0) -- (12+0,-1) -- (12+1,-2);
\draw[fill=white] (12+0,0) circle (0.3);
\draw[fill=white] (12+0,-1) circle (0.3);
\draw[fill=white] (12-1,-2) circle (0.3);
\draw[fill=white] (12+1,-2) circle (0.3);
\end{tikzpicture}
\end{center}
\section{Inclusion-exclusion}
\index{inclusion-exclusion}
\key{Inclusion-exclusion} is a technique
that can be used for counting the size
of a union of sets when the sizes of
the intersections are known, and vice versa.
A simple example of the technique is the formula
\[ |A \cup B| = |A| + |B| - |A \cap B|,\]
where $A$ and $B$ are sets and $|X|$
denotes the size of $X$.
The formula can be illustrated as follows:
\begin{center}
\begin{tikzpicture}[scale=0.8]
\draw (0,0) circle (1.5);
\draw (1.5,0) circle (1.5);
\node at (-0.75,0) {\small $A$};
\node at (2.25,0) {\small $B$};
\node at (0.75,0) {\small $A \cap B$};
\end{tikzpicture}
\end{center}
Our goal is to calculate
the size of the union $A \cup B$
that corresponds to the area of the region
that belongs to at least one circle.
The picture shows that we can calculate
the area of $A \cup B$ by first summing the
areas of $A$ and $B$ and then subtracting
the area of $A \cap B$.
The same idea can be applied when the number
of sets is larger.
When there are three sets, the inclusion-exclusion formula is
\[ |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \]
and the corresponding picture is
\begin{center}
\begin{tikzpicture}[scale=0.8]
\draw (0,0) circle (1.75);
\draw (2,0) circle (1.75);
\draw (1,1.5) circle (1.75);
\node at (-0.75,-0.25) {\small $A$};
\node at (2.75,-0.25) {\small $B$};
\node at (1,2.5) {\small $C$};
\node at (1,-0.5) {\small $A \cap B$};
\node at (0,1.25) {\small $A \cap C$};
\node at (2,1.25) {\small $B \cap C$};
\node at (1,0.5) {\scriptsize $A \cap B \cap C$};
\end{tikzpicture}
\end{center}
In the general case, the size of the
union $X_1 \cup X_2 \cup \cdots \cup X_n$
can be calculated by going through all possible
intersections that contain some of the sets $X_1,X_2,\ldots,X_n$.
If the intersection contains an odd number of sets,
its size is added to the answer,
and otherwise its size is subtracted from the answer.
Note that there are similar formulas
for calculating
the size of an intersection from the sizes of
unions. For example,
\[ |A \cap B| = |A| + |B| - |A \cup B|\]
and
\[ |A \cap B \cap C| = |A| + |B| + |C| - |A \cup B| - |A \cup C| - |B \cup C| + |A \cup B \cup C| .\]
\subsubsection{Derangements}
\index{derangement}
As an example, let us count the number of \key{derangements}
of elements $\{1,2,\ldots,n\}$, i.e., permutations
where no element remains in its original place.
For example, when $n=3$, there are
two derangements: $(2,3,1)$ and $(3,1,2)$.
One approach for solving the problem is to use
inclusion-exclusion.
Let $X_k$ be the set of permutations
that contain the element $k$ at position $k$.
For example, when $n=3$, the sets are as follows:
\[
\begin{array}{lcl}
X_1 & = & \{(1,2,3),(1,3,2)\} \\
X_2 & = & \{(1,2,3),(3,2,1)\} \\
X_3 & = & \{(1,2,3),(2,1,3)\} \\
\end{array}
\]
Using these sets, the number of derangements equals
\[ n! - |X_1 \cup X_2 \cup \cdots \cup X_n|, \]
so it suffices to calculate the size of the union.
Using inclusion-exclusion, this reduces to
calculating sizes of intersections which can be
done efficiently.
For example, when $n=3$, the size of
$|X_1 \cup X_2 \cup X_3|$ is
\[
\begin{array}{lcl}
& & |X_1| + |X_2| + |X_3| - |X_1 \cap X_2| - |X_1 \cap X_3| - |X_2 \cap X_3| + |X_1 \cap X_2 \cap X_3| \\
& = & 2+2+2-1-1-1+1 \\
& = & 4, \\
\end{array}
\]
so the number of solutions is $3!-4=2$.
It turns out that the problem can also be solved
without using inclusion-exclusion.
Let $f(n)$ denote the number of derangements
for $\{1,2,\ldots,n\}$. We can use the following
recursive formula:
\begin{equation*}
f(n) = \begin{cases}
0 & n = 1\\
1 & n = 2\\
(n-1)(f(n-2) + f(n-1)) & n>2 \\
\end{cases}
\end{equation*}
The formula can be derived by considering
the possibilities how the element 1 changes
in the derangement.
There are $n-1$ ways to choose an element $x$
that replaces the element 1.
In each such choice, there are two options:
\textit{Option 1:} We also replace the element $x$
with the element 1.
After this, the remaining task is to construct
a derangement of $n-2$ elements.
\textit{Option 2:} We replace the element $x$
with some other element than 1.
Now we have to construct a derangement
of $n-1$ element, because we cannot replace
the element $x$ with the element $1$, and all other
elements must be changed.
\section{Burnside's lemma}
\index{Burnside's lemma}
\key{Burnside's lemma}
%\footnote{Actually, Burnside did not discover this lemma; he only mentioned it in his book \cite{bur97}.}
can be used to count
the number of combinations so that
only one representative is counted
for each group of symmetric combinations.
Burnside's lemma states that the number of
combinations is
\[\sum_{k=1}^n \frac{c(k)}{n},\]
where there are $n$ ways to change the
position of a combination,
and there are $c(k)$ combinations that
remain unchanged when the $k$th way is applied.
As an example, let us calculate the number of
necklaces of $n$ pearls,
where each pearl has $m$ possible colors.
Two necklaces are symmetric if they are
similar after rotating them.
For example, the necklace
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw[fill=white] (0,0) circle (1);
\draw[fill=red] (0,1) circle (0.3);
\draw[fill=blue] (1,0) circle (0.3);
\draw[fill=red] (0,-1) circle (0.3);
\draw[fill=green] (-1,0) circle (0.3);
\end{tikzpicture}
\end{center}
has the following symmetric necklaces:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw[fill=white] (0,0) circle (1);
\draw[fill=red] (0,1) circle (0.3);
\draw[fill=blue] (1,0) circle (0.3);
\draw[fill=red] (0,-1) circle (0.3);
\draw[fill=green] (-1,0) circle (0.3);
\draw[fill=white] (4,0) circle (1);
\draw[fill=green] (4+0,1) circle (0.3);
\draw[fill=red] (4+1,0) circle (0.3);
\draw[fill=blue] (4+0,-1) circle (0.3);
\draw[fill=red] (4+-1,0) circle (0.3);
\draw[fill=white] (8,0) circle (1);
\draw[fill=red] (8+0,1) circle (0.3);
\draw[fill=green] (8+1,0) circle (0.3);
\draw[fill=red] (8+0,-1) circle (0.3);
\draw[fill=blue] (8+-1,0) circle (0.3);
\draw[fill=white] (12,0) circle (1);
\draw[fill=blue] (12+0,1) circle (0.3);
\draw[fill=red] (12+1,0) circle (0.3);
\draw[fill=green] (12+0,-1) circle (0.3);
\draw[fill=red] (12+-1,0) circle (0.3);
\end{tikzpicture}
\end{center}
There are $n$ ways to change the position
of a necklace,
because we can rotate it
$0,1,\ldots,n-1$ steps clockwise.
If the number of steps is 0,
all $m^n$ necklaces remain the same,
and if the number of steps is 1,
only the $m$ necklaces where each
pearl has the same color remain the same.
More generally, when the number of steps is $k$,
a total of
\[m^{\textrm{gcd}(k,n)}\]
necklaces remain the same,
where $\textrm{gcd}(k,n)$ is the greatest common
divisor of $k$ and $n$.
The reason for this is that blocks
of pearls of size $\textrm{gcd}(k,n)$
will replace each other.
Thus, according to Burnside's lemma,
the number of necklaces is
\[\sum_{i=0}^{n-1} \frac{m^{\textrm{gcd}(i,n)}}{n}. \]
For example, the number of necklaces of length 4
with 3 colors is
\[\frac{3^4+3+3^2+3}{4} = 24. \]
\section{Cayley's formula}
\index{Cayley's formula}
\key{Cayley's formula}
% \footnote{While the formula is named after A. Cayley,
% who studied it in 1889, it was discovered earlier by C. W. Borchardt in 1860.}
states that
there are $n^{n-2}$ labeled trees
that contain $n$ nodes.
The nodes are labeled $1,2,\ldots,n$,
and two trees are different
if either their structure or
labeling is different.
\begin{samepage}
For example, when $n=4$, the number of labeled
trees is $4^{4-2}=16$:
\begin{center}
\begin{tikzpicture}[scale=0.8]
\footnotesize
\newcommand\puua[6]{
\path[draw,thick,-] (#1,#2) -- (#1-1.25,#2-1.5);
\path[draw,thick,-] (#1,#2) -- (#1,#2-1.5);
\path[draw,thick,-] (#1,#2) -- (#1+1.25,#2-1.5);
\node[draw, circle, fill=white] at (#1,#2) {#3};
\node[draw, circle, fill=white] at (#1-1.25,#2-1.5) {#4};
\node[draw, circle, fill=white] at (#1,#2-1.5) {#5};
\node[draw, circle, fill=white] at (#1+1.25,#2-1.5) {#6};
}
\newcommand\puub[6]{
\path[draw,thick,-] (#1,#2) -- (#1+1,#2);
\path[draw,thick,-] (#1+1,#2) -- (#1+2,#2);
\path[draw,thick,-] (#1+2,#2) -- (#1+3,#2);
\node[draw, circle, fill=white] at (#1,#2) {#3};
\node[draw, circle, fill=white] at (#1+1,#2) {#4};
\node[draw, circle, fill=white] at (#1+2,#2) {#5};
\node[draw, circle, fill=white] at (#1+3,#2) {#6};
}
\puua{0}{0}{1}{2}{3}{4}
\puua{4}{0}{2}{1}{3}{4}
\puua{8}{0}{3}{1}{2}{4}
\puua{12}{0}{4}{1}{2}{3}
\puub{0}{-3}{1}{2}{3}{4}
\puub{4.5}{-3}{1}{2}{4}{3}
\puub{9}{-3}{1}{3}{2}{4}
\puub{0}{-4.5}{1}{3}{4}{2}
\puub{4.5}{-4.5}{1}{4}{2}{3}
\puub{9}{-4.5}{1}{4}{3}{2}
\puub{0}{-6}{2}{1}{3}{4}
\puub{4.5}{-6}{2}{1}{4}{3}
\puub{9}{-6}{2}{3}{1}{4}
\puub{0}{-7.5}{2}{4}{1}{3}
\puub{4.5}{-7.5}{3}{1}{2}{4}
\puub{9}{-7.5}{3}{2}{1}{4}
\end{tikzpicture}
\end{center}
\end{samepage}
Next we will see how Cayley's formula can
be derived using Prüfer codes.
\subsubsection{Prüfer code}
\index{Prüfer code}
A \key{Prüfer code}
%\footnote{In 1918, H. Prüfer proved Cayley's theorem using Prüfer codes \cite{pru18}.}
is a sequence of
$n-2$ numbers that describes a labeled tree.
The code is constructed by following a process
that removes $n-2$ leaves from the tree.
At each step, the leaf with the smallest label is removed,
and the label of its only neighbor is added to the code.
For example, let us calculate the Prüfer code
of the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (2,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (2,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (5.5,2) {$5$};
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
First we remove node 1 and add node 4 to the code:
\begin{center}
\begin{tikzpicture}[scale=0.9]
%\node[draw, circle] (1) at (2,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (2,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (5.5,2) {$5$};
%\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
Then we remove node 3 and add node 4 to the code:
\begin{center}
\begin{tikzpicture}[scale=0.9]
%\node[draw, circle] (1) at (2,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
%\node[draw, circle] (3) at (2,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (5.5,2) {$5$};
%\path[draw,thick,-] (1) -- (4);
%\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
Finally we remove node 4 and add node 2 to the code:
\begin{center}
\begin{tikzpicture}[scale=0.9]
%\node[draw, circle] (1) at (2,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
%\node[draw, circle] (3) at (2,1) {$3$};
%\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (5.5,2) {$5$};
%\path[draw,thick,-] (1) -- (4);
%\path[draw,thick,-] (3) -- (4);
%\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
Thus, the Prüfer code of the graph is $[4,4,2]$.
We can construct a Prüfer code for any tree,
and more importantly,
the original tree can be reconstructed
from a Prüfer code.
Hence, the number of labeled trees
of $n$ nodes equals
$n^{n-2}$, the number of Prüfer codes
of size $n$.

856
chapter23.tex Normal file
View File

@ -0,0 +1,856 @@
\chapter{Matrices}
\index{matrix}
A \key{matrix} is a mathematical concept
that corresponds to a two-dimensional array
in programming. For example,
\[
A =
\begin{bmatrix}
6 & 13 & 7 & 4 \\
7 & 0 & 8 & 2 \\
9 & 5 & 4 & 18 \\
\end{bmatrix}
\]
is a matrix of size $3 \times 4$, i.e.,
it has 3 rows and 4 columns.
The notation $[i,j]$ refers to
the element in row $i$ and column $j$
in a matrix.
For example, in the above matrix,
$A[2,3]=8$ and $A[3,1]=9$.
\index{vector}
A special case of a matrix is a \key{vector}
that is a one-dimensional matrix of size $n \times 1$.
For example,
\[
V =
\begin{bmatrix}
4 \\
7 \\
5 \\
\end{bmatrix}
\]
is a vector that contains three elements.
\index{transpose}
The \key{transpose} $A^T$ of a matrix $A$
is obtained when the rows and columns of $A$
are swapped, i.e., $A^T[i,j]=A[j,i]$:
\[
A^T =
\begin{bmatrix}
6 & 7 & 9 \\
13 & 0 & 5 \\
7 & 8 & 4 \\
4 & 2 & 18 \\
\end{bmatrix}
\]
\index{square matrix}
A matrix is a \key{square matrix} if it
has the same number of rows and columns.
For example, the following matrix is a
square matrix:
\[
S =
\begin{bmatrix}
3 & 12 & 4 \\
5 & 9 & 15 \\
0 & 2 & 4 \\
\end{bmatrix}
\]
\section{Operations}
The sum $A+B$ of matrices $A$ and $B$
is defined if the matrices are of the same size.
The result is a matrix where each element
is the sum of the corresponding elements
in $A$ and $B$.
For example,
\[
\begin{bmatrix}
6 & 1 & 4 \\
3 & 9 & 2 \\
\end{bmatrix}
+
\begin{bmatrix}
4 & 9 & 3 \\
8 & 1 & 3 \\
\end{bmatrix}
=
\begin{bmatrix}
6+4 & 1+9 & 4+3 \\
3+8 & 9+1 & 2+3 \\
\end{bmatrix}
=
\begin{bmatrix}
10 & 10 & 7 \\
11 & 10 & 5 \\
\end{bmatrix}.
\]
Multiplying a matrix $A$ by a value $x$ means
that each element of $A$ is multiplied by $x$.
For example,
\[
2 \cdot \begin{bmatrix}
6 & 1 & 4 \\
3 & 9 & 2 \\
\end{bmatrix}
=
\begin{bmatrix}
2 \cdot 6 & 2\cdot1 & 2\cdot4 \\
2\cdot3 & 2\cdot9 & 2\cdot2 \\
\end{bmatrix}
=
\begin{bmatrix}
12 & 2 & 8 \\
6 & 18 & 4 \\
\end{bmatrix}.
\]
\subsubsection{Matrix multiplication}
\index{matrix multiplication}
The product $AB$ of matrices $A$ and $B$
is defined if $A$ is of size $a \times n$
and $B$ is of size $n \times b$, i.e.,
the width of $A$ equals the height of $B$.
The result is a matrix of size $a \times b$
whose elements are calculated using the formula
\[
AB[i,j] = \sum_{k=1}^n A[i,k] \cdot B[k,j].
\]
The idea is that each element of $AB$
is a sum of products of elements of $A$ and $B$
according to the following picture:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\draw (0,0) grid (4,3);
\draw (5,0) grid (10,3);
\draw (5,4) grid (10,8);
\node at (2,-1) {$A$};
\node at (7.5,-1) {$AB$};
\node at (11,6) {$B$};
\draw[thick,->,red,line width=2pt] (0,1.5) -- (4,1.5);
\draw[thick,->,red,line width=2pt] (6.5,8) -- (6.5,4);
\draw[thick,red,line width=2pt] (6.5,1.5) circle (0.4);
\end{tikzpicture}
\end{center}
For example,
\[
\begin{bmatrix}
1 & 4 \\
3 & 9 \\
8 & 6 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
1 & 6 \\
2 & 9 \\
\end{bmatrix}
=
\begin{bmatrix}
1 \cdot 1 + 4 \cdot 2 & 1 \cdot 6 + 4 \cdot 9 \\
3 \cdot 1 + 9 \cdot 2 & 3 \cdot 6 + 9 \cdot 9 \\
8 \cdot 1 + 6 \cdot 2 & 8 \cdot 6 + 6 \cdot 9 \\
\end{bmatrix}
=
\begin{bmatrix}
9 & 42 \\
21 & 99 \\
20 & 102 \\
\end{bmatrix}.
\]
Matrix multiplication is associative,
so $A(BC)=(AB)C$ holds,
but it is not commutative,
so $AB = BA$ does not usually hold.
\index{identity matrix}
An \key{identity matrix} is a square matrix
where each element on the diagonal is 1
and all other elements are 0.
For example, the following matrix
is the $3 \times 3$ identity matrix:
\[
I = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
\]
\begin{samepage}
Multiplying a matrix by an identity matrix
does not change it. For example,
\[
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
1 & 4 \\
3 & 9 \\
8 & 6 \\
\end{bmatrix}
=
\begin{bmatrix}
1 & 4 \\
3 & 9 \\
8 & 6 \\
\end{bmatrix} \hspace{10px} \textrm{and} \hspace{10px}
\begin{bmatrix}
1 & 4 \\
3 & 9 \\
8 & 6 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
\end{bmatrix}
=
\begin{bmatrix}
1 & 4 \\
3 & 9 \\
8 & 6 \\
\end{bmatrix}.
\]
\end{samepage}
Using a straightforward algorithm,
we can calculate the product of
two $n \times n$ matrices
in $O(n^3)$ time.
There are also more efficient algorithms
for matrix multiplication\footnote{The first such
algorithm was Strassen's algorithm,
published in 1969 \cite{str69},
whose time complexity is $O(n^{2.80735})$;
the best current algorithm \cite{gal14}
works in $O(n^{2.37286})$ time.},
but they are mostly of theoretical interest
and such algorithms are not necessary
in competitive programming.
\subsubsection{Matrix power}
\index{matrix power}
The power $A^k$ of a matrix $A$ is defined
if $A$ is a square matrix.
The definition is based on matrix multiplication:
\[ A^k = \underbrace{A \cdot A \cdot A \cdots A}_{\textrm{$k$ times}} \]
For example,
\[
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix}^3 =
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix} \cdot
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix} \cdot
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix} =
\begin{bmatrix}
48 & 165 \\
33 & 114 \\
\end{bmatrix}.
\]
In addition, $A^0$ is an identity matrix. For example,
\[
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix}^0 =
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
\end{bmatrix}.
\]
The matrix $A^k$ can be efficiently calculated
in $O(n^3 \log k)$ time using the
algorithm in Chapter 21.2. For example,
\[
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix}^8 =
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix}^4 \cdot
\begin{bmatrix}
2 & 5 \\
1 & 4 \\
\end{bmatrix}^4.
\]
\subsubsection{Determinant}
\index{determinant}
The \key{determinant} $\det(A)$ of a matrix $A$
is defined if $A$ is a square matrix.
If $A$ is of size $1 \times 1$,
then $\det(A)=A[1,1]$.
The determinant of a larger matrix is
calculated recursively using the formula \index{cofactor}
\[\det(A)=\sum_{j=1}^n A[1,j] C[1,j],\]
where $C[i,j]$ is the \key{cofactor} of $A$
at $[i,j]$.
The cofactor is calculated using the formula
\[C[i,j] = (-1)^{i+j} \det(M[i,j]),\]
where $M[i,j]$ is obtained by removing
row $i$ and column $j$ from $A$.
Due to the coefficient $(-1)^{i+j}$ in the cofactor,
every other determinant is positive
and negative.
For example,
\[
\det(
\begin{bmatrix}
3 & 4 \\
1 & 6 \\
\end{bmatrix}
) = 3 \cdot 6 - 4 \cdot 1 = 14
\]
and
\[
\det(
\begin{bmatrix}
2 & 4 & 3 \\
5 & 1 & 6 \\
7 & 2 & 4 \\
\end{bmatrix}
) =
2 \cdot
\det(
\begin{bmatrix}
1 & 6 \\
2 & 4 \\
\end{bmatrix}
)
-4 \cdot
\det(
\begin{bmatrix}
5 & 6 \\
7 & 4 \\
\end{bmatrix}
)
+3 \cdot
\det(
\begin{bmatrix}
5 & 1 \\
7 & 2 \\
\end{bmatrix}
) = 81.
\]
\index{inverse matrix}
The determinant of $A$ tells us
whether there is an \key{inverse matrix}
$A^{-1}$ such that $A \cdot A^{-1} = I$,
where $I$ is an identity matrix.
It turns out that $A^{-1}$ exists
exactly when $\det(A) \neq 0$,
and it can be calculated using the formula
\[A^{-1}[i,j] = \frac{C[j,i]}{det(A)}.\]
For example,
\[
\underbrace{
\begin{bmatrix}
2 & 4 & 3\\
5 & 1 & 6\\
7 & 2 & 4\\
\end{bmatrix}
}_{A}
\cdot
\underbrace{
\frac{1}{81}
\begin{bmatrix}
-8 & -10 & 21 \\
22 & -13 & 3 \\
3 & 24 & -18 \\
\end{bmatrix}
}_{A^{-1}}
=
\underbrace{
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
}_{I}.
\]
\section{Linear recurrences}
\index{linear recurrence}
A \key{linear recurrence}
is a function $f(n)$
whose initial values are
$f(0),f(1),\ldots,f(k-1)$
and larger values
are calculated recursively using the formula
\[f(n) = c_1 f(n-1) + c_2 f(n-2) + \ldots + c_k f (n-k),\]
where $c_1,c_2,\ldots,c_k$ are constant coefficients.
Dynamic programming can be used to calculate
any value of $f(n)$ in $O(kn)$ time by calculating
all values of $f(0),f(1),\ldots,f(n)$ one after another.
However, if $k$ is small, it is possible to calculate
$f(n)$ much more efficiently in $O(k^3 \log n)$
time using matrix operations.
\subsubsection{Fibonacci numbers}
\index{Fibonacci number}
A simple example of a linear recurrence is the
following function that defines the Fibonacci numbers:
\[
\begin{array}{lcl}
f(0) & = & 0 \\
f(1) & = & 1 \\
f(n) & = & f(n-1)+f(n-2) \\
\end{array}
\]
In this case, $k=2$ and $c_1=c_2=1$.
\begin{samepage}
To efficiently calculate Fibonacci numbers,
we represent the
Fibonacci formula as a
square matrix $X$ of size $2 \times 2$,
for which the following holds:
\[ X \cdot
\begin{bmatrix}
f(i) \\
f(i+1) \\
\end{bmatrix}
=
\begin{bmatrix}
f(i+1) \\
f(i+2) \\
\end{bmatrix}
\]
Thus, values $f(i)$ and $f(i+1)$ are given as
''input'' for $X$,
and $X$ calculates values $f(i+1)$ and $f(i+2)$
from them.
It turns out that such a matrix is
\[ X =
\begin{bmatrix}
0 & 1 \\
1 & 1 \\
\end{bmatrix}.
\]
\end{samepage}
\noindent
For example,
\[
\begin{bmatrix}
0 & 1 \\
1 & 1 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
f(5) \\
f(6) \\
\end{bmatrix}
=
\begin{bmatrix}
0 & 1 \\
1 & 1 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
5 \\
8 \\
\end{bmatrix}
=
\begin{bmatrix}
8 \\
13 \\
\end{bmatrix}
=
\begin{bmatrix}
f(6) \\
f(7) \\
\end{bmatrix}.
\]
Thus, we can calculate $f(n)$ using the formula
\[
\begin{bmatrix}
f(n) \\
f(n+1) \\
\end{bmatrix}
=
X^n \cdot
\begin{bmatrix}
f(0) \\
f(1) \\
\end{bmatrix}
=
\begin{bmatrix}
0 & 1 \\
1 & 1 \\
\end{bmatrix}^n
\cdot
\begin{bmatrix}
0 \\
1 \\
\end{bmatrix}.
\]
The value of $X^n$ can be calculated in
$O(\log n)$ time,
so the value of $f(n)$ can also be calculated
in $O(\log n)$ time.
\subsubsection{General case}
Let us now consider the general case where
$f(n)$ is any linear recurrence.
Again, our goal is to construct a matrix $X$
for which
\[ X \cdot
\begin{bmatrix}
f(i) \\
f(i+1) \\
\vdots \\
f(i+k-1) \\
\end{bmatrix}
=
\begin{bmatrix}
f(i+1) \\
f(i+2) \\
\vdots \\
f(i+k) \\
\end{bmatrix}.
\]
Such a matrix is
\[
X =
\begin{bmatrix}
0 & 1 & 0 & 0 & \cdots & 0 \\
0 & 0 & 1 & 0 & \cdots & 0 \\
0 & 0 & 0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0 & \cdots & 1 \\
c_k & c_{k-1} & c_{k-2} & c_{k-3} & \cdots & c_1 \\
\end{bmatrix}.
\]
In the first $k-1$ rows, each element is 0
except that one element is 1.
These rows replace $f(i)$ with $f(i+1)$,
$f(i+1)$ with $f(i+2)$, and so on.
The last row contains the coefficients of the recurrence
to calculate the new value $f(i+k)$.
\begin{samepage}
Now, $f(n)$ can be calculated in
$O(k^3 \log n)$ time using the formula
\[
\begin{bmatrix}
f(n) \\
f(n+1) \\
\vdots \\
f(n+k-1) \\
\end{bmatrix}
=
X^n \cdot
\begin{bmatrix}
f(0) \\
f(1) \\
\vdots \\
f(k-1) \\
\end{bmatrix}.
\]
\end{samepage}
\section{Graphs and matrices}
\subsubsection{Counting paths}
The powers of an adjacency matrix of a graph
have an interesting property.
When $V$ is an adjacency matrix of an unweighted graph,
the matrix $V^n$ contains the numbers of paths of
$n$ edges between the nodes in the graph.
For example, for the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (1,1) {$4$};
\node[draw, circle] (3) at (3,3) {$2$};
\node[draw, circle] (4) at (5,3) {$3$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,->,>=latex] (1) -- (2);
\path[draw,thick,->,>=latex] (2) -- (3);
\path[draw,thick,->,>=latex] (3) -- (1);
\path[draw,thick,->,>=latex] (4) -- (3);
\path[draw,thick,->,>=latex] (3) -- (5);
\path[draw,thick,->,>=latex] (3) -- (6);
\path[draw,thick,->,>=latex] (6) -- (4);
\path[draw,thick,->,>=latex] (6) -- (5);
\end{tikzpicture}
\end{center}
the adjacency matrix is
\[
V= \begin{bmatrix}
0 & 0 & 0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 & 1 & 1 \\
0 & 1 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 1 & 0 \\
\end{bmatrix}.
\]
Now, for example, the matrix
\[
V^4= \begin{bmatrix}
0 & 0 & 1 & 1 & 1 & 0 \\
2 & 0 & 0 & 0 & 2 & 2 \\
0 & 2 & 0 & 0 & 0 & 0 \\
0 & 2 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 1 & 1 & 0 \\
\end{bmatrix}
\]
contains the numbers of paths of 4 edges
between the nodes.
For example, $V^4[2,5]=2$,
because there are two paths of 4 edges
from node 2 to node 5:
$2 \rightarrow 1 \rightarrow 4 \rightarrow 2 \rightarrow 5$
and
$2 \rightarrow 6 \rightarrow 3 \rightarrow 2 \rightarrow 5$.
\subsubsection{Shortest paths}
Using a similar idea in a weighted graph,
we can calculate for each pair of nodes the minimum
length of a path
between them that contains exactly $n$ edges.
To calculate this, we have to define matrix multiplication
in a new way, so that we do not calculate the numbers
of paths but minimize the lengths of paths.
\begin{samepage}
As an example, consider the following graph:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (1,1) {$4$};
\node[draw, circle] (3) at (3,3) {$2$};
\node[draw, circle] (4) at (5,3) {$3$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,->,>=latex] (1) -- node[font=\small,label=left:4] {} (2);
\path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:1] {} (3);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=north:2] {} (1);
\path[draw,thick,->,>=latex] (4) -- node[font=\small,label=north:4] {} (3);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=left:1] {} (5);
\path[draw,thick,->,>=latex] (3) -- node[font=\small,label=left:2] {} (6);
\path[draw,thick,->,>=latex] (6) -- node[font=\small,label=right:3] {} (4);
\path[draw,thick,->,>=latex] (6) -- node[font=\small,label=below:2] {} (5);
\end{tikzpicture}
\end{center}
\end{samepage}
Let us construct an adjacency matrix where
$\infty$ means that an edge does not exist,
and other values correspond to edge weights.
The matrix is
\[
V= \begin{bmatrix}
\infty & \infty & \infty & 4 & \infty & \infty \\
2 & \infty & \infty & \infty & 1 & 2 \\
\infty & 4 & \infty & \infty & \infty & \infty \\
\infty & 1 & \infty & \infty & \infty & \infty \\
\infty & \infty & \infty & \infty & \infty & \infty \\
\infty & \infty & 3 & \infty & 2 & \infty \\
\end{bmatrix}.
\]
Instead of the formula
\[
AB[i,j] = \sum_{k=1}^n A[i,k] \cdot B[k,j]
\]
we now use the formula
\[
AB[i,j] = \min_{k=1}^n A[i,k] + B[k,j]
\]
for matrix multiplication, so we calculate
a minimum instead of a sum,
and a sum of elements instead of a product.
After this modification,
matrix powers correspond to
shortest paths in the graph.
For example, as
\[
V^4= \begin{bmatrix}
\infty & \infty & 10 & 11 & 9 & \infty \\
9 & \infty & \infty & \infty & 8 & 9 \\
\infty & 11 & \infty & \infty & \infty & \infty \\
\infty & 8 & \infty & \infty & \infty & \infty \\
\infty & \infty & \infty & \infty & \infty & \infty \\
\infty & \infty & 12 & 13 & 11 & \infty \\
\end{bmatrix},
\]
we can conclude that the minimum length of a path
of 4 edges
from node 2 to node 5 is 8.
Such a path is
$2 \rightarrow 1 \rightarrow 4 \rightarrow 2 \rightarrow 5$.
\subsubsection{Kirchhoff's theorem}
\index{Kirchhoff's theorem}
\index{spanning tree}
\key{Kirchhoff's theorem}
%\footnote{G. R. Kirchhoff (1824--1887) was a German physicist.}
provides a way
to calculate the number of spanning trees
of a graph as a determinant of a special matrix.
For example, the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (3,1) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (1) -- (4);
\end{tikzpicture}
\end{center}
has three spanning trees:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1a) at (1,3) {$1$};
\node[draw, circle] (2a) at (3,3) {$2$};
\node[draw, circle] (3a) at (1,1) {$3$};
\node[draw, circle] (4a) at (3,1) {$4$};
\path[draw,thick,-] (1a) -- (2a);
%\path[draw,thick,-] (1a) -- (3a);
\path[draw,thick,-] (3a) -- (4a);
\path[draw,thick,-] (1a) -- (4a);
\node[draw, circle] (1b) at (1+4,3) {$1$};
\node[draw, circle] (2b) at (3+4,3) {$2$};
\node[draw, circle] (3b) at (1+4,1) {$3$};
\node[draw, circle] (4b) at (3+4,1) {$4$};
\path[draw,thick,-] (1b) -- (2b);
\path[draw,thick,-] (1b) -- (3b);
%\path[draw,thick,-] (3b) -- (4b);
\path[draw,thick,-] (1b) -- (4b);
\node[draw, circle] (1c) at (1+8,3) {$1$};
\node[draw, circle] (2c) at (3+8,3) {$2$};
\node[draw, circle] (3c) at (1+8,1) {$3$};
\node[draw, circle] (4c) at (3+8,1) {$4$};
\path[draw,thick,-] (1c) -- (2c);
\path[draw,thick,-] (1c) -- (3c);
\path[draw,thick,-] (3c) -- (4c);
%\path[draw,thick,-] (1c) -- (4c);
\end{tikzpicture}
\end{center}
\index{Laplacean matrix}
To calculate the number of spanning trees,
we construct a \key{Laplacean matrix} $L$,
where $L[i,i]$ is the degree of node $i$
and $L[i,j]=-1$ if there is an edge between
nodes $i$ and $j$, and otherwise $L[i,j]=0$.
The Laplacean matrix for the above graph is as follows:
\[
L= \begin{bmatrix}
3 & -1 & -1 & -1 \\
-1 & 1 & 0 & 0 \\
-1 & 0 & 2 & -1 \\
-1 & 0 & -1 & 2 \\
\end{bmatrix}
\]
It can be shown that
the number of spanning trees equals
the determinant of a matrix that is obtained
when we remove any row and any column from $L$.
For example, if we remove the first row
and column, the result is
\[ \det(
\begin{bmatrix}
1 & 0 & 0 \\
0 & 2 & -1 \\
0 & -1 & 2 \\
\end{bmatrix}
) =3.\]
The determinant is always the same,
regardless of which row and column we remove from $L$.
Note that Cayley's formula in Chapter 22.5 is
a special case of Kirchhoff's theorem,
because in a complete graph of $n$ nodes
\[ \det(
\begin{bmatrix}
n-1 & -1 & \cdots & -1 \\
-1 & n-1 & \cdots & -1 \\
\vdots & \vdots & \ddots & \vdots \\
-1 & -1 & \cdots & n-1 \\
\end{bmatrix}
) =n^{n-2}.\]

689
chapter24.tex Normal file
View File

@ -0,0 +1,689 @@
\chapter{Probability}
\index{probability}
A \key{probability} is a real number between $0$ and $1$
that indicates how probable an event is.
If an event is certain to happen,
its probability is 1,
and if an event is impossible,
its probability is 0.
The probability of an event is denoted $P(\cdots)$
where the three dots describe the event.
For example, when throwing a dice,
the outcome is an integer between $1$ and $6$,
and the probability of each outcome is $1/6$.
For example, we can calculate the following probabilities:
\begin{itemize}[noitemsep]
\item $P(\textrm{''the outcome is 4''})=1/6$
\item $P(\textrm{''the outcome is not 6''})=5/6$
\item $P(\textrm{''the outcome is even''})=1/2$
\end{itemize}
\section{Calculation}
To calculate the probability of an event,
we can either use combinatorics
or simulate the process that generates the event.
As an example, let us calculate the probability
of drawing three cards with the same value
from a shuffled deck of cards
(for example, $\spadesuit 8$, $\clubsuit 8$ and $\diamondsuit 8$).
\subsubsection*{Method 1}
We can calculate the probability using the formula
\[\frac{\textrm{number of desired outcomes}}{\textrm{total number of outcomes}}.\]
In this problem, the desired outcomes are those
in which the value of each card is the same.
There are $13 {4 \choose 3}$ such outcomes,
because there are $13$ possibilities for the
value of the cards and ${4 \choose 3}$ ways to
choose $3$ suits from $4$ possible suits.
There are a total of ${52 \choose 3}$ outcomes,
because we choose 3 cards from 52 cards.
Thus, the probability of the event is
\[\frac{13 {4 \choose 3}}{{52 \choose 3}} = \frac{1}{425}.\]
\subsubsection*{Method 2}
Another way to calculate the probability is
to simulate the process that generates the event.
In this example, we draw three cards, so the process
consists of three steps.
We require that each step of the process is successful.
Drawing the first card certainly succeeds,
because there are no restrictions.
The second step succeeds with probability $3/51$,
because there are 51 cards left and 3 of them
have the same value as the first card.
In a similar way, the third step succeeds with probability $2/50$.
The probability that the entire process succeeds is
\[1 \cdot \frac{3}{51} \cdot \frac{2}{50} = \frac{1}{425}.\]
\section{Events}
An event in probability theory can be represented as a set
\[A \subset X,\]
where $X$ contains all possible outcomes
and $A$ is a subset of outcomes.
For example, when drawing a dice, the outcomes are
\[X = \{1,2,3,4,5,6\}.\]
Now, for example, the event ''the outcome is even''
corresponds to the set
\[A = \{2,4,6\}.\]
Each outcome $x$ is assigned a probability $p(x)$.
Then, the probability $P(A)$ of an event
$A$ can be calculated as a sum
of probabilities of outcomes using the formula
\[P(A) = \sum_{x \in A} p(x).\]
For example, when throwing a dice,
$p(x)=1/6$ for each outcome $x$,
so the probability of the event
''the outcome is even'' is
\[p(2)+p(4)+p(6)=1/2.\]
The total probability of the outcomes in $X$ must
be 1, i.e., $P(X)=1$.
Since the events in probability theory are sets,
we can manipulate them using standard set operations:
\begin{itemize}
\item The \key{complement} $\bar A$ means
''$A$ does not happen''.
For example, when throwing a dice,
the complement of $A=\{2,4,6\}$ is
$\bar A = \{1,3,5\}$.
\item The \key{union} $A \cup B$ means
''$A$ or $B$ happen''.
For example, the union of
$A=\{2,5\}$
and $B=\{4,5,6\}$ is
$A \cup B = \{2,4,5,6\}$.
\item The \key{intersection} $A \cap B$ means
''$A$ and $B$ happen''.
For example, the intersection of
$A=\{2,5\}$ and $B=\{4,5,6\}$ is
$A \cap B = \{5\}$.
\end{itemize}
\subsubsection{Complement}
The probability of the complement
$\bar A$ is calculated using the formula
\[P(\bar A)=1-P(A).\]
Sometimes, we can solve a problem easily
using complements by solving the opposite problem.
For example, the probability of getting
at least one six when throwing a dice ten times is
\[1-(5/6)^{10}.\]
Here $5/6$ is the probability that the outcome
of a single throw is not six, and
$(5/6)^{10}$ is the probability that none of
the ten throws is a six.
The complement of this is the answer to the problem.
\subsubsection{Union}
The probability of the union $A \cup B$
is calculated using the formula
\[P(A \cup B)=P(A)+P(B)-P(A \cap B).\]
For example, when throwing a dice,
the union of the events
\[A=\textrm{''the outcome is even''}\]
and
\[B=\textrm{''the outcome is less than 4''}\]
is
\[A \cup B=\textrm{''the outcome is even or less than 4''},\]
and its probability is
\[P(A \cup B) = P(A)+P(B)-P(A \cap B)=1/2+1/2-1/6=5/6.\]
If the events $A$ and $B$ are \key{disjoint}, i.e.,
$A \cap B$ is empty,
the probability of the event $A \cup B$ is simply
\[P(A \cup B)=P(A)+P(B).\]
\subsubsection{Conditional probability}
\index{conditional probability}
The \key{conditional probability}
\[P(A | B) = \frac{P(A \cap B)}{P(B)}\]
is the probability of $A$
assuming that $B$ happens.
Hence, when calculating the
probability of $A$, we only consider the outcomes
that also belong to $B$.
Using the previous sets,
\[P(A | B)= 1/3,\]
because the outcomes of $B$ are
$\{1,2,3\}$, and one of them is even.
This is the probability of an even outcome
if we know that the outcome is between $1 \ldots 3$.
\subsubsection{Intersection}
\index{independence}
Using conditional probability,
the probability of the intersection
$A \cap B$ can be calculated using the formula
\[P(A \cap B)=P(A)P(B|A).\]
Events $A$ and $B$ are \key{independent} if
\[P(A|B)=P(A) \hspace{10px}\textrm{and}\hspace{10px} P(B|A)=P(B),\]
which means that the fact that $B$ happens does not
change the probability of $A$, and vice versa.
In this case, the probability of the intersection is
\[P(A \cap B)=P(A)P(B).\]
For example, when drawing a card from a deck, the events
\[A = \textrm{''the suit is clubs''}\]
and
\[B = \textrm{''the value is four''}\]
are independent. Hence the event
\[A \cap B = \textrm{''the card is the four of clubs''}\]
happens with probability
\[P(A \cap B)=P(A)P(B)=1/4 \cdot 1/13 = 1/52.\]
\section{Random variables}
\index{random variable}
A \key{random variable} is a value that is generated
by a random process.
For example, when throwing two dice,
a possible random variable is
\[X=\textrm{''the sum of the outcomes''}.\]
For example, if the outcomes are $[4,6]$
(meaning that we first throw a four and then a six),
then the value of $X$ is 10.
We denote $P(X=x)$ the probability that
the value of a random variable $X$ is $x$.
For example, when throwing two dice,
$P(X=10)=3/36$,
because the total number of outcomes is 36
and there are three possible ways to obtain
the sum 10: $[4,6]$, $[5,5]$ and $[6,4]$.
\subsubsection{Expected value}
\index{expected value}
The \key{expected value} $E[X]$ indicates the
average value of a random variable $X$.
The expected value can be calculated as the sum
\[\sum_x P(X=x)x,\]
where $x$ goes through all possible values of $X$.
For example, when throwing a dice,
the expected outcome is
\[1/6 \cdot 1 + 1/6 \cdot 2 + 1/6 \cdot 3 + 1/6 \cdot 4 + 1/6 \cdot 5 + 1/6 \cdot 6 = 7/2.\]
A useful property of expected values is \key{linearity}.
It means that the sum
$E[X_1+X_2+\cdots+X_n]$
always equals the sum
$E[X_1]+E[X_2]+\cdots+E[X_n]$.
This formula holds even if random variables
depend on each other.
For example, when throwing two dice,
the expected sum is
\[E[X_1+X_2]=E[X_1]+E[X_2]=7/2+7/2=7.\]
Let us now consider a problem where
$n$ balls are randomly placed in $n$ boxes,
and our task is to calculate the expected
number of empty boxes.
Each ball has an equal probability to
be placed in any of the boxes.
For example, if $n=2$, the possibilities
are as follows:
\begin{center}
\begin{tikzpicture}
\draw (0,0) rectangle (1,1);
\draw (1.2,0) rectangle (2.2,1);
\draw (3,0) rectangle (4,1);
\draw (4.2,0) rectangle (5.2,1);
\draw (6,0) rectangle (7,1);
\draw (7.2,0) rectangle (8.2,1);
\draw (9,0) rectangle (10,1);
\draw (10.2,0) rectangle (11.2,1);
\draw[fill=blue] (0.5,0.2) circle (0.1);
\draw[fill=red] (1.7,0.2) circle (0.1);
\draw[fill=red] (3.5,0.2) circle (0.1);
\draw[fill=blue] (4.7,0.2) circle (0.1);
\draw[fill=blue] (6.25,0.2) circle (0.1);
\draw[fill=red] (6.75,0.2) circle (0.1);
\draw[fill=blue] (10.45,0.2) circle (0.1);
\draw[fill=red] (10.95,0.2) circle (0.1);
\end{tikzpicture}
\end{center}
In this case, the expected number of
empty boxes is
\[\frac{0+0+1+1}{4} = \frac{1}{2}.\]
In the general case, the probability that a
single box is empty is
\[\Big(\frac{n-1}{n}\Big)^n,\]
because no ball should be placed in it.
Hence, using linearity, the expected number of
empty boxes is
\[n \cdot \Big(\frac{n-1}{n}\Big)^n.\]
\subsubsection{Distributions}
\index{distribution}
The \key{distribution} of a random variable $X$
shows the probability of each value that
$X$ may have.
The distribution consists of values $P(X=x)$.
For example, when throwing two dice,
the distribution for their sum is:
\begin{center}
\small {
\begin{tabular}{r|rrrrrrrrrrrrr}
$x$ & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\
$P(X=x)$ & $1/36$ & $2/36$ & $3/36$ & $4/36$ & $5/36$ & $6/36$ & $5/36$ & $4/36$ & $3/36$ & $2/36$ & $1/36$ \\
\end{tabular}
}
\end{center}
\index{uniform distribution}
In a \key{uniform distribution},
the random variable $X$ has $n$ possible
values $a,a+1,\ldots,b$ and the probability of each value is $1/n$.
For example, when throwing a dice,
$a=1$, $b=6$ and $P(X=x)=1/6$ for each value $x$.
The expected value of $X$ in a uniform distribution is
\[E[X] = \frac{a+b}{2}.\]
\index{binomial distribution}
In a \key{binomial distribution}, $n$ attempts
are made
and the probability that a single attempt succeeds
is $p$.
The random variable $X$ counts the number of
successful attempts,
and the probability of a value $x$ is
\[P(X=x)=p^x (1-p)^{n-x} {n \choose x},\]
where $p^x$ and $(1-p)^{n-x}$ correspond to
successful and unsuccessful attemps,
and ${n \choose x}$ is the number of ways
we can choose the order of the attempts.
For example, when throwing a dice ten times,
the probability of throwing a six exactly
three times is $(1/6)^3 (5/6)^7 {10 \choose 3}$.
The expected value of $X$ in a binomial distribution is
\[E[X] = pn.\]
\index{geometric distribution}
In a \key{geometric distribution},
the probability that an attempt succeeds is $p$,
and we continue until the first success happens.
The random variable $X$ counts the number
of attempts needed, and the probability of
a value $x$ is
\[P(X=x)=(1-p)^{x-1} p,\]
where $(1-p)^{x-1}$ corresponds to the unsuccessful attemps
and $p$ corresponds to the first successful attempt.
For example, if we throw a dice until we throw a six,
the probability that the number of throws
is exactly 4 is $(5/6)^3 1/6$.
The expected value of $X$ in a geometric distribution is
\[E[X]=\frac{1}{p}.\]
\section{Markov chains}
\index{Markov chain}
A \key{Markov chain}
% \footnote{A. A. Markov (1856--1922)
% was a Russian mathematician.}
is a random process
that consists of states and transitions between them.
For each state, we know the probabilities
for moving to other states.
A Markov chain can be represented as a graph
whose nodes are states and edges are transitions.
As an example, consider a problem
where we are in floor 1 in an $n$ floor building.
At each step, we randomly walk either one floor
up or one floor down, except that we always
walk one floor up from floor 1 and one floor down
from floor $n$.
What is the probability of being in floor $m$
after $k$ steps?
In this problem, each floor of the building
corresponds to a state in a Markov chain.
For example, if $n=5$, the graph is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (4,0) {$3$};
\node[draw, circle] (4) at (6,0) {$4$};
\node[draw, circle] (5) at (8,0) {$5$};
\path[draw,thick,->] (1) edge [bend left=40] node[font=\small,label=$1$] {} (2);
\path[draw,thick,->] (2) edge [bend left=40] node[font=\small,label=$1/2$] {} (3);
\path[draw,thick,->] (3) edge [bend left=40] node[font=\small,label=$1/2$] {} (4);
\path[draw,thick,->] (4) edge [bend left=40] node[font=\small,label=$1/2$] {} (5);
\path[draw,thick,->] (5) edge [bend left=40] node[font=\small,label=below:$1$] {} (4);
\path[draw,thick,->] (4) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (3);
\path[draw,thick,->] (3) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (2);
\path[draw,thick,->] (2) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (1);
%\path[draw,thick,->] (1) edge [bend left=40] node[font=\small,label=below:$1$] {} (2);
\end{tikzpicture}
\end{center}
The probability distribution
of a Markov chain is a vector
$[p_1,p_2,\ldots,p_n]$, where $p_k$ is the
probability that the current state is $k$.
The formula $p_1+p_2+\cdots+p_n=1$ always holds.
In the above scenario, the initial distribution is
$[1,0,0,0,0]$, because we always begin in floor 1.
The next distribution is $[0,1,0,0,0]$,
because we can only move from floor 1 to floor 2.
After this, we can either move one floor up
or one floor down, so the next distribution is
$[1/2,0,1/2,0,0]$, and so on.
An efficient way to simulate the walk in
a Markov chain is to use dynamic programming.
The idea is to maintain the probability distribution,
and at each step go through all possibilities
how we can move.
Using this method, we can simulate
a walk of $m$ steps in $O(n^2 m)$ time.
The transitions of a Markov chain can also be
represented as a matrix that updates the
probability distribution.
In the above scenario, the matrix is
\[
\begin{bmatrix}
0 & 1/2 & 0 & 0 & 0 \\
1 & 0 & 1/2 & 0 & 0 \\
0 & 1/2 & 0 & 1/2 & 0 \\
0 & 0 & 1/2 & 0 & 1 \\
0 & 0 & 0 & 1/2 & 0 \\
\end{bmatrix}.
\]
When we multiply a probability distribution by this matrix,
we get the new distribution after moving one step.
For example, we can move from the distribution
$[1,0,0,0,0]$ to the distribution
$[0,1,0,0,0]$ as follows:
\[
\begin{bmatrix}
0 & 1/2 & 0 & 0 & 0 \\
1 & 0 & 1/2 & 0 & 0 \\
0 & 1/2 & 0 & 1/2 & 0 \\
0 & 0 & 1/2 & 0 & 1 \\
0 & 0 & 0 & 1/2 & 0 \\
\end{bmatrix}
\begin{bmatrix}
1 \\
0 \\
0 \\
0 \\
0 \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
1 \\
0 \\
0 \\
0 \\
\end{bmatrix}.
\]
By calculating matrix powers efficiently,
we can calculate the distribution after $m$ steps
in $O(n^3 \log m)$ time.
\section{Randomized algorithms}
\index{randomized algorithm}
Sometimes we can use randomness for solving a problem,
even if the problem is not related to probabilities.
A \key{randomized algorithm} is an algorithm that
is based on randomness.
\index{Monte Carlo algorithm}
A \key{Monte Carlo algorithm} is a randomized algorithm
that may sometimes give a wrong answer.
For such an algorithm to be useful,
the probability of a wrong answer should be small.
\index{Las Vegas algorithm}
A \key{Las Vegas algorithm} is a randomized algorithm
that always gives the correct answer,
but its running time varies randomly.
The goal is to design an algorithm that is
efficient with high probability.
Next we will go through three example problems that
can be solved using randomness.
\subsubsection{Order statistics}
\index{order statistic}
The $kth$ \key{order statistic} of an array
is the element at position $k$ after sorting
the array in increasing order.
It is easy to calculate any order statistic
in $O(n \log n)$ time by first sorting the array,
but is it really needed to sort the entire array
just to find one element?
It turns out that we can find order statistics
using a randomized algorithm without sorting the array.
The algorithm, called \key{quickselect}\footnote{In 1961,
C. A. R. Hoare published two algorithms that
are efficient on average: \index{quicksort} \index{quickselect}
\key{quicksort} \cite{hoa61a} for sorting arrays and
\key{quickselect} \cite{hoa61b} for finding order statistics.}, is a Las Vegas algorithm:
its running time is usually $O(n)$
but $O(n^2)$ in the worst case.
The algorithm chooses a random element $x$
of the array, and moves elements smaller than $x$
to the left part of the array,
and all other elements to the right part of the array.
This takes $O(n)$ time when there are $n$ elements.
Assume that the left part contains $a$ elements
and the right part contains $b$ elements.
If $a=k$, element $x$ is the $k$th order statistic.
Otherwise, if $a>k$, we recursively find the $k$th order
statistic for the left part,
and if $a<k$, we recursively find the $r$th order
statistic for the right part where $r=k-a$.
The search continues in a similar way, until the element
has been found.
When each element $x$ is randomly chosen,
the size of the array about halves at each step,
so the time complexity for
finding the $k$th order statistic is about
\[n+n/2+n/4+n/8+\cdots < 2n = O(n).\]
The worst case of the algorithm requires still $O(n^2)$ time,
because it is possible that $x$ is always chosen
in such a way that it is one of the smallest or largest
elements in the array and $O(n)$ steps are needed.
However, the probability for this is so small
that this never happens in practice.
\subsubsection{Verifying matrix multiplication}
\index{matrix multiplication}
Our next problem is to \emph{verify}
if $AB=C$ holds when $A$, $B$ and $C$
are matrices of size $n \times n$.
Of course, we can solve the problem
by calculating the product $AB$ again
(in $O(n^3)$ time using the basic algorithm),
but one could hope that verifying the
answer would by easier than to calculate it from scratch.
It turns out that we can solve the problem
using a Monte Carlo algorithm\footnote{R. M. Freivalds published
this algorithm in 1977 \cite{fre77}, and it is sometimes
called \index{Freivalds' algoritm} \key{Freivalds' algorithm}.} whose
time complexity is only $O(n^2)$.
The idea is simple: we choose a random vector
$X$ of $n$ elements, and calculate the matrices
$ABX$ and $CX$. If $ABX=CX$, we report that $AB=C$,
and otherwise we report that $AB \neq C$.
The time complexity of the algorithm is
$O(n^2)$, because we can calculate the matrices
$ABX$ and $CX$ in $O(n^2)$ time.
We can calculate the matrix $ABX$ efficiently
by using the representation $A(BX)$, so only two
multiplications of $n \times n$ and $n \times 1$
size matrices are needed.
The drawback of the algorithm is
that there is a small chance that the algorithm
makes a mistake when it reports that $AB=C$.
For example,
\[
\begin{bmatrix}
6 & 8 \\
1 & 3 \\
\end{bmatrix}
\neq
\begin{bmatrix}
8 & 7 \\
3 & 2 \\
\end{bmatrix},
\]
but
\[
\begin{bmatrix}
6 & 8 \\
1 & 3 \\
\end{bmatrix}
\begin{bmatrix}
3 \\
6 \\
\end{bmatrix}
=
\begin{bmatrix}
8 & 7 \\
3 & 2 \\
\end{bmatrix}
\begin{bmatrix}
3 \\
6 \\
\end{bmatrix}.
\]
However, in practice, the probability that the
algorithm makes a mistake is small,
and we can decrease the probability by
verifying the result using multiple random vectors $X$
before reporting that $AB=C$.
\subsubsection{Graph coloring}
\index{coloring}
Given a graph that contains $n$ nodes and $m$ edges,
our task is to find a way to color the nodes
of the graph using two colors so that
for at least $m/2$ edges, the endpoints
have different colors.
For example, in the graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1,3) {$1$};
\node[draw, circle] (2) at (4,3) {$2$};
\node[draw, circle] (3) at (1,1) {$3$};
\node[draw, circle] (4) at (4,1) {$4$};
\node[draw, circle] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
a valid coloring is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle, fill=blue!40] (1) at (1,3) {$1$};
\node[draw, circle, fill=red!40] (2) at (4,3) {$2$};
\node[draw, circle, fill=red!40] (3) at (1,1) {$3$};
\node[draw, circle, fill=blue!40] (4) at (4,1) {$4$};
\node[draw, circle, fill=blue!40] (5) at (6,2) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (4) -- (5);
\end{tikzpicture}
\end{center}
The above graph contains 7 edges, and for 5 of them,
the endpoints have different colors,
so the coloring is valid.
The problem can be solved using a Las Vegas algorithm
that generates random colorings until a valid coloring
has been found.
In a random coloring, the color of each node is
independently chosen so that the probability of
both colors is $1/2$.
In a random coloring, the probability that the endpoints
of a single edge have different colors is $1/2$.
Hence, the expected number of edges whose endpoints
have different colors is $m/2$.
Since it is expected that a random coloring is valid,
we will quickly find a valid coloring in practice.

801
chapter25.tex Normal file
View File

@ -0,0 +1,801 @@
\chapter{Game theory}
In this chapter, we will focus on two-player
games that do not contain random elements.
Our goal is to find a strategy that we can
follow to win the game
no matter what the opponent does,
if such a strategy exists.
It turns out that there is a general strategy
for such games,
and we can analyze the games using the \key{nim theory}.
First, we will analyze simple games where
players remove sticks from heaps,
and after this, we will generalize the strategy
used in those games to other games.
\section{Game states}
Let us consider a game where there is initially
a heap of $n$ sticks.
Players $A$ and $B$ move alternately,
and player $A$ begins.
On each move, the player has to remove
1, 2 or 3 sticks from the heap,
and the player who removes the last stick wins the game.
For example, if $n=10$, the game may proceed as follows:
\begin{itemize}[noitemsep]
\item Player $A$ removes 2 sticks (8 sticks left).
\item Player $B$ removes 3 sticks (5 sticks left).
\item Player $A$ removes 1 stick (4 sticks left).
\item Player $B$ removes 2 sticks (2 sticks left).
\item Player $A$ removes 2 sticks and wins.
\end{itemize}
This game consists of states $0,1,2,\ldots,n$,
where the number of the state corresponds to
the number of sticks left.
\subsubsection{Winning and losing states}
\index{winning state}
\index{losing state}
A \key{winning state} is a state where
the player will win the game if they
play optimally,
and a \key{losing state} is a state
where the player will lose the game if the
opponent plays optimally.
It turns out that we can classify all states
of a game so that each state is either
a winning state or a losing state.
In the above game, state 0 is clearly a
losing state, because the player cannot make
any moves.
States 1, 2 and 3 are winning states,
because we can remove 1, 2 or 3 sticks
and win the game.
State 4, in turn, is a losing state,
because any move leads to a state that
is a winning state for the opponent.
More generally, if there is a move that leads
from the current state to a losing state,
the current state is a winning state,
and otherwise the current state is a losing state.
Using this observation, we can classify all states
of a game starting with losing states where
there are no possible moves.
The states $0 \ldots 15$ of the above game
can be classified as follows
($W$ denotes a winning state and $L$ denotes a losing state):
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (16,1);
\node at (0.5,0.5) {$L$};
\node at (1.5,0.5) {$W$};
\node at (2.5,0.5) {$W$};
\node at (3.5,0.5) {$W$};
\node at (4.5,0.5) {$L$};
\node at (5.5,0.5) {$W$};
\node at (6.5,0.5) {$W$};
\node at (7.5,0.5) {$W$};
\node at (8.5,0.5) {$L$};
\node at (9.5,0.5) {$W$};
\node at (10.5,0.5) {$W$};
\node at (11.5,0.5) {$W$};
\node at (12.5,0.5) {$L$};
\node at (13.5,0.5) {$W$};
\node at (14.5,0.5) {$W$};
\node at (15.5,0.5) {$W$};
\footnotesize
\node at (0.5,1.4) {$0$};
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
\node at (8.5,1.4) {$8$};
\node at (9.5,1.4) {$9$};
\node at (10.5,1.4) {$10$};
\node at (11.5,1.4) {$11$};
\node at (12.5,1.4) {$12$};
\node at (13.5,1.4) {$13$};
\node at (14.5,1.4) {$14$};
\node at (15.5,1.4) {$15$};
\end{tikzpicture}
\end{center}
It is easy to analyze this game:
a state $k$ is a losing state if $k$ is
divisible by 4, and otherwise it
is a winning state.
An optimal way to play the game is
to always choose a move after which
the number of sticks in the heap
is divisible by 4.
Finally, there are no sticks left and
the opponent has lost.
Of course, this strategy requires that
the number of sticks is \emph{not} divisible by 4
when it is our move.
If it is, there is nothing we can do,
and the opponent will win the game if
they play optimally.
\subsubsection{State graph}
Let us now consider another stick game,
where in each state $k$, it is allowed to remove
any number $x$ of sticks such that $x$
is smaller than $k$ and divides $k$.
For example, in state 8 we may remove
1, 2 or 4 sticks, but in state 7 the only
allowed move is to remove 1 stick.
The following picture shows the states
$1 \ldots 9$ of the game as a \key{state graph},
whose nodes are the states and edges are the moves between them:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (2,0) {$2$};
\node[draw, circle] (3) at (3.5,-1) {$3$};
\node[draw, circle] (4) at (1.5,-2) {$4$};
\node[draw, circle] (5) at (3,-2.75) {$5$};
\node[draw, circle] (6) at (2.5,-4.5) {$6$};
\node[draw, circle] (7) at (0.5,-3.25) {$7$};
\node[draw, circle] (8) at (-1,-4) {$8$};
\node[draw, circle] (9) at (1,-5.5) {$9$};
\path[draw,thick,->,>=latex] (2) -- (1);
\path[draw,thick,->,>=latex] (3) edge [bend right=20] (2);
\path[draw,thick,->,>=latex] (4) edge [bend left=20] (2);
\path[draw,thick,->,>=latex] (4) edge [bend left=20] (3);
\path[draw,thick,->,>=latex] (5) edge [bend right=20] (4);
\path[draw,thick,->,>=latex] (6) edge [bend left=20] (5);
\path[draw,thick,->,>=latex] (6) edge [bend left=20] (4);
\path[draw,thick,->,>=latex] (6) edge [bend right=40] (3);
\path[draw,thick,->,>=latex] (7) edge [bend right=20] (6);
\path[draw,thick,->,>=latex] (8) edge [bend right=20] (7);
\path[draw,thick,->,>=latex] (8) edge [bend right=20] (6);
\path[draw,thick,->,>=latex] (8) edge [bend left=20] (4);
\path[draw,thick,->,>=latex] (9) edge [bend left=20] (8);
\path[draw,thick,->,>=latex] (9) edge [bend right=20] (6);
\end{tikzpicture}
\end{center}
The final state in this game is always state 1,
which is a losing state, because there are no
valid moves.
The classification of states $1 \ldots 9$
is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (1,0) grid (10,1);
\node at (1.5,0.5) {$L$};
\node at (2.5,0.5) {$W$};
\node at (3.5,0.5) {$L$};
\node at (4.5,0.5) {$W$};
\node at (5.5,0.5) {$L$};
\node at (6.5,0.5) {$W$};
\node at (7.5,0.5) {$L$};
\node at (8.5,0.5) {$W$};
\node at (9.5,0.5) {$L$};
\footnotesize
\node at (1.5,1.4) {$1$};
\node at (2.5,1.4) {$2$};
\node at (3.5,1.4) {$3$};
\node at (4.5,1.4) {$4$};
\node at (5.5,1.4) {$5$};
\node at (6.5,1.4) {$6$};
\node at (7.5,1.4) {$7$};
\node at (8.5,1.4) {$8$};
\node at (9.5,1.4) {$9$};
\end{tikzpicture}
\end{center}
Surprisingly, in this game,
all even-numbered states are winning states,
and all odd-numbered states are losing states.
\section{Nim game}
\index{nim game}
The \key{nim game} is a simple game that
has an important role in game theory,
because many other games can be played using
the same strategy.
First, we focus on nim,
and then we generalize the strategy
to other games.
There are $n$ heaps in nim,
and each heap contains some number of sticks.
The players move alternately,
and on each turn, the player chooses
a heap that still contains sticks
and removes any number of sticks from it.
The winner is the player who removes the last stick.
The states in nim are of the form
$[x_1,x_2,\ldots,x_n]$,
where $x_k$ denotes the number of sticks in heap $k$.
For example, $[10,12,5]$ is a game where
there are three heaps with 10, 12 and 5 sticks.
The state $[0,0,\ldots,0]$ is a losing state,
because it is not possible to remove any sticks,
and this is always the final state.
\subsubsection{Analysis}
\index{nim sum}
It turns out that we can easily classify
any nim state by calculating
the \key{nim sum} $s = x_1 \oplus x_2 \oplus \cdots \oplus x_n$,
where $\oplus$ is the xor operation\footnote{The optimal strategy
for nim was published in 1901 by C. L. Bouton \cite{bou01}.}.
The states whose nim sum is 0 are losing states,
and all other states are winning states.
For example, the nim sum of
$[10,12,5]$ is $10 \oplus 12 \oplus 5 = 3$,
so the state is a winning state.
But how is the nim sum related to the nim game?
We can explain this by looking at how the nim
sum changes when the nim state changes.
\textit{Losing states:}
The final state $[0,0,\ldots,0]$ is a losing state,
and its nim sum is 0, as expected.
In other losing states, any move leads to
a winning state, because when a single value $x_k$ changes,
the nim sum also changes, so the nim sum
is different from 0 after the move.
\textit{Winning states:}
We can move to a losing state if
there is any heap $k$ for which $x_k \oplus s < x_k$.
In this case, we can remove sticks from
heap $k$ so that it will contain $x_k \oplus s$ sticks,
which will lead to a losing state.
There is always such a heap, where $x_k$
has a one bit at the position of the leftmost
one bit of $s$.
As an example, consider the state $[10,12,5]$.
This state is a winning state,
because its nim sum is 3.
Thus, there has to be a move which
leads to a losing state.
Next we will find out such a move.
The nim sum of the state is as follows:
\begin{center}
\begin{tabular}{r|r}
10 & \texttt{1010} \\
12 & \texttt{1100} \\
5 & \texttt{0101} \\
\hline
3 & \texttt{0011} \\
\end{tabular}
\end{center}
In this case, the heap with 10 sticks
is the only heap that has a one bit
at the position of the leftmost
one bit of the nim sum:
\begin{center}
\begin{tabular}{r|r}
10 & \texttt{10\underline{1}0} \\
12 & \texttt{1100} \\
5 & \texttt{0101} \\
\hline
3 & \texttt{00\underline{1}1} \\
\end{tabular}
\end{center}
The new size of the heap has to be
$10 \oplus 3 = 9$,
so we will remove just one stick.
After this, the state will be $[9,12,5]$,
which is a losing state:
\begin{center}
\begin{tabular}{r|r}
9 & \texttt{1001} \\
12 & \texttt{1100} \\
5 & \texttt{0101} \\
\hline
0 & \texttt{0000} \\
\end{tabular}
\end{center}
\subsubsection{Misère game}
\index{misère game}
In a \key{misère game}, the goal of the game
is opposite,
so the player who removes the last stick
loses the game.
It turns out that the misère nim game can be
optimally played almost like the standard nim game.
The idea is to first play the misère game
like the standard game, but change the strategy
at the end of the game.
The new strategy will be introduced in a situation
where each heap would contain at most one stick
after the next move.
In the standard game, we should choose a move
after which there is an even number of heaps with one stick.
However, in the misère game, we choose a move so that
there is an odd number of heaps with one stick.
This strategy works because a state where the
strategy changes always appears in the game,
and this state is a winning state, because
it contains exactly one heap that has more than one stick
so the nim sum is not 0.
\section{SpragueGrundy theorem}
\index{SpragueGrundy theorem}
The \key{SpragueGrundy theorem}\footnote{The theorem was
independently discovered by R. Sprague \cite{spr35} and P. M. Grundy \cite{gru39}.} generalizes the
strategy used in nim to all games that fulfil
the following requirements:
\begin{itemize}[noitemsep]
\item There are two players who move alternately.
\item The game consists of states, and the possible moves
in a state do not depend on whose turn it is.
\item The game ends when a player cannot make a move.
\item The game surely ends sooner or later.
\item The players have complete information about
the states and allowed moves, and there is no randomness in the game.
\end{itemize}
The idea is to calculate for each game state
a Grundy number that corresponds to the number of
sticks in a nim heap.
When we know the Grundy numbers of all states,
we can play the game like the nim game.
\subsubsection{Grundy numbers}
\index{Grundy number}
\index{mex function}
The \key{Grundy number} of a game state is
\[\textrm{mex}(\{g_1,g_2,\ldots,g_n\}),\]
where $g_1,g_2,\ldots,g_n$ are the Grundy numbers of the
states to which we can move,
and the mex function gives the smallest
nonnegative number that is not in the set.
For example, $\textrm{mex}(\{0,1,3\})=2$.
If there are no possible moves in a state,
its Grundy number is 0, because
$\textrm{mex}(\emptyset)=0$.
For example, in the state graph
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {\phantom{0}};
\node[draw, circle] (2) at (2,0) {\phantom{0}};
\node[draw, circle] (3) at (4,0) {\phantom{0}};
\node[draw, circle] (4) at (1,-2) {\phantom{0}};
\node[draw, circle] (5) at (3,-2) {\phantom{0}};
\node[draw, circle] (6) at (5,-2) {\phantom{0}};
\path[draw,thick,->,>=latex] (2) -- (1);
\path[draw,thick,->,>=latex] (3) -- (2);
\path[draw,thick,->,>=latex] (5) -- (4);
\path[draw,thick,->,>=latex] (6) -- (5);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (2);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (6) -- (2);
\end{tikzpicture}
\end{center}
the Grundy numbers are as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {0};
\node[draw, circle] (2) at (2,0) {1};
\node[draw, circle] (3) at (4,0) {0};
\node[draw, circle] (4) at (1,-2) {2};
\node[draw, circle] (5) at (3,-2) {0};
\node[draw, circle] (6) at (5,-2) {2};
\path[draw,thick,->,>=latex] (2) -- (1);
\path[draw,thick,->,>=latex] (3) -- (2);
\path[draw,thick,->,>=latex] (5) -- (4);
\path[draw,thick,->,>=latex] (6) -- (5);
\path[draw,thick,->,>=latex] (4) -- (1);
\path[draw,thick,->,>=latex] (4) -- (2);
\path[draw,thick,->,>=latex] (5) -- (2);
\path[draw,thick,->,>=latex] (6) -- (2);
\end{tikzpicture}
\end{center}
The Grundy number of a losing state is 0,
and the Grundy number of a winning state is
a positive number.
The Grundy number of a state corresponds to
the number of sticks in a nim heap.
If the Grundy number is 0, we can only move to
states whose Grundy numbers are positive,
and if the Grundy number is $x>0$, we can move
to states whose Grundy numbers include all numbers
$0,1,\ldots,x-1$.
As an example, consider a game where
the players move a figure in a maze.
Each square in the maze is either floor or wall.
On each turn, the player has to move
the figure some number
of steps left or up.
The winner of the game is the player who
makes the last move.
The following picture shows a possible initial state
of the game, where @ denotes the figure and *
denotes a square where it can move.
\begin{center}
\begin{tikzpicture}[scale=.65]
\begin{scope}
\fill [color=black] (0, 1) rectangle (1, 2);
\fill [color=black] (0, 3) rectangle (1, 4);
\fill [color=black] (2, 2) rectangle (3, 3);
\fill [color=black] (2, 4) rectangle (3, 5);
\fill [color=black] (4, 3) rectangle (5, 4);
\draw (0, 0) grid (5, 5);
\node at (4.5,0.5) {@};
\node at (3.5,0.5) {*};
\node at (2.5,0.5) {*};
\node at (1.5,0.5) {*};
\node at (0.5,0.5) {*};
\node at (4.5,1.5) {*};
\node at (4.5,2.5) {*};
\end{scope}
\end{tikzpicture}
\end{center}
The states of the game are all floor squares
of the maze.
In the above maze, the Grundy numbers
are as follows:
\begin{center}
\begin{tikzpicture}[scale=.65]
\begin{scope}
\fill [color=black] (0, 1) rectangle (1, 2);
\fill [color=black] (0, 3) rectangle (1, 4);
\fill [color=black] (2, 2) rectangle (3, 3);
\fill [color=black] (2, 4) rectangle (3, 5);
\fill [color=black] (4, 3) rectangle (5, 4);
\draw (0, 0) grid (5, 5);
\node at (0.5,4.5) {0};
\node at (1.5,4.5) {1};
\node at (2.5,4.5) {};
\node at (3.5,4.5) {0};
\node at (4.5,4.5) {1};
\node at (0.5,3.5) {};
\node at (1.5,3.5) {0};
\node at (2.5,3.5) {1};
\node at (3.5,3.5) {2};
\node at (4.5,3.5) {};
\node at (0.5,2.5) {0};
\node at (1.5,2.5) {2};
\node at (2.5,2.5) {};
\node at (3.5,2.5) {1};
\node at (4.5,2.5) {0};
\node at (0.5,1.5) {};
\node at (1.5,1.5) {3};
\node at (2.5,1.5) {0};
\node at (3.5,1.5) {4};
\node at (4.5,1.5) {1};
\node at (0.5,0.5) {0};
\node at (1.5,0.5) {4};
\node at (2.5,0.5) {1};
\node at (3.5,0.5) {3};
\node at (4.5,0.5) {2};
\end{scope}
\end{tikzpicture}
\end{center}
Thus, each state of the maze game
corresponds to a heap in the nim game.
For example, the Grundy number for
the lower-right square is 2,
so it is a winning state.
We can reach a losing state and
win the game by moving
either four steps left or
two steps up.
Note that unlike in the original nim game,
it may be possible to move to a state whose
Grundy number is larger than the Grundy number
of the current state.
However, the opponent can always choose a move
that cancels such a move, so it is not possible
to escape from a losing state.
\subsubsection{Subgames}
Next we will assume that our game consists
of subgames, and on each turn, the player
first chooses a subgame and then a move in the subgame.
The game ends when it is not possible to make any move
in any subgame.
In this case, the Grundy number of a game
is the nim sum of the Grundy numbers of the subgames.
The game can be played like a nim game by calculating
all Grundy numbers for subgames and then their nim sum.
As an example, consider a game that consists
of three mazes.
In this game, on each turn, the player chooses one
of the mazes and then moves the figure in the maze.
Assume that the initial state of the game is as follows:
\begin{center}
\begin{tabular}{ccc}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (0, 1) rectangle (1, 2);
\fill [color=black] (0, 3) rectangle (1, 4);
\fill [color=black] (2, 2) rectangle (3, 3);
\fill [color=black] (2, 4) rectangle (3, 5);
\fill [color=black] (4, 3) rectangle (5, 4);
\draw (0, 0) grid (5, 5);
\node at (4.5,0.5) {@};
\end{scope}
\end{tikzpicture}
&
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (1, 1) rectangle (2, 3);
\fill [color=black] (2, 3) rectangle (3, 4);
\fill [color=black] (4, 4) rectangle (5, 5);
\draw (0, 0) grid (5, 5);
\node at (4.5,0.5) {@};
\end{scope}
\end{tikzpicture}
&
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (1, 1) rectangle (4, 4);
\draw (0, 0) grid (5, 5);
\node at (4.5,0.5) {@};
\end{scope}
\end{tikzpicture}
\end{tabular}
\end{center}
The Grundy numbers for the mazes are as follows:
\begin{center}
\begin{tabular}{ccc}
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (0, 1) rectangle (1, 2);
\fill [color=black] (0, 3) rectangle (1, 4);
\fill [color=black] (2, 2) rectangle (3, 3);
\fill [color=black] (2, 4) rectangle (3, 5);
\fill [color=black] (4, 3) rectangle (5, 4);
\draw (0, 0) grid (5, 5);
\node at (0.5,4.5) {0};
\node at (1.5,4.5) {1};
\node at (2.5,4.5) {};
\node at (3.5,4.5) {0};
\node at (4.5,4.5) {1};
\node at (0.5,3.5) {};
\node at (1.5,3.5) {0};
\node at (2.5,3.5) {1};
\node at (3.5,3.5) {2};
\node at (4.5,3.5) {};
\node at (0.5,2.5) {0};
\node at (1.5,2.5) {2};
\node at (2.5,2.5) {};
\node at (3.5,2.5) {1};
\node at (4.5,2.5) {0};
\node at (0.5,1.5) {};
\node at (1.5,1.5) {3};
\node at (2.5,1.5) {0};
\node at (3.5,1.5) {4};
\node at (4.5,1.5) {1};
\node at (0.5,0.5) {0};
\node at (1.5,0.5) {4};
\node at (2.5,0.5) {1};
\node at (3.5,0.5) {3};
\node at (4.5,0.5) {2};
\end{scope}
\end{tikzpicture}
&
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (1, 1) rectangle (2, 3);
\fill [color=black] (2, 3) rectangle (3, 4);
\fill [color=black] (4, 4) rectangle (5, 5);
\draw (0, 0) grid (5, 5);
\node at (0.5,4.5) {0};
\node at (1.5,4.5) {1};
\node at (2.5,4.5) {2};
\node at (3.5,4.5) {3};
\node at (4.5,4.5) {};
\node at (0.5,3.5) {1};
\node at (1.5,3.5) {0};
\node at (2.5,3.5) {};
\node at (3.5,3.5) {0};
\node at (4.5,3.5) {1};
\node at (0.5,2.5) {2};
\node at (1.5,2.5) {};
\node at (2.5,2.5) {0};
\node at (3.5,2.5) {1};
\node at (4.5,2.5) {2};
\node at (0.5,1.5) {3};
\node at (1.5,1.5) {};
\node at (2.5,1.5) {1};
\node at (3.5,1.5) {2};
\node at (4.5,1.5) {0};
\node at (0.5,0.5) {4};
\node at (1.5,0.5) {0};
\node at (2.5,0.5) {2};
\node at (3.5,0.5) {5};
\node at (4.5,0.5) {3};
\end{scope}
\end{tikzpicture}
&
\begin{tikzpicture}[scale=.55]
\begin{scope}
\fill [color=black] (1, 1) rectangle (4, 4);
\draw (0, 0) grid (5, 5);
\node at (0.5,4.5) {0};
\node at (1.5,4.5) {1};
\node at (2.5,4.5) {2};
\node at (3.5,4.5) {3};
\node at (4.5,4.5) {4};
\node at (0.5,3.5) {1};
\node at (1.5,3.5) {};
\node at (2.5,3.5) {};
\node at (3.5,3.5) {};
\node at (4.5,3.5) {0};
\node at (0.5,2.5) {2};
\node at (1.5,2.5) {};
\node at (2.5,2.5) {};
\node at (3.5,2.5) {};
\node at (4.5,2.5) {1};
\node at (0.5,1.5) {3};
\node at (1.5,1.5) {};
\node at (2.5,1.5) {};
\node at (3.5,1.5) {};
\node at (4.5,1.5) {2};
\node at (0.5,0.5) {4};
\node at (1.5,0.5) {0};
\node at (2.5,0.5) {1};
\node at (3.5,0.5) {2};
\node at (4.5,0.5) {3};
\end{scope}
\end{tikzpicture}
\end{tabular}
\end{center}
In the initial state, the nim sum of the Grundy numbers
is $2 \oplus 3 \oplus 3 = 2$, so
the first player can win the game.
One optimal move is to move two steps up
in the first maze, which produces the nim sum
$0 \oplus 3 \oplus 3 = 0$.
\subsubsection{Grundy's game}
Sometimes a move in a game divides the game
into subgames that are independent of each other.
In this case, the Grundy number of the game is
\[\textrm{mex}(\{g_1, g_2, \ldots, g_n \}),\]
where $n$ is the number of possible moves and
\[g_k = a_{k,1} \oplus a_{k,2} \oplus \ldots \oplus a_{k,m},\]
where move $k$ generates subgames with
Grundy numbers $a_{k,1},a_{k,2},\ldots,a_{k,m}$.
\index{Grundy's game}
An example of such a game is \key{Grundy's game}.
Initially, there is a single heap that contains $n$ sticks.
On each turn, the player chooses a heap and divides
it into two nonempty heaps such that the heaps
are of different size.
The player who makes the last move wins the game.
Let $f(n)$ be the Grundy number of a heap
that contains $n$ sticks.
The Grundy number can be calculated by going
through all ways to divide the heap into
two heaps.
For example, when $n=8$, the possibilities
are $1+7$, $2+6$ and $3+5$, so
\[f(8)=\textrm{mex}(\{f(1) \oplus f(7), f(2) \oplus f(6), f(3) \oplus f(5)\}).\]
In this game, the value of $f(n)$ is based on the values
of $f(1),\ldots,f(n-1)$.
The base cases are $f(1)=f(2)=0$,
because it is not possible to divide the heaps
of 1 and 2 sticks.
The first Grundy numbers are:
\[
\begin{array}{lcl}
f(1) & = & 0 \\
f(2) & = & 0 \\
f(3) & = & 1 \\
f(4) & = & 0 \\
f(5) & = & 2 \\
f(6) & = & 1 \\
f(7) & = & 0 \\
f(8) & = & 2 \\
\end{array}
\]
The Grundy number for $n=8$ is 2,
so it is possible to win the game.
The winning move is to create heaps
$1+7$, because $f(1) \oplus f(7) = 0$.

1148
chapter26.tex Normal file

File diff suppressed because it is too large Load Diff

559
chapter27.tex Normal file
View File

@ -0,0 +1,559 @@
\chapter{Square root algorithms}
\index{square root algorithm}
A \key{square root algorithm} is an algorithm
that has a square root in its time complexity.
A square root can be seen as a ''poor man's logarithm'':
the complexity $O(\sqrt n)$ is better than $O(n)$
but worse than $O(\log n)$.
In any case, many square root algorithms are fast and usable in practice.
As an example, consider the problem of
creating a data structure that supports
two operations on an array:
modifying an element at a given position
and calculating the sum of elements in the given range.
We have previously solved the problem using
binary indexed and segment trees,
that support both operations in $O(\log n)$ time.
However, now we will solve the problem
in another way using a square root structure
that allows us to modify elements in $O(1)$ time
and calculate sums in $O(\sqrt n)$ time.
The idea is to divide the array into \emph{blocks}
of size $\sqrt n$ so that each block contains
the sum of elements inside the block.
For example, an array of 16 elements will be
divided into blocks of 4 elements as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) grid (16,1);
\draw (0,1) rectangle (4,2);
\draw (4,1) rectangle (8,2);
\draw (8,1) rectangle (12,2);
\draw (12,1) rectangle (16,2);
\node at (0.5, 0.5) {5};
\node at (1.5, 0.5) {8};
\node at (2.5, 0.5) {6};
\node at (3.5, 0.5) {3};
\node at (4.5, 0.5) {2};
\node at (5.5, 0.5) {7};
\node at (6.5, 0.5) {2};
\node at (7.5, 0.5) {6};
\node at (8.5, 0.5) {7};
\node at (9.5, 0.5) {1};
\node at (10.5, 0.5) {7};
\node at (11.5, 0.5) {5};
\node at (12.5, 0.5) {6};
\node at (13.5, 0.5) {2};
\node at (14.5, 0.5) {3};
\node at (15.5, 0.5) {2};
\node at (2, 1.5) {21};
\node at (6, 1.5) {17};
\node at (10, 1.5) {20};
\node at (14, 1.5) {13};
\end{tikzpicture}
\end{center}
In this structure,
it is easy to modify array elements,
because it is only needed to update
the sum of a single block
after each modification,
which can be done in $O(1)$ time.
For example, the following picture shows
how the value of an element and
the sum of the corresponding block change:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (5,0) rectangle (6,1);
\draw (0,0) grid (16,1);
\fill[color=lightgray] (4,1) rectangle (8,2);
\draw (0,1) rectangle (4,2);
\draw (4,1) rectangle (8,2);
\draw (8,1) rectangle (12,2);
\draw (12,1) rectangle (16,2);
\node at (0.5, 0.5) {5};
\node at (1.5, 0.5) {8};
\node at (2.5, 0.5) {6};
\node at (3.5, 0.5) {3};
\node at (4.5, 0.5) {2};
\node at (5.5, 0.5) {5};
\node at (6.5, 0.5) {2};
\node at (7.5, 0.5) {6};
\node at (8.5, 0.5) {7};
\node at (9.5, 0.5) {1};
\node at (10.5, 0.5) {7};
\node at (11.5, 0.5) {5};
\node at (12.5, 0.5) {6};
\node at (13.5, 0.5) {2};
\node at (14.5, 0.5) {3};
\node at (15.5, 0.5) {2};
\node at (2, 1.5) {21};
\node at (6, 1.5) {15};
\node at (10, 1.5) {20};
\node at (14, 1.5) {13};
\end{tikzpicture}
\end{center}
Then, to calculate the sum of elements in a range,
we divide the range into three parts such that
the sum consists of values of single elements
and sums of blocks between them:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (3,0) rectangle (4,1);
\fill[color=lightgray] (12,0) rectangle (13,1);
\fill[color=lightgray] (13,0) rectangle (14,1);
\draw (0,0) grid (16,1);
\fill[color=lightgray] (4,1) rectangle (8,2);
\fill[color=lightgray] (8,1) rectangle (12,2);
\draw (0,1) rectangle (4,2);
\draw (4,1) rectangle (8,2);
\draw (8,1) rectangle (12,2);
\draw (12,1) rectangle (16,2);
\node at (0.5, 0.5) {5};
\node at (1.5, 0.5) {8};
\node at (2.5, 0.5) {6};
\node at (3.5, 0.5) {3};
\node at (4.5, 0.5) {2};
\node at (5.5, 0.5) {5};
\node at (6.5, 0.5) {2};
\node at (7.5, 0.5) {6};
\node at (8.5, 0.5) {7};
\node at (9.5, 0.5) {1};
\node at (10.5, 0.5) {7};
\node at (11.5, 0.5) {5};
\node at (12.5, 0.5) {6};
\node at (13.5, 0.5) {2};
\node at (14.5, 0.5) {3};
\node at (15.5, 0.5) {2};
\node at (2, 1.5) {21};
\node at (6, 1.5) {15};
\node at (10, 1.5) {20};
\node at (14, 1.5) {13};
\draw [decoration={brace}, decorate, line width=0.5mm] (14,-0.25) -- (3,-0.25);
\end{tikzpicture}
\end{center}
Since the number of single elements is $O(\sqrt n)$
and the number of blocks is also $O(\sqrt n)$,
the sum query takes $O(\sqrt n)$ time.
The purpose of the block size $\sqrt n$ is
that it \emph{balances} two things:
the array is divided into $\sqrt n$ blocks,
each of which contains $\sqrt n$ elements.
In practice, it is not necessary to use the
exact value of $\sqrt n$ as a parameter,
and instead we may use parameters $k$ and $n/k$ where $k$ is
different from $\sqrt n$.
The optimal parameter depends on the problem and input.
For example, if an algorithm often goes
through the blocks but rarely inspects
single elements inside the blocks,
it may be a good idea to divide the array into
$k < \sqrt n$ blocks, each of which contains $n/k > \sqrt n$
elements.
\section{Combining algorithms}
In this section we discuss two square root algorithms
that are based on combining two algorithms into one algorithm.
In both cases, we could use either of the algorithms
without the other
and solve the problem in $O(n^2)$ time.
However, by combining the algorithms, the running
time is only $O(n \sqrt n)$.
\subsubsection{Case processing}
Suppose that we are given a two-dimensional
grid that contains $n$ cells.
Each cell is assigned a letter,
and our task is to find two cells
with the same letter whose distance is minimum,
where the distance between cells
$(x_1,y_1)$ and $(x_2,y_2)$ is $|x_1-x_2|+|y_1-y_2|$.
For example, consider the following grid:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\node at (0.5,0.5) {A};
\node at (0.5,1.5) {B};
\node at (0.5,2.5) {C};
\node at (0.5,3.5) {A};
\node at (1.5,0.5) {C};
\node at (1.5,1.5) {D};
\node at (1.5,2.5) {E};
\node at (1.5,3.5) {F};
\node at (2.5,0.5) {B};
\node at (2.5,1.5) {A};
\node at (2.5,2.5) {G};
\node at (2.5,3.5) {B};
\node at (3.5,0.5) {D};
\node at (3.5,1.5) {F};
\node at (3.5,2.5) {E};
\node at (3.5,3.5) {A};
\draw (0,0) grid (4,4);
\end{tikzpicture}
\end{center}
In this case, the minimum distance is 2 between the two 'E' letters.
We can solve the problem by considering each letter separately.
Using this approach, the new problem is to calculate
the minimum distance
between two cells with a \emph{fixed} letter $c$.
We focus on two algorithms for this:
\emph{Algorithm 1:} Go through all pairs of cells with letter $c$,
and calculate the minimum distance between such cells.
This will take $O(k^2)$ time where $k$ is the number of cells with letter $c$.
\emph{Algorithm 2:} Perform a breadth-first search that simultaneously
starts at each cell with letter $c$. The minimum distance between
two cells with letter $c$ will be calculated in $O(n)$ time.
One way to solve the problem is to choose either of the
algorithms and use it for all letters.
If we use Algorithm 1, the running time is $O(n^2)$,
because all cells may contain the same letter,
and in this case $k=n$.
Also if we use Algorithm 2, the running time is $O(n^2)$,
because all cells may have different letters,
and in this case $n$ searches are needed.
However, we can \emph{combine} the two algorithms and
use different algorithms for different letters
depending on how many times each letter appears in the grid.
Assume that a letter $c$ appears $k$ times.
If $k \le \sqrt n$, we use Algorithm 1, and if $k > \sqrt n$,
we use Algorithm 2.
It turns out that by doing this, the total running time
of the algorithm is only $O(n \sqrt n)$.
First, suppose that we use Algorithm 1 for a letter $c$.
Since $c$ appears at most $\sqrt n$ times in the grid,
we compare each cell with letter $c$ $O(\sqrt n)$ times
with other cells.
Thus, the time used for processing all such cells is $O(n \sqrt n)$.
Then, suppose that we use Algorithm 2 for a letter $c$.
There are at most $\sqrt n$ such letters,
so processing those letters also takes $O(n \sqrt n)$ time.
\subsubsection{Batch processing}
Our next problem also deals with
a two-dimensional grid that contains $n$ cells.
Initially, each cell except one is white.
We perform $n-1$ operations, each of which first
calculates the minimum distance from a given white cell
to a black cell, and then paints the white cell black.
For example, consider the following operation:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=black] (1,1) rectangle (2,2);
\fill[color=black] (3,1) rectangle (4,2);
\fill[color=black] (0,3) rectangle (1,4);
\node at (2.5,3.5) {*};
\draw (0,0) grid (4,4);
\end{tikzpicture}
\end{center}
First, we calculate the minimum distance
from the white cell marked with * to a black cell.
The minimum distance is 2, because we can move
two steps left to a black cell.
Then, we paint the white cell black:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=black] (1,1) rectangle (2,2);
\fill[color=black] (3,1) rectangle (4,2);
\fill[color=black] (0,3) rectangle (1,4);
\fill[color=black] (2,3) rectangle (3,4);
\draw (0,0) grid (4,4);
\end{tikzpicture}
\end{center}
Consider the following two algorithms:
\emph{Algorithm 1:} Use breadth-first search
to calculate
for each white cell the distance to the nearest black cell.
This takes $O(n)$ time, and after the search,
we can find the minimum distance from any white cell
to a black cell in $O(1)$ time.
\emph{Algorithm 2:} Maintain a list of cells that have been
painted black, go through this list at each operation
and then add a new cell to the list.
An operation takes $O(k)$ time where $k$ is the length of the list.
We combine the above algorithms by
dividing the operations into
$O(\sqrt n)$ \emph{batches}, each of which consists
of $O(\sqrt n)$ operations.
At the beginning of each batch,
we perform Algorithm 1.
Then, we use Algorithm 2 to process the operations
in the batch.
We clear the list of Algorithm 2 between
the batches.
At each operation,
the minimum distance to a black cell
is either the distance calculated by Algorithm 1
or the distance calculated by Algorithm 2.
The resulting algorithm works in
$O(n \sqrt n)$ time.
First, Algorithm 1 is performed $O(\sqrt n)$ times,
and each search works in $O(n)$ time.
Second, when using Algorithm 2 in a batch,
the list contains $O(\sqrt n)$ cells
(because we clear the list between the batches)
and each operation takes $O(\sqrt n)$ time.
\section{Integer partitions}
Some square root algorithms are based on
the following observation:
if a positive integer $n$ is represented as
a sum of positive integers,
such a sum always contains at most
$O(\sqrt n)$ \emph{distinct} numbers.
The reason for this is that to construct
a sum that contains a maximum number of distinct
numbers, we should choose \emph{small} numbers.
If we choose the numbers $1,2,\ldots,k$,
the resulting sum is
\[\frac{k(k+1)}{2}.\]
Thus, the maximum amount of distinct numbers is $k = O(\sqrt n)$.
Next we will discuss two problems that can be solved
efficiently using this observation.
\subsubsection{Knapsack}
Suppose that we are given a list of integer weights
whose sum is $n$.
Our task is to find out all sums that can be formed using
a subset of the weights. For example, if the weights are
$\{1,3,3\}$, the possible sums are as follows:
\begin{itemize}[noitemsep]
\item $0$ (empty set)
\item $1$
\item $3$
\item $1+3=4$
\item $3+3=6$
\item $1+3+3=7$
\end{itemize}
Using the standard knapsack approach (see Chapter 7.4),
the problem can be solved as follows:
we define a function $\texttt{possible}(x,k)$ whose value is 1
if the sum $x$ can be formed using the first $k$ weights,
and 0 otherwise.
Since the sum of the weights is $n$,
there are at most $n$ weights and
all values of the function can be calculated
in $O(n^2)$ time using dynamic programming.
However, we can make the algorithm more efficient
by using the fact that there are at most $O(\sqrt n)$
\emph{distinct} weights.
Thus, we can process the weights in groups
that consists of similar weights.
We can process each group
in $O(n)$ time, which yields an $O(n \sqrt n)$ time algorithm.
The idea is to use an array that records the sums of weights
that can be formed using the groups processed so far.
The array contains $n$ elements: element $k$ is 1 if the sum
$k$ can be formed and 0 otherwise.
To process a group of weights, we scan the array
from left to right and record the new sums of weights that
can be formed using this group and the previous groups.
\subsubsection{String construction}
Given a string \texttt{s} of length $n$
and a set of strings $D$ whose total length is $m$,
consider the problem of counting the number of ways
\texttt{s} can be formed as a concatenation of strings in $D$.
For example,
if $\texttt{s}=\texttt{ABAB}$ and
$D=\{\texttt{A},\texttt{B},\texttt{AB}\}$,
there are 4 ways:
\begin{itemize}[noitemsep]
\item $\texttt{A}+\texttt{B}+\texttt{A}+\texttt{B}$
\item $\texttt{AB}+\texttt{A}+\texttt{B}$
\item $\texttt{A}+\texttt{B}+\texttt{AB}$
\item $\texttt{AB}+\texttt{AB}$
\end{itemize}
We can solve the problem using dynamic programming:
Let $\texttt{count}(k)$ denote the number of ways to construct the prefix
$\texttt{s}[0 \ldots k]$ using the strings in $D$.
Now $\texttt{count}(n-1)$ gives the answer to the problem,
and we can solve the problem in $O(n^2)$ time
using a trie structure.
However, we can solve the problem more efficiently
by using string hashing and the fact that there
are at most $O(\sqrt m)$ distinct string lengths in $D$.
First, we construct a set $H$ that contains all
hash values of the strings in $D$.
Then, when calculating a value of $\texttt{count}(k)$,
we go through all values of $p$
such that there is a string of length $p$ in $D$,
calculate the hash value of $\texttt{s}[k-p+1 \ldots k]$
and check if it belongs to $H$.
Since there are at most $O(\sqrt m)$ distinct string lengths,
this results in an algorithm whose running time is $O(n \sqrt m)$.
\section{Mo's algorithm}
\index{Mo's algorithm}
\key{Mo's algorithm}\footnote{According to \cite{cod15}, this algorithm
is named after Mo Tao, a Chinese competitive programmer, but
the technique has appeared earlier in the literature \cite{ken06}.}
can be used in many problems
that require processing range queries in
a \emph{static} array, i.e., the array values
do not change between the queries.
In each query, we are given a range $[a,b]$,
and we should calculate a value based on the
array elements between positions $a$ and $b$.
Since the array is static,
the queries can be processed in any order,
and Mo's algorithm
processes the queries in a special order which guarantees
that the algorithm works efficiently.
Mo's algorithm maintains an \emph{active range}
of the array, and the answer to a query
concerning the active range is known at each moment.
The algorithm processes the queries one by one,
and always moves the endpoints of the
active range by inserting and removing elements.
The time complexity of the algorithm is
$O(n \sqrt n f(n))$ where the array contains
$n$ elements, there are $n$ queries
and each insertion and removal of an element
takes $O(f(n))$ time.
The trick in Mo's algorithm is the order
in which the queries are processed:
The array is divided into blocks of $k=O(\sqrt n)$
elements, and a query $[a_1,b_1]$
is processed before a query $[a_2,b_2]$
if either
\begin{itemize}
\item $\lfloor a_1/k \rfloor < \lfloor a_2/k \rfloor$ or
\item $\lfloor a_1/k \rfloor = \lfloor a_2/k \rfloor$ and $b_1 < b_2$.
\end{itemize}
Thus, all queries whose left endpoints are
in a certain block are processed one after another
sorted according to their right endpoints.
Using this order, the algorithm
only performs $O(n \sqrt n)$ operations,
because the left endpoint moves
$O(n)$ times $O(\sqrt n)$ steps,
and the right endpoint moves
$O(\sqrt n)$ times $O(n)$ steps. Thus, both
endpoints move a total of $O(n \sqrt n)$ steps during the algorithm.
\subsubsection*{Example}
As an example, consider a problem
where we are given a set of queries,
each of them corresponding to a range in an array,
and our task is to calculate for each query
the number of \emph{distinct} elements in the range.
In Mo's algorithm, the queries are always sorted
in the same way, but it depends on the problem
how the answer to the query is maintained.
In this problem, we can maintain an array
\texttt{count} where $\texttt{count}[x]$
indicates the number of times an element $x$
occurs in the active range.
When we move from one query to another query,
the active range changes.
For example, if the current range is
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (1,0) rectangle (5,1);
\draw (0,0) grid (9,1);
\node at (0.5, 0.5) {4};
\node at (1.5, 0.5) {2};
\node at (2.5, 0.5) {5};
\node at (3.5, 0.5) {4};
\node at (4.5, 0.5) {2};
\node at (5.5, 0.5) {4};
\node at (6.5, 0.5) {3};
\node at (7.5, 0.5) {3};
\node at (8.5, 0.5) {4};
\end{tikzpicture}
\end{center}
and the next range is
\begin{center}
\begin{tikzpicture}[scale=0.7]
\fill[color=lightgray] (2,0) rectangle (7,1);
\draw (0,0) grid (9,1);
\node at (0.5, 0.5) {4};
\node at (1.5, 0.5) {2};
\node at (2.5, 0.5) {5};
\node at (3.5, 0.5) {4};
\node at (4.5, 0.5) {2};
\node at (5.5, 0.5) {4};
\node at (6.5, 0.5) {3};
\node at (7.5, 0.5) {3};
\node at (8.5, 0.5) {4};
\end{tikzpicture}
\end{center}
there will be three steps:
the left endpoint moves one step to the right,
and the right endpoint moves two steps to the right.
After each step, the array \texttt{count}
needs to be updated.
After adding an element $x$,
we increase the value of
$\texttt{count}[x]$ by 1,
and if $\texttt{count}[x]=1$ after this,
we also increase the answer to the query by 1.
Similarly, after removing an element $x$,
we decrease the value of
$\texttt{count}[x]$ by 1,
and if $\texttt{count}[x]=0$ after this,
we also decrease the answer to the query by 1.
In this problem, the time needed to perform
each step is $O(1)$, so the total time complexity
of the algorithm is $O(n \sqrt n)$.

1161
chapter28.tex Normal file

File diff suppressed because it is too large Load Diff

782
chapter29.tex Normal file
View File

@ -0,0 +1,782 @@
\chapter{Geometry}
\index{geometry}
In geometric problems, it is often challenging
to find a way to approach the problem so that
the solution to the problem can be conveniently implemented
and the number of special cases is small.
As an example, consider a problem where
we are given the vertices of a quadrilateral
(a polygon that has four vertices),
and our task is to calculate its area.
For example, a possible input for the problem is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[fill] (6,2) circle [radius=0.1];
\draw[fill] (5,6) circle [radius=0.1];
\draw[fill] (2,5) circle [radius=0.1];
\draw[fill] (1,1) circle [radius=0.1];
\draw[thick] (6,2) -- (5,6) -- (2,5) -- (1,1) -- (6,2);
\end{tikzpicture}
\end{center}
One way to approach the problem is to divide
the quadrilateral into two triangles by a straight
line between two opposite vertices:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[fill] (6,2) circle [radius=0.1];
\draw[fill] (5,6) circle [radius=0.1];
\draw[fill] (2,5) circle [radius=0.1];
\draw[fill] (1,1) circle [radius=0.1];
\draw[thick] (6,2) -- (5,6) -- (2,5) -- (1,1) -- (6,2);
\draw[dashed,thick] (2,5) -- (6,2);
\end{tikzpicture}
\end{center}
After this, it suffices to sum the areas
of the triangles.
The area of a triangle can be calculated,
for example, using \key{Heron's formula}
%\footnote{Heron of Alexandria (c. 10--70) was a Greek mathematician.}
\[ \sqrt{s (s-a) (s-b) (s-c)},\]
where $a$, $b$ and $c$ are the lengths
of the triangle's sides and
$s=(a+b+c)/2$.
\index{Heron's formula}
This is a possible way to solve the problem,
but there is one pitfall:
how to divide the quadrilateral into triangles?
It turns out that sometimes we cannot just pick
two arbitrary opposite vertices.
For example, in the following situation,
the division line is \emph{outside} the quadrilateral:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[fill] (6,2) circle [radius=0.1];
\draw[fill] (3,2) circle [radius=0.1];
\draw[fill] (2,5) circle [radius=0.1];
\draw[fill] (1,1) circle [radius=0.1];
\draw[thick] (6,2) -- (3,2) -- (2,5) -- (1,1) -- (6,2);
\draw[dashed,thick] (2,5) -- (6,2);
\end{tikzpicture}
\end{center}
However, another way to draw the line works:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[fill] (6,2) circle [radius=0.1];
\draw[fill] (3,2) circle [radius=0.1];
\draw[fill] (2,5) circle [radius=0.1];
\draw[fill] (1,1) circle [radius=0.1];
\draw[thick] (6,2) -- (3,2) -- (2,5) -- (1,1) -- (6,2);
\draw[dashed,thick] (3,2) -- (1,1);
\end{tikzpicture}
\end{center}
It is clear for a human which of the lines is the correct
choice, but the situation is difficult for a computer.
However, it turns out that we can solve the problem using
another method that is more convenient to a programmer.
Namely, there is a general formula
\[x_1y_2-x_2y_1+x_2y_3-x_3y_2+x_3y_4-x_4y_3+x_4y_1-x_1y_4,\]
that calculates the area of a quadrilateral
whose vertices are
$(x_1,y_1)$,
$(x_2,y_2)$,
$(x_3,y_3)$ and
$(x_4,y_4)$.
This formula is easy to implement, there are no special
cases, and we can even generalize the formula
to \emph{all} polygons.
\section{Complex numbers}
\index{complex number}
\index{point}
\index{vector}
A \key{complex number} is a number of the form $x+y i$,
where $i = \sqrt{-1}$ is the \key{imaginary unit}.
A geometric interpretation of a complex number is
that it represents a two-dimensional point $(x,y)$
or a vector from the origin to a point $(x,y)$.
For example, $4+2i$ corresponds to the
following point and vector:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[->,thick] (-5,0)--(5,0);
\draw[->,thick] (0,-5)--(0,5);
\draw[fill] (4,2) circle [radius=0.1];
\draw[->,thick] (0,0)--(4-0.1,2-0.1);
\node at (4,2.8) {$(4,2)$};
\end{tikzpicture}
\end{center}
\index{complex@\texttt{complex}}
The C++ complex number class \texttt{complex} is
useful when solving geometric problems.
Using the class we can represent points and vectors
as complex numbers, and the class contains tools
that are useful in geometry.
In the following code, \texttt{C} is the type of
a coordinate and \texttt{P} is the type of a point or a vector.
In addition, the code defines macros \texttt{X} and \texttt{Y}
that can be used to refer to x and y coordinates.
\begin{lstlisting}
typedef long long C;
typedef complex<C> P;
#define X real()
#define Y imag()
\end{lstlisting}
For example, the following code defines a point $p=(4,2)$
and prints its x and y coordinates:
\begin{lstlisting}
P p = {4,2};
cout << p.X << " " << p.Y << "\n"; // 4 2
\end{lstlisting}
The following code defines vectors $v=(3,1)$ and $u=(2,2)$,
and after that calculates the sum $s=v+u$.
\begin{lstlisting}
P v = {3,1};
P u = {2,2};
P s = v+u;
cout << s.X << " " << s.Y << "\n"; // 5 3
\end{lstlisting}
In practice,
an appropriate coordinate type is usually
\texttt{long long} (integer) or \texttt{long double}
(real number).
It is a good idea to use integer whenever possible,
because calculations with integers are exact.
If real numbers are needed,
precision errors should be taken into account
when comparing numbers.
A safe way to check if real numbers $a$ and $b$ are equal
is to compare them using $|a-b|<\epsilon$,
where $\epsilon$ is a small number (for example, $\epsilon=10^{-9}$).
\subsubsection*{Functions}
In the following examples, the coordinate type is
\texttt{long double}.
The function $\texttt{abs}(v)$ calculates the length
$|v|$ of a vector $v=(x,y)$
using the formula $\sqrt{x^2+y^2}$.
The function can also be used for
calculating the distance between points
$(x_1,y_1)$ and $(x_2,y_2)$,
because that distance equals the length
of the vector $(x_2-x_1,y_2-y_1)$.
The following code calculates the distance
between points $(4,2)$ and $(3,-1)$:
\begin{lstlisting}
P a = {4,2};
P b = {3,-1};
cout << abs(b-a) << "\n"; // 3.16228
\end{lstlisting}
The function $\texttt{arg}(v)$ calculates the
angle of a vector $v=(x,y)$ with respect to the x axis.
The function gives the angle in radians,
where $r$ radians equals $180 r/\pi$ degrees.
The angle of a vector that points to the right is 0,
and angles decrease clockwise and increase
counterclockwise.
The function $\texttt{polar}(s,a)$ constructs a vector
whose length is $s$ and that points to an angle $a$.
A vector can be rotated by an angle $a$
by multiplying it by a vector with length 1 and angle $a$.
The following code calculates the angle of
the vector $(4,2)$, rotates it $1/2$ radians
counterclockwise, and then calculates the angle again:
\begin{lstlisting}
P v = {4,2};
cout << arg(v) << "\n"; // 0.463648
v *= polar(1.0,0.5);
cout << arg(v) << "\n"; // 0.963648
\end{lstlisting}
\section{Points and lines}
\index{cross product}
The \key{cross product} $a \times b$ of vectors
$a=(x_1,y_1)$ and $b=(x_2,y_2)$ is calculated
using the formula $x_1 y_2 - x_2 y_1$.
The cross product tells us whether $b$
turns left (positive value), does not turn (zero)
or turns right (negative value)
when it is placed directly after $a$.
The following picture illustrates the above cases:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[->,thick] (0,0)--(4,2);
\draw[->,thick] (4,2)--(4+1,2+2);
\node at (2.5,0.5) {$a$};
\node at (5,2.5) {$b$};
\node at (3,-2) {$a \times b = 6$};
\draw[->,thick] (8+0,0)--(8+4,2);
\draw[->,thick] (8+4,2)--(8+4+2,2+1);
\node at (8+2.5,0.5) {$a$};
\node at (8+5,1.5) {$b$};
\node at (8+3,-2) {$a \times b = 0$};
\draw[->,thick] (16+0,0)--(16+4,2);
\draw[->,thick] (16+4,2)--(16+4+2,2-1);
\node at (16+2.5,0.5) {$a$};
\node at (16+5,2.5) {$b$};
\node at (16+3,-2) {$a \times b = -8$};
\end{tikzpicture}
\end{center}
\noindent
For example, in the first case
$a=(4,2)$ and $b=(1,2)$.
The following code calculates the cross product
using the class \texttt{complex}:
\begin{lstlisting}
P a = {4,2};
P b = {1,2};
C p = (conj(a)*b).Y; // 6
\end{lstlisting}
The above code works, because
the function \texttt{conj} negates the y coordinate
of a vector,
and when the vectors $(x_1,-y_1)$ and $(x_2,y_2)$
are multiplied together, the y coordinate
of the result is $x_1 y_2 - x_2 y_1$.
\subsubsection{Point location}
Cross products can be used to test
whether a point is located on the left or right
side of a line.
Assume that the line goes through points
$s_1$ and $s_2$, we are looking from $s_1$
to $s_2$ and the point is $p$.
For example, in the following picture,
$p$ is on the left side of the line:
\begin{center}
\begin{tikzpicture}[scale=0.45]
\draw[dashed,thick,->] (0,-3)--(12,6);
\draw[fill] (4,0) circle [radius=0.1];
\draw[fill] (8,3) circle [radius=0.1];
\draw[fill] (5,3) circle [radius=0.1];
\node at (4,-1) {$s_1$};
\node at (8,2) {$s_2$};
\node at (5,4) {$p$};
\end{tikzpicture}
\end{center}
The cross product $(p-s_1) \times (p-s_2)$
tells us the location of the point $p$.
If the cross product is positive,
$p$ is located on the left side,
and if the cross product is negative,
$p$ is located on the right side.
Finally, if the cross product is zero,
points $s_1$, $s_2$ and $p$ are on the same line.
\subsubsection{Line segment intersection}
\index{line segment intersection}
Next we consider the problem of testing
whether two line segments
$ab$ and $cd$ intersect. The possible cases are:
\textit{Case 1:}
The line segments are on the same line
and they overlap each other.
In this case, there is an infinite number of
intersection points.
For example, in the following picture,
all points between $c$ and $b$ are
intersection points:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\draw (1.5,1.5)--(6,3);
\draw (0,1)--(4.5,2.5);
\draw[fill] (0,1) circle [radius=0.05];
\node at (0,0.5) {$a$};
\draw[fill] (1.5,1.5) circle [radius=0.05];
\node at (6,2.5) {$d$};
\draw[fill] (4.5,2.5) circle [radius=0.05];
\node at (1.5,1) {$c$};
\draw[fill] (6,3) circle [radius=0.05];
\node at (4.5,2) {$b$};
\end{tikzpicture}
\end{center}
In this case, we can use cross products to
check if all points are on the same line.
After this, we can sort the points and check
whether the line segments overlap each other.
\textit{Case 2:}
The line segments have a common vertex
that is the only intersection point.
For example, in the following picture the
intersection point is $b=c$:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\draw (0,0)--(4,2);
\draw (4,2)--(6,1);
\draw[fill] (0,0) circle [radius=0.05];
\draw[fill] (4,2) circle [radius=0.05];
\draw[fill] (6,1) circle [radius=0.05];
\node at (0,0.5) {$a$};
\node at (4,2.5) {$b=c$};
\node at (6,1.5) {$d$};
\end{tikzpicture}
\end{center}
This case is easy to check, because
there are only four possibilities
for the intersection point:
$a=c$, $a=d$, $b=c$ and $b=d$.
\textit{Case 3:}
There is exactly one intersection point
that is not a vertex of any line segment.
In the following picture, the point $p$
is the intersection point:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\draw (0,1)--(6,3);
\draw (2,4)--(4,0);
\draw[fill] (0,1) circle [radius=0.05];
\node at (0,0.5) {$c$};
\draw[fill] (6,3) circle [radius=0.05];
\node at (6,2.5) {$d$};
\draw[fill] (2,4) circle [radius=0.05];
\node at (1.5,3.5) {$a$};
\draw[fill] (4,0) circle [radius=0.05];
\node at (4,-0.4) {$b$};
\draw[fill] (3,2) circle [radius=0.05];
\node at (3,1.5) {$p$};
\end{tikzpicture}
\end{center}
In this case, the line segments intersect
exactly when both points $c$ and $d$ are
on different sides of a line through $a$ and $b$,
and points $a$ and $b$ are on different
sides of a line through $c$ and $d$.
We can use cross products to check this.
\subsubsection{Point distance from a line}
Another feature of cross products is that
the area of a triangle can be calculated
using the formula
\[\frac{| (a-c) \times (b-c) |}{2},\]
where $a$, $b$ and $c$ are the vertices of the triangle.
Using this fact, we can derive a formula
for calculating the shortest distance between a point and a line.
For example, in the following picture $d$ is the
shortest distance between the point $p$ and the line
that is defined by the points $s_1$ and $s_2$:
\begin{center}
\begin{tikzpicture}[scale=0.75]
\draw (-2,-1)--(6,3);
\draw[dashed] (1,4)--(2.40,1.2);
\node at (0,-0.5) {$s_1$};
\node at (4,1.5) {$s_2$};
\node at (0.5,4) {$p$};
\node at (2,2.7) {$d$};
\draw[fill] (0,0) circle [radius=0.05];
\draw[fill] (4,2) circle [radius=0.05];
\draw[fill] (1,4) circle [radius=0.05];
\end{tikzpicture}
\end{center}
The area of the triangle whose vertices are
$s_1$, $s_2$ and $p$ can be calculated in two ways:
it is both
$\frac{1}{2} |s_2-s_1| d$ and
$\frac{1}{2} ((s_1-p) \times (s_2-p))$.
Thus, the shortest distance is
\[ d = \frac{(s_1-p) \times (s_2-p)}{|s_2-s_1|} .\]
\subsubsection{Point inside a polygon}
Let us now consider the problem of
testing whether a point is located inside or outside
a polygon.
For example, in the following picture point $a$
is inside the polygon and point $b$ is outside
the polygon.
\begin{center}
\begin{tikzpicture}[scale=0.75]
%\draw (0,0)--(2,-2)--(3,1)--(5,1)--(2,3)--(1,2)--(-1,2)--(1,4)--(-2,4)--(-2,1)--(-3,3)--(-4,0)--(0,0);
\draw (0,0)--(2,2)--(5,1)--(2,3)--(1,2)--(-1,2)--(1,4)--(-2,4)--(-2,1)--(-3,3)--(-4,0)--(0,0);
\draw[fill] (-3,1) circle [radius=0.05];
\node at (-3,0.5) {$a$};
\draw[fill] (1,3) circle [radius=0.05];
\node at (1,2.5) {$b$};
\end{tikzpicture}
\end{center}
A convenient way to solve the problem is to
send a \emph{ray} from the point to an arbitrary direction
and calculate the number of times it touches
the boundary of the polygon.
If the number is odd,
the point is inside the polygon,
and if the number is even,
the point is outside the polygon.
\begin{samepage}
For example, we could send the following rays:
\begin{center}
\begin{tikzpicture}[scale=0.75]
\draw (0,0)--(2,2)--(5,1)--(2,3)--(1,2)--(-1,2)--(1,4)--(-2,4)--(-2,1)--(-3,3)--(-4,0)--(0,0);
\draw[fill] (-3,1) circle [radius=0.05];
\node at (-3,0.5) {$a$};
\draw[fill] (1,3) circle [radius=0.05];
\node at (1,2.5) {$b$};
\draw[dashed,->] (-3,1)--(-6,0);
\draw[dashed,->] (-3,1)--(0,5);
\draw[dashed,->] (1,3)--(3.5,0);
\draw[dashed,->] (1,3)--(3,4);
\end{tikzpicture}
\end{center}
\end{samepage}
The rays from $a$ touch 1 and 3 times
the boundary of the polygon,
so $a$ is inside the polygon.
Correspondingly, the rays from $b$
touch 0 and 2 times the boundary of the polygon,
so $b$ is outside the polygon.
\section{Polygon area}
A general formula for calculating the area
of a polygon, sometimes called the \key{shoelace formula},
is as follows: \index{shoelace formula}
\[\frac{1}{2} |\sum_{i=1}^{n-1} (p_i \times p_{i+1})| =
\frac{1}{2} |\sum_{i=1}^{n-1} (x_i y_{i+1} - x_{i+1} y_i)|, \]
Here the vertices are
$p_1=(x_1,y_1)$, $p_2=(x_2,y_2)$, $\ldots$, $p_n=(x_n,y_n)$
in such an order that
$p_i$ and $p_{i+1}$ are adjacent vertices on the boundary
of the polygon,
and the first and last vertex is the same, i.e., $p_1=p_n$.
For example, the area of the polygon
\begin{center}
\begin{tikzpicture}[scale=0.7]
\filldraw (4,1.4) circle (2pt);
\filldraw (7,3.4) circle (2pt);
\filldraw (5,5.4) circle (2pt);
\filldraw (2,4.4) circle (2pt);
\filldraw (4,3.4) circle (2pt);
\node (1) at (4,1) {(4,1)};
\node (2) at (7.2,3) {(7,3)};
\node (3) at (5,5.8) {(5,5)};
\node (4) at (2,4) {(2,4)};
\node (5) at (3.5,3) {(4,3)};
\path[draw] (4,1.4) -- (7,3.4) -- (5,5.4) -- (2,4.4) -- (4,3.4) -- (4,1.4);
\end{tikzpicture}
\end{center}
is
\[\frac{|(2\cdot5-5\cdot4)+(5\cdot3-7\cdot5)+(7\cdot1-4\cdot3)+(4\cdot3-4\cdot1)+(4\cdot4-2\cdot3)|}{2} = 17/2.\]
The idea of the formula is to go through trapezoids
whose one side is a side of the polygon,
and another side lies on the horizontal line $y=0$.
For example:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\path[draw,fill=lightgray] (5,5.4) -- (7,3.4) -- (7,0) -- (5,0) -- (5,5.4);
\filldraw (4,1.4) circle (2pt);
\filldraw (7,3.4) circle (2pt);
\filldraw (5,5.4) circle (2pt);
\filldraw (2,4.4) circle (2pt);
\filldraw (4,3.4) circle (2pt);
\node (1) at (4,1) {(4,1)};
\node (2) at (7.2,3) {(7,3)};
\node (3) at (5,5.8) {(5,5)};
\node (4) at (2,4) {(2,4)};
\node (5) at (3.5,3) {(4,3)};
\path[draw] (4,1.4) -- (7,3.4) -- (5,5.4) -- (2,4.4) -- (4,3.4) -- (4,1.4);
\draw (0,0) -- (10,0);
\end{tikzpicture}
\end{center}
The area of such a trapezoid is
\[(x_{i+1}-x_{i}) \frac{y_i+y_{i+1}}{2},\]
where the vertices of the polygon are $p_i$ and $p_{i+1}$.
If $x_{i+1}>x_{i}$, the area is positive,
and if $x_{i+1}<x_{i}$, the area is negative.
The area of the polygon is the sum of areas of
all such trapezoids, which yields the formula
\[|\sum_{i=1}^{n-1} (x_{i+1}-x_{i}) \frac{y_i+y_{i+1}}{2}| =
\frac{1}{2} |\sum_{i=1}^{n-1} (x_i y_{i+1} - x_{i+1} y_i)|.\]
Note that the absolute value of the sum is taken,
because the value of the sum may be positive or negative,
depending on whether we walk clockwise or counterclockwise
along the boundary of the polygon.
\subsubsection{Pick's theorem}
\index{Pick's theorem}
\key{Pick's theorem} provides another way to calculate
the area of a polygon provided that all vertices
of the polygon have integer coordinates.
According to Pick's theorem, the area of the polygon is
\[ a + b/2 -1,\]
where $a$ is the number of integer points inside the polygon
and $b$ is the number of integer points on the boundary of the polygon.
For example, the area of the polygon
\begin{center}
\begin{tikzpicture}[scale=0.7]
\filldraw (4,1.4) circle (2pt);
\filldraw (7,3.4) circle (2pt);
\filldraw (5,5.4) circle (2pt);
\filldraw (2,4.4) circle (2pt);
\filldraw (4,3.4) circle (2pt);
\node (1) at (4,1) {(4,1)};
\node (2) at (7.2,3) {(7,3)};
\node (3) at (5,5.8) {(5,5)};
\node (4) at (2,4) {(2,4)};
\node (5) at (3.5,3) {(4,3)};
\path[draw] (4,1.4) -- (7,3.4) -- (5,5.4) -- (2,4.4) -- (4,3.4) -- (4,1.4);
\filldraw (2,4.4) circle (2pt);
\filldraw (3,4.4) circle (2pt);
\filldraw (4,4.4) circle (2pt);
\filldraw (5,4.4) circle (2pt);
\filldraw (6,4.4) circle (2pt);
\filldraw (4,3.4) circle (2pt);
\filldraw (5,3.4) circle (2pt);
\filldraw (6,3.4) circle (2pt);
\filldraw (7,3.4) circle (2pt);
\filldraw (4,2.4) circle (2pt);
\filldraw (5,2.4) circle (2pt);
\end{tikzpicture}
\end{center}
is $6+7/2-1=17/2$.
\section{Distance functions}
\index{distance function}
\index{Euclidean distance}
\index{Manhattan distance}
A \key{distance function} defines the distance between
two points.
The usual distance function is the
\key{Euclidean distance} where the distance between
points $(x_1,y_1)$ and $(x_2,y_2)$ is
\[\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}.\]
An alternative distance function is the
\key{Manhattan distance}
where the distance between points
$(x_1,y_1)$ and $(x_2,y_2)$ is
\[|x_1-x_2|+|y_1-y_2|.\]
\begin{samepage}
For example, consider the following picture:
\begin{center}
\begin{tikzpicture}
\draw[fill] (2,1) circle [radius=0.05];
\draw[fill] (5,2) circle [radius=0.05];
\node at (2,0.5) {$(2,1)$};
\node at (5,1.5) {$(5,2)$};
\draw[dashed] (2,1) -- (5,2);
\draw[fill] (5+2,1) circle [radius=0.05];
\draw[fill] (5+5,2) circle [radius=0.05];
\node at (5+2,0.5) {$(2,1)$};
\node at (5+5,1.5) {$(5,2)$};
\draw[dashed] (5+2,1) -- (5+2,2);
\draw[dashed] (5+2,2) -- (5+5,2);
\node at (3.5,-0.5) {Euclidean distance};
\node at (5+3.5,-0.5) {Manhattan distance};
\end{tikzpicture}
\end{center}
\end{samepage}
The Euclidean distance between the points is
\[\sqrt{(5-2)^2+(2-1)^2}=\sqrt{10}\]
and the Manhattan distance is
\[|5-2|+|2-1|=4.\]
The following picture shows regions that are within a distance of 1
from the center point, using the Euclidean and Manhattan distances:
\begin{center}
\begin{tikzpicture}
\draw[fill=gray!20] (0,0) circle [radius=1];
\draw[fill] (0,0) circle [radius=0.05];
\node at (0,-1.5) {Euclidean distance};
\draw[fill=gray!20] (5+0,1) -- (5-1,0) -- (5+0,-1) -- (5+1,0) -- (5+0,1);
\draw[fill] (5,0) circle [radius=0.05];
\node at (5,-1.5) {Manhattan distance};
\end{tikzpicture}
\end{center}
\subsubsection{Rotating coordinates}
Some problems are easier to solve if
Manhattan distances are used instead of Euclidean distances.
As an example, consider a problem where we are given
$n$ points in the two-dimensional plane
and our task is to calculate the maximum Manhattan
distance between any two points.
For example, consider the following set of points:
\begin{center}
\begin{tikzpicture}[scale=0.65]
\draw[color=gray] (-1,-1) grid (4,4);
\filldraw (0,2) circle (2.5pt);
\filldraw (3,3) circle (2.5pt);
\filldraw (1,0) circle (2.5pt);
\filldraw (3,1) circle (2.5pt);
\node at (0,1.5) {$A$};
\node at (3,2.5) {$C$};
\node at (1,-0.5) {$B$};
\node at (3,0.5) {$D$};
\end{tikzpicture}
\end{center}
The maximum Manhattan distance is 5
between points $B$ and $C$:
\begin{center}
\begin{tikzpicture}[scale=0.65]
\draw[color=gray] (-1,-1) grid (4,4);
\filldraw (0,2) circle (2.5pt);
\filldraw (3,3) circle (2.5pt);
\filldraw (1,0) circle (2.5pt);
\filldraw (3,1) circle (2.5pt);
\node at (0,1.5) {$A$};
\node at (3,2.5) {$C$};
\node at (1,-0.5) {$B$};
\node at (3,0.5) {$D$};
\path[draw=red,thick,line width=2pt] (1,0) -- (1,3) -- (3,3);
\end{tikzpicture}
\end{center}
A useful technique related to Manhattan distances
is to rotate all coordinates 45 degrees so that
a point $(x,y)$ becomes $(x+y,y-x)$.
For example, after rotating the above points,
the result is:
\begin{center}
\begin{tikzpicture}[scale=0.6]
\draw[color=gray] (0,-3) grid (7,3);
\filldraw (2,2) circle (2.5pt);
\filldraw (6,0) circle (2.5pt);
\filldraw (1,-1) circle (2.5pt);
\filldraw (4,-2) circle (2.5pt);
\node at (2,1.5) {$A$};
\node at (6,-0.5) {$C$};
\node at (1,-1.5) {$B$};
\node at (4,-2.5) {$D$};
\end{tikzpicture}
\end{center}
And the maximum distance is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.6]
\draw[color=gray] (0,-3) grid (7,3);
\filldraw (2,2) circle (2.5pt);
\filldraw (6,0) circle (2.5pt);
\filldraw (1,-1) circle (2.5pt);
\filldraw (4,-2) circle (2.5pt);
\node at (2,1.5) {$A$};
\node at (6,-0.5) {$C$};
\node at (1,-1.5) {$B$};
\node at (4,-2.5) {$D$};
\path[draw=red,thick,line width=2pt] (1,-1) -- (4,2) -- (6,0);
\end{tikzpicture}
\end{center}
Consider two points $p_1=(x_1,y_1)$ and $p_2=(x_2,y_2)$ whose rotated
coordinates are $p'_1=(x'_1,y'_1)$ and $p'_2=(x'_2,y'_2)$.
Now there are two ways to express the Manhattan distance
between $p_1$ and $p_2$:
\[|x_1-x_2|+|y_1-y_2| = \max(|x'_1-x'_2|,|y'_1-y'_2|)\]
For example, if $p_1=(1,0)$ and $p_2=(3,3)$,
the rotated coordinates are $p'_1=(1,-1)$ and $p'_2=(6,0)$
and the Manhattan distance is
\[|1-3|+|0-3| = \max(|1-6|,|-1-0|) = 5.\]
The rotated coordinates provide a simple way
to operate with Manhattan distances, because we can
consider x and y coordinates separately.
To maximize the Manhattan distance between two points,
we should find two points whose
rotated coordinates maximize the value of
\[\max(|x'_1-x'_2|,|y'_1-y'_2|).\]
This is easy, because either the horizontal or vertical
difference of the rotated coordinates has to be maximum.

847
chapter30.tex Normal file
View File

@ -0,0 +1,847 @@
\chapter{Sweep line algorithms}
\index{sweep line}
Many geometric problems can be solved using
\key{sweep line} algorithms.
The idea in such algorithms is to represent
an instance of the problem as a set of events that correspond
to points in the plane.
The events are processed in increasing order
according to their x or y coordinates.
As an example, consider the following problem:
There is a company that has $n$ employees,
and we know for each employee their arrival and
leaving times on a certain day.
Our task is to calculate the maximum number of
employees that were in the office at the same time.
The problem can be solved by modeling the situation
so that each employee is assigned two events that
correspond to their arrival and leaving times.
After sorting the events, we go through them
and keep track of the number of people in the office.
For example, the table
\begin{center}
\begin{tabular}{ccc}
person & arrival time & leaving time \\
\hline
John & 10 & 15 \\
Maria & 6 & 12 \\
Peter & 14 & 16 \\
Lisa & 5 & 13 \\
\end{tabular}
\end{center}
corresponds to the following events:
\begin{center}
\begin{tikzpicture}[scale=0.6]
\draw (0,0) rectangle (17,-6.5);
\path[draw,thick,-] (10,-1) -- (15,-1);
\path[draw,thick,-] (6,-2.5) -- (12,-2.5);
\path[draw,thick,-] (14,-4) -- (16,-4);
\path[draw,thick,-] (5,-5.5) -- (13,-5.5);
\draw[fill] (10,-1) circle [radius=0.05];
\draw[fill] (15,-1) circle [radius=0.05];
\draw[fill] (6,-2.5) circle [radius=0.05];
\draw[fill] (12,-2.5) circle [radius=0.05];
\draw[fill] (14,-4) circle [radius=0.05];
\draw[fill] (16,-4) circle [radius=0.05];
\draw[fill] (5,-5.5) circle [radius=0.05];
\draw[fill] (13,-5.5) circle [radius=0.05];
\node at (2,-1) {John};
\node at (2,-2.5) {Maria};
\node at (2,-4) {Peter};
\node at (2,-5.5) {Lisa};
\end{tikzpicture}
\end{center}
We go through the events from left to right
and maintain a counter.
Always when a person arrives, we increase
the value of the counter by one,
and when a person leaves,
we decrease the value of the counter by one.
The answer to the problem is the maximum
value of the counter during the algorithm.
In the example, the events are processed as follows:
\begin{center}
\begin{tikzpicture}[scale=0.6]
\path[draw,thick,->] (0.5,0.5) -- (16.5,0.5);
\draw (0,0) rectangle (17,-6.5);
\path[draw,thick,-] (10,-1) -- (15,-1);
\path[draw,thick,-] (6,-2.5) -- (12,-2.5);
\path[draw,thick,-] (14,-4) -- (16,-4);
\path[draw,thick,-] (5,-5.5) -- (13,-5.5);
\draw[fill] (10,-1) circle [radius=0.05];
\draw[fill] (15,-1) circle [radius=0.05];
\draw[fill] (6,-2.5) circle [radius=0.05];
\draw[fill] (12,-2.5) circle [radius=0.05];
\draw[fill] (14,-4) circle [radius=0.05];
\draw[fill] (16,-4) circle [radius=0.05];
\draw[fill] (5,-5.5) circle [radius=0.05];
\draw[fill] (13,-5.5) circle [radius=0.05];
\node at (2,-1) {John};
\node at (2,-2.5) {Maria};
\node at (2,-4) {Peter};
\node at (2,-5.5) {Lisa};
\path[draw,dashed] (10,0)--(10,-6.5);
\path[draw,dashed] (15,0)--(15,-6.5);
\path[draw,dashed] (6,0)--(6,-6.5);
\path[draw,dashed] (12,0)--(12,-6.5);
\path[draw,dashed] (14,0)--(14,-6.5);
\path[draw,dashed] (16,0)--(16,-6.5);
\path[draw,dashed] (5,0)--(5,-6.5);
\path[draw,dashed] (13,0)--(13,-6.5);
\node at (10,-7) {$+$};
\node at (15,-7) {$-$};
\node at (6,-7) {$+$};
\node at (12,-7) {$-$};
\node at (14,-7) {$+$};
\node at (16,-7) {$-$};
\node at (5,-7) {$+$};
\node at (13,-7) {$-$};
\node at (10,-8) {$3$};
\node at (15,-8) {$1$};
\node at (6,-8) {$2$};
\node at (12,-8) {$2$};
\node at (14,-8) {$2$};
\node at (16,-8) {$0$};
\node at (5,-8) {$1$};
\node at (13,-8) {$1$};
\end{tikzpicture}
\end{center}
The symbols $+$ and $-$ indicate whether the
value of the counter increases or decreases,
and the value of the counter is shown below.
The maximum value of the counter is 3
between John's arrival and Maria's leaving.
The running time of the algorithm is $O(n \log n)$,
because sorting the events takes $O(n \log n)$ time
and the rest of the algorithm takes $O(n)$ time.
\section{Intersection points}
\index{intersection point}
Given a set of $n$ line segments, each of them being either
horizontal or vertical, consider the problem of
counting the total number of intersection points.
For example, when the line segments are
\begin{center}
\begin{tikzpicture}[scale=0.5]
\path[draw,thick,-] (0,2) -- (5,2);
\path[draw,thick,-] (1,4) -- (6,4);
\path[draw,thick,-] (6,3) -- (10,3);
\path[draw,thick,-] (2,1) -- (2,6);
\path[draw,thick,-] (8,2) -- (8,5);
\end{tikzpicture}
\end{center}
there are three intersection points:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\path[draw,thick,-] (0,2) -- (5,2);
\path[draw,thick,-] (1,4) -- (6,4);
\path[draw,thick,-] (6,3) -- (10,3);
\path[draw,thick,-] (2,1) -- (2,6);
\path[draw,thick,-] (8,2) -- (8,5);
\draw[fill] (2,2) circle [radius=0.15];
\draw[fill] (2,4) circle [radius=0.15];
\draw[fill] (8,3) circle [radius=0.15];
\end{tikzpicture}
\end{center}
It is easy to solve the problem in $O(n^2)$ time,
because we can go through all possible pairs of line segments
and check if they intersect.
However, we can solve the problem more efficiently
in $O(n \log n)$ time using a sweep line algorithm
and a range query data structure.
The idea is to process the endpoints of the line
segments from left to right and
focus on three types of events:
\begin{enumerate}[noitemsep]
\item[(1)] horizontal segment begins
\item[(2)] horizontal segment ends
\item[(3)] vertical segment
\end{enumerate}
The following events correspond to the example:
\begin{center}
\begin{tikzpicture}[scale=0.6]
\path[draw,dashed] (0,2) -- (5,2);
\path[draw,dashed] (1,4) -- (6,4);
\path[draw,dashed] (6,3) -- (10,3);
\path[draw,dashed] (2,1) -- (2,6);
\path[draw,dashed] (8,2) -- (8,5);
\node at (0,2) {$1$};
\node at (5,2) {$2$};
\node at (1,4) {$1$};
\node at (6,4) {$2$};
\node at (6,3) {$1$};
\node at (10,3) {$2$};
\node at (2,3.5) {$3$};
\node at (8,3.5) {$3$};
\end{tikzpicture}
\end{center}
We go through the events from left to right
and use a data structure that maintains a set of
y coordinates where there is an active horizontal segment.
At event 1, we add the y coordinate of the segment
to the set, and at event 2, we remove the
y coordinate from the set.
Intersection points are calculated at event 3.
When there is a vertical segment between points
$y_1$ and $y_2$, we count the number of active
horizontal segments whose y coordinate is between
$y_1$ and $y_2$, and add this number to the total
number of intersection points.
To store y coordinates of horizontal segments,
we can use a binary indexed or segment tree,
possibly with index compression.
When such structures are used, processing each event
takes $O(\log n)$ time, so the total running
time of the algorithm is $O(n \log n)$.
\section{Closest pair problem}
\index{closest pair}
Given a set of $n$ points, our next problem is
to find two points whose Euclidean distance is minimum.
For example, if the points are
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0)--(12,0)--(12,4)--(0,4)--(0,0);
\draw (1,2) circle [radius=0.1];
\draw (3,1) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5.5,1.5) circle [radius=0.1];
\draw (6,2.5) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (9,1.5) circle [radius=0.1];
\draw (10,2) circle [radius=0.1];
\draw (1.5,3.5) circle [radius=0.1];
\draw (1.5,1) circle [radius=0.1];
\draw (2.5,3) circle [radius=0.1];
\draw (4.5,1.5) circle [radius=0.1];
\draw (5.25,0.5) circle [radius=0.1];
\draw (6.5,2) circle [radius=0.1];
\end{tikzpicture}
\end{center}
\begin{samepage}
we should find the following points:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0)--(12,0)--(12,4)--(0,4)--(0,0);
\draw (1,2) circle [radius=0.1];
\draw (3,1) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5.5,1.5) circle [radius=0.1];
\draw[fill] (6,2.5) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (9,1.5) circle [radius=0.1];
\draw (10,2) circle [radius=0.1];
\draw (1.5,3.5) circle [radius=0.1];
\draw (1.5,1) circle [radius=0.1];
\draw (2.5,3) circle [radius=0.1];
\draw (4.5,1.5) circle [radius=0.1];
\draw (5.25,0.5) circle [radius=0.1];
\draw[fill] (6.5,2) circle [radius=0.1];
\end{tikzpicture}
\end{center}
\end{samepage}
This is another example of a problem
that can be solved in $O(n \log n)$ time
using a sweep line algorithm\footnote{Besides this approach,
there is also an
$O(n \log n)$ time divide-and-conquer algorithm \cite{sha75}
that divides the points into two sets and recursively
solves the problem for both sets.}.
We go through the points from left to right
and maintain a value $d$: the minimum distance
between two points seen so far.
At each point, we find the nearest point to the left.
If the distance is less than $d$, it is the
new minimum distance and we update
the value of $d$.
If the current point is $(x,y)$
and there is a point to the left
within a distance of less than $d$,
the x coordinate of such a point must
be between $[x-d,x]$ and the y coordinate
must be between $[y-d,y+d]$.
Thus, it suffices to only consider points
that are located in those ranges,
which makes the algorithm efficient.
For example, in the following picture, the
region marked with dashed lines contains
the points that can be within a distance of $d$
from the active point:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0)--(12,0)--(12,4)--(0,4)--(0,0);
\draw (1,2) circle [radius=0.1];
\draw (3,1) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5.5,1.5) circle [radius=0.1];
\draw (6,2.5) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (9,1.5) circle [radius=0.1];
\draw (10,2) circle [radius=0.1];
\draw (1.5,3.5) circle [radius=0.1];
\draw (1.5,1) circle [radius=0.1];
\draw (2.5,3) circle [radius=0.1];
\draw (4.5,1.5) circle [radius=0.1];
\draw (5.25,0.5) circle [radius=0.1];
\draw[fill] (6.5,2) circle [radius=0.1];
\draw[dashed] (6.5,0.75)--(6.5,3.25);
\draw[dashed] (5.25,0.75)--(5.25,3.25);
\draw[dashed] (5.25,0.75)--(6.5,0.75);
\draw[dashed] (5.25,3.25)--(6.5,3.25);
\draw [decoration={brace}, decorate, line width=0.3mm] (5.25,3.5) -- (6.5,3.5);
\node at (5.875,4) {$d$};
\draw [decoration={brace}, decorate, line width=0.3mm] (6.75,3.25) -- (6.75,2);
\node at (7.25,2.625) {$d$};
\end{tikzpicture}
\end{center}
The efficiency of the algorithm is based on the fact
that the region always contains
only $O(1)$ points.
We can go through those points in $O(\log n)$ time
by maintaining a set of points whose x coordinate
is between $[x-d,x]$, in increasing order according
to their y coordinates.
The time complexity of the algorithm is $O(n \log n)$,
because we go through $n$ points and
find for each point the nearest point to the left
in $O(\log n)$ time.
\section{Convex hull problem}
A \key{convex hull} is the smallest convex polygon
that contains all points of a given set.
Convexity means that a line segment between
any two vertices of the polygon is completely
inside the polygon.
\begin{samepage}
For example, for the points
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\end{tikzpicture}
\end{center}
\end{samepage}
the convex hull is as follows:
\begin{center}
\begin{tikzpicture}[scale=0.7]
\draw (0,0)--(4,-1)--(7,1)--(6,3)--(2,4)--(0,2)--(0,0);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\end{tikzpicture}
\end{center}
\index{Andrew's algorithm}
\key{Andrew's algorithm} \cite{and79} provides
an easy way to
construct the convex hull for a set of points
in $O(n \log n)$ time.
The algorithm first locates the leftmost
and rightmost points, and then
constructs the convex hull in two parts:
first the upper hull and then the lower hull.
Both parts are similar, so we can focus on
constructing the upper hull.
First, we sort the points primarily according to
x coordinates and secondarily according to y coordinates.
After this, we go through the points and
add each point to the hull.
Always after adding a point to the hull,
we make sure that the last line segment
in the hull does not turn left.
As long as it turns left, we repeatedly remove the
second last point from the hull.
The following pictures show how
Andrew's algorithm works:
\\
\begin{tabular}{ccccccc}
\\
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(1,1);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(1,1)--(2,2);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,2);
\end{tikzpicture}
\\
1 & & 2 & & 3 & & 4 \\
\end{tabular}
\\
\begin{tabular}{ccccccc}
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,2)--(2,4);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2)--(4,-1);
\end{tikzpicture}
\\
5 & & 6 & & 7 & & 8 \\
\end{tabular}
\\
\begin{tabular}{ccccccc}
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2)--(4,-1)--(4,0);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2)--(4,0);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2)--(4,0)--(4,3);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(3,2)--(4,3);
\end{tikzpicture}
\\
9 & & 10 & & 11 & & 12 \\
\end{tabular}
\\
\begin{tabular}{ccccccc}
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3)--(5,2);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3)--(5,2)--(6,1);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3)--(5,2)--(6,1)--(6,3);
\end{tikzpicture}
\\
13 & & 14 & & 15 & & 16 \\
\end{tabular}
\\
\begin{tabular}{ccccccc}
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3)--(5,2)--(6,3);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(4,3)--(6,3);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(6,3);
\end{tikzpicture}
& \hspace{0.1cm} &
\begin{tikzpicture}[scale=0.3]
\draw (-1,-2)--(8,-2)--(8,5)--(-1,5)--(-1,-2);
\draw (0,0) circle [radius=0.1];
\draw (4,-1) circle [radius=0.1];
\draw (7,1) circle [radius=0.1];
\draw (6,3) circle [radius=0.1];
\draw (2,4) circle [radius=0.1];
\draw (0,2) circle [radius=0.1];
\draw (1,1) circle [radius=0.1];
\draw (2,2) circle [radius=0.1];
\draw (3,2) circle [radius=0.1];
\draw (4,0) circle [radius=0.1];
\draw (4,3) circle [radius=0.1];
\draw (5,2) circle [radius=0.1];
\draw (6,1) circle [radius=0.1];
\draw (0,0)--(0,2)--(2,4)--(6,3)--(7,1);
\end{tikzpicture}
\\
17 & & 18 & & 19 & & 20
\end{tabular}

388
list.tex Normal file
View File

@ -0,0 +1,388 @@
\begin{thebibliography}{9}
\bibitem{aho83}
A. V. Aho, J. E. Hopcroft and J. Ullman.
\emph{Data Structures and Algorithms},
Addison-Wesley, 1983.
\bibitem{ahu91}
R. K. Ahuja and J. B. Orlin.
Distance directed augmenting path algorithms for maximum flow and parametric maximum flow problems.
\emph{Naval Research Logistics}, 38(3):413--430, 1991.
\bibitem{and79}
A. M. Andrew.
Another efficient algorithm for convex hulls in two dimensions.
\emph{Information Processing Letters}, 9(5):216--219, 1979.
\bibitem{asp79}
B. Aspvall, M. F. Plass and R. E. Tarjan.
A linear-time algorithm for testing the truth of certain quantified boolean formulas.
\emph{Information Processing Letters}, 8(3):121--123, 1979.
\bibitem{bel58}
R. Bellman.
On a routing problem.
\emph{Quarterly of Applied Mathematics}, 16(1):87--90, 1958.
\bibitem{bec07}
M. Beck, E. Pine, W. Tarrat and K. Y. Jensen.
New integer representations as the sum of three cubes.
\emph{Mathematics of Computation}, 76(259):1683--1690, 2007.
\bibitem{ben00}
M. A. Bender and M. Farach-Colton.
The LCA problem revisited. In
\emph{Latin American Symposium on Theoretical Informatics}, 88--94, 2000.
\bibitem{ben86}
J. Bentley.
\emph{Programming Pearls}.
Addison-Wesley, 1999 (2nd edition).
\bibitem{ben80}
J. Bentley and D. Wood.
An optimal worst case algorithm for reporting intersections of rectangles.
\emph{IEEE Transactions on Computers}, C-29(7):571--577, 1980.
\bibitem{bou01}
C. L. Bouton.
Nim, a game with a complete mathematical theory.
\emph{Annals of Mathematics}, 3(1/4):35--39, 1901.
% \bibitem{bur97}
% W. Burnside.
% \emph{Theory of Groups of Finite Order},
% Cambridge University Press, 1897.
\bibitem{coci}
Croatian Open Competition in Informatics, \url{http://hsin.hr/coci/}
\bibitem{cod15}
Codeforces: On ''Mo's algorithm'',
\url{http://codeforces.com/blog/entry/20032}
\bibitem{cor09}
T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein.
\emph{Introduction to Algorithms}, MIT Press, 2009 (3rd edition).
\bibitem{dij59}
E. W. Dijkstra.
A note on two problems in connexion with graphs.
\emph{Numerische Mathematik}, 1(1):269--271, 1959.
\bibitem{dik12}
K. Diks et al.
\emph{Looking for a Challenge? The Ultimate Problem Set from
the University of Warsaw Programming Competitions}, University of Warsaw, 2012.
% \bibitem{dil50}
% R. P. Dilworth.
% A decomposition theorem for partially ordered sets.
% \emph{Annals of Mathematics}, 51(1):161--166, 1950.
% \bibitem{dir52}
% G. A. Dirac.
% Some theorems on abstract graphs.
% \emph{Proceedings of the London Mathematical Society}, 3(1):69--81, 1952.
\bibitem{dim15}
M. Dima and R. Ceterchi.
Efficient range minimum queries using binary indexed trees.
\emph{Olympiad in Informatics}, 9(1):39--44, 2015.
\bibitem{edm65}
J. Edmonds.
Paths, trees, and flowers.
\emph{Canadian Journal of Mathematics}, 17(3):449--467, 1965.
\bibitem{edm72}
J. Edmonds and R. M. Karp.
Theoretical improvements in algorithmic efficiency for network flow problems.
\emph{Journal of the ACM}, 19(2):248--264, 1972.
\bibitem{eve75}
S. Even, A. Itai and A. Shamir.
On the complexity of time table and multi-commodity flow problems.
\emph{16th Annual Symposium on Foundations of Computer Science}, 184--193, 1975.
\bibitem{fan94}
D. Fanding.
A faster algorithm for shortest-path -- SPFA.
\emph{Journal of Southwest Jiaotong University}, 2, 1994.
\bibitem{fen94}
P. M. Fenwick.
A new data structure for cumulative frequency tables.
\emph{Software: Practice and Experience}, 24(3):327--336, 1994.
\bibitem{fis06}
J. Fischer and V. Heun.
Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE.
In \emph{Annual Symposium on Combinatorial Pattern Matching}, 36--48, 2006.
\bibitem{flo62}
R. W. Floyd
Algorithm 97: shortest path.
\emph{Communications of the ACM}, 5(6):345, 1962.
\bibitem{for56a}
L. R. Ford.
Network flow theory.
RAND Corporation, Santa Monica, California, 1956.
\bibitem{for56}
L. R. Ford and D. R. Fulkerson.
Maximal flow through a network.
\emph{Canadian Journal of Mathematics}, 8(3):399--404, 1956.
\bibitem{fre77}
R. Freivalds.
Probabilistic machines can use less running time.
In \emph{IFIP congress}, 839--842, 1977.
\bibitem{gal14}
F. Le Gall.
Powers of tensors and fast matrix multiplication.
In \emph{Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation},
296--303, 2014.
\bibitem{gar79}
M. R. Garey and D. S. Johnson.
\emph{Computers and Intractability:
A Guide to the Theory of NP-Completeness},
W. H. Freeman and Company, 1979.
\bibitem{goo17}
Google Code Jam Statistics (2017),
\url{https://www.go-hero.net/jam/17}
\bibitem{gro14}
A. Grønlund and S. Pettie.
Threesomes, degenerates, and love triangles.
In \emph{Proceedings of the 55th Annual Symposium on Foundations of Computer Science},
621--630, 2014.
\bibitem{gru39}
P. M. Grundy.
Mathematics and games.
\emph{Eureka}, 2(5):6--8, 1939.
\bibitem{gus97}
D. Gusfield.
\emph{Algorithms on Strings, Trees and Sequences:
Computer Science and Computational Biology},
Cambridge University Press, 1997.
% \bibitem{hal35}
% P. Hall.
% On representatives of subsets.
% \emph{Journal London Mathematical Society} 10(1):26--30, 1935.
\bibitem{hal13}
S. Halim and F. Halim.
\emph{Competitive Programming 3: The New Lower Bound of Programming Contests}, 2013.
\bibitem{hel62}
M. Held and R. M. Karp.
A dynamic programming approach to sequencing problems.
\emph{Journal of the Society for Industrial and Applied Mathematics}, 10(1):196--210, 1962.
\bibitem{hie73}
C. Hierholzer and C. Wiener.
Über die Möglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechung zu umfahren.
\emph{Mathematische Annalen}, 6(1), 30--32, 1873.
\bibitem{hoa61a}
C. A. R. Hoare.
Algorithm 64: Quicksort.
\emph{Communications of the ACM}, 4(7):321, 1961.
\bibitem{hoa61b}
C. A. R. Hoare.
Algorithm 65: Find.
\emph{Communications of the ACM}, 4(7):321--322, 1961.
\bibitem{hop71}
J. E. Hopcroft and J. D. Ullman.
A linear list merging algorithm.
Technical report, Cornell University, 1971.
\bibitem{hor74}
E. Horowitz and S. Sahni.
Computing partitions with applications to the knapsack problem.
\emph{Journal of the ACM}, 21(2):277--292, 1974.
\bibitem{huf52}
D. A. Huffman.
A method for the construction of minimum-redundancy codes.
\emph{Proceedings of the IRE}, 40(9):1098--1101, 1952.
\bibitem{iois}
The International Olympiad in Informatics Syllabus,
\url{https://people.ksp.sk/~misof/ioi-syllabus/}
\bibitem{kar87}
R. M. Karp and M. O. Rabin.
Efficient randomized pattern-matching algorithms.
\emph{IBM Journal of Research and Development}, 31(2):249--260, 1987.
\bibitem{kas61}
P. W. Kasteleyn.
The statistics of dimers on a lattice: I. The number of dimer arrangements on a quadratic lattice.
\emph{Physica}, 27(12):1209--1225, 1961.
\bibitem{ken06}
C. Kent, G. M. Landau and M. Ziv-Ukelson.
On the complexity of sparse exon assembly.
\emph{Journal of Computational Biology}, 13(5):1013--1027, 2006.
\bibitem{kle05}
J. Kleinberg and É. Tardos.
\emph{Algorithm Design}, Pearson, 2005.
\bibitem{knu982}
D. E. Knuth.
\emph{The Art of Computer Programming. Volume 2: Seminumerical Algorithms}, AddisonWesley, 1998 (3rd edition).
\bibitem{knu983}
D. E. Knuth.
\emph{The Art of Computer Programming. Volume 3: Sorting and Searching}, AddisonWesley, 1998 (2nd edition).
% \bibitem{kon31}
% D. Kőnig.
% Gráfok és mátrixok.
% \emph{Matematikai és Fizikai Lapok}, 38(1):116--119, 1931.
\bibitem{kru56}
J. B. Kruskal.
On the shortest spanning subtree of a graph and the traveling salesman problem.
\emph{Proceedings of the American Mathematical Society}, 7(1):48--50, 1956.
\bibitem{lev66}
V. I. Levenshtein.
Binary codes capable of correcting deletions, insertions, and reversals.
\emph{Soviet physics doklady}, 10(8):707--710, 1966.
\bibitem{mai84}
M. G. Main and R. J. Lorentz.
An $O(n \log n)$ algorithm for finding all repetitions in a string.
\emph{Journal of Algorithms}, 5(3):422--432, 1984.
% \bibitem{ore60}
% Ø. Ore.
% Note on Hamilton circuits.
% \emph{The American Mathematical Monthly}, 67(1):55, 1960.
\bibitem{pac13}
J. Pachocki and J. Radoszewski.
Where to use and how not to use polynomial string hashing.
\emph{Olympiads in Informatics}, 7(1):90--100, 2013.
\bibitem{par97}
I. Parberry.
An efficient algorithm for the Knight's tour problem.
\emph{Discrete Applied Mathematics}, 73(3):251--260, 1997.
% \bibitem{pic99}
% G. Pick.
% Geometrisches zur Zahlenlehre.
% \emph{Sitzungsberichte des deutschen naturwissenschaftlich-medicinischen Vereines
% für Böhmen "Lotos" in Prag. (Neue Folge)}, 19:311--319, 1899.
\bibitem{pea05}
D. Pearson.
A polynomial-time algorithm for the change-making problem.
\emph{Operations Research Letters}, 33(3):231--234, 2005.
\bibitem{pri57}
R. C. Prim.
Shortest connection networks and some generalizations.
\emph{Bell System Technical Journal}, 36(6):1389--1401, 1957.
% \bibitem{pru18}
% H. Prüfer.
% Neuer Beweis eines Satzes über Permutationen.
% \emph{Arch. Math. Phys}, 27:742--744, 1918.
\bibitem{q27}
27-Queens Puzzle: Massively Parallel Enumeration and Solution Counting.
\url{https://github.com/preusser/q27}
\bibitem{sha75}
M. I. Shamos and D. Hoey.
Closest-point problems.
In \emph{Proceedings of the 16th Annual Symposium on Foundations of Computer Science}, 151--162, 1975.
\bibitem{sha81}
M. Sharir.
A strong-connectivity algorithm and its applications in data flow analysis.
\emph{Computers \& Mathematics with Applications}, 7(1):67--72, 1981.
\bibitem{ski08}
S. S. Skiena.
\emph{The Algorithm Design Manual}, Springer, 2008 (2nd edition).
\bibitem{ski03}
S. S. Skiena and M. A. Revilla.
\emph{Programming Challenges: The Programming Contest Training Manual},
Springer, 2003.
\bibitem{main}
SZKOpuł, \texttt{https://szkopul.edu.pl/}
\bibitem{spr35}
R. Sprague.
Über mathematische Kampfspiele.
\emph{Tohoku Mathematical Journal}, 41:438--444, 1935.
\bibitem{sta06}
P. Stańczyk.
\emph{Algorytmika praktyczna w konkursach Informatycznych},
MSc thesis, University of Warsaw, 2006.
\bibitem{str69}
V. Strassen.
Gaussian elimination is not optimal.
\emph{Numerische Mathematik}, 13(4):354--356, 1969.
\bibitem{tar75}
R. E. Tarjan.
Efficiency of a good but not linear set union algorithm.
\emph{Journal of the ACM}, 22(2):215--225, 1975.
\bibitem{tar79}
R. E. Tarjan.
Applications of path compression on balanced trees.
\emph{Journal of the ACM}, 26(4):690--715, 1979.
\bibitem{tar84}
R. E. Tarjan and U. Vishkin.
Finding biconnected componemts and computing tree functions in logarithmic parallel time.
In \emph{Proceedings of the 25th Annual Symposium on Foundations of Computer Science}, 12--20, 1984.
\bibitem{tem61}
H. N. V. Temperley and M. E. Fisher.
Dimer problem in statistical mechanics -- an exact result.
\emph{Philosophical Magazine}, 6(68):1061--1063, 1961.
\bibitem{usaco}
USA Computing Olympiad, \url{http://www.usaco.org/}
\bibitem{war23}
H. C. von Warnsdorf.
\emph{Des Rösselsprunges einfachste und allgemeinste Lösung}.
Schmalkalden, 1823.
\bibitem{war62}
S. Warshall.
A theorem on boolean matrices.
\emph{Journal of the ACM}, 9(1):11--12, 1962.
% \bibitem{zec72}
% E. Zeckendorf.
% Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas.
% \emph{Bull. Soc. Roy. Sci. Liege}, 41:179--182, 1972.
\end{thebibliography}

33
preface.tex Normal file
View File

@ -0,0 +1,33 @@
\chapter*{Preface}
\markboth{\MakeUppercase{Preface}}{}
\addcontentsline{toc}{chapter}{Preface}
The purpose of this book is to give you
a thorough introduction to competitive programming.
It is assumed that you already
know the basics of programming, but no previous
background in competitive programming is needed.
The book is especially intended for
students who want to learn algorithms and
possibly participate in
the International Olympiad in Informatics (IOI) or
in the International Collegiate Programming Contest (ICPC).
Of course, the book is also suitable for
anybody else interested in competitive programming.
It takes a long time to become a good competitive
programmer, but it is also an opportunity to learn a lot.
You can be sure that you will get
a good general understanding of algorithms
if you spend time reading the book,
solving problems and taking part in contests.
The book is under continuous development.
You can always send feedback on the book to
\texttt{ahslaaks@cs.helsinki.fi}.
\begin{flushright}
Helsinki, August 2019 \\
Antti Laaksonen
\end{flushright}