2016-12-28 23:54:51 +01:00
|
|
|
\chapter{Probability}
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\index{probability}
|
|
|
|
|
|
|
|
A \key{probability} is a number between $0 \ldots 1$
|
|
|
|
that indicates how probable an event is.
|
|
|
|
If an event is certain to happen,
|
|
|
|
its probability is 1,
|
|
|
|
and if an event is impossible,
|
|
|
|
its probability is 0.
|
|
|
|
|
|
|
|
A typical example is throwing a dice,
|
|
|
|
where the result is an integer between
|
|
|
|
$1,2,\ldots,6$.
|
|
|
|
Usually it is assumed that the probability
|
|
|
|
for each result is $1/6$,
|
|
|
|
so all results have the same probability.
|
|
|
|
|
|
|
|
The probability of an event is denoted $P(\cdots)$
|
|
|
|
where the three dots are
|
|
|
|
a description of the event.
|
|
|
|
For example, when throwing a dice,
|
|
|
|
$P(\textrm{''the result is 4''})=1/6$,
|
|
|
|
$P(\textrm{''the result is not 6''})=5/6$
|
|
|
|
and $P(\textrm{''the result is even''})=1/2$.
|
|
|
|
|
|
|
|
\section{Calculation}
|
|
|
|
|
|
|
|
There are two standard ways to calculate
|
|
|
|
probabilities: combinatorial counting
|
|
|
|
and simulating a process.
|
|
|
|
As an example, let's calculate the probability
|
|
|
|
of drawing three cards with the same value
|
|
|
|
from a shuffled deck of cards
|
|
|
|
(for example, eight of spades,
|
|
|
|
eight of clubs and eight of diamonds).
|
|
|
|
|
|
|
|
\subsubsection*{Method 1}
|
|
|
|
|
|
|
|
We can calculate the probability using
|
|
|
|
the formula
|
|
|
|
|
|
|
|
\[\frac{\textrm{desired cases}}{\textrm{all cases}}.\]
|
|
|
|
|
|
|
|
In this problem, the desired cases are those
|
|
|
|
in which the value of each card is the same.
|
|
|
|
There are $13 {4 \choose 3}$ such cases,
|
|
|
|
because there are $13$ possibilities for the
|
|
|
|
value of the cards and ${4 \choose 3}$ ways to
|
|
|
|
choose $3$ suits from $4$ possible suits.
|
|
|
|
|
|
|
|
The number of all cases is ${52 \choose 3}$,
|
|
|
|
because we choose 3 cards from 52 cards.
|
|
|
|
Thus, the probability of the event is
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[\frac{13 {4 \choose 3}}{{52 \choose 3}} = \frac{1}{425}.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\subsubsection*{Method 2}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Another way to calculate the probability is
|
|
|
|
to simulate the process that generates the event.
|
|
|
|
In this case, we draw three cards, so the process
|
|
|
|
consists of three steps.
|
|
|
|
We require that each step in the process is successful.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Drawing the first card certainly succeeds,
|
|
|
|
because any card will do.
|
|
|
|
After this, the value of the cards has been fixed.
|
|
|
|
The second step succeeds with probability $3/51$,
|
|
|
|
because there are 51 cards left and 3 of them
|
|
|
|
have the same value as the first card.
|
|
|
|
Finally, the third step succeeds with probability $2/50$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
The probability that the entire process succeeds is
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[1 \cdot \frac{3}{51} \cdot \frac{2}{50} = \frac{1}{425}.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\section{Events}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
An event in probability can be represented as a set
|
2016-12-28 23:54:51 +01:00
|
|
|
\[A \subset X,\]
|
2017-01-15 12:26:21 +01:00
|
|
|
where $X$ contains all possible outcomes,
|
|
|
|
and $A$ is a subset of outcomes.
|
|
|
|
For example, when drawing a dice, the outcomes are
|
2016-12-28 23:54:51 +01:00
|
|
|
\[X = \{x_1,x_2,x_3,x_4,x_5,x_6\},\]
|
2017-01-15 12:26:21 +01:00
|
|
|
where $x_k$ means the result $k$.
|
|
|
|
Now, for example, the event ''the result is even''
|
|
|
|
corresponds to the set
|
2016-12-28 23:54:51 +01:00
|
|
|
\[A = \{x_2,x_4,x_6\}.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Each outcome $x$ is assigned a probability $p(x)$.
|
|
|
|
Furthermore, the probability $P(A)$ of an event
|
|
|
|
that corresponds to a set $A$ can be calcuted as a sum
|
|
|
|
of probabilities of outcomes using the formula
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A) = \sum_{x \in A} p(x).\]
|
2017-01-15 12:26:21 +01:00
|
|
|
For example, when throwing a dice,
|
|
|
|
$p(x)=1/6$ for each outcome $x$,
|
|
|
|
so the probability for the event
|
|
|
|
''the result is even'' is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[p(x_2)+p(x_4)+p(x_6)=1/2.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
The total probability of the outcomes in $X$ must
|
|
|
|
be 1, i.e., $P(X)=1$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Since the events in probability are sets,
|
|
|
|
we can manipulate them using standard set operations:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\begin{itemize}
|
2017-01-15 12:26:21 +01:00
|
|
|
\item The \key{complement} $\bar A$ means
|
|
|
|
''$A$ doesn't happen''.
|
|
|
|
For example, when throwing a dice,
|
|
|
|
the complement of $A=\{x_2,x_4,x_6\}$ is
|
2016-12-28 23:54:51 +01:00
|
|
|
$\bar A = \{x_1,x_3,x_5\}$.
|
2017-01-15 12:26:21 +01:00
|
|
|
\item The \key{union} $A \cup B$ means
|
|
|
|
''$A$ or $B$ happen''.
|
|
|
|
For example, the union of
|
|
|
|
$A=\{x_2,x_5\}$
|
|
|
|
and $B=\{x_4,x_5,x_6\}$ is
|
2016-12-28 23:54:51 +01:00
|
|
|
$A \cup B = \{x_2,x_4,x_5,x_6\}$.
|
2017-01-15 12:26:21 +01:00
|
|
|
\item The \key{intersection} $A \cap B$ means
|
|
|
|
''$A$ and $B$ happen''.
|
|
|
|
For example, the intersection of
|
|
|
|
$A=\{x_2,x_5\}$ and $B=\{x_4,x_5,x_6\}$ is
|
2016-12-28 23:54:51 +01:00
|
|
|
$A \cap B = \{x_5\}$.
|
|
|
|
\end{itemize}
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\subsubsection{Complement}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
The probability of the complement
|
|
|
|
$\bar A$ is calculated using the formula
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(\bar A)=1-P(A).\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Sometimes, we can solve a problem easily
|
|
|
|
using complements by solving an opposite problem.
|
|
|
|
For example, the probability of getting
|
|
|
|
at least one six when throwing a dice ten times is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[1-(5/6)^{10}.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Here $5/6$ is the probability that the result
|
|
|
|
of a single throw is not six, and
|
|
|
|
$(5/6)^{10}$ is the probability that none of
|
|
|
|
the ten throws is a six.
|
|
|
|
The complement of this is the answer for the problem.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\subsubsection{Union}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
The probability of the union $A \cup B$
|
|
|
|
is calculated using the formula
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A \cup B)=P(A)+P(B)-P(A \cap B).\]
|
2017-01-15 12:26:21 +01:00
|
|
|
For example, when throwing a dice,
|
|
|
|
the union of events
|
|
|
|
\[A=\textrm{''the result is even''}\]
|
|
|
|
and
|
|
|
|
\[B=\textrm{''the result is less than 4''}\]
|
|
|
|
is
|
|
|
|
\[A \cup B=\textrm{''the result is even or less than 4''},\]
|
|
|
|
and its probability is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A \cup B) = P(A)+P(B)-P(A \cap B)=1/2+1/2-1/6=5/6.\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
If the events $A$ and $B$ are \key{disjoint}, i.e.,
|
|
|
|
$A \cap B$ is empty,
|
|
|
|
the probability of the event $A \cup B$ is simply
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[P(A \cup B)=P(A)+P(B).\]
|
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\subsubsection{Conditional probability}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\index{conditional probability}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
The \key{conditional probability}
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A | B) = \frac{P(A \cap B)}{P(B)}\]
|
2017-01-15 12:26:21 +01:00
|
|
|
is the probability of an event $A$
|
|
|
|
assuming that an event happens.
|
|
|
|
In this case, when calculating the
|
|
|
|
probability of $A$, we only consider the outcomes
|
|
|
|
that also belong to $B$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Using the sets in the previous example,
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A | B)= 1/3,\]
|
2017-01-15 12:26:21 +01:00
|
|
|
Because the outcomes in $B$ are
|
|
|
|
$\{x_1,x_2,x_3\}$, and one of them is even.
|
|
|
|
This is the probability of an even result
|
|
|
|
if we know that the result is between $1 \ldots 3$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\subsubsection{Intersection}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
\index{independence}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 12:26:21 +01:00
|
|
|
Using conditional probability,
|
|
|
|
the probability of the intersection
|
|
|
|
$A \cap B$ can be calculated using the formula
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A \cap B)=P(A)P(B|A).\]
|
2017-01-15 12:26:21 +01:00
|
|
|
Events $A$ and $B$ are \key{independent} if
|
|
|
|
\[P(A|B)=P(A) \hspace{10px}\textrm{and}\hspace{10px} P(B|A)=P(B),\]
|
|
|
|
which means that the fact that $B$ happens doesn't
|
|
|
|
change the probability of $A$, and vice versa.
|
|
|
|
In this case, the probability of the intersection is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A \cap B)=P(A)P(B).\]
|
2017-01-15 12:26:21 +01:00
|
|
|
For example, when drawing a card from a deck, the events
|
|
|
|
\[A = \textrm{''the suit is clubs''}\]
|
|
|
|
and
|
|
|
|
\[B = \textrm{''the value is four''}\]
|
|
|
|
are independent. Hence the event
|
|
|
|
\[A \cap B = \textrm{''the card is the four of clubs''}\]
|
|
|
|
happens with probability
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(A \cap B)=P(A)P(B)=1/4 \cdot 1/13 = 1/52.\]
|
|
|
|
|
2017-01-15 13:31:56 +01:00
|
|
|
\section{Random variables}
|
2017-01-15 13:07:39 +01:00
|
|
|
|
|
|
|
\index{random variable}
|
|
|
|
|
|
|
|
A \key{random variable} is a value that is generated
|
|
|
|
by a random process.
|
|
|
|
For example, when throwing two dice,
|
|
|
|
a possible random variable is
|
|
|
|
\[X=\textrm{''the sum of the results''}.\]
|
|
|
|
For example, if the results are $(4,6)$,
|
|
|
|
then the value of $X$ is 10.
|
|
|
|
|
|
|
|
We denote $P(X=x)$ the probability that
|
|
|
|
the value of a random variable $X$ is $x$.
|
|
|
|
In the previous example, $P(X=10)=3/36$,
|
|
|
|
because the total number of results is 36,
|
|
|
|
and the possible ways to obtain the sum 10 are
|
|
|
|
$(4,6)$, $(5,5)$ and $(6,4)$.
|
|
|
|
|
|
|
|
\subsubsection{Expected value}
|
|
|
|
|
|
|
|
\index{expected value}
|
|
|
|
|
|
|
|
The \key{expected value} $E[X]$ indicates the
|
|
|
|
average value of a random variable $X$.
|
|
|
|
The expected value can be calculated as the sum
|
2016-12-28 23:54:51 +01:00
|
|
|
\[\sum_x P(X=x)x,\]
|
2017-01-15 13:07:39 +01:00
|
|
|
where $x$ goes through all possible results
|
|
|
|
for $X$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
For example, when throwing a dice,
|
|
|
|
the expected value is
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[1/6 \cdot 1 + 1/6 \cdot 2 + 1/6 \cdot 3 + 1/6 \cdot 4 + 1/6 \cdot 5 + 1/6 \cdot 6 = 7/2.\]
|
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
A useful property of expected values is \key{linearity}.
|
|
|
|
It means that the sum
|
|
|
|
$E[X_1+X_2+\cdots+X_n]$
|
|
|
|
always equals the sum
|
|
|
|
$E[X_1]+E[X_2]+\cdots+E[X_n]$.
|
|
|
|
This formula holds even if random variables
|
|
|
|
depend on each other.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
For example, when throwing two dice,
|
|
|
|
the expected value of their sum is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[E[X_1+X_2]=E[X_1]+E[X_2]=7/2+7/2=7.\]
|
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
Let's now consider a problem where
|
|
|
|
$n$ balls are randomly placed in $n$ boxes,
|
|
|
|
and our task is to calculate the expected
|
|
|
|
number of empty boxes.
|
|
|
|
Each ball has an equal probability to
|
|
|
|
be placed in any of the boxes.
|
|
|
|
For example, if $n=2$, the possibilities
|
|
|
|
are as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
\begin{center}
|
|
|
|
\begin{tikzpicture}
|
|
|
|
\draw (0,0) rectangle (1,1);
|
|
|
|
\draw (1.2,0) rectangle (2.2,1);
|
|
|
|
\draw (3,0) rectangle (4,1);
|
|
|
|
\draw (4.2,0) rectangle (5.2,1);
|
|
|
|
\draw (6,0) rectangle (7,1);
|
|
|
|
\draw (7.2,0) rectangle (8.2,1);
|
|
|
|
\draw (9,0) rectangle (10,1);
|
|
|
|
\draw (10.2,0) rectangle (11.2,1);
|
|
|
|
|
|
|
|
\draw[fill=blue] (0.5,0.2) circle (0.1);
|
|
|
|
\draw[fill=red] (1.7,0.2) circle (0.1);
|
|
|
|
\draw[fill=red] (3.5,0.2) circle (0.1);
|
|
|
|
\draw[fill=blue] (4.7,0.2) circle (0.1);
|
|
|
|
\draw[fill=blue] (6.25,0.2) circle (0.1);
|
|
|
|
\draw[fill=red] (6.75,0.2) circle (0.1);
|
|
|
|
\draw[fill=blue] (10.45,0.2) circle (0.1);
|
|
|
|
\draw[fill=red] (10.95,0.2) circle (0.1);
|
|
|
|
\end{tikzpicture}
|
|
|
|
\end{center}
|
2017-01-15 13:07:39 +01:00
|
|
|
In this case, the expected number of
|
|
|
|
empty boxes is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[\frac{0+0+1+1}{4} = \frac{1}{2}.\]
|
2017-01-15 13:07:39 +01:00
|
|
|
In the general case, the probability that a
|
|
|
|
single box is empty is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[\Big(\frac{n-1}{n}\Big)^n,\]
|
2017-01-15 13:07:39 +01:00
|
|
|
because no ball should be placed in it.
|
|
|
|
Hence, using linearity, the expected number of
|
|
|
|
empty boxes is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[n \cdot \Big(\frac{n-1}{n}\Big)^n.\]
|
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
\subsubsection{Distributions}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
\index{distribution}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
The \key{distribution} of a random variable $X$
|
|
|
|
shows the probability for each value that
|
|
|
|
the random variable may have.
|
|
|
|
The distribution consists of values $P(X=x)$.
|
|
|
|
For example, when throwing two dice,
|
|
|
|
the distribution for their sum is:
|
2016-12-28 23:54:51 +01:00
|
|
|
\begin{center}
|
|
|
|
\small {
|
|
|
|
\begin{tabular}{r|rrrrrrrrrrrrr}
|
|
|
|
$x$ & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\
|
|
|
|
$P(X=x)$ & $1/36$ & $2/36$ & $3/36$ & $4/36$ & $5/36$ & $6/36$ & $5/36$ & $4/36$ & $3/36$ & $2/36$ & $1/36$ \\
|
|
|
|
\end{tabular}
|
|
|
|
}
|
|
|
|
\end{center}
|
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
Next, we will discuss three distributions that
|
|
|
|
often arise in applications.
|
|
|
|
|
|
|
|
\index{uniform distribution}
|
|
|
|
~\\\\
|
|
|
|
In a \key{uniform distribution},
|
|
|
|
the value of a random variable is
|
|
|
|
between $a \ldots b$, and the probability
|
|
|
|
for each value is the same.
|
|
|
|
For example, throwing a dice generates
|
|
|
|
a uniform distribution where
|
|
|
|
$P(X=x)=1/6$ when $x=1,2,\ldots,6$.
|
|
|
|
|
|
|
|
The expected value for $X$ in a uniform distribution is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[E[X] = \frac{a+b}{2}.\]
|
2017-01-15 13:07:39 +01:00
|
|
|
\index{binomial distribution}
|
2016-12-28 23:54:51 +01:00
|
|
|
~\\
|
2017-01-15 13:07:39 +01:00
|
|
|
In a \key{binomial distribution}, $n$ attempts
|
|
|
|
are done
|
|
|
|
and the probability that a single attempt succeeds
|
|
|
|
is $p$.
|
|
|
|
The random variable $X$ counts the number of
|
|
|
|
successful attempts,
|
|
|
|
and the probability for a value $x$ is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(X=x)=p^x (1-p)^{n-x} {n \choose x},\]
|
2017-01-15 13:07:39 +01:00
|
|
|
where $p^x$ and $(1-p)^{n-x}$ correspond to
|
|
|
|
successful and unsuccessful attemps,
|
|
|
|
and ${n \choose x}$ is the number of ways
|
|
|
|
we can choose the order of the attempts.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
For example, when throwing a dice ten times,
|
|
|
|
the probability of throwing a six exactly
|
|
|
|
three times is $(1/6)^3 (5/6)^7 {10 \choose 3}$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
The expected value for $X$ in a binomial distribution is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[E[X] = pn.\]
|
2017-01-15 13:07:39 +01:00
|
|
|
\index{geometric distribution}
|
2016-12-28 23:54:51 +01:00
|
|
|
~\\
|
2017-01-15 13:07:39 +01:00
|
|
|
In a \key{geometric distribution},
|
|
|
|
the probability that an attempt succeeds is $p$,
|
|
|
|
and we do attempts until the first success happens.
|
|
|
|
The random variable $X$ counts the number
|
|
|
|
of attempts needed, and the probability for
|
|
|
|
a value $x$ is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[P(X=x)=(1-p)^{x-1} p,\]
|
2017-01-15 13:07:39 +01:00
|
|
|
where $(1-p)^{x-1}$ corresponds to unsuccessful attemps
|
|
|
|
and $p$ corresponds to the first successful attempt.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
For example, if we throw a dice until we throw a six,
|
|
|
|
the probability that the number of throws
|
|
|
|
is exactly 4 is $(5/6)^3 1/6$.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 13:07:39 +01:00
|
|
|
The expected value for $X$ in a geometric distribution is
|
2016-12-28 23:54:51 +01:00
|
|
|
\[E[X]=\frac{1}{p}.\]
|
|
|
|
|
2017-01-15 13:31:56 +01:00
|
|
|
\section{Markov chains}
|
|
|
|
|
|
|
|
\index{Markov chain}
|
|
|
|
|
|
|
|
A \key{Markov chain} is a random process
|
|
|
|
that consists of states and transitions between them.
|
|
|
|
For each state, we know the probabilities
|
|
|
|
for moving to other states.
|
|
|
|
A Markov chain can be represented as a graph
|
|
|
|
whose nodes are states and edges are transitions.
|
|
|
|
|
|
|
|
As an example, let's consider a problem
|
|
|
|
where we are in floor 1 in a $n$ floor building.
|
|
|
|
At each step, we randomly walk either one floor
|
|
|
|
up or one floor down, except that we always
|
|
|
|
walk one floor up from floor 1 and one floor down
|
|
|
|
from floor $n$.
|
|
|
|
What is the probability that we are in floor $m$
|
|
|
|
after $k$ steps?
|
|
|
|
|
|
|
|
In this problem, each floor of the building
|
|
|
|
corresponds to a state in a Markov chain.
|
|
|
|
For example, if $n=5$, the graph is as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
\begin{tikzpicture}[scale=0.9]
|
|
|
|
\node[draw, circle] (1) at (0,0) {$1$};
|
|
|
|
\node[draw, circle] (2) at (2,0) {$2$};
|
|
|
|
\node[draw, circle] (3) at (4,0) {$3$};
|
|
|
|
\node[draw, circle] (4) at (6,0) {$4$};
|
|
|
|
\node[draw, circle] (5) at (8,0) {$5$};
|
|
|
|
|
|
|
|
\path[draw,thick,->] (1) edge [bend left=40] node[font=\small,label=$1$] {} (2);
|
|
|
|
\path[draw,thick,->] (2) edge [bend left=40] node[font=\small,label=$1/2$] {} (3);
|
|
|
|
\path[draw,thick,->] (3) edge [bend left=40] node[font=\small,label=$1/2$] {} (4);
|
|
|
|
\path[draw,thick,->] (4) edge [bend left=40] node[font=\small,label=$1/2$] {} (5);
|
|
|
|
|
|
|
|
\path[draw,thick,->] (5) edge [bend left=40] node[font=\small,label=below:$1$] {} (4);
|
|
|
|
\path[draw,thick,->] (4) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (3);
|
|
|
|
\path[draw,thick,->] (3) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (2);
|
|
|
|
\path[draw,thick,->] (2) edge [bend left=40] node[font=\small,label=below:$1/2$] {} (1);
|
|
|
|
|
|
|
|
%\path[draw,thick,->] (1) edge [bend left=40] node[font=\small,label=below:$1$] {} (2);
|
|
|
|
\end{tikzpicture}
|
|
|
|
\end{center}
|
|
|
|
|
2017-01-15 13:31:56 +01:00
|
|
|
The probability distribution
|
|
|
|
of a Markov chain is a vector
|
|
|
|
$[p_1,p_2,\ldots,p_n]$, where $p_k$ is the
|
|
|
|
probability that the current state is $k$.
|
|
|
|
The formula $p_1+p_2+\cdots+p_n=1$ always holds.
|
|
|
|
|
|
|
|
In the example, the initial distribution is
|
|
|
|
$[1,0,0,0,0]$, because we always begin at floor 1.
|
|
|
|
The next distribution is $[0,1,0,0,0]$,
|
|
|
|
because we can only move from floor 1 to floor 2.
|
|
|
|
After this, we can either move one floor up
|
|
|
|
or one floor down, so the next distribution is
|
|
|
|
$[1/2,0,1/2,0,0]$, etc.
|
|
|
|
|
|
|
|
An efficient way to simulate the walk in
|
|
|
|
a Markov chain is to use dynaimc programming.
|
|
|
|
The idea is to maintain the probability distribution
|
|
|
|
and at each step go through all possibilities
|
|
|
|
how we can move.
|
|
|
|
Using this method, we can simulate $m$ steps
|
|
|
|
in $O(n^2 m)$ time.
|
|
|
|
|
|
|
|
The transitions of a Markov chain can also be
|
|
|
|
represented as a matrix that updates the
|
|
|
|
probability distribution.
|
|
|
|
In this case, the matrix is
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[
|
|
|
|
\begin{bmatrix}
|
|
|
|
0 & 1/2 & 0 & 0 & 0 \\
|
|
|
|
1 & 0 & 1/2 & 0 & 0 \\
|
|
|
|
0 & 1/2 & 0 & 1/2 & 0 \\
|
|
|
|
0 & 0 & 1/2 & 0 & 1 \\
|
|
|
|
0 & 0 & 0 & 1/2 & 0 \\
|
|
|
|
\end{bmatrix}.
|
|
|
|
\]
|
|
|
|
|
2017-01-15 13:31:56 +01:00
|
|
|
When we multiply a probability distribution by this matrix,
|
|
|
|
we get the new distribution after moving one step.
|
|
|
|
For example, we can move from the distribution
|
|
|
|
$[1,0,0,0,0]$ to the distribution
|
|
|
|
$[0,1,0,0,0]$ as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
\[
|
|
|
|
\begin{bmatrix}
|
|
|
|
0 & 1/2 & 0 & 0 & 0 \\
|
|
|
|
1 & 0 & 1/2 & 0 & 0 \\
|
|
|
|
0 & 1/2 & 0 & 1/2 & 0 \\
|
|
|
|
0 & 0 & 1/2 & 0 & 1 \\
|
|
|
|
0 & 0 & 0 & 1/2 & 0 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
\begin{bmatrix}
|
|
|
|
1 \\
|
|
|
|
0 \\
|
|
|
|
0 \\
|
|
|
|
0 \\
|
|
|
|
0 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
=
|
|
|
|
\begin{bmatrix}
|
|
|
|
0 \\
|
|
|
|
1 \\
|
|
|
|
0 \\
|
|
|
|
0 \\
|
|
|
|
0 \\
|
|
|
|
\end{bmatrix}.
|
|
|
|
\]
|
|
|
|
|
2017-01-15 13:31:56 +01:00
|
|
|
By calculating matrix powers efficiently,
|
|
|
|
we can calculate in $O(n^3 \log m)$ time
|
|
|
|
the distribution after $m$ steps.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
2017-01-15 14:45:45 +01:00
|
|
|
\section{Randomized algorithms}
|
|
|
|
|
|
|
|
\index{randomized algorithm}
|
|
|
|
|
|
|
|
Sometimes we can use randomness for solving a problem,
|
|
|
|
even if the problem is not related to random events.
|
|
|
|
A \key{randomized algorithm} is an algorithm that
|
|
|
|
is based on randomness.
|
|
|
|
|
|
|
|
\index{Monte Carlo algorithm}
|
|
|
|
|
|
|
|
A \key{Monte Carlo algorithm} is a randomized algorithm
|
|
|
|
that may sometimes give a wrong answer.
|
|
|
|
For such an algorithm to be useful,
|
|
|
|
the probability of a wrong answer should be small.
|
|
|
|
|
|
|
|
\index{Las Vegas algorithm}
|
|
|
|
|
|
|
|
A \key{Las Vegas algorithm} is a randomized algorithm
|
|
|
|
that always gives the correct answer,
|
|
|
|
but its running time varies randomly.
|
|
|
|
The goal is to design an algorithm that is
|
|
|
|
efficient with high probability.
|
|
|
|
|
|
|
|
Next we will go through three example problems that
|
|
|
|
can be solved using randomness.
|
|
|
|
|
|
|
|
\subsubsection{Order statistics}
|
|
|
|
|
|
|
|
\index{order statistic}
|
|
|
|
|
|
|
|
The $kth$ \key{order statistic} of an array
|
|
|
|
is the element at index $k$ after sorting
|
|
|
|
the array in increasing order.
|
|
|
|
It's easy to calculate any order statistic
|
|
|
|
in $O(n \log n)$ time by sorting the array,
|
|
|
|
but is it really needed to sort the whole array
|
|
|
|
to just find one element?
|
|
|
|
|
|
|
|
It turns out that we can find order statistics
|
|
|
|
using a randomized algorithm without sorting the array.
|
2017-01-15 14:49:08 +01:00
|
|
|
The algorithm is a Las Vegas algorithm:
|
2017-01-15 14:45:45 +01:00
|
|
|
its running time is usually $O(n)$,
|
|
|
|
but $O(n^2)$ in the worst case.
|
|
|
|
|
|
|
|
The algorithm chooses a random element $x$
|
|
|
|
in the array, and moves elements smaller than $x$
|
|
|
|
to the left part of the array,
|
|
|
|
and the other elements to the right part of the array.
|
|
|
|
This takes $O(n)$ time when there are $n$ elements.
|
|
|
|
Assume that the left part contains $a$ elements
|
|
|
|
and the right part contains $b$ elements.
|
|
|
|
If $a=k-1$, element $x$ is the $k$th order statistic.
|
|
|
|
Otherwise, if $a>k-1$, we recursively find the $k$th order
|
|
|
|
statistic for the left part,
|
|
|
|
and if $a<k-1$, we recursively find the $r$th order
|
|
|
|
statistic for the right part where $r=k-a-1$.
|
|
|
|
The search continues like this, until the element
|
|
|
|
has been found.
|
|
|
|
|
|
|
|
When each element $x$ is randomly chosen,
|
|
|
|
the size of the array about halves at each step,
|
|
|
|
so the time complexity for
|
|
|
|
finding the $k$th order statistic is about
|
2016-12-28 23:54:51 +01:00
|
|
|
\[n+n/2+n/4+n/8+\cdots=O(n).\]
|
|
|
|
|
2017-01-15 14:45:45 +01:00
|
|
|
The worst case for the algorithm is still $O(n^2)$,
|
|
|
|
because it is possible that $x$ is always chosen
|
|
|
|
in such a way that it's the smallest or largest
|
|
|
|
element in the array.
|
|
|
|
In this case, the size of the array decreases
|
|
|
|
only by one at each step.
|
|
|
|
However, the probability for this is so small
|
|
|
|
that this never happens in practice.
|
|
|
|
|
|
|
|
\subsubsection{Verifying matrix multiplication}
|
|
|
|
|
|
|
|
\index{matrix multiplication}
|
|
|
|
|
|
|
|
Our next problem is to \emph{verify}
|
|
|
|
if $AB=C$ holds when $A$, $B$ and $C$
|
|
|
|
are matrices of size $n \times n$.
|
|
|
|
Of course, we can solve the problem
|
|
|
|
by calculating the product $AB$ again
|
|
|
|
(in $O(n^3)$ time using the basic algorithm),
|
|
|
|
but one could hope that verifying the
|
|
|
|
answer would by easier than to calculate it again.
|
|
|
|
|
|
|
|
It turns out that we can solve the problem
|
|
|
|
using a Monte Carlo algorithm whose
|
|
|
|
time complexity is only $O(n^2)$.
|
|
|
|
The idea is simple: we choose a random vector
|
|
|
|
$X$ of $n$ elements, and calculate the matrices
|
|
|
|
$ABX$ and $CX$. If $ABX=CX$, we report that $AB=C$,
|
|
|
|
and otherwise we report that $AB \neq C$.
|
|
|
|
|
|
|
|
The time complexity of the algorithm is
|
|
|
|
$O(n^2)$, because we can calculate the matrices
|
|
|
|
$ABX$ and $CX$ in $O(n^2)$ time.
|
|
|
|
We can calculate the matrix $ABX$ efficiently
|
|
|
|
using the representation $A(BX)$, so only two
|
|
|
|
multiplications of $n \times n$ and $n \times 1$
|
|
|
|
size matrices are needed.
|
|
|
|
|
|
|
|
The weakness in the algorithm is
|
|
|
|
that there is a small chance that the algorithm
|
|
|
|
makes a mistake when it reports that $AB=C$.
|
|
|
|
For example,
|
2016-12-28 23:54:51 +01:00
|
|
|
\[
|
|
|
|
\begin{bmatrix}
|
|
|
|
2 & 4 \\
|
|
|
|
1 & 6 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
\neq
|
|
|
|
\begin{bmatrix}
|
|
|
|
0 & 5 \\
|
|
|
|
7 & 4 \\
|
|
|
|
\end{bmatrix},
|
|
|
|
\]
|
2017-01-15 14:45:45 +01:00
|
|
|
but
|
2016-12-28 23:54:51 +01:00
|
|
|
\[
|
|
|
|
\begin{bmatrix}
|
|
|
|
2 & 4 \\
|
|
|
|
1 & 6 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
\begin{bmatrix}
|
|
|
|
1 \\
|
|
|
|
3 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
=
|
|
|
|
\begin{bmatrix}
|
|
|
|
0 & 5 \\
|
|
|
|
7 & 4 \\
|
|
|
|
\end{bmatrix}
|
|
|
|
\begin{bmatrix}
|
|
|
|
1 \\
|
|
|
|
3 \\
|
|
|
|
\end{bmatrix}.
|
|
|
|
\]
|
2017-01-15 14:45:45 +01:00
|
|
|
However, in practice, the probability that the
|
|
|
|
algorithm makes a mistake is small,
|
|
|
|
and we can decrease the probability by
|
|
|
|
verifying the result using multiple random vectors $X$
|
|
|
|
before reporting the answer $AB=C$.
|
|
|
|
|
|
|
|
\subsubsection{Graph coloring}
|
|
|
|
|
|
|
|
\index{coloring}
|
|
|
|
|
|
|
|
Given a graph that contains $n$ nodes and $m$ edges,
|
|
|
|
our task is to find a way to color the nodes
|
|
|
|
of the graph using two colors so that
|
|
|
|
for at least $m/2$ edges, the end nodes
|
|
|
|
have different colors.
|
|
|
|
For example, in the graph
|
2016-12-28 23:54:51 +01:00
|
|
|
\begin{center}
|
|
|
|
\begin{tikzpicture}[scale=0.9]
|
|
|
|
\node[draw, circle] (1) at (1,3) {$1$};
|
|
|
|
\node[draw, circle] (2) at (4,3) {$2$};
|
|
|
|
\node[draw, circle] (3) at (1,1) {$3$};
|
|
|
|
\node[draw, circle] (4) at (4,1) {$4$};
|
|
|
|
\node[draw, circle] (5) at (6,2) {$5$};
|
|
|
|
|
|
|
|
\path[draw,thick,-] (1) -- (2);
|
|
|
|
\path[draw,thick,-] (1) -- (3);
|
|
|
|
\path[draw,thick,-] (1) -- (4);
|
|
|
|
\path[draw,thick,-] (3) -- (4);
|
|
|
|
\path[draw,thick,-] (2) -- (4);
|
|
|
|
\path[draw,thick,-] (2) -- (5);
|
|
|
|
\path[draw,thick,-] (4) -- (5);
|
|
|
|
\end{tikzpicture}
|
|
|
|
\end{center}
|
2017-01-15 14:45:45 +01:00
|
|
|
a valid coloring is as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
\begin{center}
|
|
|
|
\begin{tikzpicture}[scale=0.9]
|
|
|
|
\node[draw, circle, fill=blue!40] (1) at (1,3) {$1$};
|
|
|
|
\node[draw, circle, fill=red!40] (2) at (4,3) {$2$};
|
|
|
|
\node[draw, circle, fill=red!40] (3) at (1,1) {$3$};
|
|
|
|
\node[draw, circle, fill=blue!40] (4) at (4,1) {$4$};
|
|
|
|
\node[draw, circle, fill=blue!40] (5) at (6,2) {$5$};
|
|
|
|
|
|
|
|
\path[draw,thick,-] (1) -- (2);
|
|
|
|
\path[draw,thick,-] (1) -- (3);
|
|
|
|
\path[draw,thick,-] (1) -- (4);
|
|
|
|
\path[draw,thick,-] (3) -- (4);
|
|
|
|
\path[draw,thick,-] (2) -- (4);
|
|
|
|
\path[draw,thick,-] (2) -- (5);
|
|
|
|
\path[draw,thick,-] (4) -- (5);
|
|
|
|
\end{tikzpicture}
|
|
|
|
\end{center}
|
2017-01-15 14:45:45 +01:00
|
|
|
The above graph contains 7 edges, and for 5 of them,
|
|
|
|
the end nodes have different colors,
|
|
|
|
so the coloring is valid.
|
|
|
|
|
|
|
|
The problem can be solved using a Las Vegas algorithm
|
|
|
|
that generates random colorings until a valid coloring
|
|
|
|
has been found.
|
|
|
|
In a random coloring, the color of each node is
|
|
|
|
independently chosen so that the probability of
|
|
|
|
both colors is $1/2$.
|
|
|
|
|
|
|
|
In a random coloring, the probability that the end nodes
|
|
|
|
of a single edge have different colors is $1/2$.
|
|
|
|
Hence, the expected number of edges whose end nodes
|
|
|
|
have different colors is $1/2 \cdot m = m/2$.
|
|
|
|
Since it is excepted that a random coloring is valid,
|
|
|
|
we'll find a valid coloring quickly in practice.
|
2016-12-28 23:54:51 +01:00
|
|
|
|