cphb/chapter15.tex

712 lines
25 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Spanning trees}
2017-01-08 12:28:52 +01:00
\index{spanning tree}
2017-02-05 23:44:42 +01:00
A \key{spanning tree} of a graph consists of
2017-12-10 11:06:32 +01:00
all nodes of the graph and some of the
2017-02-17 21:13:30 +01:00
edges of the graph so that there is a path
2017-02-05 23:44:42 +01:00
between any two nodes.
Like trees in general, spanning trees are
2017-01-08 12:28:52 +01:00
connected and acyclic.
2017-02-05 23:44:42 +01:00
Usually there are several ways to construct a spanning tree.
2017-01-08 12:28:52 +01:00
2017-02-05 23:44:42 +01:00
For example, consider the following graph:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-05-29 19:22:28 +02:00
One spanning tree for the graph is as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-05-07 20:18:56 +02:00
The weight of a spanning tree is the sum of its edge weights.
2017-01-08 12:28:52 +01:00
For example, the weight of the above spanning tree is
$3+5+9+3+2=22$.
2016-12-28 23:54:51 +01:00
2017-01-08 12:28:52 +01:00
\index{minimum spanning tree}
A \key{minimum spanning tree}
is a spanning tree whose weight is as small as possible.
2017-02-17 21:13:30 +01:00
The weight of a minimum spanning tree for the example graph
2017-02-05 23:44:42 +01:00
is 20, and such a tree can be constructed as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-01-08 12:28:52 +01:00
\index{maximum spanning tree}
2017-02-05 23:44:42 +01:00
In a similar way, a \key{maximum spanning tree}
2017-01-08 12:28:52 +01:00
is a spanning tree whose weight is as large as possible.
The weight of a maximum spanning tree for the
2017-02-17 21:13:30 +01:00
example graph is 32:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\end{tikzpicture}
\end{center}
2017-05-07 20:18:56 +02:00
Note that a graph may have several
minimum and maximum spanning trees,
2017-01-08 12:28:52 +01:00
so the trees are not unique.
2017-05-29 19:22:28 +02:00
It turns out that several greedy methods
can be used to construct minimum and maximum
spanning trees.
In this chapter, we discuss two algorithms
that process
2017-02-05 23:44:42 +01:00
the edges of the graph ordered by their weights.
2017-05-29 19:22:28 +02:00
We focus on finding minimum spanning trees,
but the same algorithms can find
2017-02-05 23:44:42 +01:00
maximum spanning trees by processing the edges in reverse order.
2017-01-08 12:28:52 +01:00
\section{Kruskal's algorithm}
\index{Kruskal's algorithm}
2017-02-25 16:57:10 +01:00
In \key{Kruskal's algorithm}\footnote{The algorithm was published in 1956
by J. B. Kruskal \cite{kru56}.}, the initial spanning tree
2017-02-05 23:44:42 +01:00
only contains the nodes of the graph
and does not contain any edges.
Then the algorithm goes through the edges
ordered by their weights, and always adds an edge
to the tree if it does not create a cycle.
The algorithm maintains the components
of the tree.
2017-01-08 12:28:52 +01:00
Initially, each node of the graph
2017-02-05 23:44:42 +01:00
belongs to a separate component.
Always when an edge is added to the tree,
two components are joined.
Finally, all nodes belong to the same component,
2017-01-08 12:28:52 +01:00
and a minimum spanning tree has been found.
\subsubsection{Example}
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-02-05 23:44:42 +01:00
Let us consider how Kruskal's algorithm processes the
2017-01-08 12:28:52 +01:00
following graph:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
\begin{samepage}
2017-05-07 20:18:56 +02:00
The first step of the algorithm is to sort the
2017-01-08 12:28:52 +01:00
edges in increasing order of their weights.
The result is the following list:
2016-12-28 23:54:51 +01:00
\begin{tabular}{ll}
\\
2017-01-08 12:28:52 +01:00
edge & weight \\
2016-12-28 23:54:51 +01:00
\hline
5--6 & 2 \\
1--2 & 3 \\
3--6 & 3 \\
1--5 & 5 \\
2--3 & 5 \\
2--5 & 6 \\
4--6 & 7 \\
3--4 & 9 \\
\\
\end{tabular}
\end{samepage}
2017-01-08 12:28:52 +01:00
After this, the algorithm goes through the list
2017-02-05 23:44:42 +01:00
and adds each edge to the tree if it joins
2017-01-08 12:28:52 +01:00
two separate components.
2016-12-28 23:54:51 +01:00
2017-01-08 12:28:52 +01:00
Initially, each node is in its own component:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-01-08 12:28:52 +01:00
The first edge to be added to the tree is
2017-05-07 20:18:56 +02:00
the edge 5--6 that creates a component $\{5,6\}$
2017-02-05 23:44:42 +01:00
by joining the components $\{5\}$ and $\{6\}$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
After this, the edges 1--2, 3--6 and 1--5 are added in a similar way:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
After those steps, most components have been joined
2017-01-08 12:28:52 +01:00
and there are two components in the tree:
$\{1,2,3,5,6\}$ and $\{4\}$.
2016-12-28 23:54:51 +01:00
2017-02-05 23:44:42 +01:00
The next edge in the list is the edge 2--3,
but it will not be included in the tree, because
2017-01-08 12:28:52 +01:00
nodes 2 and 3 are already in the same component.
2017-02-05 23:44:42 +01:00
For the same reason, the edge 2--5 will not be included in the tree.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2017-02-05 23:44:42 +01:00
Finally, the edge 4--6 will be included in the tree:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-02-05 23:44:42 +01:00
After this, the algorithm will not add any
new edges, because the graph is connected
and there is a path between any two nodes.
2017-01-08 12:28:52 +01:00
The resulting graph is a minimum spanning tree
with weight $2+3+3+5+7=20$.
\subsubsection{Why does this work?}
2017-02-05 23:44:42 +01:00
It is a good question why Kruskal's algorithm works.
2017-01-08 12:28:52 +01:00
Why does the greedy strategy guarantee that we
will find a minimum spanning tree?
2017-02-05 23:44:42 +01:00
Let us see what happens if the minimum weight edge of
2017-05-07 20:18:56 +02:00
the graph is \emph{not} included in the spanning tree.
2017-02-05 23:44:42 +01:00
For example, suppose that a spanning tree
2017-02-17 21:13:30 +01:00
for the previous graph would not contain the
2017-02-05 23:44:42 +01:00
minimum weight edge 5--6.
We do not know the exact structure of such a spanning tree,
but in any case it has to contain some edges.
2017-01-08 12:28:52 +01:00
Assume that the tree would be as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-,dashed] (1) -- (2);
\path[draw,thick,-,dashed] (2) -- (5);
\path[draw,thick,-,dashed] (2) -- (3);
\path[draw,thick,-,dashed] (3) -- (4);
\path[draw,thick,-,dashed] (4) -- (6);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
However, it is not possible that the above tree
would be a minimum spanning tree for the graph.
2017-01-08 12:28:52 +01:00
The reason for this is that we can remove an edge
2017-02-05 23:44:42 +01:00
from the tree and replace it with the minimum weight edge 5--6.
2017-01-08 12:28:52 +01:00
This produces a spanning tree whose weight is
\emph{smaller}:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-,dashed] (1) -- (2);
\path[draw,thick,-,dashed] (2) -- (5);
\path[draw,thick,-,dashed] (3) -- (4);
\path[draw,thick,-,dashed] (4) -- (6);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
For this reason, it is always optimal
to include the minimum weight edge
in the tree to produce a minimum spanning tree.
Using a similar argument, we can show that it
is also optimal to add the next edge in weight order
to the tree, and so on.
Hence, Kruskal's algorithm works correctly and
2017-01-08 12:28:52 +01:00
always produces a minimum spanning tree.
2016-12-28 23:54:51 +01:00
2017-01-08 12:28:52 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-02-05 23:44:42 +01:00
When implementing Kruskal's algorithm,
2017-05-07 20:18:56 +02:00
it is convenient to use
the edge list representation of the graph.
2017-01-08 12:28:52 +01:00
The first phase of the algorithm sorts the
2017-02-05 23:44:42 +01:00
edges in the list in $O(m \log m)$ time.
2017-01-08 12:28:52 +01:00
After this, the second phase of the algorithm
2017-02-05 23:44:42 +01:00
builds the minimum spanning tree as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (...) {
2017-03-12 09:13:29 +01:00
if (!same(a,b)) unite(a,b);
2016-12-28 23:54:51 +01:00
}
\end{lstlisting}
2017-01-08 12:28:52 +01:00
The loop goes through the edges in the list
and always processes an edge $a$--$b$
where $a$ and $b$ are two nodes.
2017-02-17 21:13:30 +01:00
Two functions are needed:
2017-01-08 12:28:52 +01:00
the function \texttt{same} determines
2017-05-07 20:18:56 +02:00
if $a$ and $b$ are in the same component,
2017-03-12 09:13:29 +01:00
and the function \texttt{unite}
2017-05-07 20:18:56 +02:00
joins the components that contain $a$ and $b$.
2017-01-08 12:28:52 +01:00
The problem is how to efficiently implement
2017-03-12 09:13:29 +01:00
the functions \texttt{same} and \texttt{unite}.
2017-02-05 23:44:42 +01:00
One possibility is to implement the function
2017-02-17 21:13:30 +01:00
\texttt{same} as a graph traversal and check if
we can get from node $a$ to node $b$.
2017-02-05 23:44:42 +01:00
However, the time complexity of such a function
2017-02-17 21:13:30 +01:00
would be $O(n+m)$
2017-02-05 23:44:42 +01:00
and the resulting algorithm would be slow,
because the function \texttt{same} will be called for each edge in the graph.
2017-01-08 12:28:52 +01:00
We will solve the problem using a union-find structure
2017-02-05 23:44:42 +01:00
that implements both functions in $O(\log n)$ time.
2017-01-08 12:28:52 +01:00
Thus, the time complexity of Kruskal's algorithm
2017-01-08 13:45:46 +01:00
will be $O(m \log n)$ after sorting the edge list.
\section{Union-find structure}
\index{union-find structure}
2017-02-05 23:44:42 +01:00
A \key{union-find structure} maintains
2017-01-08 13:45:46 +01:00
a collection of sets.
The sets are disjoint, so no element
belongs to more than one set.
2017-02-05 23:44:42 +01:00
Two $O(\log n)$ time operations are supported:
2017-03-12 09:13:29 +01:00
the \texttt{unite} operation joins two sets,
2017-02-05 23:44:42 +01:00
and the \texttt{find} operation finds the representative
2017-02-25 16:57:10 +01:00
of the set that contains a given element\footnote{The structure presented here
was introduced in 1971 by J. D. Hopcroft and J. D. Ullman \cite{hop71}.
Later, in 1975, R. E. Tarjan studied a more sophisticated variant
of the structure \cite{tar75} that is discussed in many algorithm
textbooks nowadays.}.
2017-01-08 13:45:46 +01:00
\subsubsection{Structure}
2017-02-05 23:44:42 +01:00
In a union-find structure, one element in each set
is the representative of the set,
2017-02-17 21:13:30 +01:00
and there is a chain from any other element of the
2017-02-05 23:44:42 +01:00
set to the representative.
For example, assume that the sets are
$\{1,4,7\}$, $\{5\}$ and $\{2,3,6,8\}$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (0,-1) {$1$};
\node[draw, circle] (2) at (7,0) {$2$};
\node[draw, circle] (3) at (7,-1.5) {$3$};
\node[draw, circle] (4) at (1,0) {$4$};
\node[draw, circle] (5) at (4,0) {$5$};
\node[draw, circle] (6) at (6,-2.5) {$6$};
\node[draw, circle] (7) at (2,-1) {$7$};
\node[draw, circle] (8) at (8,-2.5) {$8$};
\path[draw,thick,->] (1) -- (4);
\path[draw,thick,->] (7) -- (4);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (6) -- (3);
\path[draw,thick,->] (8) -- (3);
\end{tikzpicture}
\end{center}
2017-02-17 21:13:30 +01:00
In this case the representatives
2017-01-08 13:45:46 +01:00
of the sets are 4, 5 and 2.
2017-05-07 20:18:56 +02:00
We can find the representative of any element
2017-02-05 23:44:42 +01:00
by following the chain that begins at the element.
For example, the element 2 is the representative
for the element 6, because
2017-02-17 21:13:30 +01:00
we follow the chain $6 \rightarrow 3 \rightarrow 2$.
2017-02-05 23:44:42 +01:00
Two elements belong to the same set exactly when
their representatives are the same.
Two sets can be joined by connecting the
2017-01-08 13:45:46 +01:00
representative of one set to the
2017-05-07 20:18:56 +02:00
representative of the other set.
2017-02-05 23:44:42 +01:00
For example, the sets
2017-01-08 13:45:46 +01:00
$\{1,4,7\}$ and $\{2,3,6,8\}$
2017-02-05 23:44:42 +01:00
can be joined as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (2,-1) {$1$};
\node[draw, circle] (2) at (7,0) {$2$};
\node[draw, circle] (3) at (7,-1.5) {$3$};
\node[draw, circle] (4) at (3,0) {$4$};
\node[draw, circle] (6) at (6,-2.5) {$6$};
\node[draw, circle] (7) at (4,-1) {$7$};
\node[draw, circle] (8) at (8,-2.5) {$8$};
\path[draw,thick,->] (1) -- (4);
\path[draw,thick,->] (7) -- (4);
\path[draw,thick,->] (3) -- (2);
\path[draw,thick,->] (6) -- (3);
\path[draw,thick,->] (8) -- (3);
\path[draw,thick,->] (4) -- (2);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
The resulting set contains the elements
$\{1,2,3,4,6,7,8\}$.
2017-02-17 21:13:30 +01:00
From this on, the element 2 is the representative
2017-02-05 23:44:42 +01:00
for the entire set and the old representative 4
2017-02-17 21:13:30 +01:00
points to the element 2.
2017-01-08 13:45:46 +01:00
2017-02-17 21:13:30 +01:00
The efficiency of the union-find structure depends on
how the sets are joined.
2017-02-05 23:44:42 +01:00
It turns out that we can follow a simple strategy:
always connect the representative of the
2017-05-07 20:18:56 +02:00
\emph{smaller} set to the representative of the \emph{larger} set
2017-02-05 23:44:42 +01:00
(or if the sets are of equal size,
we can make an arbitrary choice).
Using this strategy, the length of any chain
2017-02-17 21:13:30 +01:00
will be $O(\log n)$, so we can
find the representative of any element
efficiently by following the corresponding chain.
2017-01-08 13:45:46 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-02-05 23:44:42 +01:00
The union-find structure can be implemented
2017-01-08 13:45:46 +01:00
using arrays.
In the following implementation,
2017-04-17 12:58:04 +02:00
the array \texttt{link} contains for each element
2017-01-08 13:45:46 +01:00
the next element
2017-02-05 23:44:42 +01:00
in the chain or the element itself if it is
2017-01-08 13:45:46 +01:00
a representative,
2017-04-17 12:58:04 +02:00
and the array \texttt{size} indicates for each representative
2017-01-08 13:45:46 +01:00
the size of the corresponding set.
2017-02-05 23:44:42 +01:00
Initially, each element belongs to a separate set:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-04-17 12:58:04 +02:00
for (int i = 1; i <= n; i++) link[i] = i;
for (int i = 1; i <= n; i++) size[i] = 1;
2016-12-28 23:54:51 +01:00
\end{lstlisting}
2017-01-08 13:45:46 +01:00
The function \texttt{find} returns
2017-02-05 23:44:42 +01:00
the representative for an element $x$.
2017-01-08 13:45:46 +01:00
The representative can be found by following
2017-02-05 23:44:42 +01:00
the chain that begins at $x$.
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-08 13:45:46 +01:00
int find(int x) {
2017-04-17 12:58:04 +02:00
while (x != link[x]) x = link[x];
2016-12-28 23:54:51 +01:00
return x;
}
\end{lstlisting}
2017-02-05 23:44:42 +01:00
The function \texttt{same} checks
2017-01-08 13:45:46 +01:00
whether elements $a$ and $b$ belong to the same set.
This can easily be done by using the
2017-02-05 23:44:42 +01:00
function \texttt{find}:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-08 13:45:46 +01:00
bool same(int a, int b) {
return find(a) == find(b);
2016-12-28 23:54:51 +01:00
}
\end{lstlisting}
\begin{samepage}
2017-03-12 09:13:29 +01:00
The function \texttt{unite} joins the sets
2017-01-08 13:45:46 +01:00
that contain elements $a$ and $b$
2017-04-19 20:03:04 +02:00
(the elements have to be in different sets).
2017-01-08 13:45:46 +01:00
The function first finds the representatives
of the sets and then connects the smaller
set to the larger set.
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-03-12 09:13:29 +01:00
void unite(int a, int b) {
2017-01-08 13:45:46 +01:00
a = find(a);
b = find(b);
2017-04-17 12:58:04 +02:00
if (size[a] < size[b]) swap(a,b);
size[a] += size[b];
link[b] = a;
2016-12-28 23:54:51 +01:00
}
\end{lstlisting}
\end{samepage}
2017-01-08 13:45:46 +01:00
The time complexity of the function \texttt{find}
2017-02-05 23:44:42 +01:00
is $O(\log n)$ assuming that the length of each
chain is $O(\log n)$.
2017-03-12 09:13:29 +01:00
In this case, the functions \texttt{same} and \texttt{unite}
2017-01-08 13:45:46 +01:00
also work in $O(\log n)$ time.
2017-03-12 09:13:29 +01:00
The function \texttt{unite} makes sure that the
2017-02-05 23:44:42 +01:00
length of each chain is $O(\log n)$ by connecting
2017-01-08 13:45:46 +01:00
the smaller set to the larger set.
2016-12-28 23:54:51 +01:00
2017-01-08 14:00:25 +01:00
\section{Prim's algorithm}
2016-12-28 23:54:51 +01:00
2017-01-08 14:00:25 +01:00
\index{Prim's algorithm}
2016-12-28 23:54:51 +01:00
2017-02-25 16:57:10 +01:00
\key{Prim's algorithm}\footnote{The algorithm is
named after R. C. Prim who published it in 1957 \cite{pri57}.
However, the same algorithm was discovered already in 1930
by V. Jarník.} is an alternative method
2017-01-08 14:00:25 +01:00
for finding a minimum spanning tree.
The algorithm first adds an arbitrary node
2017-02-05 23:44:42 +01:00
to the tree.
2017-02-17 21:13:30 +01:00
After this, the algorithm always chooses
a minimum-weight edge that
adds a new node to the tree.
2017-01-08 14:00:25 +01:00
Finally, all nodes have been added to the tree
and a minimum spanning tree has been found.
2016-12-28 23:54:51 +01:00
2017-01-08 14:00:25 +01:00
Prim's algorithm resembles Dijkstra's algorithm.
The difference is that Dijkstra's algorithm always
2017-02-05 23:44:42 +01:00
selects an edge whose distance from the starting
node is minimum, but Prim's algorithm simply selects
the minimum weight edge that adds a new node to the tree.
2016-12-28 23:54:51 +01:00
2017-01-08 14:00:25 +01:00
\subsubsection{Example}
2016-12-28 23:54:51 +01:00
2017-02-05 23:44:42 +01:00
Let us consider how Prim's algorithm works
2017-01-08 14:00:25 +01:00
in the following graph:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
%\path[draw=red,thick,-,line width=2pt] (5) -- (6);
\end{tikzpicture}
\end{center}
2017-01-08 14:00:25 +01:00
Initially, there are no edges between the nodes:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
%\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-02-05 23:44:42 +01:00
An arbitrary node can be the starting node,
2017-02-17 21:13:30 +01:00
so let us choose node 1.
First, we add node 2 that is connected by
an edge of weight 3:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
%\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
2017-01-08 14:00:25 +01:00
After this, there are two edges with weight 5,
so we can add either node 3 or node 5 to the tree.
2017-02-05 23:44:42 +01:00
Let us add node 3 first:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
%\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
%\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
%\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\begin{samepage}
2017-01-08 14:00:25 +01:00
The process continues until all nodes have been included in the tree:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (1.5,2) {$1$};
\node[draw, circle] (2) at (3,3) {$2$};
\node[draw, circle] (3) at (5,3) {$3$};
\node[draw, circle] (4) at (6.5,2) {$4$};
\node[draw, circle] (5) at (3,1) {$5$};
\node[draw, circle] (6) at (5,1) {$6$};
\path[draw,thick,-] (1) -- node[font=\small,label=above:3] {} (2);
\path[draw,thick,-] (2) -- node[font=\small,label=above:5] {} (3);
%\path[draw,thick,-] (3) -- node[font=\small,label=above:9] {} (4);
%\path[draw,thick,-] (1) -- node[font=\small,label=below:5] {} (5);
\path[draw,thick,-] (5) -- node[font=\small,label=below:2] {} (6);
\path[draw,thick,-] (6) -- node[font=\small,label=below:7] {} (4);
%\path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (5);
\path[draw,thick,-] (3) -- node[font=\small,label=left:3] {} (6);
\end{tikzpicture}
\end{center}
\end{samepage}
2017-01-08 14:00:25 +01:00
\subsubsection{Implementation}
2016-12-28 23:54:51 +01:00
2017-01-08 14:00:25 +01:00
Like Dijkstra's algorithm, Prim's algorithm can be
efficiently implemented using a priority queue.
2017-02-05 23:44:42 +01:00
The priority queue should contain all nodes
2017-01-08 14:00:25 +01:00
that can be connected to the current component using
a single edge, in increasing order of the weights
of the corresponding edges.
The time complexity of Prim's algorithm is
$O(n + m \log m)$ that equals the time complexity
of Dijkstra's algorithm.
2017-02-05 23:44:42 +01:00
In practice, Prim's and Kruskal's algorithms
2017-01-08 14:00:25 +01:00
are both efficient, and the choice of the algorithm
is a matter of taste.
Still, most competitive programmers use Kruskal's algorithm.