cphb/luku12.tex

531 lines
15 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Graph search}
2017-01-07 17:30:14 +01:00
This chapter introduces two fundamental
graph algorithms:
depth-first search and breadth-first search.
Both algorithms are given a starting
node in the graph,
and they visit all nodes that can be reached
from the starting node.
The difference in the algorithms is the order
in which they visit the nodes.
\section{Depth-first search}
\index{depth-first search}
\key{Depth-first search} (DFS)
is a straightforward graph search technique.
The algorithm begins at a starting node,
and proceeds to all other nodes that are
reachable from the starting node using
the edges in the graph.
Depth-first search always follows a single
path in the graph as long as it finds
new nodes.
After this, it returns back to previous
nodes and begins to explore other parts of the graph.
The algorithm keeps track of visited nodes,
so that it processes each node only once.
\subsubsection*{Example}
Let's consider how depth-first search processes
the following graph:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
The algorithm can begin at any node in the graph,
but we will now assume that it begins
at node 1.
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
The search first proceeds to node 2:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
After this, nodes 3 and 5 will be visited:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (5);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
The neighbors of node 5 are 2 and 3,
but the search has already visited both of them,
so it's time to return back.
Also the neighbors of nodes 3 and 2
have been visited, so we'll next proceed
from node 1 to node 4:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (5);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (4);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
After this, the search terminates because it has visited
all nodes.
The time complexity of depth-first search is $O(n+m)$
where $n$ is the number of nodes and $m$ is the
number of edges,
because the algorithm processes each node and edge once.
\subsubsection*{Implementation}
Depth-first search can be conveniently
implemented using recursion.
The following function \texttt{dfs} begins
a depth-first search at a given node.
The function assumes that the graph is
stored as adjacency lists in array
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
vector<int> v[N];
\end{lstlisting}
2017-01-07 17:30:14 +01:00
and also maintains an array
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
int z[N];
\end{lstlisting}
2017-01-07 17:30:14 +01:00
that keeps track of the visited nodes.
Initially, each array value is 0,
and when the search arrives at node $s$,
the value of \texttt{z}[$s$] becomes 1.
The function can be implemented as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-01-07 17:30:14 +01:00
void dfs(int s) {
2016-12-28 23:54:51 +01:00
if (z[s]) return;
z[s] = 1;
2017-01-07 17:30:14 +01:00
// process node s
2016-12-28 23:54:51 +01:00
for (auto u: v[s]) {
2017-01-07 17:30:14 +01:00
dfs(u);
2016-12-28 23:54:51 +01:00
}
}
\end{lstlisting}
2017-01-07 17:30:14 +01:00
\section{Breadth-first search}
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
\index{breadth-first search}
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
\key{Breadth-first search} (BFS) visits the nodes
in increasing order of their distance
from the starting node.
Thus, we can calculate the distance
from the starting node to all other
nodes using breadth-first search.
However, breadth-first search is more difficult
to implement than depth-first search.
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
Breadth-first search goes through the nodes
one level after another.
First the search explores the nodes whose
distance from the starting node is 1,
then the nodes whose distance is 2, and so on.
This process continues until all nodes
have been visited.
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
\subsubsection*{Example}
2016-12-28 23:54:51 +01:00
2017-01-07 17:30:14 +01:00
Let's consider how the algorithm processes
the following graph:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (1) at (1,5) {$1$};
\node[draw, circle] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
Assume again that the search begins at node 1.
First, we process all nodes that can be reached
from node 1 using a single edge:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (1) -- (4);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
After this, we procees to nodes 3 and 5:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\node[draw, circle] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
Finally, we visit node 6:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (1) at (1,5) {$1$};
\node[draw, circle,fill=lightgray] (2) at (3,5) {$2$};
\node[draw, circle,fill=lightgray] (3) at (5,5) {$3$};
\node[draw, circle,fill=lightgray] (4) at (1,3) {$4$};
\node[draw, circle,fill=lightgray] (5) at (3,3) {$5$};
\node[draw, circle,fill=lightgray] (6) at (5,3) {$6$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (3) -- (6);
\path[draw=red,thick,->,line width=2pt] (5) -- (6);
\end{tikzpicture}
\end{center}
2017-01-07 17:30:14 +01:00
Now we have calculated the distances
from the starting node to all nodes in the graph.
The distances are as follows:
2016-12-28 23:54:51 +01:00
\begin{tabular}{ll}
\\
2017-01-07 17:30:14 +01:00
node & distance \\
2016-12-28 23:54:51 +01:00
\hline
1 & 0 \\
2 & 1 \\
3 & 2 \\
4 & 1 \\
5 & 2 \\
6 & 3 \\
\\
\end{tabular}
2017-01-07 17:30:14 +01:00
Like in depth-first search,
the time complexity of breadth-first search
is $O(n+m)$ where $n$ is the number of nodes
and $m$ is the number of edges.
\subsubsection*{Implementation}
Breadth-first search is more difficult
to implement than depth-first search
because the algorithm visits nodes
in different parts in the graph.
A typical implementation is to maintain
a queue of nodes to be processed.
At each step, the next node in the queue
will be processed.
The following code begins a breadth-first
search at node $x$.
The code assumes that the graph is stored
as adjacency lists and maintains a queue
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
queue<int> q;
\end{lstlisting}
2017-01-07 17:30:14 +01:00
that contains the nodes in increasing order
of their distance.
New nodes are always added to the end
of the queue, and the node at the beginning
of the queue is the next node to be processed.
In addition, the code uses arrays
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
int z[N], e[N];
\end{lstlisting}
2017-01-07 17:30:14 +01:00
so that array \texttt{z} indicates
which nodes the search already has visited
and array \texttt{e} will contain the
minimum distance to all nodes in the graph.
The search can be implemented as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
z[s] = 1; e[x] = 0;
q.push(x);
while (!q.empty()) {
int s = q.front(); q.pop();
2017-01-07 17:30:14 +01:00
// process node s
2016-12-28 23:54:51 +01:00
for (auto u : v[s]) {
if (z[u]) continue;
z[u] = 1; e[u] = e[s]+1;
q.push(u);
}
}
\end{lstlisting}
2017-01-07 18:17:37 +01:00
\section{Applications}
Using the graph search algorithms,
we can check many properties of the graph.
Usually, either depth-first search or
bredth-first search can be used,
but in practice, depth-first search
is a better choice because it is
easier to implement.
In the following applications we will
assume that the graph is undirected.
\subsubsection{Connectivity check}
\index{connected graph}
A graph is connected if there is a path
between any two nodes in the graph.
Thus, we can check if a graph is connected
by selecting an arbitrary node and
finding out if we can reach all other nodes.
For example, in the graph
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (7,5) {$2$};
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (3) at (5,4) {$3$};
\node[draw, circle] (5) at (7,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\end{tikzpicture}
\end{center}
2017-01-07 18:17:37 +01:00
a depth-first search from node $1$ visits
the following nodes:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (7,5) {$2$};
\node[draw, circle,fill=lightgray] (1) at (3,5) {$1$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle] (5) at (7,3) {$5$};
\node[draw, circle,fill=lightgray] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (4);
\end{tikzpicture}
\end{center}
2017-01-07 18:17:37 +01:00
Since the search didn't visit all the nodes,
we can conclude that the graph is not connected.
In a similar way, we can also find all components
in a graph by iterating trough the nodes and always
starting a new depth-first search if the node
doesn't belong to a component.
2016-12-28 23:54:51 +01:00
2017-01-07 18:17:37 +01:00
\subsubsection{Finding cycles}
2016-12-28 23:54:51 +01:00
2017-01-07 18:17:37 +01:00
\index{cycle}
2016-12-28 23:54:51 +01:00
2017-01-07 18:17:37 +01:00
A graph contains a cycle if during a graph search,
we find a node whose neighbor (other than the
previous node in the current path) has already been
visited.
For example, the graph
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=lightgray] (2) at (7,5) {$2$};
\node[draw, circle,fill=lightgray] (1) at (3,5) {$1$};
\node[draw, circle,fill=lightgray] (3) at (5,4) {$3$};
\node[draw, circle,fill=lightgray] (5) at (7,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (3) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (3) -- (5);
\path[draw=red,thick,->,line width=2pt] (1) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}
2017-01-07 18:17:37 +01:00
contains a cycle because when we move from
node 2 to node 5 it turns out
that the neighbor node 3 has already been visited.
Thus, the graph contains a cycle that goes through node 3,
for example, $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$.
Another way to find out whether a graph contains a cycle
is to simply calculate the number of nodes and edges
in every component.
If a component contains $c$ nodes and no cycle,
it must contain exactly $c-1$ edges.
If there are $c$ or more edges, the component
always contains a cycle.
\subsubsection{Bipartiteness check}
\index{bipartite graph}
A graph is bipartite if its nodes can be colored
using two colors so that there are no adjacent
nodes with same color.
It is suprisingly easy to check if a graph
is bipartite using graph search algorithms.
The idea is to color the starting node blue,
all its neighbors red, all their neighbors blue, and so on.
If at some point of the search we notice that
two adjacent nodes have the same color,
this means that the graph is not bipartite.
Otherwise the graph is bipartite and one coloring
has been found.
For example, the graph
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle] (2) at (5,5) {$2$};
\node[draw, circle] (1) at (3,5) {$1$};
\node[draw, circle] (3) at (7,4) {$3$};
\node[draw, circle] (5) at (5,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (4);
\path[draw,thick,-] (4) -- (1);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (5) -- (3);
\end{tikzpicture}
\end{center}
2017-01-07 18:17:37 +01:00
is not bipartite because a search from node 1
produces the following situation:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tikzpicture}
\node[draw, circle,fill=red!40] (2) at (5,5) {$2$};
\node[draw, circle,fill=blue!40] (1) at (3,5) {$1$};
\node[draw, circle,fill=blue!40] (3) at (7,4) {$3$};
\node[draw, circle,fill=red!40] (5) at (5,3) {$5$};
\node[draw, circle] (4) at (3,3) {$4$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (4);
\path[draw,thick,-] (4) -- (1);
\path[draw,thick,-] (2) -- (3);
\path[draw,thick,-] (5) -- (3);
\path[draw=red,thick,->,line width=2pt] (1) -- (2);
\path[draw=red,thick,->,line width=2pt] (2) -- (3);
\path[draw=red,thick,->,line width=2pt] (3) -- (5);
\path[draw=red,thick,->,line width=2pt] (5) -- (2);
\end{tikzpicture}
\end{center}
2017-01-07 18:17:37 +01:00
We notice that the color or both node 2 and node 5
is red, while they are adjacent nodes in the graph.
Thus, the graph is not bipartite.
This algorithm always works because when there
are only two colors available,
the color of the starting node in a component
determines the colors of all other nodes in the component.
It doesn't make any difference whether the
starting node is red or blue.
Note that in the general case,
it is difficult to find out if the nodes
in a graph can be colored using $k$ colors
so that no adjacent nodes have the same color.
Even when $k=3$, no efficient algorithm is known
but the problem is NP-hard.