\chapter{Basics of graphs} Many programming problems can be solved by interpreting the problem as a graph problem and using a suitable graph algorithm. A typical example of a graph is a network of roads and cities in a country. Sometimes, though, the graph is hidden in the problem and it can be difficult to detect it. This part of the book discusses techniques and algorithms involving graphs that are important in competitive programming. We will first go through graph terminology and different ways to store graphs in algorithms. \section{Terminology} \index{graph} \index{node} \index{edge} A \key{graph} consists of \key{nodes} and \key{edges} between them. In this book, the variable $n$ denotes the number of nodes in a graph, and the variable $m$ denotes the number of edges. In addition, the nodes are numbered using integers $1,2,\ldots,n$. For example, the following graph contains 5 nodes and 7 edges: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (1) -- (4); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \path[draw,thick,-] (2) -- (5); \path[draw,thick,-] (4) -- (5); \end{tikzpicture} \end{center} \index{path} A \key{path} is a route from node $a$ to node $b$ that goes through the edges in the graph. The \key{length} of a path is the number of edges in the path. For example, in the above graph, paths from node 1 to node 5 are: \begin{itemize} \item $1 \rightarrow 2 \rightarrow 5$ (length 2) \item $1 \rightarrow 4 \rightarrow 5$ (length 2) \item $1 \rightarrow 2 \rightarrow 4 \rightarrow 5$ (length 3) \item $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ (length 3) \item $1 \rightarrow 4 \rightarrow 2 \rightarrow 5$ (length 3) \item $1 \rightarrow 3 \rightarrow 4 \rightarrow 2 \rightarrow 5$ (length 4) \end{itemize} \subsubsection{Connectivity} \index{connected graph} A graph is \key{connected}, if there is path between any two nodes. For example, the following graph is connected: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (2) -- (3); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \end{tikzpicture} \end{center} The following graph is not connected because it is not possible to get to other nodes from node 4. \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (2) -- (3); \end{tikzpicture} \end{center} \index{compomnent} The connected parts of a graph are its \key{components}. For example, the following graph contains three components: $\{1,\,2,\,3\}$, $\{4,\,5,\,6,\,7\}$ and $\{8\}$. \begin{center} \begin{tikzpicture}[scale=0.8] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (6) at (6,1) {$6$}; \node[draw, circle] (7) at (9,1) {$7$}; \node[draw, circle] (4) at (6,3) {$4$}; \node[draw, circle] (5) at (9,3) {$5$}; \node[draw, circle] (8) at (11,2) {$8$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (2) -- (3); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (4) -- (5); \path[draw,thick,-] (5) -- (7); \path[draw,thick,-] (6) -- (7); \path[draw,thick,-] (6) -- (4); \end{tikzpicture} \end{center} \index{tree} A \key{tree} is a connected graph that contains $n$ nodes and $n-1$ edges. In a tree, there is a unique path between any two nodes. For example, the following graph is a tree: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); %\path[draw,thick,-] (1) -- (4); \path[draw,thick,-] (2) -- (5); \path[draw,thick,-] (2) -- (4); %\path[draw,thick,-] (4) -- (5); \end{tikzpicture} \end{center} \subsubsection{Edge directions} \index{directed graph} A graph is \key{directed} if the edges can be travelled only in one direction. For example, the following graph is directed: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,->,>=latex] (1) -- (2); \path[draw,thick,->,>=latex] (2) -- (4); \path[draw,thick,->,>=latex] (2) -- (5); \path[draw,thick,->,>=latex] (4) -- (5); \path[draw,thick,->,>=latex] (4) -- (1); \path[draw,thick,->,>=latex] (3) -- (1); \end{tikzpicture} \end{center} The above graph contains a path from node $3$ to $5$ using edges $3 \rightarrow 1 \rightarrow 2 \rightarrow 5$. However, the graph doesn't contain a path from node $5$ to $3$. \index{cycle} \index{acyclic graph} A \key{cycle} is a path whose first and last node is the same. For example, the above graph contains a cycle $1 \rightarrow 2 \rightarrow 4 \rightarrow 1$. If a graph doesn't contain any cycles, it is called \key{acyclic}. \subsubsection{Edge weights} \index{weighted graph} In a \key{weighted} graph, each edge is assigned a \key{weight}. Often, the weights are interpreted as edge lengths. For example, the following graph is weighted: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,-] (1) -- node[font=\small,label=above:5] {} (2); \path[draw,thick,-] (1) -- node[font=\small,label=left:1] {} (3); \path[draw,thick,-] (3) -- node[font=\small,label=below:7] {} (4); \path[draw,thick,-] (2) -- node[font=\small,label=left:6] {} (4); \path[draw,thick,-] (2) -- node[font=\small,label=above:7] {} (5); \path[draw,thick,-] (4) -- node[font=\small,label=below:3] {} (5); \end{tikzpicture} \end{center} Now the length of a path is the sum of edge weights. For example, in the above graph the length of path $1 \rightarrow 2 \rightarrow 5$ is $12$, and the length of path $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ is $11$. The latter is the shortest path from node $1$ to node $5$. \subsubsection{Neighbors and degrees} \index{neighbor} \index{degree} Two nodes are \key{neighbors} or \key{adjacent} if there is a edge between them. The \key{degree} of a node is the number of its neighbors. For example, in the following graph, the neighbors of node 2 are 1, 4 and 5, so its degree is 3. \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (1) -- (4); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \path[draw,thick,-] (2) -- (5); %\path[draw,thick,-] (4) -- (5); \end{tikzpicture} \end{center} The sum of degrees in a graph is always $2m$ where $m$ is the number of edges. The reason for this is that each edge increases the degree of two nodes by one. Thus, the sum of degrees is always even. \index{regular graph} \index{complete graph} A graph is \key{regular} if the degree of every node is a constant $d$. A graph is \key{complete} if the degree of every node is $n-1$, i.e., the graph contains all possible edges between the nodes. \index{indegree} \index{outdegree} In a directed graph, the \key{indegree} and \key{outdegree} of a node is the number of edges that end and begin at the node, respectively. For example, in the following graph, node 2 has indegree 2 and outdegree 1. \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (4,3) {$2$}; \node[draw, circle] (3) at (1,1) {$3$}; \node[draw, circle] (4) at (4,1) {$4$}; \node[draw, circle] (5) at (6,2) {$5$}; \path[draw,thick,->,>=latex] (1) -- (2); \path[draw,thick,->,>=latex] (1) -- (3); \path[draw,thick,->,>=latex] (1) -- (4); \path[draw,thick,->,>=latex] (3) -- (4); \path[draw,thick,->,>=latex] (2) -- (4); \path[draw,thick,<-,>=latex] (2) -- (5); \end{tikzpicture} \end{center} \subsubsection{Colorings} \index{coloring} \index{bipartite graph} In a \key{coloring} of a graph, each node is assigned a color so that no adjacent nodes have the same color. A graph is \key{bipartite} if it is possible to color it using two colors. It turns out that a graph is bipartite exactly when it doesn't contain a cycle with odd number of edges. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$2$}; \node[draw, circle] (2) at (4,3) {$3$}; \node[draw, circle] (3) at (1,1) {$5$}; \node[draw, circle] (4) at (4,1) {$6$}; \node[draw, circle] (5) at (-2,1) {$4$}; \node[draw, circle] (6) at (-2,3) {$1$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \path[draw,thick,-] (3) -- (6); \path[draw,thick,-] (5) -- (6); \end{tikzpicture} \end{center} is bipartite because we can color it as follows: \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle, fill=blue!40] (1) at (1,3) {$2$}; \node[draw, circle, fill=red!40] (2) at (4,3) {$3$}; \node[draw, circle, fill=red!40] (3) at (1,1) {$5$}; \node[draw, circle, fill=blue!40] (4) at (4,1) {$6$}; \node[draw, circle, fill=red!40] (5) at (-2,1) {$4$}; \node[draw, circle, fill=blue!40] (6) at (-2,3) {$1$}; \path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \path[draw,thick,-] (3) -- (6); \path[draw,thick,-] (5) -- (6); \end{tikzpicture} \end{center} \subsubsection{Simplicity} \index{simple graph} A graph is \key{simple} if no edge begins and ends at the same node, and there are no multiple edges between two nodes. Often we will assume that the graph is simple. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$2$}; \node[draw, circle] (2) at (4,3) {$3$}; \node[draw, circle] (3) at (1,1) {$5$}; \node[draw, circle] (4) at (4,1) {$6$}; \node[draw, circle] (5) at (-2,1) {$4$}; \node[draw, circle] (6) at (-2,3) {$1$}; \path[draw,thick,-] (1) edge [bend right=20] (2); \path[draw,thick,-] (2) edge [bend right=20] (1); %\path[draw,thick,-] (1) -- (2); \path[draw,thick,-] (1) -- (3); \path[draw,thick,-] (3) -- (4); \path[draw,thick,-] (2) -- (4); \path[draw,thick,-] (3) -- (6); \path[draw,thick,-] (5) -- (6); \tikzset{every loop/.style={in=135,out=190}} \path[draw,thick,-] (5) edge [loop left] (5); \end{tikzpicture} \end{center} is \emph{not} simple because there is an edge that begins and ends at node 4, and there are two edges between nodes 2 and 3. \section{Graph representation} There are several ways how to represent graphs in memory in an algorithm. The choice of a data structure depends on the size of the graph and how the algorithm manipulates it. Next we will go through three representations. \subsubsection{Adjacency list representation} \index{adjacency list} A usual way to represent a graph is to create an \key{adjacency list} for each node. An adjacency list contains contains all nodes that can be reached from the node using a single edge. The adjacency list representation is the most popular way to store a graph, and most algorithms can be efficiently implemented using it. A good way to store the adjacency lists is to allocate an array whose each element is a vector: \begin{lstlisting} vector v[N]; \end{lstlisting} The adjacency list for node $s$ is in position $\texttt{v}[s]$ in the array. The constant $N$ is so chosen that all adjacency lists can be stored. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- (2); \path[draw,thick,->,>=latex] (2) -- (3); \path[draw,thick,->,>=latex] (2) -- (4); \path[draw,thick,->,>=latex] (3) -- (4); \path[draw,thick,->,>=latex] (4) -- (1); \end{tikzpicture} \end{center} can be stored as follows: \begin{lstlisting} v[1].push_back(2); v[2].push_back(3); v[2].push_back(4); v[3].push_back(4); v[4].push_back(1); \end{lstlisting} If the graph is undirected, it can be stored in a similar way, but each edge each is store in both directions. For an weighted graph, the structure can be extended as follows: \begin{lstlisting} vector> v[N]; \end{lstlisting} Now each adjacency list contains pairs whose first element is the target node, and the second element is the edge weight. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4); \path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4); \path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1); \end{tikzpicture} \end{center} can be stored as follows: \begin{lstlisting} v[1].push_back({2,5}); v[2].push_back({3,7}); v[2].push_back({4,6}); v[3].push_back({4,5}); v[4].push_back({1,2}); \end{lstlisting} The benefit in the adjacency list representation is that we can efficiently find the nodes that can be reached from a certain node. For example, the following loop goes trough all nodes that can be reached from node $s$: \begin{lstlisting} for (auto u : v[s]) { // process node u } \end{lstlisting} \subsubsection{Adjacency matrix representation} \index{adjacency matrix} An \key{adjacency matrix} is a two-dimensional array that indicates for each possible edge if it is included in the graph. Using an adjacency matrix, we can efficiently check if there is an edge between two nodes. On the other hand, the matrix takes a lot of memory if the graph is large. We can store the matrix as an array \begin{lstlisting} int v[N][N]; \end{lstlisting} where the value $\texttt{v}[a][b]$ indicates whether the graph contains an edge from node $a$ to node $b$. If the edge is included in the graph, then $\texttt{v}[a][b]=1$, and otherwise $\texttt{v}[a][b]=0$. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- (2); \path[draw,thick,->,>=latex] (2) -- (3); \path[draw,thick,->,>=latex] (2) -- (4); \path[draw,thick,->,>=latex] (3) -- (4); \path[draw,thick,->,>=latex] (4) -- (1); \end{tikzpicture} \end{center} can be represented as follows: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (4,4); \node at (0.5,0.5) {1}; \node at (1.5,0.5) {0}; \node at (2.5,0.5) {0}; \node at (3.5,0.5) {0}; \node at (0.5,1.5) {0}; \node at (1.5,1.5) {0}; \node at (2.5,1.5) {0}; \node at (3.5,1.5) {1}; \node at (0.5,2.5) {0}; \node at (1.5,2.5) {0}; \node at (2.5,2.5) {1}; \node at (3.5,2.5) {1}; \node at (0.5,3.5) {0}; \node at (1.5,3.5) {1}; \node at (2.5,3.5) {0}; \node at (3.5,3.5) {0}; \node at (-0.5,0.5) {4}; \node at (-0.5,1.5) {3}; \node at (-0.5,2.5) {2}; \node at (-0.5,3.5) {1}; \node at (0.5,4.5) {1}; \node at (1.5,4.5) {2}; \node at (2.5,4.5) {3}; \node at (3.5,4.5) {4}; \end{tikzpicture} \end{center} If the graph is directed, the adjacency matrix representation can be extended so that the matrix contains the weight of the edge if the edge exists. Using this representation, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4); \path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4); \path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1); \end{tikzpicture} \end{center} \begin{samepage} corresponds to the following matrix: \begin{center} \begin{tikzpicture}[scale=0.7] \draw (0,0) grid (4,4); \node at (0.5,0.5) {2}; \node at (1.5,0.5) {0}; \node at (2.5,0.5) {0}; \node at (3.5,0.5) {0}; \node at (0.5,1.5) {0}; \node at (1.5,1.5) {0}; \node at (2.5,1.5) {0}; \node at (3.5,1.5) {5}; \node at (0.5,2.5) {0}; \node at (1.5,2.5) {0}; \node at (2.5,2.5) {7}; \node at (3.5,2.5) {6}; \node at (0.5,3.5) {0}; \node at (1.5,3.5) {5}; \node at (2.5,3.5) {0}; \node at (3.5,3.5) {0}; \node at (-0.5,0.5) {4}; \node at (-0.5,1.5) {3}; \node at (-0.5,2.5) {2}; \node at (-0.5,3.5) {1}; \node at (0.5,4.5) {1}; \node at (1.5,4.5) {2}; \node at (2.5,4.5) {3}; \node at (3.5,4.5) {4}; \end{tikzpicture} \end{center} \end{samepage} \subsubsection{Edge list representation} \index{edge list} An \key{edge list} contains all edges of a graph. This is a convenient way to represent a graph, if the algorithm will go trough all edges of the graph, and it is not needed to find edges that begin at a given node. The edge list can be stored in a vector \begin{lstlisting} vector> v; \end{lstlisting} where each element contains the starting and ending node of an edge. Thus, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- (2); \path[draw,thick,->,>=latex] (2) -- (3); \path[draw,thick,->,>=latex] (2) -- (4); \path[draw,thick,->,>=latex] (3) -- (4); \path[draw,thick,->,>=latex] (4) -- (1); \end{tikzpicture} \end{center} can be represented as follows: \begin{lstlisting} v.push_back({1,2}); v.push_back({2,3}); v.push_back({2,4}); v.push_back({3,4}); v.push_back({4,1}); \end{lstlisting} \noindent If the graph is weighted, we can extend the structure as follows: \begin{lstlisting} vector,int>> v; \end{lstlisting} Now the list contains pairs whose first element contains the starting and ending node of an edge, and the second element corresponds to the edge weight. For example, the graph \begin{center} \begin{tikzpicture}[scale=0.9] \node[draw, circle] (1) at (1,3) {$1$}; \node[draw, circle] (2) at (3,3) {$2$}; \node[draw, circle] (3) at (5,3) {$3$}; \node[draw, circle] (4) at (3,1) {$4$}; \path[draw,thick,->,>=latex] (1) -- node[font=\small,label=above:5] {} (2); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=above:7] {} (3); \path[draw,thick,->,>=latex] (2) -- node[font=\small,label=left:6] {} (4); \path[draw,thick,->,>=latex] (3) -- node[font=\small,label=right:5] {} (4); \path[draw,thick,->,>=latex] (4) -- node[font=\small,label=left:2] {} (1); \end{tikzpicture} \end{center} \begin{samepage} can be represented as follows: \begin{lstlisting} v.push_back({{1,2},5}); v.push_back({{2,3},7}); v.push_back({{2,4},6}); v.push_back({{3,4},5}); v.push_back({{4,1},2}); \end{lstlisting} \end{samepage}