cphb/luku14.tex

\chapter{Tree algorithms}

\index{tree}

A \key{tree} is a connected, acyclic graph
that contains $n$ nodes and $n-1$ edges.
Removing any edge from a tree divides it
into two components,
and adding any edge to a tree creates a cycle.
Moreover, there is always a unique path between any
two nodes in a tree.

For example, the following tree contains 7 nodes and 6 edges:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$2$};
\node[draw, circle] (3) at (0,1) {$4$};
\node[draw, circle] (4) at (2,1) {$5$};
\node[draw, circle] (5) at (4,1) {$6$};
\node[draw, circle] (6) at (-2,3) {$7$};
\node[draw, circle] (7) at (-2,1) {$3$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}

\index{leaf}

The \key{leaves} of a tree are nodes
with degree 1, i.e., with only one neighbor.
For example, the leaves in the above tree
are nodes 3, 5, 6 and 7.

\index{root}
\index{rooted tree}

In a \key{rooted} tree, one of the nodes
is chosen to be a \key{root}, and all other nodes are
placed underneath the root.
For example, in the following tree,
node 1 is the root of the tree.

\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (-2,1) {$4$};
\node[draw, circle] (4) at (0,1) {$5$};
\node[draw, circle] (5) at (2,-1) {$6$};
\node[draw, circle] (6) at (-3,-1) {$3$};
\node[draw, circle] (7) at (-1,-1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}

\index{child}
\index{parent}

In a rooted tree, the \key{childern} of a node
are its lower neighbors, and the \key{parent} of a node
is its upper neighbor.
Each node has exactly one parent,
except that the root doesn't have a parent.
For example, in the above tree,
the childern of node 4 are nodes 3 and 7,
and the parent is node 1.

\index{subtree}

The structure of a rooted tree is \emph{recursive}:
each node in the tree is the root of a \key{subtree}
that contains the node itself and all other nodes
that can be reached by travelling downwards in the tree.
For example, in the above tree, the subtree of node 4
contains nodes 4, 3 and 7.

\section{Tree search}

Depth-first search and breadth-first search
can be used for going through the nodes in a tree.
However, the search is easier to implement than
for a general graph, because
there are no cycles in the tree, and it is not
possible that the search would visit a node several times.

Often, we start a depth-first search from a chosen
root node.
The following recursive function implements it:

\begin{lstlisting}
void dfs(int s, int e) {
    // process node s
    for (auto u : v[s]) {
        if (u != e) dfs(u, s);
    }
}
\end{lstlisting}

The function parameters are the current node $s$
and the previous node $e$.
The idea of the parameter $e$ is to ensure
that the search only proceeds downwards in the tree
towards nodes that have not been visited yet.

The following function call starts the search
at node $x$:

\begin{lstlisting}
dfs(x, 0);
\end{lstlisting}

In the first call $e=0$ because there is no
previous node, and it is allowed
to proceed to any direction in the tree.

\subsubsection{Dynamic programming}

We can also use dynamic programming to calculate
some information from the tree during the search.
Using dynamic programming, we can, for example,
calculate in $O(n)$ time for each node the
number of nodes in its subtree,
or the length of the longest path downwards
that begins at the node.

As an example, let's calculate for each node $s$
a value $\texttt{c}[s]$: the number of nodes in its subtree.
The subtree contains the node itself and
all nodes in the subtrees of its children.
Thus, we can calculate the number of nodes
recursively using the following code:

\begin{lstlisting}
void haku(int s, int e) {
    c[s] = 1;
    for (auto u : v[s]) {
        if (u == e) continue;
        haku(u, s);
        c[s] += c[u];
    }
}
\end{lstlisting}

\section{Diameter}

\index{diameter}

The \key{diameter} of a tree
is the length of the longest path
between two nodes in the tree.
For example, in the tree
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$2$};
\node[draw, circle] (3) at (0,1) {$4$};
\node[draw, circle] (4) at (2,1) {$5$};
\node[draw, circle] (5) at (4,1) {$6$};
\node[draw, circle] (6) at (-2,3) {$7$};
\node[draw, circle] (7) at (-2,1) {$3$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
the diameter is 4, and it corresponds to two paths:
the path between nodes 3 and 6,
and the path between nodes 7 and 6.

Next we will learn two efficient algorithms
for calculating the diameter of a tree.
Both algorithms calculate the diameter in $O(n)$ time.
The first algorithm is based on dynamic programming,
and the second algorithm uses two depth-first searches
to calculate the diameter.

\subsubsection{Algorithm 1}

First, one of the nodes is chosen to be the root.
After this, the algorithm calculates for each node
the length of the longest path that begins at some leaf,
ascends to the node and then descends to another leaf.
The length of the
longest such path equals the diameter of the tree.

In the example case, the longest path begins at node 7,
ascends to node 1, and then descends to node 6:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (-2,1) {$4$};
\node[draw, circle] (4) at (0,1) {$5$};
\node[draw, circle] (5) at (2,-1) {$6$};
\node[draw, circle] (6) at (-3,-1) {$3$};
\node[draw, circle] (7) at (-1,-1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);

\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}

The algorithm first calculates using dynamic programming
for each node the length of the longest path
that goes downwards from the node.
For example, in the above tree,
the longest path from node 1 downwards has length 2
(the path can be $1 \rightarrow 4 \rightarrow 3$,
$1 \rightarrow 4 \rightarrow 7$ or $1 \rightarrow 2 \rightarrow 6$).

After this, the algorithm calculates for each node
the length of the longest path where the node
is the turning point of the path.
The longest such path can be found by selecting
two children with longest paths downwards.
For example, in the above graph,
nodes 2 and 4 are chosen for node 1.

\subsubsection{Algorithm 2}

Another efficient way to calculate the diameter
of a tree is based on two depth-first searches.
First, we choose an arbitrary node $a$ in the tree
and find a node $b$ with maximum distance to $a$.
Then, we find a node $c$ with maximum distance to $b$.
The diameter is the distance between nodes $b$ and $c$.

In the example case, $a$, $b$ and $c$ could be:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$2$};
\node[draw, circle] (3) at (0,1) {$4$};
\node[draw, circle] (4) at (2,1) {$5$};
\node[draw, circle] (5) at (4,1) {$6$};
\node[draw, circle] (6) at (-2,3) {$7$};
\node[draw, circle] (7) at (-2,1) {$3$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\node[color=red] at (2,1.6) {$a$};
\node[color=red] at (-1.4,3) {$b$};
\node[color=red] at (4,1.6) {$c$};

\path[draw,thick,-,color=red,line width=2pt] (6) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}

This is an elegant method, but why does it work?

It helps to draw the tree differently so that
the path that corresponds to the diameter
is horizontal, and all other
nodes hang from it:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (2,1) {$1$};
\node[draw, circle] (2) at (4,1) {$2$};
\node[draw, circle] (3) at (0,1) {$4$};
\node[draw, circle] (4) at (2,-1) {$5$};
\node[draw, circle] (5) at (6,1) {$6$};
\node[draw, circle] (6) at (0,-1) {$3$};
\node[draw, circle] (7) at (-2,1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\node[color=red] at (2,-1.6) {$a$};
\node[color=red] at (-2,1.6) {$b$};
\node[color=red] at (6,1.6) {$c$};
\node[color=red] at (2,1.6) {$x$};

\path[draw,thick,-,color=red,line width=2pt] (7) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (1);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (2) -- (5);
\end{tikzpicture}
\end{center}

Node $x$ indicates the place where the path
from node $a$ joins the path that corresponds
to the diameter.
The farthest node from $a$
is node $b$, node $c$ or some other node
that is at least as far from node $x$.
Thus, this node can always be chosen for
a starting node of a path that corresponds to the diameter.

\section{Distances between nodes}

A more difficult problem is to calculate
for each node in the tree and for each direction,
the maximum distance to a node in that direction.
It turns out that this can be calculated in
$O(n)$ time as well using dynamic programming.

\begin{samepage}
In the example case, the distances are as follows:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,3) {$2$};
\node[draw, circle] (3) at (0,1) {$4$};
\node[draw, circle] (4) at (2,1) {$5$};
\node[draw, circle] (5) at (4,1) {$6$};
\node[draw, circle] (6) at (-2,3) {$7$};
\node[draw, circle] (7) at (-2,1) {$3$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);
\node[color=red] at (0.5,3.2) {$2$};
\node[color=red] at (0.3,2.4) {$1$};
\node[color=red] at (-0.2,2.4) {$2$};
\node[color=red] at (-0.2,1.5) {$3$};
\node[color=red] at (-0.5,1.2) {$1$};
\node[color=red] at (-1.7,2.4) {$4$};
\node[color=red] at (-0.5,0.8) {$1$};
\node[color=red] at (-1.5,0.8) {$4$};
\node[color=red] at (1.5,3.2) {$3$};
\node[color=red] at (1.5,1.2) {$3$};
\node[color=red] at (3.5,1.2) {$4$};
\node[color=red] at (2.2,2.4) {$1$};
\end{tikzpicture}
\end{center}
\end{samepage}
For example, the furthest node from node 4
upwards is node 6, and the distance to this
node is 3 using the path
$4 \rightarrow 1 \rightarrow 2 \rightarrow 6$.

\begin{samepage}
Also in this problem, a good starting point
is to root the tree.
After this, all distances downwards can
be calculated using dynamic programming:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (-2,1) {$4$};
\node[draw, circle] (4) at (0,1) {$5$};
\node[draw, circle] (5) at (2,-1) {$6$};
\node[draw, circle] (6) at (-3,-1) {$3$};
\node[draw, circle] (7) at (-1,-1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);

\node[color=red] at (-2.5,0.7) {$1$};
\node[color=red] at (-1.5,0.7) {$1$};

\node[color=red] at (2.2,0.5) {$1$};

\node[color=red] at (-0.5,2.8) {$2$};
\node[color=red] at (0.2,2.5) {$1$};
\node[color=red] at (0.5,2.8) {$2$};
\end{tikzpicture}
\end{center}
\end{samepage}

The remaining task is to calculate the distances upwards.
This can be done by going through the nodes once again
and keeping track of the largest distance from the parent
of the current node to some other node in another direction.

For example, the distance from node 2 upwards
is one larger than the distance from node 1
downwards in some other direction than node 2:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (-2,1) {$4$};
\node[draw, circle] (4) at (0,1) {$5$};
\node[draw, circle] (5) at (2,-1) {$6$};
\node[draw, circle] (6) at (-3,-1) {$3$};
\node[draw, circle] (7) at (-1,-1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);

\path[draw,thick,-,color=red,line width=2pt] (1) -- (2);
\path[draw,thick,-,color=red,line width=2pt] (1) -- (3);
\path[draw,thick,-,color=red,line width=2pt] (3) -- (7);

\end{tikzpicture}
\end{center}
Finally, we can calculate the distances for all nodes
and all directions:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,3) {$1$};
\node[draw, circle] (2) at (2,1) {$2$};
\node[draw, circle] (3) at (-2,1) {$4$};
\node[draw, circle] (4) at (0,1) {$5$};
\node[draw, circle] (5) at (2,-1) {$6$};
\node[draw, circle] (6) at (-3,-1) {$3$};
\node[draw, circle] (7) at (-1,-1) {$7$};
\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (1) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (3) -- (6);
\path[draw,thick,-] (3) -- (7);

\node[color=red] at (-2.5,0.7) {$1$};
\node[color=red] at (-1.5,0.7) {$1$};

\node[color=red] at (2.2,0.5) {$1$};

\node[color=red] at (-0.5,2.8) {$2$};
\node[color=red] at (0.2,2.5) {$1$};
\node[color=red] at (0.5,2.8) {$2$};

\node[color=red] at (-3,-0.4) {$4$};
\node[color=red] at (-1,-0.4) {$4$};
\node[color=red] at (-2,1.6) {$3$};
\node[color=red] at (2,1.6) {$3$};

\node[color=red] at (2.2,-0.4) {$4$};
\node[color=red] at (0.2,1.6) {$3$};
\end{tikzpicture}
\end{center}

\section{Binary trees}

\index{binary tree}

\begin{samepage}
A \key{binary tree} is a rooted tree
where each node has a left subtree
and a right subtree.
It is possible that a subtree of a node is empty.
Thus, every node in a binary tree has
0, 1 or 2 children.

For example, the following tree is a binary tree:
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (-1.5,-1.5) {$2$};
\node[draw, circle] (3) at (1.5,-1.5) {$3$};
\node[draw, circle] (4) at (-3,-3) {$4$};
\node[draw, circle] (5) at (0,-3) {$5$};
\node[draw, circle] (6) at (-1.5,-4.5) {$6$};
\node[draw, circle] (7) at (3,-3) {$7$};

\path[draw,thick,-] (1) -- (2);
\path[draw,thick,-] (1) -- (3);
\path[draw,thick,-] (2) -- (4);
\path[draw,thick,-] (2) -- (5);
\path[draw,thick,-] (5) -- (6);
\path[draw,thick,-] (3) -- (7);
\end{tikzpicture}
\end{center}
\end{samepage}

\index{pre-order}
\index{in-order}
\index{post-order}

The nodes in a binary tree have three natural
orders that correspond to different ways to
recursively traverse the nodes:

\begin{itemize}
\item \key{pre-order}: first process the root,
then traverse the left subtree, then traverse the right subtree
\item \key{in-order}: first traverse the left subtree,
then process the root, then traverse the right subtree
\item \key{post-order}: first traverse the left subtree,
then traverse the right subtree, then process the root
\end{itemize}

For the above tree, the nodes in
pre-order are
$[1,2,4,5,6,3,7]$,
in in-order $[4,2,6,5,1,3,7]$
and in post-order $[4,6,5,2,7,3,1]$.

If we know the pre-order and the in-order
of a tree, we can find out the exact structure of the tree.
For example, the tree above is the only possible tree
with pre-order $[1,2,4,5,6,3,7]$ and
in-order $[4,2,6,5,1,3,7]$.
Correspondingly, the post-order and the in-order
also determine the structure of a tree.

However, the situation is different if we only know
the pre-order and the post-order of a tree.
In this case, there may be more than one tree
that match the orders.
For example, in both of the trees
\begin{center}
\begin{tikzpicture}[scale=0.9]
\node[draw, circle] (1) at (0,0) {$1$};
\node[draw, circle] (2) at (-1.5,-1.5) {$2$};
\path[draw,thick,-] (1) -- (2);

\node[draw, circle] (1b) at (0+4,0) {$1$};
\node[draw, circle] (2b) at (1.5+4,-1.5) {$2$};
\path[draw,thick,-] (1b) -- (2b);
\end{tikzpicture}
\end{center}
the pre-order is $[1,2]$ and the post-order is $[2,1]$
but the trees have different structures.