cphb/chapter04.tex

745 lines
20 KiB
TeX
Raw Normal View History

2016-12-28 23:54:51 +01:00
\chapter{Data structures}
2016-12-31 13:25:58 +01:00
\index{data structure}
A \key{data structure} is a way to store
data in the memory of a computer.
2017-01-30 22:32:12 +01:00
It is important to choose an appropriate
2016-12-31 13:25:58 +01:00
data structure for a problem,
because each data structure has its own
advantages and disadvantages.
The crucial question is: which operations
are efficient in the chosen data structure?
This chapter introduces the most important
data structures in the C++ standard library.
It is a good idea to use the standard library
whenever possible,
because it will save a lot of time.
Later in the book we will learn about more sophisticated
2016-12-31 13:25:58 +01:00
data structures that are not available
in the standard library.
2017-02-20 22:23:10 +01:00
\section{Dynamic arrays}
2016-12-31 13:25:58 +01:00
\index{dynamic array}
\index{vector}
2016-12-28 23:54:51 +01:00
2016-12-31 13:25:58 +01:00
A \key{dynamic array} is an array whose
size can be changed during the execution
2017-01-30 22:32:12 +01:00
of the program.
2016-12-31 13:25:58 +01:00
The most popular dynamic array in C++ is
2017-01-30 22:32:12 +01:00
the \texttt{vector} structure,
which can be used almost like an ordinary array.
2016-12-28 23:54:51 +01:00
2016-12-31 13:25:58 +01:00
The following code creates an empty vector and
adds three elements to it:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
vector<int> v;
v.push_back(3); // [3]
v.push_back(2); // [3,2]
v.push_back(5); // [3,2,5]
\end{lstlisting}
2017-02-13 20:42:16 +01:00
After this, the elements can be accessed like in an ordinary array:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
cout << v[0] << "\n"; // 3
cout << v[1] << "\n"; // 2
cout << v[2] << "\n"; // 5
\end{lstlisting}
2016-12-31 13:25:58 +01:00
The function \texttt{size} returns the number of elements in the vector.
The following code iterates through
the vector and prints all elements in it:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (int i = 0; i < v.size(); i++) {
cout << v[i] << "\n";
}
\end{lstlisting}
\begin{samepage}
A shorter way to iterate through a vector is as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (auto x : v) {
cout << x << "\n";
}
\end{lstlisting}
\end{samepage}
2016-12-31 13:25:58 +01:00
The function \texttt{back} returns the last element
in the vector, and
the function \texttt{pop\_back} removes the last element:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
vector<int> v;
v.push_back(5);
v.push_back(2);
cout << v.back() << "\n"; // 2
v.pop_back();
cout << v.back() << "\n"; // 5
\end{lstlisting}
2016-12-31 13:25:58 +01:00
The following code creates a vector with five elements:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
vector<int> v = {2,4,2,5,1};
\end{lstlisting}
2016-12-31 13:25:58 +01:00
Another way to create a vector is to give the number
of elements and the initial value for each element:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2016-12-31 13:25:58 +01:00
// size 10, initial value 0
2016-12-28 23:54:51 +01:00
vector<int> v(10);
\end{lstlisting}
\begin{lstlisting}
2016-12-31 13:25:58 +01:00
// size 10, initial value 5
2016-12-28 23:54:51 +01:00
vector<int> v(10, 5);
\end{lstlisting}
The internal implementation of a vector
2017-02-13 20:42:16 +01:00
uses an ordinary array.
2016-12-31 13:25:58 +01:00
If the size of the vector increases and
the array becomes too small,
a new array is allocated and all the
2017-01-30 22:32:12 +01:00
elements are moved to the new array.
However, this does not happen often and the
average time complexity of
\texttt{push\_back} is $O(1)$.
2016-12-31 13:25:58 +01:00
\index{string}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
The \texttt{string} structure
is also a dynamic array that can be used almost like a vector.
2016-12-31 13:25:58 +01:00
In addition, there is special syntax for strings
that is not available in other data structures.
Strings can be combined using the \texttt{+} symbol.
The function $\texttt{substr}(k,x)$ returns the substring
2017-02-13 20:42:16 +01:00
that begins at position $k$ and has length $x$,
2017-01-30 22:32:12 +01:00
and the function $\texttt{find}(\texttt{t})$ finds the position
of the first occurrence of a substring \texttt{t}.
2016-12-28 23:54:51 +01:00
2016-12-31 13:25:58 +01:00
The following code presents some string operations:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
string a = "hatti";
string b = a+a;
cout << b << "\n"; // hattihatti
b[5] = 'v';
cout << b << "\n"; // hattivatti
string c = b.substr(3,4);
cout << c << "\n"; // tiva
\end{lstlisting}
2017-02-20 22:23:10 +01:00
\section{Set structures}
2016-12-28 23:54:51 +01:00
2016-12-31 13:25:58 +01:00
\index{set}
2016-12-28 23:54:51 +01:00
2016-12-31 13:25:58 +01:00
A \key{set} is a data structure that
2017-01-30 22:32:12 +01:00
maintains a collection of elements.
2017-02-27 20:29:32 +01:00
The basic operations of sets are element
2016-12-31 13:25:58 +01:00
insertion, search and removal.
The C++ standard library contains two set
2017-03-12 12:42:21 +01:00
implementations:
2016-12-31 13:25:58 +01:00
The structure \texttt{set} is based on a balanced
binary tree and the time complexity of its
operations is $O(\log n)$.
2017-02-27 20:29:32 +01:00
The structure \texttt{unordered\_set} uses hashing,
2016-12-31 13:25:58 +01:00
and the time complexity of its operations is $O(1)$ on average.
The choice of which set implementation to use
2016-12-31 13:25:58 +01:00
is often a matter of taste.
The benefit in the \texttt{set} structure
is that it maintains the order of the elements
and provides functions that are not available
in \texttt{unordered\_set}.
On the other hand, \texttt{unordered\_set} is
often more efficient.
The following code creates a set
that consists of integers,
2017-01-30 22:32:12 +01:00
and shows some of the operations.
2016-12-31 13:25:58 +01:00
The function \texttt{insert} adds an element to the set,
2017-01-30 22:32:12 +01:00
the function \texttt{count} returns the number of occurrences
of an element,
2016-12-31 13:25:58 +01:00
and the function \texttt{erase} removes an element from the set.
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
set<int> s;
s.insert(3);
s.insert(2);
s.insert(5);
cout << s.count(3) << "\n"; // 1
cout << s.count(4) << "\n"; // 0
s.erase(3);
s.insert(4);
cout << s.count(3) << "\n"; // 0
cout << s.count(4) << "\n"; // 1
\end{lstlisting}
2016-12-31 13:25:58 +01:00
A set can be used mostly like a vector,
but it is not possible to access
the elements using the \texttt{[]} notation.
The following code creates a set,
prints the number of elements in it, and then
iterates through all the elements:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
set<int> s = {2,5,6,8};
cout << s.size() << "\n"; // 4
for (auto x : s) {
cout << x << "\n";
}
\end{lstlisting}
2017-02-25 15:51:29 +01:00
An important property of sets is
that all their elements are \emph{distinct}.
2016-12-31 13:25:58 +01:00
Thus, the function \texttt{count} always returns
either 0 (the element is not in the set)
or 1 (the element is in the set),
and the function \texttt{insert} never adds
an element to the set if it is
2017-02-13 20:42:16 +01:00
already there.
2016-12-31 13:25:58 +01:00
The following code illustrates this:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
set<int> s;
s.insert(5);
s.insert(5);
s.insert(5);
cout << s.count(5) << "\n"; // 1
\end{lstlisting}
2017-02-13 20:42:16 +01:00
C++ also contains the structures
2016-12-31 13:25:58 +01:00
\texttt{multiset} and \texttt{unordered\_multiset}
that otherwise work like \texttt{set}
2016-12-31 13:25:58 +01:00
and \texttt{unordered\_set}
2017-01-30 22:32:12 +01:00
but they can contain multiple instances of an element.
For example, in the following code all three instances
2017-02-13 20:42:16 +01:00
of the number 5 are added to a multiset:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
multiset<int> s;
s.insert(5);
s.insert(5);
s.insert(5);
cout << s.count(5) << "\n"; // 3
\end{lstlisting}
2016-12-31 13:25:58 +01:00
The function \texttt{erase} removes
all instances of an element
2017-02-13 20:42:16 +01:00
from a multiset:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
s.erase(5);
cout << s.count(5) << "\n"; // 0
\end{lstlisting}
2016-12-31 13:25:58 +01:00
Often, only one instance should be removed,
which can be done as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
s.erase(s.find(5));
cout << s.count(5) << "\n"; // 2
\end{lstlisting}
2017-02-20 22:23:10 +01:00
\section{Map structures}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
\index{map}
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
A \key{map} is a generalized array
that consists of key-value-pairs.
2017-02-13 20:42:16 +01:00
While the keys in an ordinary array are always
2017-01-30 22:32:12 +01:00
the consecutive integers $0,1,\ldots,n-1$,
2016-12-31 14:31:37 +01:00
where $n$ is the size of the array,
the keys in a map can be of any data type and
2017-01-30 22:32:12 +01:00
they do not have to be consecutive values.
2016-12-31 14:31:37 +01:00
The C++ standard library contains two map
implementations that correspond to the set
implementations: the structure
2016-12-31 14:31:37 +01:00
\texttt{map} is based on a balanced
2017-02-13 20:42:16 +01:00
binary tree and accessing elements
2016-12-31 14:31:37 +01:00
takes $O(\log n)$ time,
while the structure
2017-02-27 20:29:32 +01:00
\texttt{unordered\_map} uses hashing
2017-02-13 20:42:16 +01:00
and accessing elements takes $O(1)$ time on average.
2016-12-31 14:31:37 +01:00
The following code creates a map
where the keys are strings and the values are integers:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
map<string,int> m;
2016-12-31 14:31:37 +01:00
m["monkey"] = 4;
m["banana"] = 3;
m["harpsichord"] = 9;
cout << m["banana"] << "\n"; // 3
2016-12-28 23:54:51 +01:00
\end{lstlisting}
2017-01-30 22:32:12 +01:00
If the value of a key is requested
but the map does not contain it,
2016-12-31 14:31:37 +01:00
the key is automatically added to the map with
a default value.
For example, in the following code,
the key ''aybabtu'' with value 0
is added to the map.
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
map<string,int> m;
cout << m["aybabtu"] << "\n"; // 0
\end{lstlisting}
2017-02-13 20:42:16 +01:00
The function \texttt{count} checks
if a key exists in a map:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
if (m.count("aybabtu")) {
2016-12-31 14:31:37 +01:00
cout << "key exists in the map";
2016-12-28 23:54:51 +01:00
}
\end{lstlisting}
The following code prints all the keys and values
2017-02-13 20:42:16 +01:00
in a map:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (auto x : m) {
cout << x.first << " " << x.second << "\n";
}
\end{lstlisting}
2016-12-31 14:31:37 +01:00
\section{Iterators and ranges}
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
\index{iterator}
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
Many functions in the C++ standard library
2017-01-30 22:32:12 +01:00
operate with iterators.
2016-12-31 14:31:37 +01:00
An \key{iterator} is a variable that points
to an element in a data structure.
2016-12-28 23:54:51 +01:00
The often used iterators \texttt{begin}
and \texttt{end} define a range that contains
2016-12-31 14:31:37 +01:00
all elements in a data structure.
The iterator \texttt{begin} points to
the first element in the data structure,
and the iterator \texttt{end} points to
the position \emph{after} the last element.
The situation looks as follows:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tabular}{llllllllll}
\{ & 3, & 4, & 6, & 8, & 12, & 13, & 14, & 17 & \} \\
& $\uparrow$ & & & & & & & & $\uparrow$ \\
& \multicolumn{3}{l}{\texttt{s.begin()}} & & & & & & \texttt{s.end()} \\
\end{tabular}
\end{center}
2016-12-31 14:31:37 +01:00
Note the asymmetry in the iterators:
\texttt{s.begin()} points to an element in the data structure,
while \texttt{s.end()} points outside the data structure.
Thus, the range defined by the iterators is \emph{half-open}.
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
\subsubsection{Working with ranges}
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
Iterators are used in C++ standard library functions
2017-01-30 22:36:14 +01:00
that are given a range of elements in a data structure.
2016-12-31 14:31:37 +01:00
Usually, we want to process all elements in a
data structure, so the iterators
\texttt{begin} and \texttt{end} are given for the function.
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
For example, the following code sorts a vector
using the function \texttt{sort},
then reverses the order of the elements using the function
\texttt{reverse}, and finally shuffles the order of
the elements using the function \texttt{random\_shuffle}.
2016-12-28 23:54:51 +01:00
\index{sort@\texttt{sort}}
\index{reverse@\texttt{reverse}}
\index{random\_shuffle@\texttt{random\_shuffle}}
\begin{lstlisting}
sort(v.begin(), v.end());
reverse(v.begin(), v.end());
random_shuffle(v.begin(), v.end());
\end{lstlisting}
2017-02-13 20:42:16 +01:00
These functions can also be used with an ordinary array.
2016-12-31 14:31:37 +01:00
In this case, the functions are given pointers to the array
instead of iterators:
2016-12-28 23:54:51 +01:00
2016-12-31 14:31:37 +01:00
\newpage
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
sort(t, t+n);
reverse(t, t+n);
random_shuffle(t, t+n);
\end{lstlisting}
2016-12-31 14:31:37 +01:00
\subsubsection{Set iterators}
2016-12-28 23:54:51 +01:00
2017-02-13 20:42:16 +01:00
Iterators are often used to access
elements of a set.
2016-12-31 14:31:37 +01:00
The following code creates an iterator
\texttt{it} that points to the first element in a set:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
set<int>::iterator it = s.begin();
\end{lstlisting}
2016-12-31 14:31:37 +01:00
A shorter way to write the code is as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
auto it = s.begin();
\end{lstlisting}
2016-12-31 14:31:37 +01:00
The element to which an iterator points
2017-02-27 20:29:32 +01:00
can be accessed using the \texttt{*} symbol.
2016-12-31 14:31:37 +01:00
For example, the following code prints
the first element in the set:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
auto it = s.begin();
cout << *it << "\n";
\end{lstlisting}
2017-01-30 22:32:12 +01:00
Iterators can be moved using the operators
2016-12-31 14:31:37 +01:00
\texttt{++} (forward) and \texttt{---} (backward),
meaning that the iterator moves to the next
or previous element in the set.
2016-12-28 23:54:51 +01:00
The following code prints all the elements in the set:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
for (auto it = s.begin(); it != s.end(); it++) {
cout << *it << "\n";
}
\end{lstlisting}
2016-12-31 14:31:37 +01:00
The following code prints the last element in the set:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
auto it = s.end();
it--;
cout << *it << "\n";
\end{lstlisting}
2016-12-31 14:31:37 +01:00
The function $\texttt{find}(x)$ returns an iterator
that points to an element whose value is $x$.
2017-01-30 22:32:12 +01:00
However, if the set does not contain $x$,
2016-12-31 14:31:37 +01:00
the iterator will be \texttt{end}.
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
auto it = s.find(x);
2016-12-31 14:31:37 +01:00
if (it == s.end()) cout << "x is missing";
2016-12-28 23:54:51 +01:00
\end{lstlisting}
2016-12-31 14:31:37 +01:00
The function $\texttt{lower\_bound}(x)$ returns
2017-02-13 20:42:16 +01:00
an iterator to the smallest element
2017-01-30 22:32:12 +01:00
whose value is \emph{at least} $x$, and
2016-12-31 14:31:37 +01:00
the function $\texttt{upper\_bound}(x)$
returns an iterator to the smallest element
2017-02-13 20:42:16 +01:00
whose value is \emph{larger than} $x$.
2016-12-31 14:31:37 +01:00
If such elements do not exist,
the return value of the functions will be \texttt{end}.
These functions are not supported by the
\texttt{unordered\_set} structure which
2017-01-30 22:32:12 +01:00
does not maintain the order of the elements.
2016-12-28 23:54:51 +01:00
\begin{samepage}
2016-12-31 14:31:37 +01:00
For example, the following code finds the element
nearest to $x$:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
auto a = s.lower_bound(x);
if (a == s.begin() && a == s.end()) {
2017-01-30 22:32:12 +01:00
cout << "the set is empty\n";
2016-12-28 23:54:51 +01:00
} else if (a == s.begin()) {
cout << *a << "\n";
} else if (a == s.end()) {
a--;
cout << *a << "\n";
} else {
auto b = a; b--;
if (x-*b < *a-x) cout << *b << "\n";
else cout << *a << "\n";
}
\end{lstlisting}
2016-12-31 14:31:37 +01:00
The code goes through all possible cases
using the iterator \texttt{a}.
First, the iterator points to the smallest
element whose value is at least $x$.
If \texttt{a} is both \texttt{begin}
and \texttt{end} at the same time, the set is empty.
If \texttt{a} equals \texttt{begin},
the corresponding element is nearest to $x$.
If \texttt{a} equals \texttt{end},
the last element in the set is nearest to $x$.
If none of the previous cases hold,
2016-12-31 14:31:37 +01:00
the element nearest to $x$ is either the
element that corresponds to $a$ or the previous element.
2016-12-28 23:54:51 +01:00
\end{samepage}
2016-12-31 16:36:46 +01:00
\section{Other structures}
2016-12-28 23:54:51 +01:00
2017-02-20 22:23:10 +01:00
\subsubsection{Bitsets}
2016-12-28 23:54:51 +01:00
2016-12-31 14:38:55 +01:00
\index{bitset}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
A \texttt{bitset} is an array
2016-12-31 14:38:55 +01:00
where each element is either 0 or 1.
For example, the following code creates
a bitset that contains 10 elements:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
bitset<10> s;
2017-03-07 16:54:11 +01:00
s[1] = 1;
s[3] = 1;
s[4] = 1;
s[7] = 1;
cout << s[4] << "\n"; // 1
cout << s[5] << "\n"; // 0
2016-12-28 23:54:51 +01:00
\end{lstlisting}
2017-02-13 20:42:16 +01:00
The benefit in using bitsets is that
they require less memory than ordinary arrays,
because each element in a bitset only
2016-12-31 14:38:55 +01:00
uses one bit of memory.
For example,
2017-02-13 20:42:16 +01:00
if $n$ bits are stored in an \texttt{int} array,
2016-12-31 14:38:55 +01:00
$32n$ bits of memory will be used,
but a corresponding bitset only requires $n$ bits of memory.
2017-02-13 20:42:16 +01:00
In addition, the values of a bitset
2016-12-31 14:38:55 +01:00
can be efficiently manipulated using
bit operators, which makes it possible to
2017-02-13 20:42:16 +01:00
optimize algorithms using bit sets.
2016-12-28 23:54:51 +01:00
2017-03-07 16:54:11 +01:00
The following code shows another way to create the above bitset:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
2017-03-07 16:54:11 +01:00
bitset<10> s(string("0010011010")); // from right to left
cout << s[4] << "\n"; // 1
cout << s[5] << "\n"; // 0
2016-12-28 23:54:51 +01:00
\end{lstlisting}
2016-12-31 14:38:55 +01:00
The function \texttt{count} returns the number
of ones in the bitset:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
bitset<10> s(string("0010011010"));
cout << s.count() << "\n"; // 4
\end{lstlisting}
2016-12-31 14:38:55 +01:00
The following code shows examples of using bit operations:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
bitset<10> a(string("0010110110"));
bitset<10> b(string("1011011000"));
cout << (a&b) << "\n"; // 0010010000
cout << (a|b) << "\n"; // 1011111110
cout << (a^b) << "\n"; // 1001101110
\end{lstlisting}
2017-02-20 22:23:10 +01:00
\subsubsection{Deques}
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
\index{deque}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
A \texttt{deque} is a dynamic array
2016-12-31 16:35:06 +01:00
whose size can be changed at both ends of the array.
2017-02-27 20:29:32 +01:00
Like a vector, a deque provides the functions
2016-12-31 16:35:06 +01:00
\texttt{push\_back} and \texttt{pop\_back}, but
2017-02-27 20:29:32 +01:00
it also provides the functions
2016-12-31 16:35:06 +01:00
\texttt{push\_front} and \texttt{pop\_front}
which are not available in a vector.
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
A deque can be used as follows:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
deque<int> d;
d.push_back(5); // [5]
d.push_back(2); // [5,2]
d.push_front(3); // [3,5,2]
d.pop_back(); // [3,5]
d.pop_front(); // [5]
\end{lstlisting}
2016-12-31 16:35:06 +01:00
The internal implementation of a deque
2017-02-13 20:42:16 +01:00
is more complex than that of a vector.
2016-12-31 16:35:06 +01:00
For this reason, a deque is slower than a vector.
Still, the time complexity of adding and removing
elements is $O(1)$ on average at both ends.
2016-12-28 23:54:51 +01:00
2017-02-20 22:23:10 +01:00
\subsubsection{Stacks}
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
\index{stack}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
A \texttt{stack}
2016-12-31 16:35:06 +01:00
is a data structure that provides two
$O(1)$ time operations:
adding an element to the top,
and removing an element from the top.
It is only possible to access the top
element of a stack.
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
The following code shows how a stack can be used:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
stack<int> s;
s.push(3);
s.push(2);
s.push(5);
cout << s.top(); // 5
s.pop();
cout << s.top(); // 2
\end{lstlisting}
2017-02-20 22:23:10 +01:00
\subsubsection{Queues}
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
\index{queue}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
A \texttt{queue} also
2016-12-31 16:35:06 +01:00
provides two $O(1)$ time operations:
adding an element to the end of the queue,
2017-01-30 22:32:12 +01:00
and removing the first element in the queue.
2016-12-31 16:35:06 +01:00
It is only possible to access the first
2017-02-13 20:42:16 +01:00
and last element of a queue.
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
The following code shows how a queue can be used:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
queue<int> s;
s.push(3);
s.push(2);
s.push(5);
cout << s.front(); // 3
s.pop();
cout << s.front(); // 2
\end{lstlisting}
2017-02-20 22:23:10 +01:00
\subsubsection{Priority queues}
2016-12-28 23:54:51 +01:00
2016-12-31 16:35:06 +01:00
\index{priority queue}
\index{heap}
2016-12-28 23:54:51 +01:00
2017-01-30 22:32:12 +01:00
A \texttt{priority\_queue}
2016-12-31 16:35:06 +01:00
maintains a set of elements.
The supported operations are insertion and,
depending on the type of the queue,
retrieval and removal of
2017-02-13 20:42:16 +01:00
either the minimum or maximum element.
2016-12-31 16:35:06 +01:00
The time complexity is $O(\log n)$
for insertion and removal and $O(1)$ for retrieval.
2017-02-27 20:29:32 +01:00
While an ordered set efficiently supports
2016-12-31 16:35:06 +01:00
all the operations of a priority queue,
the benefit in using a priority queue is
that it has smaller constant factors.
A priority queue is usually implemented using
a heap structure that is much simpler than a
balanced binary tree needed for an ordered set.
2016-12-28 23:54:51 +01:00
\begin{samepage}
By default, the elements in the C++
2016-12-31 16:35:06 +01:00
priority queue are sorted in decreasing order,
and it is possible to find and remove the
largest element in the queue.
2017-02-13 20:42:16 +01:00
The following code illustrates this:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
priority_queue<int> q;
q.push(3);
q.push(5);
q.push(7);
q.push(2);
cout << q.top() << "\n"; // 7
q.pop();
cout << q.top() << "\n"; // 5
q.pop();
q.push(6);
cout << q.top() << "\n"; // 6
q.pop();
\end{lstlisting}
\end{samepage}
2017-01-30 22:32:12 +01:00
Using the following declaration,
we can create a priority queue
2017-02-13 20:42:16 +01:00
that allows us to find and remove
the minimum element:
2016-12-28 23:54:51 +01:00
\begin{lstlisting}
priority_queue<int,vector<int>,greater<int>> q;
\end{lstlisting}
2016-12-31 16:35:06 +01:00
\section{Comparison to sorting}
It is often possible to solve a problem
2016-12-31 16:35:06 +01:00
using either data structures or sorting.
Sometimes there are remarkable differences
in the actual efficiency of these approaches,
which may be hidden in their time complexities.
Let us consider a problem where
we are given two lists $A$ and $B$
that both contain $n$ integers.
Our task is to calculate the number of integers
that belong to both of the lists.
For example, for the lists
\[A = [5,2,8,9,4] \hspace{10px} \textrm{and} \hspace{10px} B = [3,2,9,5],\]
the answer is 3 because the numbers 2, 5
and 9 belong to both of the lists.
2017-01-30 22:32:12 +01:00
A straightforward solution to the problem is
2016-12-31 16:35:06 +01:00
to go through all pairs of numbers in $O(n^2)$ time,
but next we will concentrate on
2017-02-13 20:42:16 +01:00
more efficient algorithms.
2016-12-31 16:35:06 +01:00
2017-02-13 20:42:16 +01:00
\subsubsection{Algorithm 1}
2016-12-31 16:35:06 +01:00
2017-02-13 20:42:16 +01:00
We construct a set of the numbers that appear in $A$,
2017-01-30 22:32:12 +01:00
and after this, we iterate through the numbers
2016-12-31 16:35:06 +01:00
in $B$ and check for each number if it
also belongs to $A$.
2017-02-13 20:42:16 +01:00
This is efficient because the numbers of $A$
2016-12-31 16:35:06 +01:00
are in a set.
Using the \texttt{set} structure,
the time complexity of the algorithm is $O(n \log n)$.
2017-02-13 20:42:16 +01:00
\subsubsection{Algorithm 2}
2016-12-31 16:35:06 +01:00
It is not needed to maintain an ordered set,
so instead of the \texttt{set} structure
we can also use the \texttt{unordered\_set} structure.
This is an easy way to make the algorithm
2017-01-30 22:32:12 +01:00
more efficient, because we only have to change
the underlying data structure.
2016-12-31 16:35:06 +01:00
The time complexity of the new algorithm is $O(n)$.
2017-02-13 20:42:16 +01:00
\subsubsection{Algorithm 3}
2016-12-31 16:35:06 +01:00
Instead of data structures, we can use sorting.
First, we sort both lists $A$ and $B$.
After this, we iterate through both the lists
at the same time and find the common elements.
The time complexity of sorting is $O(n \log n)$,
and the rest of the algorithm works in $O(n)$ time,
so the total time complexity is $O(n \log n)$.
\subsubsection{Efficiency comparison}
The following table shows how efficient
the above algorithms are when $n$ varies and
2017-02-13 20:42:16 +01:00
the elements of the lists are random
2016-12-31 16:35:06 +01:00
integers between $1 \ldots 10^9$:
2016-12-28 23:54:51 +01:00
\begin{center}
\begin{tabular}{rrrr}
2017-02-13 20:42:16 +01:00
$n$ & algorithm 1 & algorithm 2 & algorithm 3 \\
2016-12-28 23:54:51 +01:00
\hline
$10^6$ & $1{,}5$ s & $0{,}3$ s & $0{,}2$ s \\
$2 \cdot 10^6$ & $3{,}7$ s & $0{,}8$ s & $0{,}3$ s \\
$3 \cdot 10^6$ & $5{,}7$ s & $1{,}3$ s & $0{,}5$ s \\
$4 \cdot 10^6$ & $7{,}7$ s & $1{,}7$ s & $0{,}7$ s \\
$5 \cdot 10^6$ & $10{,}0$ s & $2{,}3$ s & $0{,}9$ s \\
\end{tabular}
\end{center}
2017-02-25 15:51:29 +01:00
Algorithms 1 and 2 are equal except that
2017-01-30 22:32:12 +01:00
they use different set structures.
In this problem, this choice has an important effect on
2017-02-13 20:42:16 +01:00
the running time, because algorithm 2
is 45 times faster than algorithm 1.
2016-12-31 16:35:06 +01:00
2017-02-13 20:42:16 +01:00
However, the most efficient algorithm is algorithm 3
which uses sorting.
It only uses half the time compared to algorithm 2.
2016-12-31 16:35:06 +01:00
Interestingly, the time complexity of both
2017-02-13 20:42:16 +01:00
algorithm 1 and algorithm 3 is $O(n \log n)$,
but despite this, algorithm 3 is ten times faster.
2017-01-30 22:32:12 +01:00
This can be explained by the fact that
2016-12-31 16:35:06 +01:00
sorting is a simple procedure and it is done
2017-02-13 20:42:16 +01:00
only once at the beginning of algorithm 3,
2016-12-31 16:35:06 +01:00
and the rest of the algorithm works in linear time.
On the other hand,
2017-02-28 20:14:06 +01:00
algorithm 1 maintains a complex balanced binary tree
during the whole algorithm.