2016-12-28 23:54:51 +01:00
|
|
|
|
\chapter{Data structures}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
\index{data structure}
|
|
|
|
|
|
|
|
|
|
A \key{data structure} is a way to store
|
|
|
|
|
data in the memory of the computer.
|
|
|
|
|
It is important to choose a suitable
|
|
|
|
|
data structure for a problem,
|
|
|
|
|
because each data structure has its own
|
|
|
|
|
advantages and disadvantages.
|
|
|
|
|
The crucial question is: which operations
|
|
|
|
|
are efficient in the chosen data structure?
|
|
|
|
|
|
|
|
|
|
This chapter introduces the most important
|
|
|
|
|
data structures in the C++ standard library.
|
|
|
|
|
It is a good idea to use the standard library
|
|
|
|
|
whenever possible,
|
|
|
|
|
because it will save a lot of time.
|
|
|
|
|
Later in the book we will learn more sophisticated
|
|
|
|
|
data structures that are not available
|
|
|
|
|
in the standard library.
|
|
|
|
|
|
|
|
|
|
\section{Dynamic array}
|
|
|
|
|
|
|
|
|
|
\index{dynamic array}
|
|
|
|
|
\index{vector}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\index{vector@\texttt{vector}}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
A \key{dynamic array} is an array whose
|
|
|
|
|
size can be changed during the execution
|
|
|
|
|
of the code.
|
|
|
|
|
The most popular dynamic array in C++ is
|
|
|
|
|
the \key{vector} structure (\texttt{vector}),
|
|
|
|
|
that can be used almost like a regular array.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The following code creates an empty vector and
|
|
|
|
|
adds three elements to it:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
vector<int> v;
|
|
|
|
|
v.push_back(3); // [3]
|
|
|
|
|
v.push_back(2); // [3,2]
|
|
|
|
|
v.push_back(5); // [3,2,5]
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
After this, the elements can be accessed like in a regular array:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
cout << v[0] << "\n"; // 3
|
|
|
|
|
cout << v[1] << "\n"; // 2
|
|
|
|
|
cout << v[2] << "\n"; // 5
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The function \texttt{size} returns the number of elements in the vector.
|
|
|
|
|
The following code iterates through
|
|
|
|
|
the vector and prints all elements in it:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
for (int i = 0; i < v.size(); i++) {
|
|
|
|
|
cout << v[i] << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
|
|
\begin{samepage}
|
2016-12-31 13:25:58 +01:00
|
|
|
|
A shorter way to iterate trough a vector is as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
for (auto x : v) {
|
|
|
|
|
cout << x << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
\end{samepage}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The function \texttt{back} returns the last element
|
|
|
|
|
in the vector, and
|
|
|
|
|
the function \texttt{pop\_back} removes the last element:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
vector<int> v;
|
|
|
|
|
v.push_back(5);
|
|
|
|
|
v.push_back(2);
|
|
|
|
|
cout << v.back() << "\n"; // 2
|
|
|
|
|
v.pop_back();
|
|
|
|
|
cout << v.back() << "\n"; // 5
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The following code creates a vector with five elements:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
vector<int> v = {2,4,2,5,1};
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
Another way to create a vector is to give the number
|
|
|
|
|
of elements and the initial value for each element:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
2016-12-31 13:25:58 +01:00
|
|
|
|
// size 10, initial value 0
|
2016-12-28 23:54:51 +01:00
|
|
|
|
vector<int> v(10);
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
\begin{lstlisting}
|
2016-12-31 13:25:58 +01:00
|
|
|
|
// size 10, initial value 5
|
2016-12-28 23:54:51 +01:00
|
|
|
|
vector<int> v(10, 5);
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The internal implementation of the vector
|
|
|
|
|
uses a regular array.
|
|
|
|
|
If the size of the vector increases and
|
|
|
|
|
the array becomes too small,
|
|
|
|
|
a new array is allocated and all the
|
|
|
|
|
elements are copied to the new array.
|
|
|
|
|
However, this doesn't happen often and the
|
|
|
|
|
time complexity of
|
|
|
|
|
\texttt{push\_back} is $O(1)$ on average.
|
|
|
|
|
|
|
|
|
|
\index{string}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\index{string@\texttt{string}}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
Also the \key{string} structure (\texttt{string})
|
|
|
|
|
is a dynamic array that can be used almost like a vector.
|
|
|
|
|
In addition, there is special syntax for strings
|
|
|
|
|
that is not available in other data structures.
|
|
|
|
|
Strings can be combined using the \texttt{+} symbol.
|
|
|
|
|
The function $\texttt{substr}(k,x)$ returns the substring
|
|
|
|
|
that begins at index $k$ and has length $x$.
|
|
|
|
|
The function $\texttt{find}(\texttt{t})$ finds the position
|
|
|
|
|
where a substring \texttt{t} appears in the string.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The following code presents some string operations:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
string a = "hatti";
|
|
|
|
|
string b = a+a;
|
|
|
|
|
cout << b << "\n"; // hattihatti
|
|
|
|
|
b[5] = 'v';
|
|
|
|
|
cout << b << "\n"; // hattivatti
|
|
|
|
|
string c = b.substr(3,4);
|
|
|
|
|
cout << c << "\n"; // tiva
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
\section{Set structure}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
\index{set}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\index{set@\texttt{set}}
|
|
|
|
|
\index{unordered\_set@\texttt{unordered\_set}}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
A \key{set} is a data structure that
|
|
|
|
|
contains a collection of elements.
|
|
|
|
|
The basic operations in a set are element
|
|
|
|
|
insertion, search and removal.
|
|
|
|
|
|
|
|
|
|
C++ contains two set implementations:
|
|
|
|
|
\texttt{set} and \texttt{unordered\_set}.
|
|
|
|
|
The structure \texttt{set} is based on a balanced
|
|
|
|
|
binary tree and the time complexity of its
|
|
|
|
|
operations is $O(\log n)$.
|
|
|
|
|
The structure \texttt{unordered\_set} uses a hash table,
|
|
|
|
|
and the time complexity of its operations is $O(1)$ on average.
|
|
|
|
|
|
|
|
|
|
The choice which set implementation to use
|
|
|
|
|
is often a matter of taste.
|
|
|
|
|
The benefit in the \texttt{set} structure
|
|
|
|
|
is that it maintains the order of the elements
|
|
|
|
|
and provides functions that are not available
|
|
|
|
|
in \texttt{unordered\_set}.
|
|
|
|
|
On the other hand, \texttt{unordered\_set} is
|
|
|
|
|
often more efficient.
|
|
|
|
|
|
|
|
|
|
The following code creates a set
|
|
|
|
|
that consists of integers,
|
|
|
|
|
and shows how to use it.
|
|
|
|
|
The function \texttt{insert} adds an element to the set,
|
|
|
|
|
the function \texttt{count} returns how many times an
|
|
|
|
|
element appears in the set,
|
|
|
|
|
and the function \texttt{erase} removes an element from the set.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
set<int> s;
|
|
|
|
|
s.insert(3);
|
|
|
|
|
s.insert(2);
|
|
|
|
|
s.insert(5);
|
|
|
|
|
cout << s.count(3) << "\n"; // 1
|
|
|
|
|
cout << s.count(4) << "\n"; // 0
|
|
|
|
|
s.erase(3);
|
|
|
|
|
s.insert(4);
|
|
|
|
|
cout << s.count(3) << "\n"; // 0
|
|
|
|
|
cout << s.count(4) << "\n"; // 1
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
A set can be used mostly like a vector,
|
|
|
|
|
but it is not possible to access
|
|
|
|
|
the elements using the \texttt{[]} notation.
|
|
|
|
|
The following code creates a set,
|
|
|
|
|
prints the number of elements in it, and then
|
|
|
|
|
iterates through all the elements:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
set<int> s = {2,5,6,8};
|
|
|
|
|
cout << s.size() << "\n"; // 4
|
|
|
|
|
for (auto x : s) {
|
|
|
|
|
cout << x << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
An important property of a set is
|
|
|
|
|
that all the elements are distinct.
|
|
|
|
|
Thus, the function \texttt{count} always returns
|
|
|
|
|
either 0 (the element is not in the set)
|
|
|
|
|
or 1 (the element is in the set),
|
|
|
|
|
and the function \texttt{insert} never adds
|
|
|
|
|
an element to the set if it is
|
|
|
|
|
already in the set.
|
|
|
|
|
The following code illustrates this:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
set<int> s;
|
|
|
|
|
s.insert(5);
|
|
|
|
|
s.insert(5);
|
|
|
|
|
s.insert(5);
|
|
|
|
|
cout << s.count(5) << "\n"; // 1
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
|
|
\index{multiset@\texttt{multiset}}
|
|
|
|
|
\index{unordered\_multiset@\texttt{unordered\_multiset}}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
C++ also contains the structures
|
|
|
|
|
\texttt{multiset} and \texttt{unordered\_multiset}
|
|
|
|
|
that work otherwise like \texttt{set}
|
|
|
|
|
and \texttt{unordered\_set}
|
|
|
|
|
but they can contain multiple copies of an element.
|
|
|
|
|
For example, in the following code all copies
|
|
|
|
|
of the number 5 are added to the set:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
multiset<int> s;
|
|
|
|
|
s.insert(5);
|
|
|
|
|
s.insert(5);
|
|
|
|
|
s.insert(5);
|
|
|
|
|
cout << s.count(5) << "\n"; // 3
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 13:25:58 +01:00
|
|
|
|
The function \texttt{erase} removes
|
|
|
|
|
all instances of an element
|
|
|
|
|
from a \texttt{multiset}:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
s.erase(5);
|
|
|
|
|
cout << s.count(5) << "\n"; // 0
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 13:25:58 +01:00
|
|
|
|
Often, only one instance should be removed,
|
|
|
|
|
which can be done as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
s.erase(s.find(5));
|
|
|
|
|
cout << s.count(5) << "\n"; // 2
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 13:25:58 +01:00
|
|
|
|
\section{Map structure}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\index{hakemisto@hakemisto}
|
|
|
|
|
\index{map@\texttt{map}}
|
|
|
|
|
\index{unordered\_map@\texttt{unordered\_map}}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
A \key{map} is a generalized array
|
|
|
|
|
that consists of key-value-pairs.
|
|
|
|
|
While the keys in a regular array are always
|
|
|
|
|
the successive integers $0,1,\ldots,n-1$,
|
|
|
|
|
where $n$ is the size of the array,
|
|
|
|
|
the keys in a map can be of any data type and
|
|
|
|
|
they don't have to be successive values.
|
|
|
|
|
|
|
|
|
|
C++ contains two map implementations that
|
|
|
|
|
correspond to the set implementations:
|
|
|
|
|
the structure
|
|
|
|
|
\texttt{map} is based on a balanced
|
|
|
|
|
binary tree and accessing an element
|
|
|
|
|
takes $O(\log n)$ time,
|
|
|
|
|
while the structure
|
|
|
|
|
\texttt{unordered\_map} uses a hash map
|
|
|
|
|
and accessing an element takes $O(1)$ time on average.
|
|
|
|
|
|
|
|
|
|
The following code creates a map
|
|
|
|
|
where the keys are strings and the values are integers:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
map<string,int> m;
|
2016-12-31 14:31:37 +01:00
|
|
|
|
m["monkey"] = 4;
|
|
|
|
|
m["banana"] = 3;
|
|
|
|
|
m["harpsichord"] = 9;
|
|
|
|
|
cout << m["banana"] << "\n"; // 3
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
If a value of a key is requested
|
|
|
|
|
but the map doesn't contain it,
|
|
|
|
|
the key is automatically added to the map with
|
|
|
|
|
a default value.
|
|
|
|
|
For example, in the following code,
|
|
|
|
|
the key ''aybabtu'' with value 0
|
|
|
|
|
is added to the map.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
map<string,int> m;
|
|
|
|
|
cout << m["aybabtu"] << "\n"; // 0
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The function \texttt{count} determines
|
|
|
|
|
if a key exists in the map:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
if (m.count("aybabtu")) {
|
2016-12-31 14:31:37 +01:00
|
|
|
|
cout << "key exists in the map";
|
2016-12-28 23:54:51 +01:00
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The following code prints all keys and values
|
|
|
|
|
in the map:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
for (auto x : m) {
|
|
|
|
|
cout << x.first << " " << x.second << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
\section{Iterators and ranges}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
\index{iterator}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Many functions in the C++ standard library
|
|
|
|
|
are given iterators to data structures,
|
|
|
|
|
and iterators often correspond to ranges.
|
|
|
|
|
An \key{iterator} is a variable that points
|
|
|
|
|
to an element in a data structure.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Often used iterators are \texttt{begin}
|
|
|
|
|
and \texttt{end} that define a range that contains
|
|
|
|
|
all elements in a data structure.
|
|
|
|
|
The iterator \texttt{begin} points to
|
|
|
|
|
the first element in the data structure,
|
|
|
|
|
and the iterator \texttt{end} points to
|
|
|
|
|
the position \emph{after} the last element.
|
|
|
|
|
The situation looks as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
|
\begin{tabular}{llllllllll}
|
|
|
|
|
\{ & 3, & 4, & 6, & 8, & 12, & 13, & 14, & 17 & \} \\
|
|
|
|
|
& $\uparrow$ & & & & & & & & $\uparrow$ \\
|
|
|
|
|
& \multicolumn{3}{l}{\texttt{s.begin()}} & & & & & & \texttt{s.end()} \\
|
|
|
|
|
\end{tabular}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Note the asymmetry in the iterators:
|
|
|
|
|
\texttt{s.begin()} points to an element in the data structure,
|
|
|
|
|
while \texttt{s.end()} points outside the data structure.
|
|
|
|
|
Thus, the range defined by the iterators is \emph{half-open}.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
\subsubsection{Handling ranges}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Iterators are used in C++ standard library functions
|
|
|
|
|
that work with ranges of data structures.
|
|
|
|
|
Usually, we want to process all elements in a
|
|
|
|
|
data structure, so the iterators
|
|
|
|
|
\texttt{begin} and \texttt{end} are given for the function.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
For example, the following code sorts a vector
|
|
|
|
|
using the function \texttt{sort},
|
|
|
|
|
then reverses the order of the elements using the function
|
|
|
|
|
\texttt{reverse}, and finally shuffles the order of
|
|
|
|
|
the elements using the function \texttt{random\_shuffle}.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\index{sort@\texttt{sort}}
|
|
|
|
|
\index{reverse@\texttt{reverse}}
|
|
|
|
|
\index{random\_shuffle@\texttt{random\_shuffle}}
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
sort(v.begin(), v.end());
|
|
|
|
|
reverse(v.begin(), v.end());
|
|
|
|
|
random_shuffle(v.begin(), v.end());
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
These functions can also be used with a regular array.
|
|
|
|
|
In this case, the functions are given pointers to the array
|
|
|
|
|
instead of iterators:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
\newpage
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
sort(t, t+n);
|
|
|
|
|
reverse(t, t+n);
|
|
|
|
|
random_shuffle(t, t+n);
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
\subsubsection{Set iterators}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Iterators are often used when accessing
|
|
|
|
|
elements in a set.
|
|
|
|
|
The following code creates an iterator
|
|
|
|
|
\texttt{it} that points to the first element in the set:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
set<int>::iterator it = s.begin();
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
A shorter way to write the code is as follows:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
auto it = s.begin();
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The element to which an iterator points
|
|
|
|
|
can be accessed through the \texttt{*} symbol.
|
|
|
|
|
For example, the following code prints
|
|
|
|
|
the first element in the set:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
auto it = s.begin();
|
|
|
|
|
cout << *it << "\n";
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
Iterators can be moved using operators
|
|
|
|
|
\texttt{++} (forward) and \texttt{---} (backward),
|
|
|
|
|
meaning that the iterator moves to the next
|
|
|
|
|
or previous element in the set.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The following code prints all elements in the set:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
for (auto it = s.begin(); it != s.end(); it++) {
|
|
|
|
|
cout << *it << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The following code prints the last element in the set:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
auto it = s.end();
|
|
|
|
|
it--;
|
|
|
|
|
cout << *it << "\n";
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The function $\texttt{find}(x)$ returns an iterator
|
|
|
|
|
that points to an element whose value is $x$.
|
|
|
|
|
However, if the set doesn't contain $x$,
|
|
|
|
|
the iterator will be \texttt{end}.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
auto it = s.find(x);
|
2016-12-31 14:31:37 +01:00
|
|
|
|
if (it == s.end()) cout << "x is missing";
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The function $\texttt{lower\_bound}(x)$ returns
|
|
|
|
|
an iterator to the smallest element in the set
|
|
|
|
|
whose value is at least $x$.
|
|
|
|
|
Correspondingly,
|
|
|
|
|
the function $\texttt{upper\_bound}(x)$
|
|
|
|
|
returns an iterator to the smallest element
|
|
|
|
|
in the set whose value is larger than $x$.
|
|
|
|
|
If such elements do not exist,
|
|
|
|
|
the return value of the functions will be \texttt{end}.
|
|
|
|
|
These functions are not supported by the
|
|
|
|
|
\texttt{unordered\_set} structure that
|
|
|
|
|
doesn't maintain the order of the elements.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{samepage}
|
2016-12-31 14:31:37 +01:00
|
|
|
|
For example, the following code finds the element
|
|
|
|
|
nearest to $x$:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
auto a = s.lower_bound(x);
|
|
|
|
|
if (a == s.begin() && a == s.end()) {
|
|
|
|
|
cout << "joukko on tyhjä\n";
|
|
|
|
|
} else if (a == s.begin()) {
|
|
|
|
|
cout << *a << "\n";
|
|
|
|
|
} else if (a == s.end()) {
|
|
|
|
|
a--;
|
|
|
|
|
cout << *a << "\n";
|
|
|
|
|
} else {
|
|
|
|
|
auto b = a; b--;
|
|
|
|
|
if (x-*b < *a-x) cout << *b << "\n";
|
|
|
|
|
else cout << *a << "\n";
|
|
|
|
|
}
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:31:37 +01:00
|
|
|
|
The code goes through all possible cases
|
|
|
|
|
using the iterator \texttt{a}.
|
|
|
|
|
First, the iterator points to the smallest
|
|
|
|
|
element whose value is at least $x$.
|
|
|
|
|
If \texttt{a} is both \texttt{begin}
|
|
|
|
|
and \texttt{end} at the same time, the set is empty.
|
|
|
|
|
If \texttt{a} equals \texttt{begin},
|
|
|
|
|
the corresponding element is nearest to $x$.
|
|
|
|
|
If \texttt{a} equals \texttt{end},
|
|
|
|
|
the last element in the set is nearest to $x$.
|
|
|
|
|
If none of the previous cases is true,
|
|
|
|
|
the element nearest to $x$ is either the
|
|
|
|
|
element that corresponds to $a$ or the previous element.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\end{samepage}
|
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
\section{Other data structures}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
\subsubsection{Bitset}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
\index{bitset}
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\index{bitset@\texttt{bitset}}
|
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
A \key{bitset} (\texttt{bitset}) is an array
|
|
|
|
|
where each element is either 0 or 1.
|
|
|
|
|
For example, the following code creates
|
|
|
|
|
a bitset that contains 10 elements:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
bitset<10> s;
|
|
|
|
|
s[2] = 1;
|
|
|
|
|
s[5] = 1;
|
|
|
|
|
s[6] = 1;
|
|
|
|
|
s[8] = 1;
|
|
|
|
|
cout << s[4] << "\n"; // 0
|
|
|
|
|
cout << s[5] << "\n"; // 1
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
The benefit in using a bitset is that
|
|
|
|
|
it requires less memory than a regular array,
|
|
|
|
|
because each element in the bitset only
|
|
|
|
|
uses one bit of memory.
|
|
|
|
|
For example,
|
|
|
|
|
if $n$ bits are stored as an \texttt{int} array,
|
|
|
|
|
$32n$ bits of memory will be used,
|
|
|
|
|
but a corresponding bitset only requires $n$ bits of memory.
|
|
|
|
|
In addition, the values in a bitset
|
|
|
|
|
can be efficiently manipulated using
|
|
|
|
|
bit operators, which makes it possible to
|
|
|
|
|
optimize algorithms.
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
The following code shows another way to create a bitset:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
bitset<10> s(string("0010011010"));
|
|
|
|
|
cout << s[4] << "\n"; // 0
|
|
|
|
|
cout << s[5] << "\n"; // 1
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
The function \texttt{count} returns the number
|
|
|
|
|
of ones in the bitset:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
bitset<10> s(string("0010011010"));
|
|
|
|
|
cout << s.count() << "\n"; // 4
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
2016-12-31 14:38:55 +01:00
|
|
|
|
The following code shows examples of using bit operations:
|
2016-12-28 23:54:51 +01:00
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
bitset<10> a(string("0010110110"));
|
|
|
|
|
bitset<10> b(string("1011011000"));
|
|
|
|
|
cout << (a&b) << "\n"; // 0010010000
|
|
|
|
|
cout << (a|b) << "\n"; // 1011111110
|
|
|
|
|
cout << (a^b) << "\n"; // 1001101110
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
|
|
\subsubsection{Pakka}
|
|
|
|
|
|
|
|
|
|
\index{pakka@pakka}
|
|
|
|
|
\index{deque@\texttt{deque}}
|
|
|
|
|
|
|
|
|
|
\key{Pakka} (\texttt{deque}) on dynaaminen taulukko,
|
|
|
|
|
jonka kokoa pystyy muuttamaan tehokkaasti
|
|
|
|
|
sekä alku- että loppupäässä.
|
|
|
|
|
Pakka sisältää vektorin tavoin
|
|
|
|
|
funktiot \texttt{push\_back}
|
|
|
|
|
ja \texttt{pop\_back}, mutta siinä on lisäksi myös funktiot
|
|
|
|
|
\texttt{push\_front} ja \texttt{pop\_front},
|
|
|
|
|
jotka käsittelevät taulukon alkua.
|
|
|
|
|
|
|
|
|
|
Seuraava koodi esittelee pakan käyttämistä:
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
deque<int> d;
|
|
|
|
|
d.push_back(5); // [5]
|
|
|
|
|
d.push_back(2); // [5,2]
|
|
|
|
|
d.push_front(3); // [3,5,2]
|
|
|
|
|
d.pop_back(); // [3,5]
|
|
|
|
|
d.pop_front(); // [5]
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
|
|
Pakan sisäinen toteutus on monimutkaisempi kuin
|
|
|
|
|
vektorissa, minkä vuoksi se on
|
|
|
|
|
vektoria raskaampi rakenne.
|
|
|
|
|
Kuitenkin lisäyksen ja poiston
|
|
|
|
|
aikavaativuus on keskimäärin $O(1)$ molemmissa päissä.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Pino}
|
|
|
|
|
|
|
|
|
|
\index{pino@pino}
|
|
|
|
|
\index{stack@\texttt{stack}}
|
|
|
|
|
|
|
|
|
|
\key{Pino} (\texttt{stack}) on tietorakenne,
|
|
|
|
|
joka tarjoaa kaksi $O(1)$-aikaista
|
|
|
|
|
operaatiota:
|
|
|
|
|
alkion lisäys pinon päälle ja alkion
|
|
|
|
|
poisto pinon päältä.
|
|
|
|
|
Pinossa ei ole mahdollista käsitellä muita
|
|
|
|
|
alkioita kuin pinon päällimmäistä alkiota.
|
|
|
|
|
|
|
|
|
|
Seuraava koodi esittelee pinon käyttämistä:
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
stack<int> s;
|
|
|
|
|
s.push(3);
|
|
|
|
|
s.push(2);
|
|
|
|
|
s.push(5);
|
|
|
|
|
cout << s.top(); // 5
|
|
|
|
|
s.pop();
|
|
|
|
|
cout << s.top(); // 2
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
\subsubsection{Jono}
|
|
|
|
|
|
|
|
|
|
\index{jono@jono}
|
|
|
|
|
\index{queue@\texttt{queue}}
|
|
|
|
|
|
|
|
|
|
\key{Jono} (\texttt{queue}) on kuin pino,
|
|
|
|
|
mutta alkion lisäys tapahtuu jonon loppuun
|
|
|
|
|
ja alkion poisto tapahtuu jonon alusta.
|
|
|
|
|
Jonossa on mahdollista käsitellä vain
|
|
|
|
|
alussa ja lopussa olevaa alkiota.
|
|
|
|
|
|
|
|
|
|
Seuraava koodi esittelee jonon käyttämistä:
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
queue<int> s;
|
|
|
|
|
s.push(3);
|
|
|
|
|
s.push(2);
|
|
|
|
|
s.push(5);
|
|
|
|
|
cout << s.front(); // 3
|
|
|
|
|
s.pop();
|
|
|
|
|
cout << s.front(); // 2
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
%
|
|
|
|
|
% Huomaa, että rakenteiden \texttt{stack} ja \texttt{queue}
|
|
|
|
|
% sijasta voi aina käyttää rakenteita
|
|
|
|
|
% \texttt{vector} ja \texttt{deque}, joilla voi
|
|
|
|
|
% tehdä kaiken saman ja enemmän.
|
|
|
|
|
% Kuitenkin \texttt{stack} ja \texttt{queue} ovat
|
|
|
|
|
% kevyempiä ja hieman tehokkaampia rakenteita,
|
|
|
|
|
% jos niiden operaatiot riittävät algoritmin toteuttamiseen.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Prioriteettijono}
|
|
|
|
|
|
|
|
|
|
\index{prioriteettijono@prioriteettijono}
|
|
|
|
|
\index{keko@keko}
|
|
|
|
|
\index{priority\_queue@\texttt{priority\_queue}}
|
|
|
|
|
|
|
|
|
|
\key{Prioriteettijono} (\texttt{priority\_queue})
|
|
|
|
|
pitää yllä joukkoa alkioista.
|
|
|
|
|
Sen operaatiot ovat alkion lisäys ja
|
|
|
|
|
jonon tyypistä riippuen joko
|
|
|
|
|
pienimmän alkion haku ja poisto tai
|
|
|
|
|
suurimman alkion haku ja poisto.
|
|
|
|
|
Lisäyksen ja poiston aikavaativuus on $O(\log n)$
|
|
|
|
|
ja haun aikavaativuus on $O(1)$.
|
|
|
|
|
|
|
|
|
|
Vaikka prioriteettijonon operaatiot
|
|
|
|
|
pystyy toteuttamaan myös \texttt{set}-ra\-ken\-teel\-la,
|
|
|
|
|
prioriteettijonon etuna on,
|
|
|
|
|
että sen kekoon perustuva sisäinen
|
|
|
|
|
toteutus on yksinkertaisempi
|
|
|
|
|
kuin \texttt{set}-rakenteen tasapainoinen binääripuu,
|
|
|
|
|
minkä vuoksi rakenne on kevyempi ja
|
|
|
|
|
operaatiot ovat tehokkaampia.
|
|
|
|
|
|
|
|
|
|
\begin{samepage}
|
|
|
|
|
C++:n prioriteettijono toimii oletuksena niin,
|
|
|
|
|
että alkiot ovat järjestyksessä suurimmasta pienimpään
|
|
|
|
|
ja jonosta pystyy hakemaan ja poistamaan suurimman alkion.
|
|
|
|
|
Seuraava koodi esittelee prioriteettijonon käyttämistä:
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
priority_queue<int> q;
|
|
|
|
|
q.push(3);
|
|
|
|
|
q.push(5);
|
|
|
|
|
q.push(7);
|
|
|
|
|
q.push(2);
|
|
|
|
|
cout << q.top() << "\n"; // 7
|
|
|
|
|
q.pop();
|
|
|
|
|
cout << q.top() << "\n"; // 5
|
|
|
|
|
q.pop();
|
|
|
|
|
q.push(6);
|
|
|
|
|
cout << q.top() << "\n"; // 6
|
|
|
|
|
q.pop();
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
\end{samepage}
|
|
|
|
|
|
|
|
|
|
Seuraava määrittely luo käänteisen prioriteettijonon,
|
|
|
|
|
jossa jonosta pystyy hakemaan ja poistamaan pienimmän alkion:
|
|
|
|
|
|
|
|
|
|
\begin{lstlisting}
|
|
|
|
|
priority_queue<int,vector<int>,greater<int>> q;
|
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
|
|
\section{Vertailu järjestämiseen}
|
|
|
|
|
|
|
|
|
|
Monen tehtävän voi ratkaista tehokkaasti joko
|
|
|
|
|
käyttäen sopivia tietorakenteita
|
|
|
|
|
tai taulukon järjestämistä.
|
|
|
|
|
Vaikka erilaiset ratkaisutavat olisivat kaikki
|
|
|
|
|
periaatteessa tehokkaita, niissä voi olla
|
|
|
|
|
käytännössä merkittäviä eroja.
|
|
|
|
|
|
|
|
|
|
Tarkastellaan ongelmaa, jossa
|
|
|
|
|
annettuna on kaksi listaa $A$ ja $B$,
|
|
|
|
|
joista kummassakin on $n$ kokonaislukua.
|
|
|
|
|
Tehtävänä on selvittää, moniko luku
|
|
|
|
|
esiintyy kummassakin listassa.
|
|
|
|
|
Esimerkiksi jos listat ovat
|
|
|
|
|
\[A = [5,2,8,9,4] \hspace{10px} \textrm{ja} \hspace{10px} B = [3,2,9,5],\]
|
|
|
|
|
niin vastaus on 3, koska luvut 2, 5
|
|
|
|
|
ja 9 esiintyvät kummassakin listassa.
|
|
|
|
|
Suoraviivainen ratkaisu tehtävään on käydä läpi
|
|
|
|
|
kaikki lukuparit ajassa $O(n^2)$, mutta seuraavaksi
|
|
|
|
|
keskitymme tehokkaampiin ratkaisuihin.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Ratkaisu 1}
|
|
|
|
|
|
|
|
|
|
Tallennetaan listan $A$ luvut joukkoon
|
|
|
|
|
ja käydään sitten läpi listan $B$ luvut ja
|
|
|
|
|
tarkistetaan jokaisesta, esiintyykö se myös listassa $A$.
|
|
|
|
|
Joukon ansiosta on tehokasta tarkastaa,
|
|
|
|
|
esiintyykö listan $B$ luku listassa $A$.
|
|
|
|
|
Kun joukko toteutetaan \texttt{set}-rakenteella,
|
|
|
|
|
algoritmin aikavaativuus on $O(n \log n)$.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Ratkaisu 2}
|
|
|
|
|
|
|
|
|
|
Joukon ei tarvitse säilyttää lukuja
|
|
|
|
|
järjestyksessä, joten
|
|
|
|
|
\texttt{set}-ra\-ken\-teen sijasta voi
|
|
|
|
|
käyttää myös \texttt{unordered\_set}-ra\-ken\-net\-ta.
|
|
|
|
|
Tämä on helppo tapa parantaa algoritmin
|
|
|
|
|
tehokkuutta, koska
|
|
|
|
|
algoritmin toteutus säilyy samana ja vain tietorakenne vaihtuu.
|
|
|
|
|
Uuden algoritmin aikavaativuus on $O(n)$.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Ratkaisu 3}
|
|
|
|
|
|
|
|
|
|
Tietorakenteiden sijasta voimme käyttää järjestämistä.
|
|
|
|
|
Järjestetään ensin listat $A$ ja $B$,
|
|
|
|
|
minkä jälkeen yhteiset luvut voi löytää
|
|
|
|
|
käymällä listat rinnakkain läpi.
|
|
|
|
|
Järjestämisen aikavaativuus on $O(n \log n)$ ja
|
|
|
|
|
läpikäynnin aikavaativuus on $O(n)$,
|
|
|
|
|
joten kokonaisaikavaativuus on $O(n \log n)$.
|
|
|
|
|
|
|
|
|
|
\subsubsection{Tehokkuusvertailu}
|
|
|
|
|
|
|
|
|
|
Seuraavassa taulukossa on mittaustuloksia
|
|
|
|
|
äskeisten algoritmien tehokkuudesta,
|
|
|
|
|
kun $n$ vaihtelee ja listojen luvut ovat
|
|
|
|
|
satunnaisia lukuja välillä $1 \ldots 10^9$:
|
|
|
|
|
|
|
|
|
|
\begin{center}
|
|
|
|
|
\begin{tabular}{rrrr}
|
|
|
|
|
$n$ & ratkaisu 1 & ratkaisu 2 & ratkaisu 3 \\
|
|
|
|
|
\hline
|
|
|
|
|
$10^6$ & $1{,}5$ s & $0{,}3$ s & $0{,}2$ s \\
|
|
|
|
|
$2 \cdot 10^6$ & $3{,}7$ s & $0{,}8$ s & $0{,}3$ s \\
|
|
|
|
|
$3 \cdot 10^6$ & $5{,}7$ s & $1{,}3$ s & $0{,}5$ s \\
|
|
|
|
|
$4 \cdot 10^6$ & $7{,}7$ s & $1{,}7$ s & $0{,}7$ s \\
|
|
|
|
|
$5 \cdot 10^6$ & $10{,}0$ s & $2{,}3$ s & $0{,}9$ s \\
|
|
|
|
|
\end{tabular}
|
|
|
|
|
\end{center}
|
|
|
|
|
|
|
|
|
|
Ratkaisut 1 ja 2 ovat muuten samanlaisia,
|
|
|
|
|
mutta ratkaisu 1 käyttää \texttt{set}-rakennetta,
|
|
|
|
|
kun taas ratkaisu 2 käyttää
|
|
|
|
|
\texttt{unordered\_set}-rakennetta.
|
|
|
|
|
Tässä tapauksessa tällä valinnalla on
|
|
|
|
|
merkittävä vaikutus suoritusaikaan,
|
|
|
|
|
koska ratkaisu 2 on 4–5 kertaa
|
|
|
|
|
nopeampi kuin ratkaisu 1.
|
|
|
|
|
|
|
|
|
|
Tehokkain ratkaisu on kuitenkin järjestämistä
|
|
|
|
|
käyttävä ratkaisu 3, joka on vielä puolet
|
|
|
|
|
nopeampi kuin ratkaisu 2.
|
|
|
|
|
Kiinnostavaa on, että sekä ratkaisun 1 että
|
|
|
|
|
ratkaisun 3 aikavaativuus on $O(n \log n)$,
|
|
|
|
|
mutta siitä huolimatta
|
|
|
|
|
ratkaisu 3 vie aikaa vain kymmenesosan.
|
|
|
|
|
Tämän voi selittää sillä, että
|
|
|
|
|
järjestäminen on kevyt
|
|
|
|
|
operaatio ja se täytyy tehdä vain kerran
|
|
|
|
|
ratkaisussa 3 algoritmin alussa,
|
|
|
|
|
minkä jälkeen algoritmin loppuosa on lineaarinen.
|
|
|
|
|
Ratkaisu 1 taas pitää yllä monimutkaista
|
|
|
|
|
tasapainoista binääripuuta koko algoritmin ajan.
|