\chapter{Introduction} Competitive programming combines two topics: (1) design of algorithms and (2) implementation of algorithms. The \key{design of algorithms} consists of problem solving and mathematical thinking. Skills for analyzing problems and solving them using creativity is needed. An algorithm for solving a problem has to be both correct and efficient, and the core of the problem is often how to invent an efficient algorithm. Theoretical knowledge of algorithms is very important to competitive programmers. Typically, a solution for a problem is a combination of well-known techniques and new insights. The techniques that appear in competitive programming also form the basis for the scientific research of algorithms. The \key{implementation of algorithms} requires good programming skills. In competitive programming, the solutions are graded by testing an implemented algorithm using a set of test cases. Thus, it is not enough that the idea of the algorithm is correct, but the implementation has to be correct as well. Good coding style in contests is straightforward and concise. The solutions should be written quickly, because there is not much time available. Unlike in traditional software engineering, the solutions are short (usually at most some hundreds of lines) and it is not needed to maintain them after the contest. \section{Programming languages} \index{programming language} At the moment, the most popular programming languages in contests are C++, Python and Java. For example, in Google Code Jam 2016, among the best 3,000 participants, 73 \% used C++, 15 \% used Python and 10 \% used Java\footnote{\url{https://www.go-hero.net/jam/16}}. Some participants also used several languages. Many people think that C++ is the best choice for a competitive programmer, and C++ is nearly always available in contest systems. The benefits in using C++ are that it is a very efficient language and its standard library contains a large collection of data structures and algorithms. On the other hand, it is good to master several languages and know the benefits of them. For example, if big integers are needed in the problem, Python can be a good choice because it contains a built-in library for handling big integers. Still, usually the goal is to write the problems so that the use of a specific programming language is not an unfair advantage in the contest. All examples in this book are written in C++, and the data structures and algorithms in the standard library are often used. The book follows the C++11 standard, that can be used in most contests nowadays. If you can't program in C++ yet, now it is a good time to start learning. \subsubsection{C++ template} A typical C++ template for competitive programming looks like this: \begin{lstlisting} #include using namespace std; int main() { // solution comes here } \end{lstlisting} The \texttt{\#include} line at the beginning of the code is a feature in the \texttt{g++} compiler that allows to include the whole standard library. Thus, it is not needed to separately include libraries such as \texttt{iostream}, \texttt{vector} and \texttt{algorithm}, but they are available automatically. The \texttt{using} line determines that the classes and functions of the standard library can be used directly in the code. Without the \texttt{using} line we should write, for example, \texttt{std::cout}, but now it is enough to write \texttt{cout}. The code can be compiled using the following command: \begin{lstlisting} g++ -std=c++11 -O2 -Wall code.cpp -o code \end{lstlisting} This command produces a binary file \texttt{code} from the source code \texttt{code.cpp}. The compiler obeys the C++11 standard (\texttt{-std=c++11}), optimizes the code (\texttt{-O2}) and shows warnings about possible errors (\texttt{-Wall}). \section{Input and output} \index{input and output} In most contests, standard streams are used for reading input and writing output. In C++, the standard streams are \texttt{cin} for input and \texttt{cout} for output. In addition, the C functions \texttt{scanf} and \texttt{printf} can be used. The input for the program usually consists of numbers and strings that are separated with spaces and newlines. They can be read from the \texttt{cin} stream as follows: \begin{lstlisting} int a, b; string x; cin >> a >> b >> x; \end{lstlisting} This kind of code always works, assuming that there is at least one space or one newline between each element in the input. For example, the above code accepts both the following inputs: \begin{lstlisting} 123 456 apina \end{lstlisting} \begin{lstlisting} 123 456 apina \end{lstlisting} The \texttt{cout} stream is used for output as follows: \begin{lstlisting} int a = 123, b = 456; string x = "apina"; cout << a << " " << b << " " << x << "\n"; \end{lstlisting} Handling input and output is sometimes a bottleneck in the program. The following lines at the beginning of the code make input and output more efficient: \begin{lstlisting} ios_base::sync_with_stdio(0); cin.tie(0); \end{lstlisting} Note that the newline \texttt{"\textbackslash n"} works faster than \texttt{endl}, becauses \texttt{endl} always causes a flush operation. The C functions \texttt{scanf} and \texttt{printf} are an alternative to the C++ standard streams. They are usually a bit faster, but they are also more difficult to use. The following code reads two integers from the input: \begin{lstlisting} int a, b; scanf("%d %d", &a, &b); \end{lstlisting} The following code prints two integers: \begin{lstlisting} int a = 123, b = 456; printf("%d %d\n", a, b); \end{lstlisting} Sometimes the program should read a whole line from the input, possibly with spaces. This can be accomplished using the \texttt{getline} function: \begin{lstlisting} string s; getline(cin, s); \end{lstlisting} If the amount of data is unknown, the following loop can be handy: \begin{lstlisting} while (cin >> x) { // koodia } \end{lstlisting} This loop reads elements from the input one after another, until there is no more data available in the input. In some contest systems, files are used for input and output. An easy solution for this is to write the code as usual using standard streams, but add the following lines to the beginning of the code: \begin{lstlisting} freopen("input.txt", "r", stdin); freopen("output.txt", "w", stdout); \end{lstlisting} After this, the code reads the input from the file ''input.txt'' and writes the output to the file ''output.txt''. \section{Handling numbers} \index{integer} \subsubsection{Integers} The most popular integer type in competitive programming is \texttt{int}. This is a 32-bit type with value range $-2^{31} \ldots 2^{31}-1$, i.e., about $-2 \cdot 10^9 \ldots 2 \cdot 10^9$. If the type \texttt{int} is not enough, the 64-bit type \texttt{long long} can be used, with value range $-2^{63} \ldots 2^{63}-1$, i.e., about $-9 \cdot 10^{18} \ldots 9 \cdot 10^{18}$. The following code defines a \texttt{long long} variable: \begin{lstlisting} long long x = 123456789123456789LL; \end{lstlisting} The suffix \texttt{LL} means that the type of the number is \texttt{long long}. A typical error when using the type \texttt{long long} is that the type \texttt{int} is still used somewhere in the code. For example, the following code contains a subtle error: \begin{lstlisting} int a = 123456789; long long b = a*a; cout << b << "\n"; // -1757895751 \end{lstlisting} Even though the variable \texttt{b} is of type \texttt{long long}, both numbers in the expression \texttt{a*a} are of type \texttt{int} and the result is also of type \texttt{int}. Because of this, the variable \texttt{b} will contain a wrong result. The problem can be solved by changing the type of \texttt{a} to \texttt{long long} or by changing the expression to \texttt{(long long)a*a}. Usually, the problems are written so that the type \texttt{long long} is enough. Still, it is good to know that the \texttt{g++} compiler also features an 128-bit type \texttt{\_\_int128\_t} with value range $-2^{127} \ldots 2^{127}-1$, i.e., $-10^{38} \ldots 10^{38}$. However, this type is not available in all contest systems. \subsubsection{Modular arithmetic} \index{remainder} \index{modular arithmetic} We denote by $x \bmod m$ the remainder when $x$ is divided by $m$. For example, $17 \bmod 5 = 2$, because $17 = 3 \cdot 5 + 2$. Sometimes, the answer for a problem is a very big integer but it is enough to print it ''modulo $m$'', i.e., the remainder when the answer is divided by $m$ (for example, ''modulo $10^9+7$''). The idea is that even if the actual answer may be very big, it is enough to use the types \texttt{int} and \texttt{long long}. An important property of the remainder is that in addition, subtraction and multiplication, the remainder can be calculated before the operation: \[ \begin{array}{rcr} (a+b) \bmod m & = & (a \bmod m + b \bmod m) \bmod m \\ (a-b) \bmod m & = & (a \bmod m - b \bmod m) \bmod m \\ (a \cdot b) \bmod m & = & (a \bmod m \cdot b \bmod m) \bmod m \end{array} \] Thus, we can calculate the remainder after every operation and the numbers will never become too large. For example, the following code calculates $n!$, the factorial of $n$, modulo $m$: \begin{lstlisting} long long x = 1; for (int i = 2; i <= n i++) { x = (x*i)%m; } cout << x << "\n"; \end{lstlisting} Usually, the answer should be always given so that the remainder is between $0\ldots m-1$. However, in C++ and other languages, the remainder of a negative number can be negative. An easy way to make sure that this will not happen is to first calculate the remainder as usual and then add $m$ if the result is negative: \begin{lstlisting} x = x%m; if (x < 0) x += m; \end{lstlisting} However, this is only needed when there are subtractions in the code and the remainder may become negative. \subsubsection{Floating point numbers} \index{floating point number} The usual floating point types in competitive programming are the 64-bit \texttt{double} and, as an extension in the \texttt{g++} compiler, the 80-bit \texttt{long double}. In most cases, \texttt{double} is enough, but \texttt{long double} is more accurate. The required precision of the answer is usually given. The easiest way is to use the \texttt{printf} function that can be given the number of decimal places. For example, the following code prints the value of $x$ with 9 decimal places: \begin{lstlisting} printf("%.9f\n", x); \end{lstlisting} A difficulty when using floating point numbers is that some numbers cannot be represented accurately, but there will be rounding errors. For example, the result of the following code is surprising: \begin{lstlisting} double x = 0.3*3+0.1; printf("%.20f\n", x); // 0.99999999999999988898 \end{lstlisting} Because of a rounding error, the value of \texttt{x} is a bit less than 1, while the correct value would be 1. It is risky to compare floating point numbers with the \texttt{==} operator, because it is possible that the values should be equal but they are not due to rounding errors. A better way to compare floating point numbers is to assume that two numbers are equal if the difference between them is $\varepsilon$, where $\varepsilon$ is a small number. In practice, the numbers can be compared as follows ($\varepsilon=10^{-9}$): \begin{lstlisting} if (abs(a-b) < 1e-9) { // a and b are equal } \end{lstlisting} Note that while floating point numbers are inaccurate, integers up to a certain limit can be still represented accurately. For example, using \texttt{double}, it is possible to accurately represent all integers having absolute value at most $2^{53}$. \section{Shortening code} Short code is ideal in competitive programming, because the algorithm should be implemented as fast as possible. Because of this, competitive programmers often define shorter names for datatypes and other parts of code. \subsubsection{Type names} \index{tuppdef@\texttt{typedef}} Using the command \texttt{typedef} it is possible to give a shorter name to a datatype. For example, the name \texttt{long long} is long, so we can define a shorter name \texttt{ll}: \begin{lstlisting} typedef long long ll; \end{lstlisting} After this, the code \begin{lstlisting} long long a = 123456789; long long b = 987654321; cout << a*b << "\n"; \end{lstlisting} can be shortened as follows: \begin{lstlisting} ll a = 123456789; ll b = 987654321; cout << a*b << "\n"; \end{lstlisting} The command \texttt{typedef} can also be used with more complex types. For example, the following code gives the name \texttt{vi} for a vector of integers, and the name \texttt{pi} for a pair that contains two integers. \begin{lstlisting} typedef vector vi; typedef pair pi; \end{lstlisting} \subsubsection{Macros} \index{macro} Another way to shorten the code is to define \key{macros}. A macro means that certain strings in the code will be changed before the compilation. In C++, macros are defined using the command \texttt{\#define}. For example, we can define the following macros: \begin{lstlisting} #define F first #define S second #define PB push_back #define MP make_pair \end{lstlisting} After this, the code \begin{lstlisting} v.push_back(make_pair(y1,x1)); v.push_back(make_pair(y2,x2)); int d = v[i].first+v[i].second; \end{lstlisting} can be shortened as follows: \begin{lstlisting} v.PB(MP(y1,x1)); v.PB(MP(y2,x2)); int d = v[i].F+v[i].S; \end{lstlisting} It is also possible to define a macro with parameters which makes it possible to shorten loops and other structures in the code. For example, we can define the following macro: \begin{lstlisting} #define REP(i,a,b) for (int i = a; i <= b; i++) \end{lstlisting} After this, the code \begin{lstlisting} for (int i = 1; i <= n; i++) { haku(i); } \end{lstlisting} can be shortened as follows: \begin{lstlisting} REP(i,1,n) { haku(i); } \end{lstlisting} \section{Mathematics} Mathematics plays an important role in competitive programming, and it is not possible to become a successful competitive programmer without good skills in mathematics. This section covers some important mathematical concepts and formulas that are needed later in the book. \subsubsection{Sum formulas} Each sum of the form \[\sum_{x=1}^n x^k = 1^k+2^k+3^k+\ldots+n^k\] where $k$ is a positive integer, has a closed-form formula that is a polynomial of degree $k+1$. For example, \[\sum_{x=1}^n x = 1+2+3+\ldots+n = \frac{n(n+1)}{2}\] and \[\sum_{x=1}^n x^2 = 1^2+2^2+3^2+\ldots+n^2 = \frac{n(n+1)(2n+1)}{6}.\] An \key{arithmetic sum} is a sum \index{arithmetic sum} where the difference between any two consecutive numbers is constant. For example, \[3+7+11+15\] is an arithmetic sum with constant 4. An arithmetic sum can be calculated using the formula \[\frac{n(a+b)}{2}\] where $a$ is the first number, $b$ is the last number and $n$ is the amount of numbers. For example, \[3+7+11+15=\frac{4 \cdot (3+15)}{2} = 36.\] The formula is based on the fact that the sum consists of $n$ numbers and the value of each number is $(a+b)/2$ on average. \index{geometric sum} A \key{geometric sum} is a sum where the ratio between any two consecutive numbers is constant. For example, \[3+6+12+24\] is a geometric sum with constant 2. A geometric sum can be calculated using the formula \[\frac{bx-a}{x-1}\] where $a$ is the first number, $b$ is the last number and the ratio between consecutive numbers is $x$. For example, \[3+6+12+24=\frac{24 \cdot 2 - 3}{2-1} = 45.\] This formula can be derived as follows. Let \[ S = a + ax + ax^2 + \cdots + b .\] By multiplying both sides by $x$, we get \[ xS = ax + ax^2 + ax^3 + \cdots + bx,\] and solving the equation \[ xS-S = bx-a.\] yields the formula. A special case of a geometric sum is the formula \[1+2+4+8+\ldots+2^{n-1}=2^n-1.\] \index{harmonic sum} A \key{harmonic sum} is a sum of the form \[ \sum_{x=1}^n \frac{1}{x} = 1+\frac{1}{2}+\frac{1}{3}+\ldots+\frac{1}{n}.\] An upper bound for the harmonic sum is $\log_2(n)+1$. The reason for this is that we can change each term $1/k$ so that $k$ becomes a power of two that doesn't exceed $k$. For example, when $n=6$, we can estimate the sum as follows: \[ 1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\frac{1}{5}+\frac{1}{6} \le 1+\frac{1}{2}+\frac{1}{2}+\frac{1}{4}+\frac{1}{4}+\frac{1}{4}.\] This upper bound consists of $\log_2(n)+1$ parts ($1$, $2 \cdot 1/2$, $4 \cdot 1/4$, etc.), and the sum of each part is at most 1. \subsubsection{Set theory} \index{set theory} \index{set} \index{intersection} \index{union} \index{difference} \index{subset} \index{universal set} \index{complement} A \key{set} is a collection of elements. For example, the set \[X=\{2,4,7\}\] contains elements 2, 4 and 7. The symbol $\emptyset$ denotes an empty set, and $|S|$ denotes the size of set $S$, i.e., the number of elements in the set. For example, in the above set, $|X|=3$. If set $S$ contains element $x$, we write $x \in S$, and otherwise we write $x \notin S$. For example, in the above set \[4 \in X \hspace{10px}\textrm{and}\hspace{10px} 5 \notin X.\] \begin{samepage} New sets can be constructed as follows using set operations: \begin{itemize} \item The \key{intersection} $A \cap B$ consists of elements that are both in $A$ and $B$. For example, if $A=\{1,2,5\}$ and $B=\{2,4\}$, then $A \cap B = \{2\}$. \item The \key{union} $A \cup B$ consists of elements that are in $A$ or $B$ or both. For example, if $A=\{3,7\}$ and $B=\{2,3,8\}$, then $A \cup B = \{2,3,7,8\}$. \item The \key{complement} $\bar A$ consists of elements that are not in $A$. The interpretation of a complement depends on the \key{universal set} that contains all possible elements. For example, if $A=\{1,2,5,7\}$ and the universal set is $P=\{1,2,\ldots,10\}$, then $\bar A = \{3,4,6,8,9,10\}$. \item The \key{difference} $A \setminus B = A \cap \bar B$ consists of elements that are in $A$ but not in $B$. Note that $B$ can contain elements that are not in $A$. For example, if $A=\{2,3,7,8\}$ and $B=\{3,5,8\}$, then $A \setminus B = \{2,7\}$. \end{itemize} \end{samepage} If each element of $A$ also belongs to $S$, we say that $A$ is a \key{subset} of $S$, denoted by $A \subset S$. Set $S$ always has $2^{|S|}$ subsets, including the empty set. For example, the subsets of the set $\{2,4,7\}$ are \begin{center} $\emptyset$, $\{2\}$, $\{4\}$, $\{7\}$, $\{2,4\}$, $\{2,7\}$, $\{4,7\}$ ja $\{2,4,7\}$. \end{center} Often used sets are \begin{itemize}[noitemsep] \item $\mathbb{N}$ (natural numbers), \item $\mathbb{Z}$ (integers), \item $\mathbb{Q}$ (rational numbers) and \item $\mathbb{R}$ (real numbers). \end{itemize} The set $\mathbb{N}$ of natural numbers can be defined in two ways, depending on the situation: either $\mathbb{N}=\{0,1,2,\ldots\}$ or $\mathbb{N}=\{1,2,3,...\}$. We can also construct a set using a rule of the form \[\{f(n) : n \in S\},\] where $f(n)$ is some function. This set contains all elements $f(n)$ where $n$ is an element in $S$. For example, the set \[X=\{2n : n \in \mathbb{Z}\}\] contains all even integers. \subsubsection{Logic} \index{logic} \index{negation} \index{conjuction} \index{disjunction} \index{implication} \index{equivalence} The value of a logical expression is either \key{true} (1) or \key{false} (0). The most important logical operators are $\lnot$ (\key{negation}), $\land$ (\key{conjunction}), $\lor$ (\key{disjunction}), $\Rightarrow$ (\key{implication}) and $\Leftrightarrow$ (\key{equivalence}). The following table shows the meaning of the operators: \begin{center} \begin{tabular}{rr|rrrrrrr} $A$ & $B$ & $\lnot A$ & $\lnot B$ & $A \land B$ & $A \lor B$ & $A \Rightarrow B$ & $A \Leftrightarrow B$ \\ \hline 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 & 1 \\ \end{tabular} \end{center} The negation $\lnot A$ reverses the value of an expression. The expression $A \land B$ is true if both $A$ and $B$ are true, and the expression $A \lor B$ is true if $A$ or $B$ or both are true. The expression $A \Rightarrow B$ is true if whenever $A$ is true, also $B$ is true. The expression $A \Leftrightarrow B$ is true if $A$ and $B$ are both true or both false. \index{predicate} A \key{predicate} is an expression that is true or false depending on its parameters. Predicates are usually denoted by capital letters. For example, we can define a predicate $P(x)$ that is true exactly when $x$ is a prime number. Using this definition, $P(7)$ is true but $P(8)$ is false. \index{quantifier} A \key{quantifier} connects a logical expression to elements in a set. The most important quantifiers are $\forall$ (\key{for all}) and $\exists$ (\key{there is}). For example, \[\forall x (\exists y (y < x))\] means that for each element $x$ in the set, there is an element $y$ in the set such that $y$ is smaller than $x$. This is true in the set of integers, but false in the set of natural numbers. Using the notation described above, we can express many kinds of logical propositions. For example, \[\forall x ((x>2 \land \lnot P(x)) \Rightarrow (\exists a (\exists b (x = ab \land a > 1 \land b > 1))))\] means that if a number $x$ is larger than 2 and not a prime number, there are numbers $a$ and $b$ that are larger than $1$ and whose product is $x$. This proposition is true in the set of integers. \subsubsection{Functions} The function $\lfloor x \rfloor$ rounds the number $x$ down to an integer, and the function $\lceil x \rceil$ rounds the number $x$ up to an integer. For example, \[ \lfloor 3/2 \rfloor = 1 \hspace{10px} \textrm{and} \hspace{10px} \lceil 3/2 \rceil = 2.\] The functions $\min(x_1,x_2,\ldots,x_n)$ and $\max(x_1,x_2,\ldots,x_n)$ return the smallest and the largest of values $x_1,x_2,\ldots,x_n$. For example, \[ \min(1,2,3)=1 \hspace{10px} \textrm{and} \hspace{10px} \max(1,2,3)=3.\] \index{factorial} The \key{factorial} $n!$ is defined \[\prod_{x=1}^n x = 1 \cdot 2 \cdot 3 \cdot \ldots \cdot n\] or recursively \[ \begin{array}{lcl} 0! & = & 1 \\ n! & = & n \cdot (n-1)! \\ \end{array} \] \index{Fibonacci number} The \key{Fibonacci numbers} arise in several situations. They can be defined recursively as follows: \[ \begin{array}{lcl} f(0) & = & 0 \\ f(1) & = & 1 \\ f(n) & = & f(n-1)+f(n-2) \\ \end{array} \] The first Fibonacci numbers are \[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, \ldots\] There is also a closed-form formula for calculating Fibonacci numbers: \[f(n)=\frac{(1 + \sqrt{5})^n - (1-\sqrt{5})^n}{2^n \sqrt{5}}.\] \subsubsection{Logarithm} \index{logarithm} The \key{logarithm} of a number $x$ is denoted $\log_k(x)$ where $k$ is the base of the logarithm. The logarithm is defined so that $\log_k(x)=a$ exactly when $k^a=x$. A useful interpretation in algorithmics is that $\log_k(x)$ equals the number of times we have to divide $x$ by $k$ before we reach the number 1. For example, $\log_2(32)=5$ because 5 divisions are needed: \[32 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1 \] Logarithms are often needed in the analysis of algorithms because many efficient algorithms divide in half something at each step. Thus, we can estimate the efficiency of those algorithms using the logarithm. The logarithm of a product is \[\log_k(ab) = \log_k(a)+\log_k(b),\] and consequently, \[\log_k(x^n) = n \cdot \log_k(x).\] In addition, the logarithm of a quotient is \[\log_k\Big(\frac{a}{b}\Big) = \log_k(a)-\log_k(b).\] Another useful formula is \[\log_u(x) = \frac{\log_k(x)}{\log_k(u)},\] and using this, it is possible to calculate logarithms to any base if there is a way to calculate logarithms to some fixed base. \index{natural logarithm} The \key{natural logarithm} $\ln(x)$ of a number $x$ is a logarithm whose base is $e \approx 2{,}71828$. Another property of the logarithm is that the number of digits of a number $x$ in base $b$ is $\lfloor \log_b(x)+1 \rfloor$. For example, the representation of the number $123$ in base $2$ is 1111011 and $\lfloor \log_2(123)+1 \rfloor = 7$.