Bit representation and manipulation

2017-01-07 13:34:28 +02:00 · 2017-01-07 13:34:28 +02:00 · e65fe7c8c7
parent edb0faeb45
commit e65fe7c8c7
1 changed files with 154 additions and 162 deletions
--- a/luku10.tex
+++ b/luku10.tex
@ -1,97 +1,105 @@
 \chapter{Bit manipulation}
-Tietokone käsittelee tietoa sisäisesti bitteinä
+A computer internally manipulates data
-eli numeroina 0 ja 1.
+as bits, i.e., as numbers 0 and 1.
-Tässä luvussa tutustumme tarkemmin kokonaisluvun
+In this chapter, we will learn how integers
-bittiesitykseen sekä bittioperaatioihin,
+are represented as bits, and how bit operations
-jotka muokkaavat luvun bittejä.
+can be used for manipulating them.
-Osoittautuu, että näistä operaatioista on
+It turns out that there are many uses for
-monenlaista hyötyä algoritmien ohjelmoinnissa.
+bit operations in the implementation of algorithms.
-\section{Luvun bittiesitys}
+\section{Bit representation}
-\index{bittiesitys@bittiesitys}
+\index{bit representation}
-Luvun \key{bittiesitys} ilmaisee, mistä 2:n potensseista
+The \key{bit representation} of a number
-luku muodostuu. Esimerkiksi luvun 43 bittiesitys on 101011, koska
+indicates which powers of two form the number.
-$43 = 2^5 + 2^3 + 2^1 + 2^0$
+For example, the bit representation of the number 43
-eli oikealta lukien bitit 0, 1, 3 ja 5 ovat ykkösiä
+is 101011 because
-ja kaikki muut bitit ovat nollia.
+$43 = 2^5 + 2^3 + 2^1 + 2^0$ where
 bits 0, 1, 3 and 5 from the right are ones,
 and all other bits are zeros.
-Tietokoneessa luvun bittiesityksen
+The length of a bit representation of a number
-bittien määrä on kiinteä ja riippuu käytetystä tietotyypistä.
+in a computer is static, and depends on the
-Esimerkiksi C++:n \texttt{int}-tyyppi on tavallisesti 32-bittinen,
+data type chosen.
-jolloin \texttt{int}-luku tallennetaan 32 bittinä.
+For example, the \texttt{int} type in C++ is
-Tällöin esimerkiksi luvun 43 bittiesitys \texttt{int}-lukuna on seuraava:
+usually a 32-bit type, and an \texttt{int} number
 consists of 32 bits.
 In this case, the bit representation of 43
 as an \texttt{int} number is as follows:
 \[00000000000000000000000000101011\]
-Luvun bittiesitys on joko \key{etumerkillinen}
+The bit representation of a number is either
-tai \key{etumerkitön}.
+\key{signed} or \key{unsigned}.
-Etumerkillisen bittiesityksen ensimmäinen bitti on etumerkki
+The first bit of a signed number is the sign
-($+$ tai $-$) ja $n$ bitillä voi esittää luvut $-2^{n-1} \ldots 2^{n-1}-1$.
+($+$ or $-$), and we can represent numbers
-Jos taas bittiesitys on etumerkitön,
+$-2^{n-1} \ldots 2^{n-1}-1$ using $n$ bits.
-kaikki bitit kuuluvat lukuun ja $n$ bitillä voi esittää luvut $0 \ldots 2^n-1$.
+In an unsigned number, in turn,
 all bits belong to the number and we
 can represent numbers $0 \ldots 2^n-1$ using $n$ bits.
-Etumerkillisessä bittiesityksessä ei-negatiivisen luvun
+In an signed bit representation,
-ensimmäinen bitti on 0 ja negatiivisen luvun
+the first bit of a nonnegative number is 0,
-ensimmäinen bitti on 1.
+and the first bit of a negative number is 1.
-Bittiesityksenä on \key{kahden komplementti},
+\key{Two's complement} is used which means that
-jossa luvun luvun vastaluvun saa laskettua
+the opposite number of a number can be calculated
-muuttamalla
+by first inversing all the bits in the number,
-kaikki bitit käänteiseksi ja lisäämällä
+and then increasing the number by one.
 tulokseen yksi.
-Esimerkiksi luvun $-43$ esitys \texttt{int}-lukuna on seuraava:
+For example, the representation of $-43$
 as an \texttt{int} number is as follows:
 \[11111111111111111111111111010101\]
-Etumerkillisen ja etumerkittömän bittiesityksen
+The connection between signed and unsigned numbers
-yhteys on, että etumerkillisen luvun $-x$
+is that the representations of a signed
-ja etumerkittömän luvun $2^n-x$ bittiesitykset ovat samat.
+number $-x$ and an unsigned number $2^n-x$
-Niinpä yllä oleva bittiesitys tarkoittaa
+are equal.
-etumerkittömänä lukua $2^{32}-43$.
+Thus, the above representation corresponds to
 the unsigned number $2^{32}-43$.
-C++:ssa luvut ovat oletuksena etumerkillisiä,
+In C++, the numbers are signed as default,
-mutta avainsanan \texttt{unsigned} avulla
+but we can create unsigned numbers by
-luvusta saa etumerkittömän.
+using the keyword \texttt{unsigned}.
-Esimerkiksi koodissa
+For example, in the code
 \begin{lstlisting}
 int x = -43;
 unsigned int y = x;
 cout << x << "\n"; // -43
 cout << y << "\n"; // 4294967253
 \end{lstlisting}
-etumerkillistä lukua $x=-43$ vastaa etumerkitön luku $y=2^{32}-43$.
+the signed number
 $x=-43$ becomes the unsigned number $y=2^{32}-43$.
-Jos luvun suuruus menee käytössä
+If a number becomes too large or too small for the
-olevan bittiesityksen ulkopuolelle,
+bit representation chosen, it will overflow.
-niin luku pyörähtää ympäri.
+In practice, in a signed representation,
-Etumerkillisessä bittiesityksessä
+the next number after $2^{n-1}-1$ is $-2^{n-1}$,
-luvusta $2^{n-1}-1$ seuraava luku on $-2^{n-1}$
+and in an unsigned representation,
-ja vastaavasti etumerkittömässä bittiesityksessä
+the next number after $2^{n-1}$ is $0$.
-luvusta $2^n-1$ seuraava luku on $0$.
+For example, in the code
 Esimerkiksi koodissa
 \begin{lstlisting}
 int x = 2147483647
 cout << x << "\n"; // 2147483647
 x++;
 cout << x << "\n"; // -2147483648
 \end{lstlisting}
-muuttuja $x$ pyörähtää ympäri luvusta $2^{31}-1$ lukuun $-2^{31}$.
+we increase $2^{31}-1$ by one to get $-2^{31}$.
-\section{Bittioperaatiot}
+\section{Bit operations}
 \newcommand\XOR{\mathbin{\char`\^}}
-\subsubsection{And-operaatio}
+\subsubsection{And operation}
-\index{and-operaatio}
+\index{and operation}
-And-operaatio $x$ \& $y$ tuottaa luvun,
+The \key{and} operation $x$ \& $y$ produces a number
-jossa on ykkösbitti niissä kohdissa,
+that has bit 1 in positions where both the numbers
-joissa molemmissa luvuissa $x$ ja $y$ on ykkösbitti.
+$x$ and $y$ have bit 1.
-Esimerkiksi $22$ \& $26$ = 18, koska
+For example, $22$ \& $26$ = 18 because
 \begin{center}
 \begin{tabular}{rrr}
@ -102,18 +110,20 @@ Esimerkiksi $22$ \& $26$ = 18, koska
 \end{tabular}
 \end{center}
-And-operaation avulla voi tarkastaa luvun parillisuuden,
+Using the and operation, we can check if a number
-koska $x$ \& $1$ = 0, jos luku on parillinen,
+$x$ is even because
-ja $x$ \& $1$ = 1, jos luku on pariton.
+$x$ \& $1$ = 0 if $x$ is even, and
 $x$ \& $1$ = 1 if $x$ is odd.
-\subsubsection{Or-operaatio}
+\subsubsection{Or operation}
-\index{or-operaatio}
+\index{or operation}
-Or-operaatio $x$ | $y$ tuottaa luvun,
+The \key{or} operation $x$ | $y$ produces a number
-jossa on ykkösbitti niissä kohdissa,
+that has bit 1 in positions where at least one
-joissa ainakin toisessa luvuista $x$ ja $y$ on ykkösbitti.
+of the numbers
-Esimerkiksi $22$ | $26$ = 30, koska
+$x$ and $y$ have bit 1.
 For example, $22$ | $26$ = 30 because
 \begin{center}
 \begin{tabular}{rrr}
@ -124,14 +134,15 @@ Esimerkiksi $22$ | $26$ = 30, koska
 \end{tabular}
 \end{center}
-\subsubsection{Xor-operaatio}
+\subsubsection{Xor operation}
-\index{xor-operaatio}
+\index{xor operation}
-Xor-operaatio $x$ $\XOR$ $y$ tuottaa luvun,
+The \key{xor} operation $x$ $\XOR$ $y$ produces a number
-jossa on ykkösbitti niissä kohdissa,
+that has bit 1 in positions where exactly one
-joissa tarkalleen toisessa luvuista $x$ ja $y$ on ykkösbitti.
+of the numbers
-Esimerkiksi $22$ $\XOR$ $26$ = 12, koska
+$x$ and $y$ have bit 1.
 For example, $22$ $\XOR$ $26$ = 12 because
 \begin{center}
 \begin{tabular}{rrr}
@ -142,20 +153,21 @@ $\XOR$ & 11010 & (26) \\
 \end{tabular}
 \end{center}
-\subsubsection{Not-operaatio}
+\subsubsection{Not operation}
-\index{not-operaatio}
+\index{not operation}
-Not-operaatio \textasciitilde$x$ tuottaa luvun,
+The \key{not} operation \textasciitilde$x$
-jossa kaikki $x$:n bitit on muutettu käänteisiksi.
+produces a number where all the bits of $x$
-Operaatiolle pätee kaava \textasciitilde$x = -x-1$,
+have been inversed.
-esimerkiksi \textasciitilde$29 = -30$.
+The formula \textasciitilde$x = -x-1$ holds,
 for example, \textasciitilde$29 = -30$.
-Not-operaation toiminta bittitasolla riippuu siitä,
+The result of the not operation at the bit level
-montako bittiä luvun bittiesityksessä on,
+depends on the length of the bit representation
-koska operaatio vaikuttaa kaikkiin luvun bitteihin.
+because the operation changes all bits.
-Esimerkiksi 32-bittisenä \texttt{int}-lukuna
+For example, if the numbers are 32-bit
-tilanne on seuraava:
+\texttt{int} numbers, the result is as follows:
 \begin{center}
 \begin{tabular}{rrrr}
@ -164,108 +176,82 @@ $x$ & = & 29 &   00000000000000000000000000011101 \\
 \end{tabular}
 \end{center}
-\subsubsection{Bittisiirrot}
+\subsubsection{Bit shifts}
-\index{bittisiirto@bittisiirto}
+\index{bit shift}
-Vasen bittisiirto $x < < k$ tuottaa luvun, jossa luvun $x$ bittejä
+The left bit shift $x < < k$ produces a number
-on siirretty $k$ askelta vasemmalle eli
+where the bits of $x$ have been moved $k$ steps to
-luvun loppuun tulee $k$ nollabittiä.
+the left by adding $k$ zero bits to the number.
-Oikea bittisiirto $x > > k$ tuottaa puolestaan
+The right bit shift $x > > k$ produces a number
-luvun, jossa luvun $x$ bittejä
+where the bits of $x$ have been moved $k$ steps
-on siirretty $k$ askelta oikealle eli
+to the right by removing $k$ last bits from the number.
 luvun lopusta lähtee pois $k$ viimeistä bittiä.
-Esimerkiksi $14 < < 2 = 56$,
+For example, $14 < < 2 = 56$
-koska $14$ on bitteinä 1110,
+because $14$ equals 1110,
-josta tulee bittisiirron jälkeen 111000 eli $56$.
+and it becomes $56$ that equals 111000.
-Vastaavasti $49 > > 3 = 6$,
+Correspondingly, $49 > > 3 = 6$
-koska $49$ on bitteinä 110001,
+because $49$ equals 110001,
-josta tulee bittisiirron jälkeen 110 eli $6$.
+and it becomes $6$ that equals 110.
-Huomaa, että vasen bittisiirto $x < < k$
+Note that the left bit shift $x < < k$
-vastaa luvun $x$ kertomista $2^k$:lla
+corresponds to multiplying $x$ by $2^k$,
-ja oikea bittisiirto $x > > k$
+and the right bit shift $x > > k$
-vastaa luvun $x$ jakamista $2^k$:lla
+corresponds to dividing $x$ by $2^k$
-alaspäin pyöristäen.
+rounding downwards.
-\subsubsection{Bittien käsittely}
+\subsubsection{Bit manipulation}
-Luvun bitit indeksoidaan oikealta vasemmalle
+The bits in a number are indexed from the right
-nollasta alkaen.
+to the left beginning from zero.
-Luvussa $1 < < k$ on yksi ykkösbitti
+A number of the form $1 < < k$ contains a one bit
-kohdassa $k$ ja kaikki muut bitit ovat nollia, joten sen avulla voi käsitellä
+in position $k$, and all other bits are zero,
-muiden lukujen yksittäisiä bittejä.
+so we can manipulate single bits of numbers
 using these numbers.
-Luvun $x$ bitti $k$ on ykkösbitti, jos
+The $k$th bit in $x$ is one if
 $x$ \& $(1 < < k) = (1 < < k)$.
-Lauseke $x$ | $(1 < < k)$ asettaa luvun $x$ bitin $k$
+The formula $x$ | $(1 < < k)$
-ykköseksi, lauseke
+sets the $k$th bit of $x$ to one,
 the formula
 $x$ \& \textasciitilde $(1 < < k)$
-asettaa luvun $x$ bitin $k$ nollaksi ja
+sets the $k$th bit of $x$ to zero,
-lauseke $x$ $\XOR$ $(1 < < k)$
+and the formula
-muuttaa luvun $x$ bitin $k$ käänteiseksi.
+$x$ $\XOR$ $(1 < < k)$
-% 
+inverses the $k$th bit of $x$.
 % Seuraava koodi muuttaa luvun bittejä:
 % 
 % \begin{lstlisting}
 % int x = 181; // 10110101
 % cout << (x|(1<<2)) << "\n"; // 181 = 10110101
 % cout << (x|(1<<3)) << "\n"; // 189 = 10111101
 % cout << (x&~(1<<2)) << "\n"; // 177 = 10110001
 % cout << (x&~(1<<3)) << "\n"; // 181 = 10110101
 % cout << (x^(1<<2)) << "\n"; // 177 = 10110001
 % cout << (x^(1<<3)) << "\n"; // 189 = 10111101
 % \end{lstlisting}
 % 
 % % Bittiesityksen vasemmanpuoleisin bitti on eniten merkitsevä
 % % (\textit{most significant}) ja
 % % oikeanpuoleisin bitti on vähiten merkitsevä (\textit{least significant}).
-Lauseke $x$ \& $(x-1)$ muuttaa luvun $x$ viimeisen
+The formula $x$ \& $(x-1)$ sets the last
-ykkösbitin nollaksi, ja lauseke $x$ \& $-x$ nollaa
+one bit of $x$ to zero,
-luvun $x$ kaikki bitit paitsi viimeisen ykkösbitin.
+and the formula $x$ \& $-x$ sets all the
-Lauseke $x$ | $(x-1)$ vuorostaan muuttaa kaikki
+one bits to zero, except for the last one bit.
-viimeisen ykkösbitin jälkeiset bitit ykkösiksi.
+The formula $x$ | $(x-1)$, in turn,
 inverses all the bits after the last one bit.
-Huomaa myös, että positiivinen luku $x$ on muotoa $2^k$,
+Also note that a positive number $x$ is
-jos $x$ \& $(x-1) = 0$.
+of the form $2^k$ if $x$ \& $(x-1) = 0$.
 % 
 % Seuraava koodi esittelee operaatioita:
 % 
 % \begin{lstlisting}
 % int x = 168; // 10101000
 % cout << (x&(x-1)) << "\n"; // 160 = 10100000
 % cout << (x&-x) << "\n"; // 8 = 00001000
 % cout << (x|(x-1)) << "\n"; // 175 = 10101111
 % \end{lstlisting}
-\subsubsection*{Lisäfunktiot}
+\subsubsection*{Additional functions}
-Kääntäjä g++ sisältää mm. seuraavat funktiot
+The g++ compiler contains the following
-bittien käsittelyyn:
+functions for bit manipulation:
 \begin{itemize}
 \item
 $\texttt{\_\_builtin\_clz}(x)$:
-nollien määrä bittiesityksen alussa
+the number of zeros at the beginning of the number
 \item
 $\texttt{\_\_builtin\_ctz}(x)$:
-nollien määrä bittiesityksen lopussa
+the number of zeros at the end of the number
 \item
 $\texttt{\_\_builtin\_popcount}(x)$:
-ykkösten määrä bittiesityksessä
+the number of ones in the number
 \item
 $\texttt{\_\_builtin\_parity}(x)$:
-ykkösten määrän parillisuus
+the parity (even or odd) of the number of ones
 \end{itemize}
 \begin{samepage}
 Nämä funktiot käsittelevät \texttt{int}-lukuja,
 mutta funktioista on myös \texttt{long long} -versiot,
 joiden lopussa on pääte \texttt{ll}.
 Seuraava koodi esittelee funktioiden käyttöä:
 The following code shows how to use the functions:
 \begin{lstlisting}
 int x = 5328; // 00000000000000000001010011010000
 cout << __builtin_clz(x) << "\n"; // 19
@ -275,6 +261,12 @@ cout << __builtin_parity(x) << "\n"; // 1
 \end{lstlisting}
 \end{samepage}
 The functions support \texttt{int} numbers,
 but there are also \texttt{long long} versions
 of the functions
 available with the prefix \texttt{ll}.
 \section{Joukon bittiesitys}
 Joukon $\{0,1,2,\ldots,n-1\}$