Counting subgrids

2017-05-23 22:09:51 +03:00 · 2017-05-23 22:09:51 +03:00 · 7e729e8333
parent 04f3c313cc
commit 7e729e8333
1 changed files with 93 additions and 16 deletions
--- a/chapter10.tex
+++ b/chapter10.tex
@ -375,33 +375,38 @@ Such optimizations do not change the
 time complexity of the algorithm,
 but they may have a large impact
 on the actual running time of the code.
-In this section we discuss examples
+In this section we discuss two examples
 of such situations.

 \subsubsection{Hamming distances}

 \index{Hamming distance}
-The \key{Hamming distance} between two bit strings
-of equal length is
+The \key{Hamming distance}
+$\texttt{hamming}(a,b)$ between two
+equal-length bit strings $a$ and $b$ is
 the number of positions where the strings differ.
-For example, the Hamming distance between
-01101 and 11001 is 2.
+For example, $\texttt{hamming}(01101,11001)=2$.

-Consider the following problem: We are given
+Consider the following problem: Given
 a list of $n$ bit strings, each of length $k$,
-and our task is to calculate the minimum Hamming distance
+calculate the minimum Hamming distance
 between two strings in the list.
-For example, the minimum distance for the list
-$[00111,01101,11101]$ is 2.
+For example, the answer for $[00111,01101,11110]$,
+is 2, because
+\begin{itemize}[noitemsep]
+\item $\texttt{hamming}(00111,01101)=2$,
+\item $\texttt{hamming}(00111,11110)=3$, and
+\item $\texttt{hamming}(01101,11110)=3$.
+\end{itemize}

 A straightforward way to solve the problem is
 to go through all pairs of string and calculate
-their Hamming distances.
-Such an algorithm works in $O(n^2 k)$ time.
-The following function can be used to calculate
+their Hamming distances,
+which yields an $O(n^2 k)$ time algorithm.
+The following function calculates
 the Hamming distance between two strings:
 \begin{lstlisting}
-int distance(string a, string b) {
+int hamming(string a, string b) {
    int d = 0;
    for (int i = 0; i < k; i++) {
        if (a[i] != b[i]) d++;
@ -417,7 +422,7 @@ In particular, if $k \le 32$, we can just store
 the strings as \texttt{int} values and use the
 following function to calculate distances:
 \begin{lstlisting}
-int distance(int a, int b) {
+int hamming(int a, int b) {
    return __builtin_popcount(a^b);
 }
 \end{lstlisting}
@ -431,11 +436,83 @@ To compare the implementations, we generated
 a list of 10000 random bit strings of length 30.
 Using the first approach, the search took
 13.5 seconds, and after the bit optimization,
-it took only 0.5 seconds.
+it only took 0.5 seconds.
 Thus, the bit optimized code was almost
 30 times faster than the original code.

-\subsubsection{}
+\subsubsection{Counting subgrids}
+
+As another example, consider the
+following problem:
+Given an $n \times n$ grid whose
+each square is either black or white,
+calculate the number of subgrids
+whose all corners are black.
+For example, the grid
+\begin{center}
+\begin{tikzpicture}[scale=0.5]
+\fill[black] (1,1) rectangle (2,2);
+\fill[black] (1,4) rectangle (2,5);
+\fill[black] (4,1) rectangle (5,2);
+\fill[black] (4,4) rectangle (5,5);
+\fill[black] (1,3) rectangle (2,4);
+\fill[black] (2,3) rectangle (3,4);
+\fill[black] (2,1) rectangle (3,2);
+\fill[black] (0,2) rectangle (1,3);
+\draw (0,0) grid (5,5);
+\end{tikzpicture}
+\end{center}
+contains two such subgrids:
+\begin{center}
+\begin{tikzpicture}[scale=0.5]
+\fill[black] (1,1) rectangle (2,2);
+\fill[black] (1,4) rectangle (2,5);
+\fill[black] (4,1) rectangle (5,2);
+\fill[black] (4,4) rectangle (5,5);
+\fill[black] (1,3) rectangle (2,4);
+\fill[black] (2,3) rectangle (3,4);
+\fill[black] (2,1) rectangle (3,2);
+\fill[black] (0,2) rectangle (1,3);
+\draw (0,0) grid (5,5);
+
+\fill[black] (7+1,1) rectangle (7+2,2);
+\fill[black] (7+1,4) rectangle (7+2,5);
+\fill[black] (7+4,1) rectangle (7+5,2);
+\fill[black] (7+4,4) rectangle (7+5,5);
+\fill[black] (7+1,3) rectangle (7+2,4);
+\fill[black] (7+2,3) rectangle (7+3,4);
+\fill[black] (7+2,1) rectangle (7+3,2);
+\fill[black] (7+0,2) rectangle (7+1,3);
+\draw (7+0,0) grid (7+5,5);
+
+\draw[color=red,line width=1mm] (1,1) rectangle (3,4);
+\draw[color=red,line width=1mm] (7+1,1) rectangle (7+5,5);
+\end{tikzpicture}
+\end{center}
+
+There is an $O(n^3)$ time algorithm for solving the problem:
+go through all $O(n^2)$ pairs of rows and for each pair
+calculate the number of columns that contain a black
+square in both rows in $O(n)$ time.
+Then, if there are $c$ such columns for a fixed row pair,
+they account for $c(c-1)/2$ subgrids with black corners.
+
+To optimize this algorithm, we divide the grid into blocks
+of columns such that each block consists of $N$
+consecutive columns. Then, each row is stored as
+a list of $N$-bit numbers that describe the colors
+of the squares.
+Using this representation,
+we can process $N$ columns at the same time
+using bit operations, and the resulting algorithm
+works in $O(n^3/N)$ time.
+
+We generated a random grid of size $2500 \times 2500$
+and compared the original and bit-optimized implementation.
+While the original code took $29.6$ seconds,
+the bit-optimized version only took $3.1$ seconds
+with $N=32$ (\texttt{int} numbers) and $1.7$ seconds
+with $N=64$ (\texttt{long long} numbers).

 \section{Dynamic programming}