Counting subgrids

This commit is contained in:
Antti H S Laaksonen 2017-05-23 22:09:51 +03:00
parent 04f3c313cc
commit 7e729e8333
1 changed files with 93 additions and 16 deletions

View File

@ -375,33 +375,38 @@ Such optimizations do not change the
time complexity of the algorithm, time complexity of the algorithm,
but they may have a large impact but they may have a large impact
on the actual running time of the code. on the actual running time of the code.
In this section we discuss examples In this section we discuss two examples
of such situations. of such situations.
\subsubsection{Hamming distances} \subsubsection{Hamming distances}
\index{Hamming distance} \index{Hamming distance}
The \key{Hamming distance} between two bit strings The \key{Hamming distance}
of equal length is $\texttt{hamming}(a,b)$ between two
equal-length bit strings $a$ and $b$ is
the number of positions where the strings differ. the number of positions where the strings differ.
For example, the Hamming distance between For example, $\texttt{hamming}(01101,11001)=2$.
01101 and 11001 is 2.
Consider the following problem: We are given Consider the following problem: Given
a list of $n$ bit strings, each of length $k$, a list of $n$ bit strings, each of length $k$,
and our task is to calculate the minimum Hamming distance calculate the minimum Hamming distance
between two strings in the list. between two strings in the list.
For example, the minimum distance for the list For example, the answer for $[00111,01101,11110]$,
$[00111,01101,11101]$ is 2. is 2, because
\begin{itemize}[noitemsep]
\item $\texttt{hamming}(00111,01101)=2$,
\item $\texttt{hamming}(00111,11110)=3$, and
\item $\texttt{hamming}(01101,11110)=3$.
\end{itemize}
A straightforward way to solve the problem is A straightforward way to solve the problem is
to go through all pairs of string and calculate to go through all pairs of string and calculate
their Hamming distances. their Hamming distances,
Such an algorithm works in $O(n^2 k)$ time. which yields an $O(n^2 k)$ time algorithm.
The following function can be used to calculate The following function calculates
the Hamming distance between two strings: the Hamming distance between two strings:
\begin{lstlisting} \begin{lstlisting}
int distance(string a, string b) { int hamming(string a, string b) {
int d = 0; int d = 0;
for (int i = 0; i < k; i++) { for (int i = 0; i < k; i++) {
if (a[i] != b[i]) d++; if (a[i] != b[i]) d++;
@ -417,7 +422,7 @@ In particular, if $k \le 32$, we can just store
the strings as \texttt{int} values and use the the strings as \texttt{int} values and use the
following function to calculate distances: following function to calculate distances:
\begin{lstlisting} \begin{lstlisting}
int distance(int a, int b) { int hamming(int a, int b) {
return __builtin_popcount(a^b); return __builtin_popcount(a^b);
} }
\end{lstlisting} \end{lstlisting}
@ -431,11 +436,83 @@ To compare the implementations, we generated
a list of 10000 random bit strings of length 30. a list of 10000 random bit strings of length 30.
Using the first approach, the search took Using the first approach, the search took
13.5 seconds, and after the bit optimization, 13.5 seconds, and after the bit optimization,
it took only 0.5 seconds. it only took 0.5 seconds.
Thus, the bit optimized code was almost Thus, the bit optimized code was almost
30 times faster than the original code. 30 times faster than the original code.
\subsubsection{} \subsubsection{Counting subgrids}
As another example, consider the
following problem:
Given an $n \times n$ grid whose
each square is either black or white,
calculate the number of subgrids
whose all corners are black.
For example, the grid
\begin{center}
\begin{tikzpicture}[scale=0.5]
\fill[black] (1,1) rectangle (2,2);
\fill[black] (1,4) rectangle (2,5);
\fill[black] (4,1) rectangle (5,2);
\fill[black] (4,4) rectangle (5,5);
\fill[black] (1,3) rectangle (2,4);
\fill[black] (2,3) rectangle (3,4);
\fill[black] (2,1) rectangle (3,2);
\fill[black] (0,2) rectangle (1,3);
\draw (0,0) grid (5,5);
\end{tikzpicture}
\end{center}
contains two such subgrids:
\begin{center}
\begin{tikzpicture}[scale=0.5]
\fill[black] (1,1) rectangle (2,2);
\fill[black] (1,4) rectangle (2,5);
\fill[black] (4,1) rectangle (5,2);
\fill[black] (4,4) rectangle (5,5);
\fill[black] (1,3) rectangle (2,4);
\fill[black] (2,3) rectangle (3,4);
\fill[black] (2,1) rectangle (3,2);
\fill[black] (0,2) rectangle (1,3);
\draw (0,0) grid (5,5);
\fill[black] (7+1,1) rectangle (7+2,2);
\fill[black] (7+1,4) rectangle (7+2,5);
\fill[black] (7+4,1) rectangle (7+5,2);
\fill[black] (7+4,4) rectangle (7+5,5);
\fill[black] (7+1,3) rectangle (7+2,4);
\fill[black] (7+2,3) rectangle (7+3,4);
\fill[black] (7+2,1) rectangle (7+3,2);
\fill[black] (7+0,2) rectangle (7+1,3);
\draw (7+0,0) grid (7+5,5);
\draw[color=red,line width=1mm] (1,1) rectangle (3,4);
\draw[color=red,line width=1mm] (7+1,1) rectangle (7+5,5);
\end{tikzpicture}
\end{center}
There is an $O(n^3)$ time algorithm for solving the problem:
go through all $O(n^2)$ pairs of rows and for each pair
calculate the number of columns that contain a black
square in both rows in $O(n)$ time.
Then, if there are $c$ such columns for a fixed row pair,
they account for $c(c-1)/2$ subgrids with black corners.
To optimize this algorithm, we divide the grid into blocks
of columns such that each block consists of $N$
consecutive columns. Then, each row is stored as
a list of $N$-bit numbers that describe the colors
of the squares.
Using this representation,
we can process $N$ columns at the same time
using bit operations, and the resulting algorithm
works in $O(n^3/N)$ time.
We generated a random grid of size $2500 \times 2500$
and compared the original and bit-optimized implementation.
While the original code took $29.6$ seconds,
the bit-optimized version only took $3.1$ seconds
with $N=32$ (\texttt{int} numbers) and $1.7$ seconds
with $N=64$ (\texttt{long long} numbers).
\section{Dynamic programming} \section{Dynamic programming}