Counting subgrids
This commit is contained in:
parent
04f3c313cc
commit
7e729e8333
109
chapter10.tex
109
chapter10.tex
|
@ -375,33 +375,38 @@ Such optimizations do not change the
|
||||||
time complexity of the algorithm,
|
time complexity of the algorithm,
|
||||||
but they may have a large impact
|
but they may have a large impact
|
||||||
on the actual running time of the code.
|
on the actual running time of the code.
|
||||||
In this section we discuss examples
|
In this section we discuss two examples
|
||||||
of such situations.
|
of such situations.
|
||||||
|
|
||||||
\subsubsection{Hamming distances}
|
\subsubsection{Hamming distances}
|
||||||
|
|
||||||
\index{Hamming distance}
|
\index{Hamming distance}
|
||||||
The \key{Hamming distance} between two bit strings
|
The \key{Hamming distance}
|
||||||
of equal length is
|
$\texttt{hamming}(a,b)$ between two
|
||||||
|
equal-length bit strings $a$ and $b$ is
|
||||||
the number of positions where the strings differ.
|
the number of positions where the strings differ.
|
||||||
For example, the Hamming distance between
|
For example, $\texttt{hamming}(01101,11001)=2$.
|
||||||
01101 and 11001 is 2.
|
|
||||||
|
|
||||||
Consider the following problem: We are given
|
Consider the following problem: Given
|
||||||
a list of $n$ bit strings, each of length $k$,
|
a list of $n$ bit strings, each of length $k$,
|
||||||
and our task is to calculate the minimum Hamming distance
|
calculate the minimum Hamming distance
|
||||||
between two strings in the list.
|
between two strings in the list.
|
||||||
For example, the minimum distance for the list
|
For example, the answer for $[00111,01101,11110]$,
|
||||||
$[00111,01101,11101]$ is 2.
|
is 2, because
|
||||||
|
\begin{itemize}[noitemsep]
|
||||||
|
\item $\texttt{hamming}(00111,01101)=2$,
|
||||||
|
\item $\texttt{hamming}(00111,11110)=3$, and
|
||||||
|
\item $\texttt{hamming}(01101,11110)=3$.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
A straightforward way to solve the problem is
|
A straightforward way to solve the problem is
|
||||||
to go through all pairs of string and calculate
|
to go through all pairs of string and calculate
|
||||||
their Hamming distances.
|
their Hamming distances,
|
||||||
Such an algorithm works in $O(n^2 k)$ time.
|
which yields an $O(n^2 k)$ time algorithm.
|
||||||
The following function can be used to calculate
|
The following function calculates
|
||||||
the Hamming distance between two strings:
|
the Hamming distance between two strings:
|
||||||
\begin{lstlisting}
|
\begin{lstlisting}
|
||||||
int distance(string a, string b) {
|
int hamming(string a, string b) {
|
||||||
int d = 0;
|
int d = 0;
|
||||||
for (int i = 0; i < k; i++) {
|
for (int i = 0; i < k; i++) {
|
||||||
if (a[i] != b[i]) d++;
|
if (a[i] != b[i]) d++;
|
||||||
|
@ -417,7 +422,7 @@ In particular, if $k \le 32$, we can just store
|
||||||
the strings as \texttt{int} values and use the
|
the strings as \texttt{int} values and use the
|
||||||
following function to calculate distances:
|
following function to calculate distances:
|
||||||
\begin{lstlisting}
|
\begin{lstlisting}
|
||||||
int distance(int a, int b) {
|
int hamming(int a, int b) {
|
||||||
return __builtin_popcount(a^b);
|
return __builtin_popcount(a^b);
|
||||||
}
|
}
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
|
@ -431,11 +436,83 @@ To compare the implementations, we generated
|
||||||
a list of 10000 random bit strings of length 30.
|
a list of 10000 random bit strings of length 30.
|
||||||
Using the first approach, the search took
|
Using the first approach, the search took
|
||||||
13.5 seconds, and after the bit optimization,
|
13.5 seconds, and after the bit optimization,
|
||||||
it took only 0.5 seconds.
|
it only took 0.5 seconds.
|
||||||
Thus, the bit optimized code was almost
|
Thus, the bit optimized code was almost
|
||||||
30 times faster than the original code.
|
30 times faster than the original code.
|
||||||
|
|
||||||
\subsubsection{}
|
\subsubsection{Counting subgrids}
|
||||||
|
|
||||||
|
As another example, consider the
|
||||||
|
following problem:
|
||||||
|
Given an $n \times n$ grid whose
|
||||||
|
each square is either black or white,
|
||||||
|
calculate the number of subgrids
|
||||||
|
whose all corners are black.
|
||||||
|
For example, the grid
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[scale=0.5]
|
||||||
|
\fill[black] (1,1) rectangle (2,2);
|
||||||
|
\fill[black] (1,4) rectangle (2,5);
|
||||||
|
\fill[black] (4,1) rectangle (5,2);
|
||||||
|
\fill[black] (4,4) rectangle (5,5);
|
||||||
|
\fill[black] (1,3) rectangle (2,4);
|
||||||
|
\fill[black] (2,3) rectangle (3,4);
|
||||||
|
\fill[black] (2,1) rectangle (3,2);
|
||||||
|
\fill[black] (0,2) rectangle (1,3);
|
||||||
|
\draw (0,0) grid (5,5);
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
contains two such subgrids:
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[scale=0.5]
|
||||||
|
\fill[black] (1,1) rectangle (2,2);
|
||||||
|
\fill[black] (1,4) rectangle (2,5);
|
||||||
|
\fill[black] (4,1) rectangle (5,2);
|
||||||
|
\fill[black] (4,4) rectangle (5,5);
|
||||||
|
\fill[black] (1,3) rectangle (2,4);
|
||||||
|
\fill[black] (2,3) rectangle (3,4);
|
||||||
|
\fill[black] (2,1) rectangle (3,2);
|
||||||
|
\fill[black] (0,2) rectangle (1,3);
|
||||||
|
\draw (0,0) grid (5,5);
|
||||||
|
|
||||||
|
\fill[black] (7+1,1) rectangle (7+2,2);
|
||||||
|
\fill[black] (7+1,4) rectangle (7+2,5);
|
||||||
|
\fill[black] (7+4,1) rectangle (7+5,2);
|
||||||
|
\fill[black] (7+4,4) rectangle (7+5,5);
|
||||||
|
\fill[black] (7+1,3) rectangle (7+2,4);
|
||||||
|
\fill[black] (7+2,3) rectangle (7+3,4);
|
||||||
|
\fill[black] (7+2,1) rectangle (7+3,2);
|
||||||
|
\fill[black] (7+0,2) rectangle (7+1,3);
|
||||||
|
\draw (7+0,0) grid (7+5,5);
|
||||||
|
|
||||||
|
\draw[color=red,line width=1mm] (1,1) rectangle (3,4);
|
||||||
|
\draw[color=red,line width=1mm] (7+1,1) rectangle (7+5,5);
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
There is an $O(n^3)$ time algorithm for solving the problem:
|
||||||
|
go through all $O(n^2)$ pairs of rows and for each pair
|
||||||
|
calculate the number of columns that contain a black
|
||||||
|
square in both rows in $O(n)$ time.
|
||||||
|
Then, if there are $c$ such columns for a fixed row pair,
|
||||||
|
they account for $c(c-1)/2$ subgrids with black corners.
|
||||||
|
|
||||||
|
To optimize this algorithm, we divide the grid into blocks
|
||||||
|
of columns such that each block consists of $N$
|
||||||
|
consecutive columns. Then, each row is stored as
|
||||||
|
a list of $N$-bit numbers that describe the colors
|
||||||
|
of the squares.
|
||||||
|
Using this representation,
|
||||||
|
we can process $N$ columns at the same time
|
||||||
|
using bit operations, and the resulting algorithm
|
||||||
|
works in $O(n^3/N)$ time.
|
||||||
|
|
||||||
|
We generated a random grid of size $2500 \times 2500$
|
||||||
|
and compared the original and bit-optimized implementation.
|
||||||
|
While the original code took $29.6$ seconds,
|
||||||
|
the bit-optimized version only took $3.1$ seconds
|
||||||
|
with $N=32$ (\texttt{int} numbers) and $1.7$ seconds
|
||||||
|
with $N=64$ (\texttt{long long} numbers).
|
||||||
|
|
||||||
\section{Dynamic programming}
|
\section{Dynamic programming}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue