# Shortest Paths in Grid Graphs

## 1. Shortest Path Problem

The shortest path problem is one of the most important and fundamental problems in graph theory. It has many real-world applications, such as finding the best driving directions and finding the minimum delay route in networking.

There are mainly two variants of the problem: single-source shortest paths (SSSP) and all-pair shortest paths (APSP). The most famous algorithms for SSSP are Bellman-Ford (applicable to arbitrary weights, $O(VE)$) and Dijkstra’s (applicable to nonnegative weights,$O(V \log V+E)$); the best-known algorithms for APSP are Floyd’s in $O(V^3)$ time and Johnson’s in $O(V^2 \log V + VE)$ time, which are both applicable to arbitrary weights; for nonnegative weights, running $V$ rounds of Dijkstra’s algorithm is fine.

## 2. Grid Graph

Formally speaking, a grid graph $P_w \times P_h$ is the Cartesian product of two paths. A grid graph can be drawn in a plane as a lattice. The following figure shows a grid graph $P_5 \times P_4$. Note that the weights of the edges are omitted.

We consider the shortest path problem in grid graphs, where edges are undirected and nonnegatively weighted. Specifically, we can preprocess the graph, and then answer several queries online. Each query contains two vertices $u, v$, which asks the shortest path between $u$ and $v$. Let $n = w \times h$ denote the size of the graph $P_w \times P_h$.

One naive solution is to do nothing in preprocessing, and for each query, just run an SSSP (e.g., Dijkstra’s). The time complexity is $O(1)/O(n \log n)$.

Another solution is to compute APSP during preprocessing. If we simply run $n$ rounds of SSSP algorithm, the time complexity is $O(n^2 \log n) / O(1)$.

Here, we present a nice algorithm that achieves $O(\sqrt{n})$ time per query after $O(n^{1.5} \log n)$ preprocessing. Compared with the above naive algorithms, the $O(n^{1.5} \log n) / O(\sqrt{n})$ algorithm features a good space-time tradeoff. The basic idea is divide and conquer.

## 3. The Algorithm

Assume w.l.o.g. that $w \geq h$. Also we use ordered pair $(x, y)$ $(1 \leq x \leq w, 1 \leq y \leq h)$ to represent a vertex in $P_w \times P_h$.

We define the midline of the graph as the vertices in the $\frac{w}{2}$th column. This is how we divide subproblems. Now we solve the queries where two vertices lie on different sides of the midline. Given to vertices $u, v$ where $u$ is in the left of the midline and $v$ is in the right of the midline, then every path between $u$ and $v$ must cross the midline, though it may cross the line left and right multiple times (as the following figure shows).

Now comes the key point. For the sake of convenience we denote the vertices in the midline as $M_i$, i.e. $M_i = (\frac{w}{2}, i)$. For every vertex in midline $M_i$, run an SSSP in preprocessing stage, and stores its distance to every other vertex. How does this information help? Consider a vertex $u$ left to the midline and a vertex $v$ right to the midline, since the shortest path between them must cross the midline, we can try all vertices in the midline, and simply pick the shortest path:
$$d(u, v) = \min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v).$$
Since $w \geq h$, there are at most $\sqrt{n}$ vertices in the midline, so the preprocessing time is $O(\sqrt{n} \cdot n \log n)$, and the time per query is $O(\sqrt{n})$.

How should we answer the queries where two vertices lie on the same side of the midline? Note that we can’t simply solve the problem on the subgraph left to the midline, since the shortest path might possibly, though not necessarily, cross the midline.

But this is never a problem. We can update $d(u, v)$ with $\min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v)$ for each query $(u, v)$ not cross the midline, then recursively solve the problem on the left and right subgraphs.

The total preprocessing time follows this recursion:
$$T(n) = 2T(n / 2) + O(n^{1.5} \log n).$$
By the master theorem, $T(n) = O(n^{1.5} \log n)$.

## 4. Remarks

From the view of implementation, the $O(\sqrt{n})$ query time can be done very efficiently, since it just computes the minimum component of the sum of two vectors. This is SIMD- and cache-friendly. Also, since in preprocessing stage we need to run $h$ independent rounds of SSSP algorithm, the preprocessing part can be easily parallelized.

# Why std::pair and std::tuple differ in size?

It is known that std::tuple (since c++11) is a generalization of std::pair. However, they exhibit some slight differences in some cases.

Consider the following code:

struct dummy {};

int main() {
std::cout << sizeof(int) << std::endl;
std::cout << sizeof(std::pair<int, dummy>) << std::endl;
std::cout << sizeof(std::tuple<int, dummy>) << std::endl;
}

When compiling with g++ and libstdc++, the output might be 4, 8, 4. Why std::tuple<int, dummy> takes less space than std::pair<int, dummy>?

This is a nontrivial problem. Unlike C, empty structs (and classes) are supported in C++; however, they must take nonzero space (typically 1 byte) so that every instance of the empty struct has unique address.

The typical implementation of pair<T1, T2> is simply a struct with two members T1 first and T2 second. For std::pair<int, dummy>, it contains two members, of type int and dummy respectively. The total size is 5, however, due to padding, the pair actually takes 8 bytes.

Since the number of members of a tuple is variable, its implementation is more complex. In libstdc++, the tuple is implemented as double inheritance: _Tuple_impl<id, X, Y, Z> inherits _Head_base<id, X> and _Tuple_impl<id + 1, Y, Z>, and _Head_base<id, X> again inherits X if X is an empty non-final struct. In such case, _Head_base<id, X> is also empty and thus the compiler is allowed to perform Empty Base Optimization (EBO), that is, the empty base need not take up any space. That’s why std::pair<int, dummy> takes only 4 bytes.