## 1. Shortest Path Problem

The **shortest path problem** is one of the most important and fundamental problems in graph theory. It has many real-world applications, such as finding the best driving directions and finding the minimum delay route in networking.

There are mainly two variants of the problem: **single-source shortest paths** (SSSP) and **all-pair shortest paths** (APSP). The most famous algorithms for SSSP are Bellman-Ford (applicable to arbitrary weights, $O(VE)$) and Dijkstra’s (applicable to nonnegative weights,$O(V \log V+E)$); the best-known algorithms for APSP are Floyd’s in $O(V^3)$ time and Johnson’s in $O(V^2 \log V + VE)$ time, which are both applicable to arbitrary weights; for nonnegative weights, running $V$ rounds of Dijkstra’s algorithm is fine.

## 2. Grid Graph

Formally speaking, a grid graph $P_w \times P_h$ is the Cartesian product of two paths. A grid graph can be drawn in a plane as a lattice. The following figure shows a grid graph $P_5 \times P_4$. Note that the weights of the edges are omitted.

We consider the shortest path problem in grid graphs, where edges are undirected and nonnegatively weighted. Specifically, we can preprocess the graph, and then answer several queries online. Each query contains two vertices $u, v$, which asks the shortest path between $u$ and $v$. Let $n = w \times h$ denote the size of the graph $P_w \times P_h$.

One naive solution is to do nothing in preprocessing, and for each query, just run an SSSP (e.g., Dijkstra’s). The time complexity is $O(1)/O(n \log n)$.

Another solution is to compute APSP during preprocessing. If we simply run $n$ rounds of SSSP algorithm, the time complexity is $O(n^2 \log n) / O(1)$.

Here, we present a nice algorithm that achieves $O(\sqrt{n})$ time per query after $O(n^{1.5} \log n)$ preprocessing. Compared with the above naive algorithms, the $O(n^{1.5} \log n) / O(\sqrt{n})$ algorithm features a good space-time tradeoff. The basic idea is divide and conquer.

## 3. The Algorithm

Assume w.l.o.g. that $w \geq h$. Also we use ordered pair $(x, y)$ $(1 \leq x \leq w, 1 \leq y \leq h)$ to represent a vertex in $P_w \times P_h$.

We define the *midline* of the graph as the vertices in the $\frac{w}{2}$th column. This is how we *divide* subproblems. Now we solve the queries where two vertices lie on different sides of the midline. Given to vertices $u, v$ where $u$ is in the left of the midline and $v$ is in the right of the midline, then every path between $u$ and $v$ must cross the midline, though it may cross the line left and right multiple times (as the following figure shows).

Now comes the key point. For the sake of convenience we denote the vertices in the midline as $M_i$, i.e. $M_i = (\frac{w}{2}, i)$. For every vertex in midline $M_i$, run an SSSP in preprocessing stage, and stores its distance to every other vertex. How does this information help? Consider a vertex $u$ left to the midline and a vertex $v$ right to the midline, since the shortest path between them must cross the midline, we can try all vertices in the midline, and simply pick the shortest path:

$$ d(u, v) = \min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v). $$

Since $w \geq h$, there are at most $\sqrt{n}$ vertices in the midline, so the preprocessing time is $O(\sqrt{n} \cdot n \log n)$, and the time per query is $O(\sqrt{n})$.

How should we answer the queries where two vertices lie on the same side of the midline? Note that we can’t simply solve the problem on the subgraph left to the midline, since the shortest path might possibly, though not necessarily, cross the midline.

But this is never a problem. We can update $d(u, v)$ with $\min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v)$ for each query $(u, v)$ not cross the midline, then recursively solve the problem on the left and right subgraphs.

The total preprocessing time follows this recursion:

$$ T(n) = 2T(n / 2) + O(n^{1.5} \log n). $$

By the master theorem, $T(n) = O(n^{1.5} \log n)$.

## 4. Remarks

From the view of implementation, the $O(\sqrt{n})$ query time can be done very efficiently, since it just computes the minimum component of the sum of two vectors. This is SIMD- and cache-friendly. Also, since in preprocessing stage we need to run $h$ independent rounds of SSSP algorithm, the preprocessing part can be easily parallelized.