# Shortest Paths in Grid Graphs

## 1. Shortest Path Problem

The shortest path problem is one of the most important and fundamental problems in graph theory. It has many real-world applications, such as finding the best driving directions and finding the minimum delay route in networking.

There are mainly two variants of the problem: single-source shortest paths (SSSP) and all-pair shortest paths (APSP). The most famous algorithms for SSSP are Bellman-Ford (applicable to arbitrary weights, $O(VE)$) and Dijkstra’s (applicable to nonnegative weights,$O(V \log V+E)$); the best-known algorithms for APSP are Floyd’s in $O(V^3)$ time and Johnson’s in $O(V^2 \log V + VE)$ time, which are both applicable to arbitrary weights; for nonnegative weights, running $V$ rounds of Dijkstra’s algorithm is fine.

## 2. Grid Graph

Formally speaking, a grid graph $P_w \times P_h$ is the Cartesian product of two paths. A grid graph can be drawn in a plane as a lattice. The following figure shows a grid graph $P_5 \times P_4$. Note that the weights of the edges are omitted.

We consider the shortest path problem in grid graphs, where edges are undirected and nonnegatively weighted. Specifically, we can preprocess the graph, and then answer several queries online. Each query contains two vertices $u, v$, which asks the shortest path between $u$ and $v$. Let $n = w \times h$ denote the size of the graph $P_w \times P_h$.

One naive solution is to do nothing in preprocessing, and for each query, just run an SSSP (e.g., Dijkstra’s). The time complexity is $O(1)/O(n \log n)$.

Another solution is to compute APSP during preprocessing. If we simply run $n$ rounds of SSSP algorithm, the time complexity is $O(n^2 \log n) / O(1)$.

Here, we present a nice algorithm that achieves $O(\sqrt{n})$ time per query after $O(n^{1.5} \log n)$ preprocessing. Compared with the above naive algorithms, the $O(n^{1.5} \log n) / O(\sqrt{n})$ algorithm features a good space-time tradeoff. The basic idea is divide and conquer.

## 3. The Algorithm

Assume w.l.o.g. that $w \geq h$. Also we use ordered pair $(x, y)$ $(1 \leq x \leq w, 1 \leq y \leq h)$ to represent a vertex in $P_w \times P_h$.

We define the midline of the graph as the vertices in the $\frac{w}{2}$th column. This is how we divide subproblems. Now we solve the queries where two vertices lie on different sides of the midline. Given to vertices $u, v$ where $u$ is in the left of the midline and $v$ is in the right of the midline, then every path between $u$ and $v$ must cross the midline, though it may cross the line left and right multiple times (as the following figure shows).

Now comes the key point. For the sake of convenience we denote the vertices in the midline as $M_i$, i.e. $M_i = (\frac{w}{2}, i)$. For every vertex in midline $M_i$, run an SSSP in preprocessing stage, and stores its distance to every other vertex. How does this information help? Consider a vertex $u$ left to the midline and a vertex $v$ right to the midline, since the shortest path between them must cross the midline, we can try all vertices in the midline, and simply pick the shortest path:
$$d(u, v) = \min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v).$$
Since $w \geq h$, there are at most $\sqrt{n}$ vertices in the midline, so the preprocessing time is $O(\sqrt{n} \cdot n \log n)$, and the time per query is $O(\sqrt{n})$.

How should we answer the queries where two vertices lie on the same side of the midline? Note that we can’t simply solve the problem on the subgraph left to the midline, since the shortest path might possibly, though not necessarily, cross the midline.

But this is never a problem. Just add the midline along with the edges representing APSP between the midline vertices, and solve the problem on this subgraph. It can be verified that the number of edges is still linear to the number of vertices even if we add these APSP edges, and recursively apply the above algorithm on this subgraph.

The total preprocessing time follows this recursion:
$$T(n) = 2T(n / 2) + O(n^{1.5} \log n).$$
By the master theorem, $T(n) = O(n^{1.5} \log n)$.

## 4. Remarks

1. From the view of implementation, the $O(\sqrt{n})$ query time can be done very efficiently, since it just computes the minimum component of the sum of two vectors. This is SIMD- and cache-friendly. Also, since in preprocessing stage we need to run $h$ independent rounds of SSSP algorithm, the preprocessing part can be easily parallelized.
2. I believe the $O(n^{1.5} \log n)$ preprocessing time can be improved, by reusing the shortest distance information across the subproblems, but I have no idea about how to do that.

# Why std::pair and std::tuple differ in size?

It is known that std::tuple (since c++11) is a generalization of std::pair. However, they exhibit some slight differences in some cases.

Consider the following code:

struct dummy {};

int main() {
std::cout << sizeof(int) << std::endl;
std::cout << sizeof(std::pair<int, dummy>) << std::endl;
std::cout << sizeof(std::tuple<int, dummy>) << std::endl;
}

When compiling with g++ and libstdc++, the output might be 4, 8, 4. Why std::tuple<int, dummy> takes less space than std::pair<int, dummy>?

This is a nontrivial problem. Unlike C, empty structs (and classes) are supported in C++; however, they must take nonzero space (typically 1 byte) so that every instance of the empty struct has unique address.

The typical implementation of pair<T1, T2> is simply a struct with two members T1 first and T2 second. For std::pair<int, dummy>, it contains two members, of type int and dummy respectively. The total size is 5, however, due to padding, the pair actually takes 8 bytes.

Since the number of members of a tuple is variable, its implementation is more complex. In libstdc++, the tuple is implemented as double inheritance: _Tuple_impl<id, X, Y, Z> inherits _Head_base<id, X> and _Tuple_impl<id + 1, Y, Z>, and _Head_base<id, X> again inherits X if X is an empty non-final struct. In such case, _Head_base<id, X> is also empty and thus the compiler is allowed to perform Empty Base Optimization (EBO), that is, the empty base need not take up any space. That’s why std::pair<int, dummy> takes only 4 bytes.

# Range Minimum Query and Lowest Common Ancestor in O(n)/O(1) Time: An Overview

We show that the Range Minimum Query (RMQ) problem and Lowest Common Ancestor (LCA) problem can both be solved in O(1) per query after O(n) preprocessing time. The method is to reduce the two problems to +1/-1 RMQ, and solve the +1/-1 RMQ problem in O(n)/O(1) time.

## 1. The Three Problems

Problem 1 (Range Minimum Query, RMQ): Given an integer array of size $n$: $a[1], …, a[n]$. A range minimum query $\text{RMQ}_a(l, r) = (\arg) \min_{l \leq i \leq r} a[i]$, returns the value or the position of the minimum element in subarray $a[l…r]$.

Problem 2 (Lowest Common Ancestor, LCA): Given a rooted tree $T$. A lowest common ancestor query $\text{LCA}_T(u, v)$ returns a lowest node that has both $u, v$ as descendants.

Problem 3 (+1/-1 RMQ): +1/-1 RMQ is a special case of RMQ, where the difference of two adjacent elements in the given array is either +1 or -1.

## 2. Equivalence Between the Three Problems

In this section, we show that the three problems are mutually reducible in linear time. Note that the reduction from +1/-1 RMQ to RMQ is immediate.

### 2.1 Reduction from RMQ to LCA

The reduction from RMQ to LCA is to convert the array into Cartesian tree. The Cartesian tree of an array is a binary tree with heap property, and the in-order traversal gives the original array. Note that the minimum element in $a[l…r]$ corresponds to the LCA of $l$ and $r$ in Cartesian tree, and vice versa.

We may convert the array into Cartesian tree in linear time. Just add the element one by one, and maintain a pointer to the rightmost element of the current tree. When a new element comes, moves the pointer up until the number of current node is less than the new element, or until reaching the root of the tree. Then insert a node here, and place the original child subtree as the left subtree of the new node. Define the potential as the depth of the pointee, so the amortized time is $O(1)$ per insertion, and the total time complexity is $O(n)$.

### 2.2 Reduction from LCA to +1/-1 RMQ

To reduce the LCA problem into +1/-1 RMQ problem, we first label each vertex its depth. Then we run depth first search, and output the depth of the current node before visiting any child node and after visiting every child node. The resulting sequence is the Euler tour traversal of the original tree. To compute the LCA of $u$ and $v$, just find any occurrence position of $u$ and $v$, and find the range minimum between them. Since the depth of two adjacent nodes in Euler tour differ by at most 1, this is a +1/-1 RMQ problem.

Note that the number of occurrences of each node is the number of its children, plus 1. The length of the resulting sequence is $2n-1$, which is still linear.

## 3. Solving RMQ in O(n log n)/O(1) Time via Sparse Table

To achieve O(1) query time, a naive solution is to precalculate all $O(n^2)$ queries, which takes too much preprocessing time. In fact, we do not need to preprocess so many answers. The sparse table technique only preprocesses the RMQ of intervals of length power of 2. To answer RMQ query $\text{RMQ}(l, r)$, just return $\min\{\text{RMQ}(l, l+2^k-1), \text{RMQ}(r-2^k+1, r)\}$, where $k$ is the maximum integer such that $2^k \leq r – l + 1$. This actually uses two intervals to cover the entire range; due to the idempotence of minimum operation, the overlapped part does not affect the answer. There are at most $O(\log n)$ powers of 2 not exceeding $n$, and for each power of 2 we have $O(n)$ intervals to preprocess, so the total preprocess time is $O(n \log n)$.

## 4. Solving +1/-1 RMQ in O(n)/O(1) Time via Indirection

Our final step is to shave off the log factor. We use the indirection technique to remove the log factor, but only for +1/-1 RMQ.

We split the original array $a$ into segments of length $\frac{1}{2} \log n$. For each segment, we replace it with the minimum value, and we obtain a new sequence of length $O(\frac{n}{\log n})$. Now every interval that spans multiple segments can generally be decomposed into several contiguous segments and two intervals entirely within some segment. Using sparse table technique we may answer the minimum value in some contiguous segments in $O(1)$ time, with $O(n)$ preprocessing time (the log factor cancels). The remaining part is to answer the RMQ for intervals within segments.

Note that the minimum operation distributes over addition: $\min\{a, b\} + c = \min\{a + c, b + c\}$. Hence, for RMQ problem, the first element of the array doesn’t matter; only the difference matters. How many essentially difference +1/-1 RMQ instances of length $\frac{1}{2}\log n$ are there? Only $O(\sqrt{n})$, since there are only so many different difference arrays. And, for each array of length $\frac{1}{2} \log n$, there are only $O(\log^2 n)$ different intervals. We may set up a lookup table for all intervals of all possible instances of size $\frac{1}{2} \log n$ in $O(\sqrt{n} \log^2 n)$ time. This is dominated by the sparse table part, so the total preprocessing time is still $O(n)$.

For queries that entirely lies in some segment, just read the corresponding value in the lookup table; otherwise, the interval is decomposed into several contiguous segments, which can be answered by sparse table in constant time, and two intervals within some segments, which can be answered by lookup table. The total time per query is therefore $O(1)$.

# Counting independent sets in O(1.619^n) time

Given a graph $G=(V, E)$, an independent set $I$ of $G$ is a subset of $V$, such that every two vertices in $I$ are not adjacent. We give an algorithm to count the number of independent sets of a given graph on $n$ vertices. The main idea of the algorithm is divide and conquer.

Counting independent sets of $G$: CIS(G)

1. If $G$ contains multiple connected components, count the independent sets of each component and multiply the numbers.
2. If $G$ contains only one vertex, return 1.
3. Otherwise, arbitrarily select a vertex $v$. Remove $v$ and count the number of independent sets of the remaining graph. Remove $v$ and its neighbors and count the number of independent sets of the remaining graph. Return the sum of two.

To see the O(1.619^n) running time, note that in step 3, removing $v$ decreases the number of vertices by one, while removing $v$ and its neighbors decreases the number of vertices by at least two. The recurrent is identical to Fibonacci sequence.