We present a polynomial algorithm that solves this problem. The algorithm adds the edges one by one and maintains a set of forests in an incremental fashion. Note that adding an edge increases the minimum number of forests by at most 1. The framework of the algorithm is as follows.

- Initialize the set of forests $\mathcal{F}$ to $\varnothing$;
- For each edge $e \in E(G)$, if it is possible to add $e$ by not increasing the minimum number of forests, then add it; otherwise, let $\mathcal{F} \leftarrow \mathcal{F} + \{e\}$.

The hard part is to determine if it is possible to add an edge without increasing the number of forests. Note that simply trying to add the edge into every forest is incorrect because it is possible to move some edges between the forests to make room for the new edge to fit in some forest.

The algorithm is to build an auxiliary directed graph and find an augmenting path. The vertex set of the auxiliary graph is $\mathcal{F} \cup E’ \cup \{e\}$, where $E’$ is the set of edges that are already added. There is an edge $e_1 (\in E) \rightarrow f (\in \mathcal{F})$ if $f \cup \{e\}$ is acyclic, and there is an edge $e_1 (\in E’) \rightarrow e_2 (\in E’)$ if replacing $e_2$ with $e_1$ in the forest of $e_2$ yields a valid forest. If there exists a directed path from $e$ to any $f$, then performing all operations represented by these edges simultaneously yields a set of forests of the same size; otherwise, we must increment the number of forests.

This problem is a special case of the **matroid partition problem**, which asks the minimum number of independent sets to which a matroid can be partitioned. The general problem is also polynomially solvable; an algorithm similar to the one presented above works.

- Arboricity. Wikipedia, https://en.wikipedia.org/wiki/Arboricity.
- Edmonds, Jack (1965), “Minimum partition of a matroid into independent subsets”, Journal of Research of the National Bureau of Standards, 69B: 67–72.

The **shortest path problem** is one of the most important and fundamental problems in graph theory. It has many real-world applications, such as finding the best driving directions and finding the minimum delay route in networking.

There are mainly two variants of the problem: **single-source shortest paths** (SSSP) and **all-pair shortest paths** (APSP). The most famous algorithms for SSSP are Bellman-Ford (applicable to arbitrary weights, $O(VE)$) and Dijkstra’s (applicable to nonnegative weights,$O(V \log V+E)$); the best-known algorithms for APSP are Floyd’s in $O(V^3)$ time and Johnson’s in $O(V^2 \log V + VE)$ time, which are both applicable to arbitrary weights; for nonnegative weights, running $V$ rounds of Dijkstra’s algorithm is fine.

Formally speaking, a grid graph $P_w \times P_h$ is the Cartesian product of two paths. A grid graph can be drawn in a plane as a lattice. The following figure shows a grid graph $P_5 \times P_4$. Note that the weights of the edges are omitted.

We consider the shortest path problem in grid graphs, where edges are undirected and nonnegatively weighted. Specifically, we can preprocess the graph, and then answer several queries online. Each query contains two vertices $u, v$, which asks the shortest path between $u$ and $v$. Let $n = w \times h$ denote the size of the graph $P_w \times P_h$.

One naive solution is to do nothing in preprocessing, and for each query, just run an SSSP (e.g., Dijkstra’s). The time complexity is $O(1)/O(n \log n)$.

Another solution is to compute APSP during preprocessing. If we simply run $n$ rounds of SSSP algorithm, the time complexity is $O(n^2 \log n) / O(1)$.

Here, we present a nice algorithm that achieves $O(\sqrt{n})$ time per query after $O(n^{1.5} \log n)$ preprocessing. Compared with the above naive algorithms, the $O(n^{1.5} \log n) / O(\sqrt{n})$ algorithm features a good space-time tradeoff. The basic idea is divide and conquer.

Assume w.l.o.g. that $w \geq h$. Also we use ordered pair $(x, y)$ $(1 \leq x \leq w, 1 \leq y \leq h)$ to represent a vertex in $P_w \times P_h$.

We define the *midline* of the graph as the vertices in the $\frac{w}{2}$th column. This is how we *divide* subproblems. Now we solve the queries where two vertices lie on different sides of the midline. Given to vertices $u, v$ where $u$ is in the left of the midline and $v$ is in the right of the midline, then every path between $u$ and $v$ must cross the midline, though it may cross the line left and right multiple times (as the following figure shows).

Now comes the key point. For the sake of convenience we denote the vertices in the midline as $M_i$, i.e. $M_i = (\frac{w}{2}, i)$. For every vertex in midline $M_i$, run an SSSP in preprocessing stage, and stores its distance to every other vertex. How does this information help? Consider a vertex $u$ left to the midline and a vertex $v$ right to the midline, since the shortest path between them must cross the midline, we can try all vertices in the midline, and simply pick the shortest path:

$$ d(u, v) = \min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v). $$

Since $w \geq h$, there are at most $\sqrt{n}$ vertices in the midline, so the preprocessing time is $O(\sqrt{n} \cdot n \log n)$, and the time per query is $O(\sqrt{n})$.

How should we answer the queries where two vertices lie on the same side of the midline? Note that we can’t simply solve the problem on the subgraph left to the midline, since the shortest path might possibly, though not necessarily, cross the midline.

But this is never a problem. We can update $d(u, v)$ with $\min_{1 \leq i \leq h} d(u, M_i) + d(M_i, v)$ for each query $(u, v)$ not cross the midline, then recursively solve the problem on the left and right subgraphs.

The total preprocessing time follows this recursion:

$$ T(n) = 2T(n / 2) + O(n^{1.5} \log n). $$

By the master theorem, $T(n) = O(n^{1.5} \log n)$.

From the view of implementation, the $O(\sqrt{n})$ query time can be done very efficiently, since it just computes the minimum component of the sum of two vectors. This is SIMD- and cache-friendly. Also, since in preprocessing stage we need to run $h$ independent rounds of SSSP algorithm, the preprocessing part can be easily parallelized.

]]>`std::tuple`

(since c++11) is a generalization of `std::pair`

. However, they exhibit some slight differences in some cases.
Consider the following code:

struct dummy {}; int main() { std::cout << sizeof(int) << std::endl; std::cout << sizeof(std::pair<int, dummy>) << std::endl; std::cout << sizeof(std::tuple<int, dummy>) << std::endl; }

When compiling with g++ and libstdc++, the output might be 4, 8, 4. Why `std::tuple<int, dummy>`

takes less space than `std::pair<int, dummy>`

?

This is a nontrivial problem. Unlike C, empty `struct`

s (and `class`

es) are supported in C++; however, they must take nonzero space (typically 1 byte) so that every instance of the empty struct has unique address.

The typical implementation of `pair<T1, T2>`

is simply a struct with two members `T1 first`

and `T2 second`

. For `std::pair<int, dummy>`

, it contains two members, of type `int`

and `dummy`

respectively. The total size is 5, however, due to padding, the pair actually takes 8 bytes.

Since the number of members of a `tuple`

is variable, its implementation is more complex. In libstdc++, the `tuple`

is implemented as double inheritance: `_Tuple_impl<id, X, Y, Z>`

inherits `_Head_base<id, X>`

and `_Tuple_impl<id + 1, Y, Z>`

, and `_Head_base<id, X>`

again inherits `X`

if `X`

is an empty non-final struct. In such case, `_Head_base<id, X>`

is also empty and thus the compiler is allowed to perform Empty Base Optimization (EBO), that is, the empty base need not take up any space. That’s why `std::pair<int, dummy>`

takes only 4 bytes.

Problem 1 (Range Minimum Query, RMQ): Given an integer array of size $n$: $a[1], …, a[n]$. A range minimum query $\text{RMQ}_a(l, r) = (\arg) \min_{l \leq i \leq r} a[i]$, returns the value or the position of the minimum element in subarray $a[l…r]$.

Problem 2 (Lowest Common Ancestor, LCA): Given a rooted tree $T$. A lowest common ancestor query $\text{LCA}_T(u, v)$ returns a lowest node that has both $u, v$ as descendants.

Problem 3 (+1/-1 RMQ): +1/-1 RMQ is a special case of RMQ, where the difference of two adjacent elements in the given array is either +1 or -1.

In this section, we show that the three problems are mutually reducible in linear time. Note that the reduction from +1/-1 RMQ to RMQ is immediate.

The reduction from RMQ to LCA is to convert the array into Cartesian tree. The Cartesian tree of an array is a binary tree with heap property, and the in-order traversal gives the original array. Note that the minimum element in $a[l…r]$ corresponds to the LCA of $l$ and $r$ in Cartesian tree, and vice versa.

We may convert the array into Cartesian tree in linear time. Just add the element one by one, and maintain a pointer to the rightmost element of the current tree. When a new element comes, moves the pointer up until the number of current node is less than the new element, or until reaching the root of the tree. Then insert a node here, and place the original child subtree as the left subtree of the new node. Define the potential as the depth of the pointee, so the amortized time is $O(1)$ per insertion, and the total time complexity is $O(n)$.

To reduce the LCA problem into +1/-1 RMQ problem, we first label each vertex its depth. Then we run depth first search, and output the depth of the current node before visiting any child node and after visiting every child node. The resulting sequence is the Euler tour traversal of the original tree. To compute the LCA of $u$ and $v$, just find any occurrence position of $u$ and $v$, and find the range minimum between them. Since the depth of two adjacent nodes in Euler tour differ by at most 1, this is a +1/-1 RMQ problem.

Note that the number of occurrences of each node is the number of its children, plus 1. The length of the resulting sequence is $2n-1$, which is still linear.

To achieve O(1) query time, a naive solution is to precalculate all $O(n^2)$ queries, which takes too much preprocessing time. In fact, we do not need to preprocess so many answers. The sparse table technique only preprocesses the RMQ of intervals of length power of 2. To answer RMQ query $\text{RMQ}(l, r)$, just return $\min\{\text{RMQ}(l, l+2^k-1), \text{RMQ}(r-2^k+1, r)\}$, where $k$ is the maximum integer such that $2^k \leq r – l + 1$. This actually uses two intervals to cover the entire range; due to the idempotence of minimum operation, the overlapped part does not affect the answer. There are at most $O(\log n)$ powers of 2 not exceeding $n$, and for each power of 2 we have $O(n)$ intervals to preprocess, so the total preprocess time is $O(n \log n)$.

Our final step is to shave off the log factor. We use the indirection technique to remove the log factor, but only for +1/-1 RMQ.

We split the original array $a$ into segments of length $\frac{1}{2} \log n$. For each segment, we replace it with the minimum value, and we obtain a new sequence of length $O(\frac{n}{\log n})$. Now every interval that spans multiple segments can generally be decomposed into several contiguous segments and two intervals entirely within some segment. Using sparse table technique we may answer the minimum value in some contiguous segments in $O(1)$ time, with $O(n)$ preprocessing time (the log factor cancels). The remaining part is to answer the RMQ for intervals within segments.

Note that the minimum operation distributes over addition: $\min\{a, b\} + c = \min\{a + c, b + c\}$. Hence, for RMQ problem, the first element of the array doesn’t matter; only the difference matters. How many essentially difference +1/-1 RMQ instances of length $\frac{1}{2}\log n$ are there? Only $O(\sqrt{n})$, since there are only so many different difference arrays. And, for each array of length $\frac{1}{2} \log n$, there are only $O(\log^2 n)$ different intervals. We may set up a lookup table for all intervals of all possible instances of size $\frac{1}{2} \log n$ in $O(\sqrt{n} \log^2 n)$ time. This is dominated by the sparse table part, so the total preprocessing time is still $O(n)$.

For queries that entirely lies in some segment, just read the corresponding value in the lookup table; otherwise, the interval is decomposed into several contiguous segments, which can be answered by sparse table in constant time, and two intervals within some segments, which can be answered by lookup table. The total time per query is therefore $O(1)$.

]]>我们称一个序列$\{b_i\}_{i=1}^k$的beauty值为$\min_{i \neq j} |b_i – b_j|$。 给定序列$\{a_i\}_{i=1}^n$，问所有长度为k的子序列的beauty值之和。

数据范围：$2 \leq k \leq n \leq 1000, 0 \leq a_i \leq 10^5$

我们可以使用一个常见的技巧：令$P(t)$为所有beauty值大于等于t的长度为k的子序列个数，那么答案就是$ \sum_{t=1}^{\infty} P(t) $。

这样问题就变成了计数题。首先将输入序列排序，令$ p_t(i) = \max\{j : a_j – a_i \geq t\} $，那么转移就可以写成：$ f_t(i, j) = f_t(i, j-1) + f_t(i-1, p_t(j)) $，且$P(t)= f_t(k, n)$。可以在均摊$O(n^2)$时间内求出所有 $p_t(i)$的值，另外计算一个P(t)的值需要$O(nk)$时间。注意到P(t)是单调递减的，因此当出现P(t)=0时就不必继续计算了。这样的t不会超过$\frac{\max a}{k-1}$，因此总的复杂度为$O(n \max a)$。

]]>Counting independent sets of $G$: CIS(G)

- If $G$ contains multiple connected components, count the independent sets of each component and multiply the numbers.
- If $G$ contains only one vertex, return 1.
- Otherwise, arbitrarily select a vertex $v$. Remove $v$ and count the number of independent sets of the remaining graph. Remove $v$ and its neighbors and count the number of independent sets of the remaining graph. Return the sum of two.

To see the O(1.619^n) running time, note that in step 3, removing $v$ decreases the number of vertices by one, while removing $v$ and its neighbors decreases the number of vertices by at least two. The recurrent is identical to Fibonacci sequence.

]]>Given polynomial $p(x)$, if $ p(\mathbf{A}) = \mathbf{0}$ for matrix $\mathbf{A}$, then we say $p(x)$ is an **annealing polynomial** for matrix $ \mathbf{A} $. Due to **Hamilton-Cayley theorem**, the **characteristic polynomial** $\det( x\mathbf{I} – \mathbf{A} ) $ is an annealing polynomial for $ \mathbf{A} $. Among all annealing polynomials for matrix $ \mathbf{A} $, the one with minimum degree is called the **minimal polynomial** for $ \mathbf{A} $. It can be proved that the minimum polynomial for given matrix is unique up to a constant factor, and any other annealing polynomial is a polynomial multiple of the minimum polynomial.

We can invert a matrix $ \mathbf{A} $ if we have a minimal polynomial for $ \mathbf{A} $ :

\[ a_0 I + a_1 \mathbf{A} + a_2 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^k = 0 \tag{*} \]Since $ \mathbf{A} $ is invertible, 0 is never a root of its characteristic polynomial, hence we must have $a_0 \neq 0 $. Multiplying $ \mathbf{A}^{-1} $ on both sides yields

\[ \mathbf{A}^{-1} = -\frac{a_1 I + a_2 \mathbf{A} + a_3 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^{k-1}}{a_0} \]this means that we may represent the inversion of $ \mathbf{A} $ by the linear combination of powers of $ \mathbf{A} $.

Berlek-Massey algorithm solves the following problem in $O(n^2)$ time:

Given a finite sequence $\{x_i\}_{i=1}^n$, find a minimum order linear recurrence consistent with the given sequence. Formally, find a shortest sequence $c_0 = 1, c_1, \cdots, c_{k-1}$, such that $\sum_{l=0}^{k-1} x_{j-l} c_l = 0$ holds for all possible $j$.

This algorithm has many real world applications. The most typical one is to find the shortest linear feedback shift register for a given binary sequence. Also, it can be viewed as interpolating a sequence with exponential terms. One important fact for Berlekamp-Massey algorithm is, for an order-$r$ linearly recurrent sequence, taking the first $2r$ elements as the input of the algorithm suffices to recover the recurrence.

Note that the annealing polynomial is exactly the linear recurrence of powers of a matrix. However, it is infeasible to compute the minimum polynomial from the powers of $\mathbf{A}$. However, we may randomly pick vectors $\mathbf{u}$ and $\mathbf{v}$ and compute the minimum polynomial from $\mathbf{u}^T \mathbf{A}^i \mathbf{v}$. We claim without proof that, with high probability, the coefficients of the recurrence of the sequence are exactly those of the minimum polynomial. The sequence can be computed in $O(mn)$ time by iteratively doing sparse matrix-vector multiplication in $O(m)$ time. Finally, apply Berlekamp-Massey algorithm to the given sequence.

Since the inverse of a sparse matrix is generally not sparse, we won’t actually compute the inverse of $\mathbf{A}$. Actually, we can compute $\mathbf{A}^{-1}\mathbf{b}$ via formula (*) in $O(mn)$ time. The procedure is exactly the same as in finding minimum polynomial: just iteratively perform spare matrix-vector multiplication.

]]>unsolved

solved (0:08, +1)

逐个检查即可。

solved (0:17)

逐日模拟即可。

solved (4:55)

首先跑最短路获取APSP。

二分答案，然后dp出送完前i单最早时间，或者不可行

solved (3:23)

可以用dp的思想转移。由于随从的顺序无关，可以将他们的血量排序，总状态数会大大减少。

unsolved

unsolved

solved (0:34)

如果$\frac{10080rc}{t+r} \geq l$，则说明可行。记录可行的机器中，最便宜的那些机器的名称即可。

solved (1:00 +1)

由于相邻两数相差至少为一倍，只需要从大到小排序，然后能减就减。最后结果为0，则选中的那些就是答案。

solved (1:57 +2)

可以根据00和11子序列的数量算出0和1的数量，然后根据01和10子序列的数量调整0和1的位置。注意若干边界情况。

solved (3:07)

用最多k种颜色染色有$(k-1)^{n-1}k$种方案。用恰好$k$种颜色的方案数可以用容斥算出。

]]>

定义一种机器，一共有5000个寄存器，前两个寄存器初始值分别为a, b，其他寄存器初始值均为1。该机器支持两种操作：

- ADD %dest, %src1, %src2：R[dest] = R[src1] + R[src2]
- POW %d, %s：R[dest] = pow(R[src], d)

所有操作均模p。要求用不超过5000条指令计算a*b。

首先有加法我们就可以用类似快速幂的方式实现乘一个常数。这样，我们就可以构造出常数0（只需要乘p即可），实现减法（乘p-1获得相反数），除以一个常数（乘上它的逆元）。

精彩的是，我们可以用上述两条指令计算一个数的平方。考虑$x^d, (x+1)^d, \cdots, (x+d)^d$的二项展开，每一个都可以表示成$1, x, x^2, \cdots, x^d$的线性组合。这样，通过矩阵求逆就可以用$x^d, (x+1)^d, \cdots, (x+d)^d$表示出$x^2$，从而完成了平方的计算。

有了平方操作，就可以用$xy = ((x+y)^2-x^2-y^2)/2$计算出两数之积了。

]]>link: http://codeforces.com/gym/101889

Given an array of length $ 1 \leq L \leq 10^5 $, initially filled with 1. Define cnt[i] as the number of elements in the array that are equal to i. You should perform no more than $10^5$ operations. Each operation is four integers $P, X, A, B$, and let $M_1 = (A + S^2) \bmod L, M_2 = (A + (S + B)^2) \bmod L$, where S = cnt[P], then assign all elements with indices in $[min(M_1, M_2), max(M_1, M_2)]$ with value X. After all operations performed, output the maximum cnt[i] over all possible values of i.

The main idea of the solution is just to simulate the operations in a reasonable time complexity. In fact, various possible solutions exist, including segment tree and square root decomposition. However, the fastest solution (at least theoretically) I know is employing balanced binary search trees.

We use the nodes of the balanced binary search tree to represent contiguous intervals of the array with same values. Additionally, we maintain an array “cnt”, which has the meaning as described in problem statement.

When updating an interval [l, r) to value X, we delete all nodes representing the interval [l, r), updating the array “cnt”. Note that this may split the nodes that across the boundaries of given interval. Finally, we inserting a new node which represents [l, r) with value X, and updating “cnt”.

Each operation can be done in amortized $ O(\log L) $ time, since in each operation, only constant number of nodes are inserted, though it is possible that a great number of nodes are deleted. Hence the total time complexity is $ O(n \log L) $.

#include <bits/stdc++.h> #include <ext/pb_ds/assoc_container.hpp> using namespace std; using namespace __gnu_pbds; #ifdef __LOCAL_DEBUG__ # define _debug(fmt, ...) fprintf(stderr, "\033[94m%s: " fmt "\n\033[0m", \ __func__, ##__VA_ARGS__) #else # define _debug(...) ((void) 0) #endif #define rep(i, n) for (int i=0; i<(n); i++) #define Rep(i, n) for (int i=1; i<=(n); i++) #define range(x) (x).begin(), (x).end() typedef long long LL; typedef unsigned long long ULL; int l, c, n; int cnt[100005]; typedef tree<int, pair<int, int>, less<int>, rb_tree_tag> rbtree; #define tree shuorgsrh rbtree tree; void update(int l, int r, int c) { rbtree mpart, rpart; tree.split(l - 1, mpart); int dif; if (tree.size() && (dif = tree.rbegin()->first + tree.rbegin()->second.first - l) > 0) { tree.rbegin()->second.first -= dif; mpart.insert(make_pair(l, make_pair(dif, tree.rbegin()->second.second))); } mpart.split(r, rpart); if (mpart.size() && (dif = mpart.rbegin()->first + mpart.rbegin()->second.first - r) > 0) { mpart.rbegin()->second.first -= dif; rpart.insert(make_pair(r, make_pair(dif, mpart.rbegin()->second.second))); } for (auto& p : mpart) { cnt[p.second.second] -= p.second.first; } cnt[c] += r - l; tree.insert(make_pair(l, make_pair(r - l, c))); tree.join(rpart); } int main() { scanf("%d%d%d", &l, &c, &n); cnt[1] = l; tree.insert(make_pair(0, make_pair(l, 1))); rep (i, n) { int p, x, a, b; scanf("%d%d%d%d", &p, &x, &a, &b); int m1 = (a + 1ll * cnt[p] * cnt[p]) % l; int m2 = (a + 1ll * (cnt[p] + b) * (cnt[p] + b)) % l; if (m1 > m2) swap(m1, m2); m2++; update(m1, m2, x); } cout << *max_element(cnt+1, cnt+c+1) << endl; return 0; }

]]>