Solving Sparse Linear System in O(nm) Time

There are many practical applications that involve solving sparse linear system $\mathbf{A}\mathbf {x} = \mathbf{b} $, such as random walk in sparse graphs. Traditional techniques such as Gauss elimination or LU decomposition applies for general linear systems, and they run in cubic time. Fastest known algorithms for general matrix run in about $O(n^{2.373})$ time. However, for sparse matrix, we may achieve a much faster result. Specifically, we may solve the system in $O(mn)$ time, where $m$ is the number of nonempty entries in $\mathbf{A}$.  We assume that the matrix $\mathbf{A} $ is non-singular (i.e., invertible) throughout this passage.

1. Inverting a matrix from its annealing polynomial

Given polynomial $p(x)$, if $ p(\mathbf{A}) = \mathbf{0}$ for matrix $\mathbf{A}$, then we say $p(x)$ is an annealing polynomial for matrix $ \mathbf{A} $. Due to Hamilton-Cayley theorem, the characteristic polynomial $\det( x\mathbf{I} – \mathbf{A} ) $ is an annealing polynomial for $ \mathbf{A} $. Among all annealing polynomials for matrix $ \mathbf{A} $, the one with minimum degree is called the minimal polynomial for $ \mathbf{A} $. It can be proved that the minimum polynomial for given matrix is unique up to a constant factor, and any other annealing polynomial is a polynomial multiple of the minimum polynomial.

We can invert a matrix $ \mathbf{A} $ if we have a minimal polynomial for $ \mathbf{A} $ :
\[ a_0 I + a_1  \mathbf{A} + a_2 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^k = 0 \tag{*} \]Since $ \mathbf{A} $ is invertible, 0 is never a root of its characteristic polynomial, hence we must have $a_0 \neq 0 $. Multiplying $ \mathbf{A}^{-1} $ on both sides yields
\[ \mathbf{A}^{-1} = -\frac{a_1 I + a_2 \mathbf{A} + a_3 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^{k-1}}{a_0} \]this means that we may represent the inversion of $ \mathbf{A} $ by the linear combination of powers of  $ \mathbf{A} $.

2. The Berlekamp-Massey algorithm

Berlek-Massey algorithm solves the following problem in $O(n^2)$ time:

Given a finite sequence $\{x_i\}_{i=1}^n$, find a minimum order linear recurrence consistent with the given sequence. Formally, find a shortest sequence $c_0 = 1, c_1, \cdots, c_{k-1}$, such that $\sum_{l=0}^{k-1} x_{j-l} c_l = 0$ holds for all possible $j$.

 This algorithm has many real world applications. The most typical one is to find the shortest linear feedback shift register for a given binary sequence. Also, it can be viewed as interpolating a sequence with exponential terms. One important fact for Berlekamp-Massey algorithm is, for an order-$r$ linearly recurrent sequence, taking the first $2r$ elements as the input of the algorithm suffices to recover the recurrence.

3. Finding the minimum polynomial 

Note that the annealing polynomial is exactly the linear recurrence of powers of a matrix. However, it is infeasible to compute the minimum polynomial from the powers of $\mathbf{A}$. However, we may randomly pick vectors $\mathbf{u}$ and $\mathbf{v}$ and compute the minimum polynomial from $\mathbf{u}^T \mathbf{A}^i \mathbf{v}$. We claim without proof that, with high probability, the coefficients of the recurrence of the sequence are exactly those of the minimum polynomial. The sequence can be computed in $O(mn)$ time by iteratively doing sparse matrix-vector multiplication in $O(m)$ time. Finally, apply Berlekamp-Massey algorithm to the given sequence.

4. Solving the linear system

Since the inverse of a sparse matrix is generally not sparse, we won’t actually compute the inverse of $\mathbf{A}$.  Actually, we can compute $\mathbf{A}^{-1}\mathbf{b}$ via formula (*) in $O(mn)$ time. The procedure is exactly the same as in finding minimum polynomial: just iteratively perform spare matrix-vector multiplication.

Nordic Collegiate Programming Contest 2018 [NCPC2018] 题解

A. Altruistic Amphibians


B. Baby Bites

solved (0:08, +1)


C. Code Cleanups

solved (0:17)


D. Delivery Delays

solved (4:55)



E. Explosion Exploit

solved (3:23)


F. Firing the Phaser


G. Game Scheduling


H. House Lawn

solved (0:34)

如果$\frac{10080rc}{t+r} \geq l$,则说明可行。记录可行的机器中,最便宜的那些机器的名称即可。

I. House Lawn

solved (1:00 +1)


J. Jumbled String

solved (1:57 +2)


K. King’s Colors

solved (3:07)



[Codeforces Round #513] H. Sophisticated Device


定义一种机器,一共有5000个寄存器,前两个寄存器初始值分别为a, b,其他寄存器初始值均为1。该机器支持两种操作:

  1. ADD %dest, %src1, %src2:R[dest] = R[src1] + R[src2]
  2. POW %d, %s:R[dest] = pow(R[src], d)




精彩的是,我们可以用上述两条指令计算一个数的平方。考虑$x^d, (x+1)^d, \cdots, (x+d)^d$的二项展开,每一个都可以表示成$1, x, x^2, \cdots, x^d$的线性组合。这样,通过矩阵求逆就可以用$x^d, (x+1)^d, \cdots, (x+d)^d$表示出$x^2$,从而完成了平方的计算。

有了平方操作,就可以用$xy = ((x+y)^2-x^2-y^2)/2$计算出两数之积了。

[2017-2018 ACM-ICPC Latin American Regional Programming Contest] D. Daunting device



Given an array of length $ 1 \leq L \leq 10^5 $, initially filled with 1. Define cnt[i] as the number of elements in the array that are equal to i. You should perform no more than $10^5$ operations. Each operation is four integers $P, X, A, B$, and let $M_1 = (A + S^2) \bmod L, M_2 = (A + (S + B)^2) \bmod L$, where S = cnt[P], then assign all elements with indices in $[min(M_1, M_2), max(M_1, M_2)]$ with value X. After all operations performed, output the maximum cnt[i] over all possible values of i.


The main idea of the solution is just to simulate the operations in a reasonable time complexity. In fact, various possible solutions exist, including segment tree and square root decomposition. However, the fastest solution (at least theoretically) I know is employing balanced binary search trees.

We use the nodes of the balanced binary search tree to represent contiguous intervals of the array with same values. Additionally, we maintain an array “cnt”, which has the meaning as described in problem statement.

When updating an interval [l, r) to value X, we delete all nodes representing the interval [l, r), updating the array “cnt”. Note that this may split the nodes that across the boundaries of given interval. Finally, we inserting a new node which represents [l, r) with value X, and updating “cnt”.

Each operation can be done in amortized $ O(\log L) $ time, since in each operation, only constant number of nodes are inserted, though it is possible that a great number of nodes are deleted. Hence the total time complexity is $ O(n \log L) $.

Source Code

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;

#ifdef __LOCAL_DEBUG__
# define _debug(fmt, ...) fprintf(stderr, "\033[94m%s: " fmt "\n\033[0m", \
    __func__, ##__VA_ARGS__)
# define _debug(...) ((void) 0)
#define rep(i, n) for (int i=0; i<(n); i++)
#define Rep(i, n) for (int i=1; i<=(n); i++)
#define range(x) (x).begin(), (x).end()
typedef long long LL;
typedef unsigned long long ULL;

int l, c, n;
int cnt[100005];

typedef tree<int, pair<int, int>, less<int>, rb_tree_tag> rbtree;

#define tree shuorgsrh
rbtree tree;

void update(int l, int r, int c) {
  rbtree mpart, rpart;

  tree.split(l - 1, mpart);
  int dif;
  if (tree.size() && 
      (dif = tree.rbegin()->first + tree.rbegin()->second.first - l) > 0) {
    tree.rbegin()->second.first -= dif;
    mpart.insert(make_pair(l, make_pair(dif, tree.rbegin()->second.second)));
  mpart.split(r, rpart);

  if (mpart.size() && 
      (dif = mpart.rbegin()->first + mpart.rbegin()->second.first - r) > 0) {
    mpart.rbegin()->second.first -= dif;
    rpart.insert(make_pair(r, make_pair(dif, mpart.rbegin()->second.second)));
  for (auto& p : mpart) { 
    cnt[p.second.second] -= p.second.first;
  cnt[c] += r - l;
  tree.insert(make_pair(l, make_pair(r - l, c)));

int main() {
  scanf("%d%d%d", &l, &c, &n);
  cnt[1] = l;
  tree.insert(make_pair(0, make_pair(l, 1)));
  rep (i, n) {
    int p, x, a, b;
    scanf("%d%d%d%d", &p, &x, &a, &b);
    int m1 = (a + 1ll * cnt[p] * cnt[p]) % l;
    int m2 = (a + 1ll * (cnt[p] + b) * (cnt[p] + b)) % l;
    if (m1 > m2) swap(m1, m2); 
    update(m1, m2, x);
  cout << *max_element(cnt+1, cnt+c+1) << endl;
  return 0;



[2017-2018 ACM-ICPC Latin American Regional Programming Contest] A. Arranging tiles



给定N个凸多边形$1 \leq N \leq 14$,每个凸多边形有一条$y=0$和$y=H$的边。将这些多边形在$0 \leq y \leq H$的区域内排成一排,不允许重叠,求最小的总宽度。允许调换多边形的次序,但不能旋转。





总复杂度为$O(N^2K + N^2 2^N)$。


#include <bits/stdc++.h>
using namespace std;

#define rep(i, n) for (int i = 0; i < (n); i++)
#define Rep(i, n) for (int i = 1; i <=(n); i++)
#define range(x) begin(x), end(x)

constexpr double EPS = 1e-8;

int fcmp(double x, double y = 0.0) {
  if (fabs(x - y) < EPS) return 0;
  return x < y ? -1 : 1;

typedef double T;
typedef struct pt {
  T x, y;
  T operator,(pt a) { return x * a.x + y * a.y; }  // inner product
  T operator*(pt a) { return x * a.y - y * a.x; }  // outer product
  pt operator+(pt a) { return {x + a.x, y + a.y}; }
  pt operator-(pt a) { return {x - a.x, y - a.y}; }

  pt operator*(T k) { return {x * k, y * k}; }
  pt operator-() { return {-x, -y}; }
} vec;

typedef pair<pt, pt> seg;

int n;
int height;

struct polygon {
  int width;
  vector<pair<int, int>> lpart, rpart;
  void init(vector<pair<int, int>>& pts) {
    int minx = INT_MAX, maxx = INT_MIN;
    for (auto& p : pts) {
      height = max(height, p.second);
      minx = min(minx, p.first);
      maxx = max(maxx, p.first);
    width = maxx - minx;
    for (auto& p : pts) p.first -= minx;
    int i = 1;
    while (pts[i].second < height) rpart.push_back(pts[i++]);
    while (i < pts.size()) lpart.push_back(pts[i++]);
} poly[16];

pt getMidPoint(pair<int, int> pt1, pair<int, int> pt2, double y) {
  pt a { (double) pt1.first, (double) pt1.second }, 
     b { (double) pt2.first, (double) pt2.second };
  vec v = (b - a) * (1.0 / (b.y - a.y));
  return a + v * (y - a.y);

double max_overlap(polygon& polyl, polygon& polyr) {
  vector<pair<int, int>> &lpts = polyl.rpart, &rpts = polyr.lpart;
  int ln = lpts.size(), rn = rpts.size(), yn;
  vector<int> ys;
    vector<int> yl(ln), yr(rn);
    rep (i, ln) yl[i] = lpts[i].second;
    rep (i, rn) yr[i] = rpts[i].second;
    set_union(range(yl), range(yr), back_inserter(ys));
    yn = ys.size();
  int lptr = 0, rptr = 0;
  double min_dif = 0.0;
  rep (i, yn) {
    pt lp = getMidPoint(lpts[lptr], lpts[lptr+1], ys[i]),
       rp = getMidPoint(rpts[rptr], rpts[rptr+1], ys[i]);
    min_dif = max(min_dif, lp.x - rp.x);
    while (lptr < ln-1 && lpts[lptr+1].second <= ys[i]) lptr++;
    while (rptr < rn-1 && rpts[rptr+1].second <= ys[i]) rptr++;
  return polyl.width - min_dif;

double g[16][16];
double dp[1 << 14][16];

int main() {
  scanf("%d", &n);
  rep (i, n) {
    int k; scanf("%d", &k);
    vector<pair<int, int>> pts;
    rep (j, k) {
      int x, y; scanf("%d%d", &x, &y);
      pts.emplace_back(x, y);
  rep (i, n) {
    rep (j, n) {
      if (i == j) continue;
      g[i][j] = max_overlap(poly[i], poly[j]);
  for (unsigned mask = 1; mask < (1 << n); mask++) {
    if (mask & (mask - 1)) {
      unsigned rem = mask;
      while (rem) {
        unsigned bit = rem & -rem;
        int i = __builtin_ctz(bit);
        dp[mask][i] = -1e15;
        rep (j, n) {
          if (j == i || ((1 << j) & mask) == 0) continue;
          dp[mask][i] = max(dp[mask][i], dp[mask ^ bit][j] + g[j][i]);
        rem -= bit;
    } else {
      int i = __builtin_ctz(mask);
      rep (j, n) dp[mask][j] = (j == i ? 0.0 : -1e15);
  double ans = *max_element(range(dp[(1<<n)-1]));
  double tot = 0.0;
  rep (i, n) tot += poly[i].width;
  printf("%.3f\n", tot - ans);
  return 0;