Solving Sparse Linear System in O(nm) Time

There are many practical applications that involve solving sparse linear system \( \mathbf{A}\mathbf {x} = \mathbf{b} \), such as random walk in sparse graphs. Traditional techniques such as Gauss elimination or LU decomposition applies for general linear systems, and they run in cubic time. Fastest known algorithms for general matrix run in about \( O(n^{2.373}) \) time. However, for sparse matrix, we may achieve a much faster result. Specifically, we may solve the system in \(O(mn)\) time, where \(m\) is the number of nonempty entries in \(\mathbf{A}\).  We assume that the matrix \( \mathbf{A} \) is non-singular (i.e., invertible) throughout this passage.

1. Inverting a matrix from its annealing polynomial

Given polynomial \( p(x) \), if \( p(\mathbf{A}) = \mathbf{0} \) for matrix \( \mathbf{A} \), then we say \(p(x)\) is an annealing polynomial for matrix \( \mathbf{A} \). Due to Hamilton-Cayley theorem, the characteristic polynomial \(\det( x\mathbf{I} – \mathbf{A} ) \) is an annealing polynomial for \( \mathbf{A} \). Among all annealing polynomials for matrix \( \mathbf{A} \), the one with minimum degree is called the minimal polynomial for \( \mathbf{A} \). It can be proved that the minimum polynomial for given matrix is unique up to a constant factor, and any other annealing polynomial is a polynomial multiple of the minimum polynomial.

We can invert a matrix \( \mathbf{A} \) if we have a minimal polynomial for \( \mathbf{A} \) :
\[ a_0 I + a_1  \mathbf{A} + a_2 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^k = 0 \tag{*} \]Since \( \mathbf{A} \) is invertible, 0 is never a root of its characteristic polynomial, hence we must have \(a_0 \neq 0 \). Multiplying \( \mathbf{A}^{-1} \) on both sides yields
\[ \mathbf{A}^{-1} = -\frac{a_1 I + a_2 \mathbf{A} + a_3 \mathbf{A}^2 + \cdots + a_k \mathbf{A}^{k-1}}{a_0} \]this means that we may represent the inversion of \( \mathbf{A} \) by the linear combination of powers of  \( \mathbf{A} \).

2. The Berlekamp-Massey algorithm

Berlek-Massey algorithm solves the following problem in $O(n^2)$ time:

Given a finite sequence \(\{x_i\}_{i=1}^n\), find a minimum order linear recurrence consistent with the given sequence. Formally, find a shortest sequence \(c_0 = 1, c_1, \cdots, c_{k-1}\), such that \(\sum_{l=0}^{k-1} x_{j-l} c_l = 0\) holds for all possible \(j\).

 This algorithm has many real world applications. The most typical one is to find the shortest linear feedback shift register for a given binary sequence. Also, it can be viewed as interpolating a sequence with exponential terms. One important fact for Berlekamp-Massey algorithm is, for an order-\(r\) linearly recurrent sequence, taking the first \(2r\) elements as the input of the algorithm suffices to recover the recurrence.

3. Finding the minimum polynomial 

Note that the annealing polynomial is exactly the linear recurrence of powers of a matrix. However, it is infeasible to compute the minimum polynomial from the powers of \(\mathbf{A}\). However, we may randomly pick vectors \(\mathbf{u}\) and \(\mathbf{v}\) and compute the minimum polynomial from \(\mathbf{u}^T \mathbf{A}^i \mathbf{v}\). We claim without proof that, with high probability, the coefficients of the recurrence of the sequence are exactly those of the minimum polynomial. The sequence can be computed in \(O(mn)\) time by iteratively doing sparse matrix-vector multiplication in $O(m)$ time. Finally, apply Berlekamp-Massey algorithm to the given sequence.

4. Solving the linear system

Since the inverse of a sparse matrix is generally not sparse, we won’t actually compute the inverse of \(\mathbf{A}\).  Actually, we can compute \(\mathbf{A}^{-1}\mathbf{b}\) via formula (*) in \(O(mn)\) time. The procedure is exactly the same as in finding minimum polynomial: just iteratively perform spare matrix-vector multiplication.

Nordic Collegiate Programming Contest 2018 [NCPC2018] 题解

A. Altruistic Amphibians

unsolved

B. Baby Bites

solved (0:08, +1)

逐个检查即可。

C. Code Cleanups

solved (0:17)

逐日模拟即可。

D. Delivery Delays

solved (4:55)

首先跑最短路获取APSP。

二分答案,然后dp出送完前i单最早时间,或者不可行

E. Explosion Exploit

solved (3:23)

可以用dp的思想转移。由于随从的顺序无关,可以将他们的血量排序,总状态数会大大减少。

F. Firing the Phaser

unsolved

G. Game Scheduling

unsolved

H. House Lawn

solved (0:34)

如果\(\frac{10080rc}{t+r} \geq l\),则说明可行。记录可行的机器中,最便宜的那些机器的名称即可。

I. House Lawn

solved (1:00 +1)

由于相邻两数相差至少为一倍,只需要从大到小排序,然后能减就减。最后结果为0,则选中的那些就是答案。

J. Jumbled String

solved (1:57 +2)

可以根据00和11子序列的数量算出0和1的数量,然后根据01和10子序列的数量调整0和1的位置。注意若干边界情况。

K. King’s Colors

solved (3:07)

用最多k种颜色染色有\((k-1)^{n-1}k\)种方案。用恰好$k$种颜色的方案数可以用容斥算出。

 

[Codeforces Round #513] H. Sophisticated Device

题目描述

定义一种机器,一共有5000个寄存器,前两个寄存器初始值分别为a, b,其他寄存器初始值均为1。该机器支持两种操作:

  1. ADD %dest, %src1, %src2:R[dest] = R[src1] + R[src2]
  2. POW %d, %s:R[dest] = pow(R[src], d)

所有操作均模p。要求用不超过5000条指令计算a*b。

题解

首先有加法我们就可以用类似快速幂的方式实现乘一个常数。这样,我们就可以构造出常数0(只需要乘p即可),实现减法(乘p-1获得相反数),除以一个常数(乘上它的逆元)。

精彩的是,我们可以用上述两条指令计算一个数的平方。考虑\(x^d, (x+1)^d, \cdots, (x+d)^d\)的二项展开,每一个都可以表示成\(1, x, x^2, \cdots, x^d\)的线性组合。这样,通过矩阵求逆就可以用\(x^d, (x+1)^d, \cdots, (x+d)^d\)表示出\(x^2\),从而完成了平方的计算。

有了平方操作,就可以用\(xy = ((x+y)^2-x^2-y^2)/2\)计算出两数之积了。

[2017-2018 ACM-ICPC Latin American Regional Programming Contest] D. Daunting device

link: http://codeforces.com/gym/101889

Statement

Given an array of length \( 1 \leq L \leq 10^5 \), initially filled with 1. Define cnt[i] as the number of elements in the array that are equal to i. You should perform no more than \(10^5\) operations. Each operation is four integers \(P, X, A, B\), and let \(M_1 = (A + S^2) \bmod L, M_2 = (A + (S + B)^2) \bmod L\), where S = cnt[P], then assign all elements with indices in \([min(M_1, M_2), max(M_1, M_2)]\) with value X. After all operations performed, output the maximum cnt[i] over all possible values of i.

Solution

The main idea of the solution is just to simulate the operations in a reasonable time complexity. In fact, various possible solutions exist, including segment tree and square root decomposition. However, the fastest solution (at least theoretically) I know is employing balanced binary search trees.

We use the nodes of the balanced binary search tree to represent contiguous intervals of the array with same values. Additionally, we maintain an array “cnt”, which has the meaning as described in problem statement.

When updating an interval [l, r) to value X, we delete all nodes representing the interval [l, r), updating the array “cnt”. Note that this may split the nodes that across the boundaries of given interval. Finally, we inserting a new node which represents [l, r) with value X, and updating “cnt”.

Each operation can be done in amortized \( O(\log L) \) time, since in each operation, only constant number of nodes are inserted, though it is possible that a great number of nodes are deleted. Hence the total time complexity is \( O(n \log L) \).

Source Code

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;

#ifdef __LOCAL_DEBUG__
# define _debug(fmt, ...) fprintf(stderr, "\033[94m%s: " fmt "\n\033[0m", \
    __func__, ##__VA_ARGS__)
#else 
# define _debug(...) ((void) 0)
#endif
#define rep(i, n) for (int i=0; i<(n); i++)
#define Rep(i, n) for (int i=1; i<=(n); i++)
#define range(x) (x).begin(), (x).end()
typedef long long LL;
typedef unsigned long long ULL;

int l, c, n;
int cnt[100005];

typedef tree<int, pair<int, int>, less<int>, rb_tree_tag> rbtree;

#define tree shuorgsrh
rbtree tree;

void update(int l, int r, int c) {
  rbtree mpart, rpart;

  tree.split(l - 1, mpart);
  int dif;
  if (tree.size() && 
      (dif = tree.rbegin()->first + tree.rbegin()->second.first - l) > 0) {
    tree.rbegin()->second.first -= dif;
    mpart.insert(make_pair(l, make_pair(dif, tree.rbegin()->second.second)));
  }
  mpart.split(r, rpart);

  if (mpart.size() && 
      (dif = mpart.rbegin()->first + mpart.rbegin()->second.first - r) > 0) {
    mpart.rbegin()->second.first -= dif;
    rpart.insert(make_pair(r, make_pair(dif, mpart.rbegin()->second.second)));
  } 
  for (auto& p : mpart) { 
    cnt[p.second.second] -= p.second.first;
  }
  cnt[c] += r - l;
  tree.insert(make_pair(l, make_pair(r - l, c)));
  tree.join(rpart);
}

int main() {
  scanf("%d%d%d", &l, &c, &n);
  cnt[1] = l;
  tree.insert(make_pair(0, make_pair(l, 1)));
  rep (i, n) {
    int p, x, a, b;
    scanf("%d%d%d%d", &p, &x, &a, &b);
    int m1 = (a + 1ll * cnt[p] * cnt[p]) % l;
    int m2 = (a + 1ll * (cnt[p] + b) * (cnt[p] + b)) % l;
    if (m1 > m2) swap(m1, m2); 
    m2++;
    update(m1, m2, x);
  }
  cout << *max_element(cnt+1, cnt+c+1) << endl;
  return 0;
}