Talldoor

pangu.hs

Yu Cong — Mon, 15 Dec 2025 00:00:00 UT

Tags: hakyll

pangu.js 是给兼有中英文字符的文本中插入空格的工具。

我觉得这种工具很有用。这个网站曾经使用一个简化版本。不过 js 版本的工具会导致整个网页加载很慢，如果能在编译的时候运行就会变得更好。

最简单的办法其实是在 hakyll 生成 html 之后运行一下 pangu。有人会想把 pangu 也变成一个 pandoc filter。在 pandoc AST 上判断中英文字符连接情况有点复杂，不过没有你想的那么复杂。 pangu 在每个段落上的工作是独立的，不会需要知道别的段落的信息，在 pandoc AST 当中，每个段落都是一个 Block，constructor 是 Para [Inline]。而 Inline 包含了所有需要考虑的其他元素。所以只需要在一个 Inline list 当中检查每个 element 内部是否需要空格，以及 list 两个元素之间是否需要空格。我们并不关心遍历 AST 的时候上一个 leaf 是什么。

1 Python → Haskell

在元旦假期终于开始写。计划是从 python 版本迁移过来，因为 vinta 实现 pangu 的那些语言里我只看得懂 python。

我发现貌似作者自己也没总结出什么中英文排版加空格、换全角或中文标点的规则（也可能是因为 ww 和大陆习惯也不同），而且正则表达式里漏洞很多。

pangu.hs 需要根据一些规则来替换字符和删减空格，所以应该把这些规则写下来然后弄一个存放规则的 list，程序根据这个 list 里的规则来调整文本，同时用户见到自己不想要的 rule 也可以直接 comment 它。

我没有在 Haskell 中使用正则表达式的经验，于是求助 Gemini 和 ChatGPT。相比于 regex 他们更推荐用 megaparsec。如果要 find and replace，就需要用streamEdit. 这函数会找所有不重叠的 section，完成替换任务。

type Parser = Parsec Void Text
type Rule = Parser Text
type RuleSet = [Rule]

applyUntilFixed :: Rule -> Text -> Text
applyUntilFixed rule =
  fix
    ( \loop current ->
        let next = streamEdit (try rule) id current
         in if next == current then next else loop next
    )

applyRulesRecursively :: RuleSet -> Text -> Text
applyRulesRecursively rules input = 
  foldl (flip applyUntilFixed) input rules

applyRules :: RuleSet -> Text -> Text
applyRules rules input = foldl (flip applyOnce) input rules
  where
    applyOnce rule = streamEdit (try rule) id

然后即可这样选择要启用哪些Rule

Linear time linebreaker?

Yu Cong — Tue, 28 Oct 2025 00:00:00 UT

Tags: alg, typography

Recently, Typst has merged a PR which includes support for character-level justification. This feature is an extension to the linebreak algorithm. It allows changes in the inter-character space in each word and this adjustment may affect line break decisions. For line breaking Typst uses Knuth–Plass line-breaking algorithm

See this nice blog post for a comparison between the layout models of TeX and Typst.

Let’s forget about typography and focus on the algorithmic part of line-breaking problems.

1 line-breaking

Problem1line breaking

Given a sequence

S

and a cost function

c

mapping from contiguous subsequences (substrings) of

S

\mathbb{R}

, divide the sequence

S

(the paragraph) into substrings

S_1,\dots,S_k

(lines) such that

\sum_{i\in [k]} c(S_i)

is minimized.

We over-simplify things here. In real typography world the cost function not only depends on the line

S_i

but also other things like the length of the line.

The input size

n

is the number of elements in

S

and for now the cost function

c

is assumed to be given in an oracle.

This line breaking admits a

O(n^2)

dynamic programming algorithm:

f(i)=\max_{0\leq j

2 SMAWK algorithm

Wikipedia says that this problem can be solve in linear time using SMAWK algorithm, which finds the minimum in each row of a

n\times m

totally monotone matrix in

O(n+m)

This is a footnote. See section 6.5 in Jeff Erickson’s lecture notes).

The matrix here would be

M[i,j]=f(j)+c(S[j..i])

. The dynamic programming algorithm is in fact finding the minimum for each row. If we can show this matrix

M

is totally monotone, then we can make optimal line breaking decisions in linear time. A matrix is totally monotone, if each row’s minimum value occurs in a column which is equal to or greater than the column of the previous row’s minimum. A special case of totally monotone is Monge array which requires

M[i,k]+M[j,\ell]\geq M[j,k]+M[i,\ell]

for all

i

and

k<\ell

. Now we focus on checking Monge property of

M

Doing some easy math, one can see that the Monge property depends on the following inequality on the cost function

c

c(S[i..k])+c(S[j..\ell])\leq c(S[i..\ell])+c(S[j..k]))\quad \forall i

This looks like some intersecting supermodular property on set functions, but our domain here is the collection of substrings.

As cited on wiki, this paper stated that the problem of optimally breaking up text of a paragraph into lines can be done in linear time. The cost function is

c(X)=(len(X)-parwidth)^2

, where

len(X)

is the sum of character width in

X

and

parwidth

is the width of the paragraph. This cost function does generate a Monge array, but this is not the cost in the original paper of Knuth and Plass.

They introduced several cost functions: “first-fit”, “best-fit” and “demerits”, with increasing complexity. For simplicity of analysis, we consider the “best-fit” case, where the cost function is the sum of badness and penalty.

Penalty relates to the ending character of each line. For example, we want as few number of hyphens as possible, then the penalty for ending hyphen should be large. Badness meansures how much do we need to stretch/shrink the white spaces in a line to make it fit. More formally, badness is

c\left(\frac{L}{W}\right)^3

where

L

is the sum of default length of all characters in this line,

W

is the total length of stretchable/shrinkable whitespaces and

c

is some universal constant factor.

Note that penalties do not affect Monge property, since the number of occurence of ending characters (

S[k]

and

S[\ell]

) in the same on LHS and RHS. However, for badness, one can verify that

M

is not always a Monge array. I think total monotonicity is violated too but don’t have a counterexample now…

3 character-level operations

Assume the cost function is still a parabola on line width and we further add some character-level operations.

We can break all ligatures in one line. The cost function becomes $c(X)=\min\{c_1(X),c_2(X)\}$ .
Stretch/shrink font glyphs and adjust kerning. $c(X)=\min_{\theta}(\theta {\operatorname{l}en}_1(X)+{\operatorname{l}en}_2(X)-parlength)^2$ , where ${\operatorname{l}en}_{1,2}$ are the total length of whitespaces and other characters in $X$ and $\theta\in [lb,ub]$ is the stretching factor.

Unfortuantely, neither of the operations preserves Monge property and i believe they break total monotonicity as well.

Matroid girth

Yu Cong — Tue, 12 Aug 2025 00:00:00 UT

Tags: matroid, optimization, combinatorics

Let

M=(E,\mathcal I)

be a matroid with non-negative weights

w:E\to \mathbb{R}_{\geq 0}

. The girth of

M

\min \left\{ \sum_{e\in C} w(e): \text{$C$ is a circuit of $M$} \right\}.

Cogirth of

M

is the girth of the dual matroid of

M

Computing girth is NP-hard for binary matroids but can be done in polynomial time for graphs. Wikipedia lists some negative complexity results, which mainly concern more general matroid classes than binary matroids. So here are some positive results filling the gap between graphic matroids and binary matroids. (Results can be found in https://matroidunion.org/?p=1106.) Proofs that you don’t see here can easily be found in the references.

1 Regular matroid

Theorem1Seymour decomposition [1]

Every regular matroid may be constructed by combining graphic matroids, cographic matroids, and a certain ten-element regular matroid that is neither graphic nor cographic, using 3 binary operations:

1-sum is direct sum of two matroids
2-sum is patching two matroid on 1 common element
3-sum is patching two matroids on 3 common elements forming a 3-circuit in each matroid.

This decomposition can be found in polynomial time. One can decide if a matroid

M

can be decomposed into

M_1

and

M_2

using 1/2/3-sum in polynomial time. See https://www.emis.de/monographs/md/index.html.

Theorem 1 leads to a natural algorithm for computing the girth in regular matroids. The decomposition of regular matroids gives us a binary tree, where each node is a regular matroid and each leaf is either (co)graphic or a special 10 element regular matroid. Every non-leaf node in the decomposition tree represents a regular matroid

M

which is 1/2/3-sum of its two direct decendents

M_1

and

M_2

. Let

A \oplus_i B

be the

i

-sum of

A

and

B

for

i\in [3]

. Now there are only 3 cases:

$M=M_1\oplus_1 M_2$ . Direct sum does not create new circuit. $\mathop{\mathrm{girth}}(M)=\min \left\{ \mathop{\mathrm{girth}}(M_1),\mathop{\mathrm{girth}}(M_2) \right\}$ .
M=M1⊕2M2M=M_1\oplus_2 M_2. Let ee be the common element of E(M1)E(M_1) and E(M2)E(M_2). In this case there may be new circuits. However, any circuit of MM which is not a circuit of M1M_1 and M2M_2 must be contained in C1∪C2\{e}C_1\cup C_2\setminus \left\{ e \right\}, where CiC_i is a circuit in MiM_i containing ee. Thus to find the minimum weighted new circuit we can compute the minimum circuit in M1M_1 containing ee (say C1*C_1^*) and replace the weight w(e)w(e) in M2M_2 with w(C1*)−w(e)w(C_1^*)-w(e) and then find the minimum weight circuit in M2M_2 containing ee. The girth of MM is the minimum among girth⁡(M1)\mathop{\mathrm{girth}}(M_1), girth⁡(M2)\mathop{\mathrm{girth}}(M_2) and min⁡{w(C):C is new circuit}\min\left\{ w(C): \text{$C$ is new circuit} \right\}.
We need to prove that all these operations can be done in polynomial time.
For the 2/3-sum case we need to find in at least one of the summands the minimum circuit that contains a common element ee. However, finding such a minimum circuit is regular matroids is not known to be polynomially solvable. To understand what’s happening here we need to look into ~~Seymour’s proof~~
I realized that one doesn’t need to understand Seymour’s 55-page paper to see why the desired operations can be done in polynomial time…
The proof of Theorem 1 has 3 parts:
- There is a special 10-element regular matroid $R_{10}$ such that any regular matroid can be obtained by 1/2-sums from regular matroids without $R_{10}$ minor and copies of $R_{10}$ . (Now we can assume that we are working with regular matroids which have no $R_{10}$ minor and are not separable via 1/2-sum.)
- There is another 12-element regular matroid $R_{12}$ such that any regular matroid can be obtained by 1/2/3-sums from matroids without $R_{12}$ minor. (Now we are working with regular matroids that are not separable via 1/2/3-sum and have no $R_{10}$ or $R_{12}$ minors.)
- Every 3-connected regular matroid which is neither graphic nor cographic has an $R_{10}$ or $R_{12}$ minor. Let $M$ be a matroid. $M$ is 3-connected iff $M$ is not expressible as a 1- or 2-sum. (cf.[1] 2.10(b)) It follows that the remaining regular matroids are graphic or cographic.
the decomposition tree.
Instead of considering our binary decomposition tree, we now construct a new graph GG where each vertex represents a graphic matroid, cographic matroid or R10R_{10} and there is an edge between two vertices if the corresponding matroids are patched using 1/2/3-sum. We claim that there is no cycle in the graph. The graph is connected. Assume that there is a cycle and let M1,M2M_1,M_2 be two matroids whose corresponding vertices are in the cycle. Consider the LCA MM of M1M_1 and M2M_2 in the binary tree. MM represents a connected subgraph H⊂GH\subset G that contains M1M_1 and M2M_2 but not the entire cycle since otherwise there will be 2 1/2/3-sum operation between two regular matroids. However, M1M_1 and M2M_2 are still connected in G−E[H]G-E[H] since M1M_1 and M2M_2 are in the same cycle, which contradicts the uniqueness of the LCA.
Thus GG is a tree and we can always assume that one of the matroids in the summands is graphic matroid, cographic matroid or R10R_{10}. Finding the minimum circuit containing a fixed element can be done in those matroids in polynomial time and there exists a algorithm that computes the tree in cubic time [2].
$M=M_1\oplus_3 M_2$ . Similar to the 2-sum case. There are only 3 common elements. We can enumerate all circuits of $M_1$ which contain one of the common elements.

However, deciding whether a regular matroid has a circuit of length at most k containing two fixed elements is FPT.

2 Proper minor-closed class of binary matroids

The most important problem in this field is the following.

Conjecture2Geelen, Gerards, and Whittle [3]

For any proper minor-closed class

\mathcal M

of binary matroids, there is a polynomial-time algorithm for computing the girth of matroids in

\mathcal M

Similar to Seymour’s decomposition for regular matroids, every proper minor-closed class of binary matroid admits a “decomposition” into graphic matroids and some binary matroids.

Theorem3[3]

For each proper minor-closed class

\mathcal M

of binary matroids, there exist integers

k,t\geq 0

such that for each vertically

k

-connected matroid

M\in \mathcal M

, there exist matrices

A,P\in \mathbb{F}_2^{r\times n}

such that

A

is the incidence matrix of a graph,

r(P)\leq t

and either

M=M(A+P)

M^*=M(A+P)

The matroids

M(A+P)

in Theorem 3 are called perturbed graphic matroids. Note that we can consider

k

and

t

in Theorem 3 as constants since for each minor-closed class they are fixed.

Using Theorem 3, Conjecture 2 is true if one can prove the followings:

there is a polynomial-time alg that finds the girth of $M(A+P)$ ;
One can reduce the problem of computing the girth of members of $\mathcal M$ to that of computing the girth of vertically $k$ -connected members of $\mathcal M$ .

3 Perturbed graphic matroids

Jim Geelen and Rohan Kapadia [4] showed that the (co)girth can be computed in randomized polynomial time for a subclass of binary matroids called perturbed graphic matroids. They made a reduction from the (co)girth problem of perturbed graphic matroids to graph cuts and matchings using

(s,t)

-signed-grafts. IMO the reduction is quite tricky. Let

s

and

t

be two non-negative integers. An

(s,t)

-signed-graft is a tuple

(G,S,T,B,C,D)

such that:

$G$ is a graph,
$S$ is an $s$ -element set disjoint from $V(G)$ ,
$T$ is a $t$ -element set disjoint from $E(G)$ ,
$B,C,D$ are 0-1 matrices.

The incidence matrix of an

(s,t)

-signed-graft

(G,S,T,B,C,D)

A = \begin{array}{ccc} & \begin{array}{cc} E(G) & T \end{array} \\ \begin{array}{c} V(G) \\ S \end{array} & \left( \begin{array}{cc} A(G) & B \\ C & D \end{array} \right) \end{array}

where

A(G)

is the incidence matrix of

G

. Denote the matroid

M(A)

M(G,S,T,B,C,D)

Lemma4[4,Lemma 4.1]

Let

G

be a graph and let

P\in \mathbb{F}_2^{V(G)\times E(G)}

be a rank-

t

matrix. Then there is a

(t,t)

-signed-graft

(G,S,T,B,C,D)

such that

M(A(G)+P)=M(G,S,T,B,C,D)/T.

The proof is taking

B,C

as a rank decomposition of

P

and applying some row operations.

Recall that Theorem 3 says that each vertically

k

-connected matroid

M

in a proper minor-closed class of binary matroids is either

M(A+P)

M(A+P)^*

. One has to consider the girth and cogirth problem separately.

3.1 Reductions

Lemma5the cogirth part. [4,Lemma 4.2]

Let

(G,S,T,B,C,D)

be an

(s,t)

-signed-graft and

S'

be a one-element set disjoint from

V(G)

. The cogirth of

M(G,S,T,B,C,D)/T

is the mimimum of the cogirths of matroids

M(G,S',T,B,yC,yD)/T

taken over all

y\in \mathbb{F}_2^{S'\times S}

Proof

To see this lemma, I suggest considering the flats instead of cocycles.

Each flat in $M=M(G,S',T,B,yC,yD)$ is also a flat $M'=M(G,S,T,B,C,D)$ . Let $F'$ be a flat of $M'$ and $F$ be the corresponding set in $M$ . If there is an element $e$ of $M\setminus F$ such that $e$ is linearly representable by vectors in $F$ . Then $e$ is also representable by vectors in $F'$ by linearality of the multiplication.
For each hyperplane $H$ in $M$ , there is a $y\in \mathbb{F}_2^{S'\times S}$ such that $F'$ is a flat of $M'$ . Note that this only works for cocircuits (hyperplanes) but not cocycles (flats). We can assume that the $A(G),B$ part is empty. Let the first $k$ columns be the hyperplane $H$ . Then the matrix is $M=\begin{pmatrix} H & U \end{pmatrix}.$ We want to show that there is a $y\in \mathbb{F}_2^s$ such that $H^T y=\mathbf{0}$ and $U^T y=\mathbf{1}.$ Let $B$ be a base in this linear matroid. Apply row operations to make $B$ a standard basis (at most one “1” in each column). The intersection of $B$ and $H$ has exactly $r-1$ vectors. Now we construct the vector $y.$ If there is any vector in $B\cap H$ that has a “1” in the $k$ -th coordinate, let $y[k]=0$ ; Otherwise, we set $y[k]=1.$ Note that $H^T y=\mathbf{0}$ and $U^T y=\mathbf{1}.$ Thus $H$ remains a hyperplane in $M'.$

Lemma6the girth part. [4,Lemma 4.3]

Let

(G,S,T,B,C,D)

be an

(s,t)

-signed-graft and

T'

be a one-element set disjoint from

E(G).

The girth of

M(G,S,T,B,C,D)/T

is the mimimum of the girths of matroids

M(G,S',T,Bx,C,Dx)/T

taken over all

x\in \mathbb{F}_2^{T\times T'}.

It follows from Lemma 4 we need to consider the (co)girth of

M/T

where

M

is the matroid of an

(s,t)

-signed-graft.

3.2 cogirth → even cuts

By Lemma 5, to compute the cogirth of an

(s,t)

-signed-graft we only need to consider binary matroid of the following kind:

A= \begin{array}{ccc} & \begin{array}{cc} E(G) & T \end{array} \\ \begin{array}{r} V(G) \\ \left\{ v \right\} \end{array} & \begin{pmatrix} A(G) & B \\ \sigma & \alpha \end{pmatrix} \end{array}

We want to find the cogirth of

M(A)/T

. One useful proposition is the following. (See this post for a proof sketch.)

Proposition7[5,Proposition 9.2.2]

Let

A

be a binary representation of a rank-

r

binary matroid

M

. Then the cocircuit space of

M

equals the row space of

A

. Moreover, this space has dimension

r

and is the orthogonal subspace of the circuit space of

M

What we are finding is the minimum support of vectors in the row space of

A

such that the support has empty intersection with

T

Why? We want to find the cocircuit with minimum size. This is exactly the vector with minimum number of 1s in the cocircuit space if our matroid is binary. In binary matroid the symmetric difference of (co)circuits contains a (co)circuit and is dependent.

. Note that the support of rows in the graph incidence matrix

A(G)

has interpretation. They are exactly

\delta(X)

where

X

is the set of vertices for the summand rows. Thus we divide the problem into 2 cases. Let

B[u]

be a

t

-dimentional binary label on each vertex.

The row indexed by $\left\{ v \right\}$ is not in the solution. Find the smallest non-empty cut $\delta(X)$ in $G$ such that $\sum_{u\in X}B[u]=\mathbf 0$ .
The row indexed by $\left\{ v \right\}$ is in the solution. Now $\sigma$ represents a subset of edges in $G$ . We want to find a cut $\delta(X)$ such that $\sigma \Delta \delta(X)$ is minimized and non-empty and $\sum_{u\in X}B[u]=\alpha$ .

These are called the

t

-dimensional even-cut problem. Geelen and Kapadia discovered a random contraction algorithm which solves both of the problems in randomized polynomial time [4].

If the graph is not connected, it is possible that

\delta(X)

is empty even if

X

is non-empty. Fortunately, Geelen and Kapadia have done the reduction to connected graphs.

Note that the size of cut

|\delta(X)|

is a submodular function on

V(G)

but

|\delta(X)\Delta \sigma|

is not necessarily submodular. The first case is minimizing a symmetric submodular function under some congruency constraints.

Problem8Generalised Congruency-Constrained Submodular Minimization (GCCSM)

Let

f:2^N \to \mathbb{Z}

submodular,

p

prime,

k\in \mathbb{Z}_{\geq 0}

r_1,\dots,r_k\in \mathbb{Z}_p

, and

S_1,\dots,S_k\in N

\begin{aligned} \min& & f(S)& & &\\ s.t.& & |S\cap S_i|&\equiv r_i & &\forall i\in [k]\\ & & S&\subset N \end{aligned}

Nägele, Sudakov and Zenklusen showed that Problem 8 can be done in polynomial time if the field is small and the number of congruency constraints is constant [6].

Theorem9[6]

Problem 8 can be solved in time

|N|^{2kp+O(1)}

I think currently (Sep 2025) finding a deterministic polynomial time algorithm for

t

-dim even cut is still open.

3.3 girth → parity cycle + parity join

Similar to the cogirth part, by Lemma 6 we consider the following matrix

A= \begin{array}{ccc} & \begin{array}{cc} E(G) & \left\{ f \right\} \end{array} \\ \begin{array}{r} V(G) \\ S \end{array} & \begin{pmatrix} A(G) & b \\ C & d \end{pmatrix} \end{array}

and we want to compute the girth of

M(A)/\left\{ f \right\}

. Each edge in

G

has a label

c(e)\in\mathbb{F}_2^{s}

(the submatrix

C

). Contracting an element in a matroid may change circuits. There are two cases:

The minimum circuit does not contain $\left\{ f \right\}$ . Then the girth of $M(A)/\left\{ f \right\}$ is the same as $M(A)\setminus \left\{ f \right\}$ . In this case we need to find the minimum cycle in $G$ such that the sum of its edge labels is exactly $\mathbf{0}$ . This is the parity cycle problem.
The minimum circuit contains $\left\{ f \right\}$ . Let the girth of $M(A)/\left\{ f \right\}$ be $\lambda$ . Then $M(A)$ has a minimum circuit that contains $f$ and has size $\lambda+1$ . Let $T$ be the set of vertices whose characteristic vector is $b$ . To find the minimum circuit of $M(A)$ , we want to find the minimum edge set $F\subset E$ such that the sum of labels is $d$ and $T$ is exactly the set of vertices with odd degree in $G[F]$ . This is called the parity $T$ join problem.

Recently, Schlotter and Sebő find FPT time algorithm for the odd

T

-join problem[7]. Sebő has some open problems on this topic which can be found here(page 11).

4 More on perturbed graphic matroids

Fomin and others studied FPT algorithms of Space Cover problem (which is a generalization of matroid girth problem) on perturbed graphic matroids [8].

Problem10Space Cover

Let

G=(V,E)

be a multigraph on

n

vertices and

m

edges and let

P

be a

n\times m

matrix with constant rank. We write

I(G)

for the incidence matrix of

G

. Given a set of terminal edges

T\subset E

and an integer

k

, decide if there is a edge set

F\subset E\setminus T

with

|F|

such that

T\subset \operatorname{span}(F)

in the binary matroid of

M(I(G)+P)

They show that Space Cover generalizes steiner tree and multiway cut even when

P

is absent and they focus on FPT algorithms with parameter

k

. This problem is solvable in time

k^{O(k)}\operatorname{poly}(n+m)

References

[1]

P.D. Seymour, Decomposition of regular matroids, Journal of Combinatorial Theory, Series B. 28 (1980) 305–359 10.1016/0095-8956(80)90075-1.

[2]

K. Truemper, A decomposition theory for matroids. V. Testing of matrix total unimodularity, Journal of Combinatorial Theory, Series B. 49 (1990) 241–281 10.1016/0095-8956(90)90030-4.

[3]

J. Geelen, B. Gerards, G. Whittle, The highly connected matroids in minor-closed classes, Annals of Combinatorics. 19 (2015) 107–123 10.1007/s00026-015-0251-3.

[4]

J. Geelen, R. Kapadia, Computing Girth and Cogirth in Perturbed Graphic Matroids, Combinatorica. 38 (2018) 167–191 10.1007/s00493-016-3445-3.

[5]

J.G. Oxley, Matroid theory, 2nd ed, Oxford University Press, Oxford ; New York, 2011.

[6]

M. Nägele, B. Sudakov, R. Zenklusen, Submodular minimization under congruency constraints, Combinatorica. 39 (2019) 1351–1386 10.1007/s00493-019-3900-1.

[7]

I. Schlotter, A. Sebő, Odd Paths, Cycles, and

T

-Joins: Connections and Algorithms, SIAM Journal on Discrete Mathematics. 39 (2025) 484–504 10.1137/23M158156X.

[8]

F.V. Fomin, P.A. Golovach, D. Lokshtanov, S. Saurabh, M. Zehavi, Covering Vectors by Spaces in Perturbed Graphic Matroids and Their Duals, in: 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2019: pp. 59:1–59:13 10.4230/LIPIcs.ICALP.2019.59.

Matroid circuit packing and covering

Yu Cong — Sun, 15 Jun 2025 00:00:00 UT

Tags: matroid, optimization, combinatorics

In the previous post, we mainly focus on the algorithmic part of integral and fractional base packing and base covering. In this post we consider packing and covering of matroid circuits.

1 Packing/Covering Defect

Seymour [1] proved the following theorem.

Theorem1

Let

M=(E,I)

be a matroid without coloop. Then one has

\theta(M)-\kappa(M)\leq r^*(M)-\nu(M),

where

\theta(M)

is the minimum number of circuits whose union is

E

\kappa(M)

is the number of connected components in

M

r^*

is the corank and

\nu(M)

is the max number of disjoint circuits.

The left hand side

\theta(M)-\kappa(M)

is called the circuit covering defect and the right hand side

r^*(M)-\nu(M)

is called the circuit packing defect. I guess the name “covering defect” comes from the fact that

\theta(M)-\kappa(M)

is the gap between the circuit covering number and a lowerbound

\kappa(M)

\kappa(M)\leq \theta(M)

since there is no circuit containing two elements in different components. The packing defect is the set-point dual of the covering version. To see the duality, one can write

\kappa(M)

as the max size of

X\subset E

such that

|C\cap X|\leq 1

for all circuit

C

and write

r^*(M)

as the minimum size of

X\subset E

such that

|X\cap C|\geq 1

for all circuit

C

2 Complexity

Computing the corank

r^*

and the component number

\kappa(M)

is easy. What about

\theta(M)

and

\nu(M)

The problem of determining if a sparse split graph (a special case of chordal graphs) can have its edges partitioned into edge-disjoint triangles is NP-complete [2]. So finding

\theta(M)

and

\nu(M)

is NP-hard even for some special graphic matroids.

3 Cycle Double Cover

Cycle double cover conjecture is a famous unsolved problem posed by W. T. Tutte, Itai and Rodeh, George Szekeres and Paul Seymour. The cycle double cover conjecture asks whether every bridgeless undirected graph has a collection of cycles such that each edge of the graph is contained in exactly two of the cycles.

[3] is a nice survey. However, there is little discussion about (even simplier version of) circuit double cover on some special case of matroids. For example, this question on math.sx is a relaxation of faithful CDC on matroids.

Problem2Not so faithful circuit cover

Given a matroid

M=(E,\mathcal I)

and a non-negative integral weight function

w:E\to \mathbb{Z}_{\geq 0}

, decide if there is a multiset of circuits of

M

such that each element in

E

is covered by at least 1 and at most

w(e)

circuits in the multiset.

3.1 Matroids with the circuit cover property

Problem3circuit cover

Let

M

be a matroid on groundset

E

and let

w:E\to \mathbb{Z}_+

be a weight function. Find a list of circuit of

M

such that each element

e

is contained in exactly

w(e)

circuits in the list.

One can see that certain weights can never allow a circuit cover. If

(M,w)

has a circuit cover, then the weight function

w

must satisfy the following admissibility conditions:

$w(e)\geq 0$ for any element $e$ ,
$w(D)\equiv 0 \mod 2$ for any cocircuit $D$ ,
if $e\in D$ , then $w(e)\leq w(D-e)$ .

A matroid

M

has the circuit cover property if

(M,w)

has a circuit cover for every admissible weight

w

. [4] characterized binary matroids with circuit cover property.

Theorem4

Let

M

be a binary matroid. Then

M

has the circuit cover property if and only if

M

has no minor isomorphic to any of

F^*_7,R_{10},M^*(K_5)

M(P_{10})

3.2 Matroids satisfying $\nu_{k,w}=\tau_{k,w}$

[5] studied a related (and seemingly simplier) optimization variant. How many circuits can we pack with element capacity

k w(e)

\begin{aligned} \nu_{k,w}=\max& & \sum_C x_C& & &\\ s.t.& & \sum_{C:e\in C} x_C &\leq k w(e) & &\forall e\in E\\ & & x_C &\in \mathbb{Z}_{\geq 0} \end{aligned}

\begin{aligned} \tau_{k,w}=\min& & \sum_e w(e)&y_e & &\\ s.t.& & \sum_{e\in C} y_e &\geq k & &\forall \text{ circuit $C$}\\ & & y_e &\in \mathbb{Z}_{\geq 0} \end{aligned}

Clearly the linear relaxation of

\nu_{k,w}

and of

\tau_{k,w}

are LP dual of each other and have the same optimum. For what class of matroids do we have equality

\nu_{k,w}=\tau_{k,w}

for any weight function

w

When

k=1

this is relatively simple. First we can assume that

M

contains no coloop since coloops won’t appear in any circuit. Suppose that there are two circuits

C_1,C_2

whose intersection is non-empty. Let

a,b,c

be the smallest weight of elements in

C_1,C_2, C_1\cap C_2

respectively. It follows by definition that

a\leq c

and

b\leq c

. The max number of circuits we can pack in the matroid

M|_{C_1\cup C_2}

\min(a+b,c)

. Now we further assume that

a,b\leq c\leq a+b

. The minimum weight of elements hitting every circuit is not necessarily

c

, since by the circuit axiom there must another circuit

C'\in C_1\cup C_2-e

for any

e\in C_1\cap C_2

which won’t be hit if we are selecting element in

C_1\cap C_2

. Thus for the case of

k=1

, any matroid satisfying

\nu_{1,w}=\tau_{1,w}

has no intersecting circuits.

The characterization of matroids satisfying

\nu_{2,w}=\tau_{2,w}

is the following theorem [5].

Theorem5 A matroid satisfies

\nu_{2,w}=\tau_{2,w}

iff none of its minor is isomorphic to

U_{2,4},F_7,F_7^*,M(K_{3,3}),M(K_5^-)

M(K)

, where

K_5^-

K_5

deleting an edge and

K

is a special 4-node graph.

Graph K

Their proof is also based on LP. In fact they prove the following theorem.

Theorem6

Let

M

be a matroid. The following statements are equivalent:

$M$ does not contain $U_{2,4},F_7,F_7^*,M(K_{3,3}),M(K_5^-)$ or $M(K)$ as a minor;
the linear system $\{ \sum_{e\in C} y_e \geq 2 \;\forall C, y_e\geq 0\}$ is TDI;
the polytope $\{y: \sum_{e\in C} y_e \geq 2 \;\forall C, y_e\geq 0\}$ is integral.

2\to 3

is easy. To show

3\to 1

they prove that if

M

satisfies 3 then so do its minors and none of the matroids in 1 satisfies 3. The hard part is proving

1\to 2

, for which they use the following two lemmas.

Lemma7

M

satisfies 1, then

M=M^*(G)

for some graph

G

that contains neither the planar dual of

K

nor of

K_5^-

as a minor.

Proof

A matroid

M

is regular iff it has no minor isomorphic to

U_{2,4},F_7,F_7^*

. Then

M

must be regular since it satisfies 1. The lemma then follows from the “excluded minor characterization of graphic matroids in regular matroids”. A regular matroid is cographic iff it has no minor isomorphic to

M(K_5)

and

M(K_{3,3})

. (see Corollary 10.4.3 in Oxley’s Matroid Theory book 2nd edition)

Lemma8

If a graph

G

contains neither the planar dual of

K

nor of

K_5^-

as a minor, then

M^*(G)

satisfies 2.

This is the hardest part and it takes a lot of work to prove it.

They first prove a complete characterization of grpahs that contain neither the planar dual of

K

nor that of

K_5^-

as a minor using

0,1,2

-sum and then prove that summing operations preserves the TDI property.

The

0,1,2

-sum theorem looks like this. Let

K^*

and

P

be the planar dual of graph

K

and

K_5^-

respectively.

Theorem9informal, thm 3.1 in [5]

A simple graph

G

has no minors

P

and

K^*

iff

G

can be obtained by repeatedly taking

0,1,2

-sums starting from some small graphs and from some cyclically 3-connected graphs with no minors

P

and

K^*

It remains to show that all the summand graphs in the above theorem have the TDI property.

some notes on TDI (cf. section 22.7 in [6])

Understanding Lasserre Hierarchy

Yu Cong — Thu, 05 Jun 2025 00:00:00 UT

From a Probabilistic Perspective

Tags: optimization, LP

Useful links:

https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf
https://web.stanford.edu/class/cs369h/
Laurent’s survey [1]
https://rameesh-paul.github.io/sos.pdf
https://www.ams.jhu.edu/~abasu9/papers/main-Lasserre.pdf
Chapter 3 of Ali Kemal Sinop’s PhD thesis

When I started writing this post, I hadn’t found so many “useful links” yet. The content below and link 1.2. only focus on using the Lasserre hierarchy on LPs, while 3.4.5.6. mention more general cases.

1 $K\subset [0,1]^n \to K\cap \{0,1\}^n$

We want to solve a 0-1 integer program. Since this task is NP-hard in general, we usually consider its linear relaxation. Different LP formulations have different integrality gaps. For example, consider the following linear relaxation of the max matching IP in non-bipartite graph.

\begin{aligned} \sum_{e\in \delta(v)} x(e)&\leq 1 & &\forall v\in V\\ x(e)&\in [0,1] & &\forall e\in E \end{aligned}

:(, this polytope is not integral. Edmonds proved that the following formulation is integral.

\begin{aligned} \sum_{e\in \delta(v)} x(e)&\leq 1 & &\forall v\in V\\ x(e)&\in [0,1] & &\forall e\in E\\ \sum_{e\in E[U]} x(e) &\leq (|U|-1)/2 & &\forall U\subset V, |U| \text{ odd} \end{aligned}

Schrijver [2] showed that those odd constraints can be obtained by adding cutting planes to the previous polytope. Fortunately for matching polytope we have a polynomial time separation oracle. However, for harder problems adding cutting planes may make the program NP-hard to solve. Lasserre hierarchy is a method to strengthen the polytope to approaching the integer hull while providing provable good properties and keeping the program polynomial time solvable (if applied constant number of times).

2 Probability Perspective

There is a good interpretation of the linear relaxation of 0-1 integer programs. Let

K=\left\{ x\in \mathbb{R}^n| Ax\geq b \right\}\subset [0,1]^n

be the polytope of the linear relaxation. The goal of solving the integer program is to describe all possible discrete distribution over

K\cap \left\{ 0,1 \right\}^n

. Note that for a fixed distribution the expected position

(\sum_p \mathop{\mathrm{Pr}}[X_p(1)=1] x_p(1),...,\sum_p \mathop{\mathrm{Pr}}[X_p(n)=1] x_p(n))^T

is in

\mathop{\mathrm{conv}}(K\cup \left\{ 0,1 \right\}^n)

and iterating over all possible distribution gives us the integer-hull. Hence we can find the integral optimal solution if having access to all distribution over integer points.

For any

x\in K

x_i

can be seen as the probability of

x_i=1

. We only care about the discrete distribution on feasible integral points. However, each

x\in K

only describes some marginal probabilities and this this marginal probability may not be even feasible. Consider the following 2D example. Any point in

\text{green area}\setminus \text{orange area}

is not a marginal distribution of any possible joint distribution over

(0,0),(1,0)

and

(1,1)

. The idea is to iteratively prune this area.

Now we need to think about how to represent all possible joint distribution. One natural way is to use a vector

y\in \mathbb{R}^{2^n}

for the distribution law of every possible integer point in

\left\{ 0,1 \right\}^n

. However, this method does not work well with our existing marginal probabilities. Let

y\in \mathbb{R}^{2^{n}}

be a random vector such that

y_I=\mathop{\mathrm{Pr}}[\bigwedge_{i\in I}(x_i=1)]

and

y_\emptyset=1

. Computing all feasible

y

is the same as finding all possible bivariate discrete distribution on the integer points. To make

y

a feasible probability from some joint distribution and to make

(y_{\left\{ 1 \right\}},...,y_{\left\{ n \right\}})^T\in K

we have to add more constraints.

2.1 Feasible Probability

For now let’s forget about

K

and consider

y\in [0,1]^{2^{[n]}}

and a discrete distribution

D

\left\{ 0,1 \right\}^n

. We want to make

y_I=\mathop{\mathrm{Pr}}_{D}[\bigwedge_{i\in I}(X_i=1)]

. In fact, there is a one to one correspondence between

y

and

D

. If

D

is given, computing

y_I

is easy for any

I\subseteq [n]

. If

y

is given, recovering the distribution

D

is the same as solving a system of

2^n

linear equations with

2^n

variables (

2^n-1

of the the equations come form

y_I

, and the remaining one is

\sum_p D(p)=1

.) Thus, with a slight abuse of notation, we will refer to

y

as a distribution.

We work with the 2D example first. Let

x=(x_1,x_2)^T\in K

be a marginal distribution. One can see that

y=(1,x_1,x_2,\mathop{\mathrm{Pr}}[X_1=X_2=1])^T

and the last number is not arbitrary. In fact,

\mathop{\mathrm{Pr}}[X_1=X_2=1]

must in range

[\max(0, x_1+x_2-1),\min(x_1,x_2)]

To make sure

y

is indeed a probability distribution the moment matrix is considered. The moment matrix

M(y)

is of size

2^n \times 2^n

and

M(y)[I,J]

is defined as the expectation

E[\prod_{i\in I\cup J}X_i]=y_{I\cup J}

(the only non-zero term is

1\cdot \mathop{\mathrm{Pr}}[\bigwedge_{i\in I\cup j}(X_i=1)]=y_{I\cup J}

). The expectation is taken over the distribution defined by

y

Lemma1

For any probability distribution

y

, the moment matrix is psd.

Proof

We need to verify

z^T M(y) z\geq 0

for any

z

\begin{aligned} z^T M(y) z &= \sum_I \sum_J z_I y_{I\cup J} z_J\\ &= \sum_I \sum_J z_I E[\prod_{i\in I\cup J} X_i] z_J\\ &= E\left[\left( \sum_I (z_I \prod_{i\in I} X_i)\right)^2 \right] \end{aligned}

Note that in the proof something like sum of squares appears. Lasserre hierarchy has deep connections with SOS optimization and is also known as sum-of-squares hierarchy.

Lemma2

M(y)

is psd then

y

is a probability distribution.

It is easy to see that

y_I\in [0,1]

for all

I\subseteq [n]

. Consider the following submatrix

\begin{bmatrix} \emptyset & y_I\\ y_I & y_I \end{bmatrix}

It is psd since

M(y)

is psd. The determinant is

y_I(1-y_I)\geq 0

Let

\mathop{\mathrm{Pr}}_D[p]

be the probability of selecting

p\in\left\{ 0,1 \right\}^n

D

. It remains to prove the following system of linear equations has a solution such that

\mathop{\mathrm{Pr}}_D[p]\in [0,1]

for all

p

\begin{aligned} y_{[n]} &= \mathop{\mathrm{Pr}}_D[\mathbf 1]\\ y_{[n]\setminus \left\{ n \right\}} &= \sum_{p:\bigwedge\limits_{i\in [n-1]}(p_i=1)} \mathop{\mathrm{Pr}}_D[p]\\ y_{[n]\setminus \left\{ n-1 \right\}} &= \sum_{p:\bigwedge\limits_{i\in [n]\setminus \left\{ n-1 \right\}}(p_i=1)} \mathop{\mathrm{Pr}}_D[p]\\ &\vdots \\ y_{\left\{ 1 \right\}} &= \sum_{p:p_1=1} \mathop{\mathrm{Pr}}_D[p]\\ y_\emptyset &= \sum_p \mathop{\mathrm{Pr}}_D[p] \end{aligned}

I believe this can be proven with the idea of Lemma 2 here.

2.2 Projection in $K$

Let the projection of

y

(y_{\left\{ 1 \right\}},\dots,y_{\left\{ n \right\}})^T

. For any

y

the projection should always lie in

K

. One may want to define moment matrices for constraints

Ax\geq b

. This is called the moment matrix of slacks. For simplicity we only consider one linear constraint

a^Tx-b\geq 0

. The moment matrix for this constraint is

M(y)=\left( \sum_{i=1}^n a_i y_{I\cup J\cup \left\{ i \right\}}-b y_{I\cup J} \right)_{I,J\subseteq [n]}

. Then we can do similar arguments.

\begin{aligned} z^T M(y) z &= \sum_I \sum_J z_I z_J (\sum_{i=1}^n a_i y_{I\cup J\cup \left\{ i \right\}}-b y_{I\cup J})\\ &= \sum_I \sum_J z_I z_J (\sum_i a_i E[\prod_{k\in I\cup J\cup\left\{ i \right\}} X_k] - b E[\prod_{k\in I\cup J}X_k] )\\ &= E\left[ \sum_I \sum_J z_I z_J (\sum_i a_i X_i -b) \prod_{k\in I\cup J}X_k \right]\\ &= E\left[ (\sum_i a_i X_i -b) \left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \right] \end{aligned}

Note that we can combine the expectations since they are taken over the same probability distribution. Now assume that we have

a^TX-b\geq 0

\begin{aligned} E&\left[ (\sum_i a_i X_i -b) \left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \right]\\ &= \sum \mathop{\mathrm{Pr}}[\cdots](a^T X-b)\left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \geq 0 \end{aligned}

a^TX\geq b

is satisfied, then the corresponding slack moment matrix is psd.

Finally, this is a more formal definiton.

Definition3

t

-th level of Lasserre hierarchy

The

t

-th level of Lasserre hierarchy

\mathop{\mathrm{Las}}_t(K)

of a convex polytope

K=\left\{ x\in \mathbb{R}^n| Ax\geq b \right\}\subset [0,1]^n

is the set of vectors

y\in \mathbb{R}^{2^n}

that make the following matrices psd.

moment matrix $M_t(y):=(y_{I\cup J})_{|I|,|J|\leq t}\succeq 0$
moment matrix of slacks $M_t^\ell(y):=\left( \sum_{i=1}^n A_{\ell i}y_{I\cup J\cup \left\{ i \right\}}-b_\ell y_{I\cup J} \right)_{|I|,|J|\leq t}\succeq 0$

Note that the

t

-th level of Lasserre hierarchy only involve entries

y_I

with

|I|\leq 2t+1

. (

+1

comes from the moment matrix of slacks) The matrices have dimension

\binom{n}{2t+1}=n^{O(t)}

and there are only

m+1

matrices. Thus to optimize some objective over the

t

-th level of Lasserre hierarchy takes

mn^{O(t)}

time which is still polynomial in the input size. (The separation oracle computes eigenvalues and eigenvectors. If there is a negative eigenvalue we find the corresponding eigenvector

v

and the separating hyperplane is

\sum_{I,J}v_{I}v_{J} x_{I,J}=0

. See Example 43.)

3 Properties

Almost everything in this part can be found here.

Suppose that we have the

t

-th level of Lasserre hierarchy

\mathop{\mathrm{Las}}_t(K)

. Denote by

\mathop{\mathrm{Las}}_t^{proj}(K)

the projection of the

t

-th level.

$\mathop{\mathrm{Las}}_t(K)$ is convex
$y_I\in [0,1]$ for all $y\in \mathop{\mathrm{Las}}_t(K)$
$0\leq y_I \leq y_J \leq 1$ for all $J\subset I$ with $|I|,|J|\leq t$
$y_{I\cup J}\leq \sqrt{y_I \cdot y_J}$
$K\cap \left\{ 0,1 \right\}^n \subset \mathop{\mathrm{Las}}_t^{proj}(K)$ for all $t\in [n]$
$\mathop{\mathrm{Las}}_t^{proj}(K)\subset K$
$\mathop{\mathrm{Las}}_n(K)\subset \mathop{\mathrm{Las}}_{n-1}(K)\subset \dots \subset \mathop{\mathrm{Las}}_0(K)$

1.2.3. show that

y

behaves similarly to a real probability distribution.

4.5.6. show that

K\cap \left\{ 0,1 \right\}^n \subset \mathop{\mathrm{Las}}_n^{proj}(K)\subset \mathop{\mathrm{Las}}_{n-1}^{proj}(K)\subset \dots \subset \mathop{\mathrm{Las}}_0^{proj}(K) = K

The goal of this section is to show that

K\cap \left\{ 0,1 \right\}^n = \mathop{\mathrm{Las}}_n^{proj}(K)

. When working on the Lasserre hierarchy, instead of considering the projection

x_i

solely, we usually perform the analysis on

y

3.1 Convex Hull and Conditional Probability

Lemma4

For

t\geq 1

, let

y\in \mathop{\mathrm{Las}}_t(K)

and

S\subset [n]

be any subset of variables of size at most

t

. then

y\in \mathop{\mathrm{conv}}\left\{ z\in \mathop{\mathrm{Las}}_{t-|S|}(K)| z_i\in \left\{ 0,1 \right\} \forall i\in S \right\}.

For any

y\in\mathop{\mathrm{Las}}_n(K)

and

S=[n]

, the previous lemma implies the projection of

y

is convex combination of integral vectors in

K\cap \left\{ 0,1 \right\}^n

. Then it follows that

\mathop{\mathrm{Las}}_n^{proj}(K)=K\cap \left\{ 0,1 \right\}^n

. This also provides proofs for the facts that if

M_n(y)\succeq 0

and

M_n^{\ell}(y)\succeq 0

then

y

is indeed a probability distribution and the projection is in

K

Proof

The proof is constructive and is by induction on the size of

S

$S=\left\{ i \right\}$ . Assume that $y_{\left\{ i \right\}}\in (0,1)$ . For simplicity I use $y_i$ for $y_{\left\{ i \right\}}$ . Define two vectors $z^{(1)},z^{(2)}$ as $z^{(1)}_I=\frac{y_{I\cup\left\{ i \right\}}}{y_i}$ and $z^{(2)}_I=\frac{y_I-y_{I\cup\left\{ i \right\}}}{1-y_i}$ . One can easily verify that $y=y_i z^{(1)}+(1-y_i)z^{(2)}, z^{(1)}_i=1$ and $z^{(2)}_i=0$ . It remains to verify $z^{(1)},z^{(2)}\in \mathop{\mathrm{Las}}_{t-1}(K)$ . Since $M_t(y)$ is psd, there must be vectors $v_I,v_J$ such that $\langle v_I,v_J \rangle=y_{I\cup J}$ for all $|I|,|J|\leq t$ . Take $v_I^{(1)}=v_{I\cup\left\{ i \right\}}/\sqrt{y_i}$ . We have $\langle v_I^{(1)},v_J^{(1)} \rangle=\frac{y_{I\cup J\cup\left\{ i \right\}}}{y_i}=M_{t-1}(z^{(1)})[I,J]$ for all $|I|,|J|\leq t-1$ . Thus $M_{t-1}(z^{(1)})$ is psd. Similarly, one can take $v_I^{(2)}=(v_I-v_{I\cup \left\{ i \right\}})/\sqrt{(1-y_i)}$ and show $M_{t-1}(z^{(2)})$ is psd.
For each moment matrix of slacks one can use exactly the same arguments to show $M_{t-1}^{\ell}(z^{(1)})\succeq 0$ and $M_{t-1}^{\ell}(z^{(2)})\succeq 0$ .
For the inductive steps one can see that our arguments for the base case can be applied recursively on $z^{(1)},z^{(2)}$ .

y\in \mathop{\mathrm{Las}}_t(K)

is a probability distribution if we consider only

|I|\leq t

y_I=\mathop{\mathrm{Pr}}[\bigwedge_{i\in I}X_i=1]

. The vectors

z^{(1)},z^{(2)}

we constructed in the previous proof can be understood as conditional probabilities.

\begin{aligned} &z^{(1)}_I=\frac{y_{I\cup\left\{ i \right\}}}{y_i}=\frac{\mathop{\mathrm{Pr}}[\bigwedge_{k\in I\cup \left\{ i \right\}}X_k=1]}{\mathop{\mathrm{Pr}}[X_i=1]}=\mathop{\mathrm{Pr}}[\bigwedge_{k\in I}X_k=1 | X_i=1]\\ &z^{(2)}_I=\frac{y_I-y_{I\cup\left\{ i \right\}}}{1-y_i}=\frac{\mathop{\mathrm{Pr}}[\bigwedge_{k\in I} (X_k=1) \land X_i=0]}{\mathop{\mathrm{Pr}}[X_i=0]}=\mathop{\mathrm{Pr}}[\bigwedge_{k\in I}X_k=1 | X_i=0] \end{aligned}

The proof is basically showing that

\begin{aligned} y_I &= \mathop{\mathrm{Pr}}[X_i=1] \mathop{\mathrm{Pr}}[\bigwedge_{k\in I}X_k=1 | X_i=1]+\mathop{\mathrm{Pr}}[X_i=0] \mathop{\mathrm{Pr}}[\bigwedge_{k\in I}X_k=1 | X_i=0]\\ &= \mathop{\mathrm{Pr}}[\bigwedge_{i\in I}X_i=1] \end{aligned}

For any partially feasible probability distribution

y\in\mathop{\mathrm{Las}}_t(K)

y_i \in (0,1)

implies that both

X_i=0

and

X_i=1

happen with non-zero probability, which in turn impies

z^{(1)},z^{(2)}\in \mathop{\mathrm{Las}}_{t-1}(K)

. One can also explicitly express

y

as convex combination and see the relation with Möbius inversion, see p9 in this notes.

In Lemma 4, each vector in the convex combination (those with integer value on

S

, such as

z^{(1)},z^{(2)}

) can be understood as a partial probability distribution under condition

[\bigwedge_{i\in I} (X_i=1) \bigwedge_{j\in J}(X_j=0)]

where

I\sqcup J=S

, and the probability assigned to it is exactly the chance its condition happens. More formally, Lemma 4 implies the following,

Corollary5

Let

y\in\mathop{\mathrm{Las}}_t(K)

. For any subset

S\subset [n]

of size at most

t

, there is a distribution

D(S)

over

\left\{ 0,1 \right\}^S

such

\mathop{\mathrm{Pr}}_{z\sim D(S)}\left[ \bigwedge_{i\in I} (z_i=1) \right]=y_I \quad \forall I\subset S

Moreover, this distribution is “locally consistent” since the prabability assigned to each vector only depends on its condition.

Since the constraints in

\mathop{\mathrm{Las}}_t

only concern the psdness of certain matrices, one may naturally think about its decomposition. This leads to a vector representation of

y_I

for all

|I|\leq t

and may be helpful in rounding algorithms. For

J\subset I

v_I

lies on the sphere of radius

\|v_J\|/2=\sqrt{y_J}/2

and center

v_J /2

3.2 Decomposition Theorem

We have seen that

\mathop{\mathrm{Las}}_n^{proj}(K)

is the integer hull. Can we get better upperbounds based on properties of

K

? Another easy upperbound is

\max_{x\in K}|\mathop{\mathrm{ones}}(x)|+1

, where

\mathop{\mathrm{ones}}(x)=\left\{ i|x_i=1 \right\}

. This is because

y\in \mathop{\mathrm{Las}}_t(K)

is a partial distribution for

|I|\leq t

that can be realized as the marginal distribution of some distribution on

K\cap \left\{ 0,1 \right\}^n

; if

k\cap \left\{ 0,1 \right\}^n

does not contain a point with at least

t

ones, we certainly have

\mathop{\mathrm{Pr}}[\bigwedge_{i\in I}(X_i=1)]=0

for

|I|\geq t

This fact implies that for most hard problems we should not expect

\mathop{\mathrm{Las}}_k

to give us a integral solution for constant

k

Karlin, Mathieu and Nguyen [3] proved a more general form of Lemma 4 using similar arguments.

Theorem6Decomposition Theorem

Let

y\in \mathop{\mathrm{Las}}_t(K)

S\subset [n]

and

k\in [0,t]

such that

k\geq |\mathop{\mathrm{ones}}(x)\cap S|

for all

x\in K

. Then

y\in \mathop{\mathrm{conv}}\left\{ z| z\in \mathop{\mathrm{Las}}_{t-k}(K); z_{\left\{ i \right\}}\in \left\{ 0,1 \right\} \forall i\in S \right\}.

4 Moment Relaxation

In this section we briefly show the non-probabilistic view of Lasserre hierarchy and how this idea is used in polynomial optimization problems.

Everything in this section can be found in useful_link[6].

Consider the following polynomial optimiation problem

\begin{aligned} \min& & a(x)& & &\\ s.t.& & b(x)&\geq 0 & &\forall b\in B\\ & & x&\in\left\{ 0,1 \right\}^n \end{aligned}

where

a,b,c

are polynomials. We want to formulate this problem with SDP.

We can consider polynomials

a,b

as multilinear polynomials. Since

x_i\in \left\{ 0,1 \right\}

, we have

x_i^2=x_i

. Now we can consider enumerating

x_S=\prod_{i\in S}x_i

and write these polynomials as linear functions. For example, we can rewrite

a(x)=\sum_{S\subset [n]}\sum_{\alpha_S:S\to \mathbb{Z}} a_S \prod_{i\in S}x_i^{\alpha_S(i)}

\sum_{S\subset [n]} a_S x_S

which is linear in the moment sequence

(x_\emptyset, x_{\left\{ 1 \right\}},\ldots,x_{[n]})

Recall that our goal is to find a SDP formulation. A common technique is replace each variable with a vector. We consider the moment vectors

[v_S\in \mathbb{R}^\gamma]_{S\in 2^{[n]}}

. Similar to the LP case, we want

\langle v_A,v_B \rangle=x_{A\cup B}

. This is exactly the Gram decomposition of the moment matrix. There exist such moment vectors iff the moment matrix is psd. For

b(x)\geq 0

, we consider the slack moment matrix

M^b(x)=\left( \sum_S b_S x_{I\cup J\cup S} \right)_{I,J}

Then the program becomes the following SDP

\begin{aligned} \min& & \sum_{S\subseteq [n]}a_S x_S& & &\\ s.t.& & M^b(x)&\succeq 0 & &\forall b\in B\\ & & M(x)&\succeq 0\\ & & x_{\emptyset}&=1 \end{aligned}

Note that if the max degree of polynomials

a,b

is at most

d

, then the following program is a relaxation of the original polynomial optimiation problem (cf. Corollary 3.2.2.).

\begin{aligned} \min& & \sum_{S\subseteq [n]}a_S x_S& & &\\ s.t.& & M_{F}^b(x)&\succeq 0 & &\forall b\in B\\ & & M_{F\uplus V_{\leq d}}(x)&\succeq 0\\ & & x_{\emptyset}&=1 \end{aligned}

where

F\subset 2^{[n]}

\uplus(A,B)=\left\{ a\cup b| \forall a\in A,b\in B \right\}

is element-wise union and

M_{F}

is the submatrix of

M(F)

on entries

F\times F

. Taking

F=\binom{[n]}{\leq t}

gives us

\mathop{\mathrm{Las}}_t

5 Applications

5.1 Sparsest Cut

There are lots of applications in the useful links, but none of them discusses sparsest cut [4].

Problem7sparsest cut

Given a vertex set

V

and two weight functions

c,D:\binom{V}{2} \to \mathbb{R}_{\geq 0}

, find

T\subset V

that minimizes the sparsity of

T

\Phi(T)=\frac{\sum_{u < v}c_{u,v}|\chi^T(u)-\chi^T(v)|}{\sum_{u < v}D_{u,v}|\chi^T(u)-\chi^T(v)|},

where

\chi^T

is the indicator vector of

T

In [4] Guruswami and Sinop describe Lasserre hierarchy in a slightly different way. (Note that useful_link[6] is Sinop’s thesis) We have seen that

y\in [0,1]^{2^{[n]}}

is sufficient for describing the joint distribution. However, the total number of events is

3^n

, since for each variable

X_i

in an event there are 3 possible states,

X_i=0,X_i=1

and

X_i

is absent.

Instead of using

y\in [0,1]^{2^{[n]}}

, they enumerate each of the

3^n

events and consider the vectors in the Gram decomposition. For each set

S\subset V

of size

\leq r+1

, and for each 0-1 labeling

f

on elements of

S

, they define a vector

x_S(f)

. Note that

S(f)

enumerates all events and one should understand

x_S(f)

as the vector corresponding to

y_{S,f}\in [0,1]^{3^{[n]}}

in the Gram decomposition and

\langle x_S(f), x_T(g) \rangle=y_{f(S)\land g(T)}

. Then

x_S(f)

should have the following properties:

if $f(S)$ and $g(T)$ are inconsistant, i.e. there is an element $e\in S\cap T$ and $f(e)\neq g(e)$ , then one should have $\langle x_S(f), x_T(g) \rangle=y_{f(S)\land g(T)}=0$ .
if $f(S)\land g(T)$ and $f'(A)\land g'(B)$ are the same event, i.e. $A\cup B=S\cup T$ and the labels are the same, then $\langle x_S(f), x_T(g) \rangle=\langle x_A(f'), x_B(g') \rangle$
$\|x_{\emptyset}\|^2=1$ here $\emptyset$ is the union of all events.
for all $u\in V$ , $\|x_u(0)\|^2+\|x_u(1)\|^2=\|x_{\emptyset}\|^2=1$ .
for $S\subset V, u\in S$ and $f\in \left\{ 0,1 \right\}^{S\setminus \left\{ u \right\}}$ , $x_S(f\land (u=1))+x_S(f\land (u=0))=x_{S\setminus \left\{ u \right\}}(f)$ . (Note that two lhs vectors are orthogonal)

Lemma8pseudo probability

Let

x\in \mathop{\mathrm{Las}}_t(V)

for

t\geq 0

. Then the following holds:

$\|x_S(f)\|^2 \in [0,1]$ for all $|S|\leq t+1$ .
$\|x_S(f)\|^2 \leq \|x_T(g)\|^2$ if $T\subset S$ and $f(t)=g(t)$ for all $t\in T$ .
$\|x_S(f)\|^2 = \sum_{h\in \left\{ 0,1 \right\}^{T-S}} \|x_T(f\land h)\|^2$ if $S\subset T$ .
If $S\in \binom{V}{\leq t}$ , $f\in \left\{ 0,1 \right\}^S$ and $u\notin S$ , then $x_{S+u}(f\land u=1)+x_{S+u}(f\land u=0)=x_{S}(f)$ .

Proof

Let

N_t=\sum_{r=0}^{t+1}\binom{V}{r}2^r

be the number of vectors in

x

. Consider the moment matrix

M_t\in \mathbb{R}^{N_t\times N_t}

, where each entry

M_t[f(S),g(T)]

\langle x_S(f),x_T(g)\rangle

. The moment matrix is positive semidefinite since vectors in

x

form a Gram decomposition of

M_t

Consider the following submatrix of $M_t$ . $\begin{bmatrix} \langle x_\emptyset,x_\emptyset\rangle & \langle x_\emptyset,x_S(f)\rangle\\ \langle x_S(f),x_\emptyset\rangle & \langle x_S(f),x_S(f)\rangle \end{bmatrix}\succeq 0$ Computing the determinant gives us $\|x_S(f)\|^2(1-\|x_S(f)\|^2)\geq 0$ .
Again consider certain submatrix of $M_t$ . $\begin{bmatrix} \langle{x_T(g)},{x_T(g)}\rangle & \langle{x_T(g)},{x_S(f)}\rangle\\ \langle{x_S(f)},{x_T(g)}\rangle & \langle{x_S(f)},{x_S(f)}\rangle \end{bmatrix}\succeq 0$ The determinant is $\|x_S(f)\|^2(\|x_T(g)\|^2-\|x_S(f)\|^2)\geq 0$ .
We only need to show $\|x_S(f)\|^2=\|x_{S+u}(f\land u=0)\|^2 +\|x_{S+u}(f\land u=1)\|^2$ and the rest follows by induction. Note that $x_u(0)+x_u(1)=x_\emptyset$ since we have $\|x_u(0)\|^2+\|x_u(1)\|^2=\|x_{\emptyset}\|^2$ and they are orthogonal. $\begin{aligned} \|x_{S+u}(f\land u=0)\|^2 +\|x_{S+u}(f\land u=1)\|^2 &= \langle{x_S(f)},{x_u(0)}\rangle+\langle{x_S(f)},{x_u(1)}\rangle\\ &= \langle{x_S(f)},{x_u(0)+x_u(1)}\rangle\\ &= \langle{x_S(f)},{x_\emptyset}\rangle=\|x_S(f)\|^2 \end{aligned}$
Notice that $x_{S+u}(f\land u=1)$ and $x_{S+u}(f\land u=0)$ are orthogonal. Denote by $x_S(f')$ the projection of $f$ on the hyperplane spanned by $x_{S+u}(f\land u=1)$ and $x_{S+u}(f\land u=0)$ . One can verify that $f'=x_{S+u}(f\land u=1)+x_{S+u}(f\land u=0)$ . Then it remains to show $\langle x_S(f'), x_S(f)\rangle=\|x_S(f)\|^2$ , which immediately follows from 3.

Then write

x_u=x_{\left\{ u \right\}}(1)

. The follwing “SDP” is a relaxation of sparsest cut.

\begin{aligned} \min& & \frac{\sum_{u < v}c_{u,v}\|x_u-x_v\|^2}{\sum_{u < v}D_{u,v}\|x_u-x_v\|^2}\\ s.t.& & \sum_{u < v}D_{u,v}\|x_u-x_v\|^2&\geq 0\\ & & x\in \mathop{\mathrm{Las}}_r(V)& \end{aligned}

Scaling every

x_S(f)

by a factor of the square root of the objective’s denominator gives us a real SDP.

\begin{aligned} \min& & \sum_{u < v}c_{u,v}\|x_u-x_v\|^2\\ s.t.& & \sum_{u < v}D_{u,v}\|x_u-x_v\|^2&= 1\\ & & x\in \mathop{\mathrm{Las}}_r(V),\|x_\emptyset\|^2&>0 \end{aligned}

The rounding method is too complicated, so it won’t be covered here.

5.2 Matching

This application can be found in section 3.3 of useful_link[1]. We consider the maximum matching IP in non-bipartite graphs. Let

K=\left\{ x\in \mathbb{R}_{\geq 0}^n | \sum_{e\in \delta(v)}x_e\geq 1 \; \forall v\in V \right\}

be the polytope and consider

\mathop{\mathrm{Las}}_t(K)

. In the notes Rothvoss shows the following lemma.

Lemma9

\mathop{\mathrm{Las}}_t^{proj}(K)\subseteq (1+\frac{1}{2t})\cdot\mathop{\mathrm{conv}}(K\cap \left\{ 0,1 \right\}^n)

Proof

Let

y\in \mathop{\mathrm{Las}}_t(K)

. It suffices to show that

\sum_{e\in E[U]} y_e\leq (1+\frac{1}{2t})k

for all

|U|=2k+1

, since

\left\{ x\in K| \text{$x$ satisfies odd constraints} \right\}

is the matching polytope. When

k>t

, the degree constraints imply that

\sum_{e\in E[U]} y_e\leq k+\frac{1}{2} \leq (1+\frac{1}{2t})k

. Now consider the case

k\leq t

. Note that for fixed

U

, any

I\subset E[U]

of size

|I|> k

has

y_I=0

, since it is impossible to find a matching in

U

covering more that

k

vertices. Then by Lemma 4

y

can be represented as a convex combination of solutions

z\in \mathop{\mathrm{Las}}_0(K)

in which

z_e\in \left\{ 0,1 \right\}

for all

e\in E[U]

. The convex combination implies that

\sum_{e\in E[U]} y_e\leq k

when

k\leq t

However, one can see that Lemma 9 is not tight.

\mathop{\mathrm{Las}}_0^{proj}(K)

should be contained in

(1+\frac{1}{2})\cdot\mathop{\mathrm{conv}}(K\cap \left\{ 0,1 \right\}^n)

and

\mathop{\mathrm{Las}}_n^{proj}(K)

should be exactly the integer hull. Can we prove a slightly better gap that matches observations at

\mathop{\mathrm{Las}}_0

and

\mathop{\mathrm{Las}}_n

? The later part of the proof in fact shows that

y\in \mathop{\mathrm{Las}}_t(K)

satisfies all odd constraints with

|U|\leq 2t+1

. Consider an odd cycle with

2t+3

vertices.

(1/2,\ldots,1/2)^T\in \mathbb{R}^{2t+3}

is a feasible solution in

\mathop{\mathrm{Las}}_t(K)

and proves a tight lowerbound of

k+1/2

6 Questions

6.1 Replace $M_t^\ell(y)\succeq 0$ with $\mathop{\mathrm{Las}}_t^{proj}(y)\in K$

~~I don’t see any proof relying on the psdness of slack moment matrices…~~

It turns out that problems occur in the proof of Lemma 4. If

\mathop{\mathrm{Las}}_t(K)

is defined as

\left\{ y|M_t(y)\succeq 0, y^{proj}\in K \right\}

, then we cannot guarantee

z^{(1)},z^{(2)}\in K

. Without Lemma 4,

\mathop{\mathrm{Las}}_n^{proj}(K)

may not be exactly

K\cap \left\{ 0,1 \right\}^n

and the hierarchy seems less interesting? But an alternative formulation (see Sparsest cut, which entirely ignore the slack moment matrices) still allows good rounding even without Lemma 4. Generally speaking, if the psdness of slack moment matrices is neglected, then we won’t have Law of total probability(Lemma 4); However, we still have “finite additivity property of probability measures”(Lemma 8 (3)).

6.2 Separation Oracle for Implicit $K$

Sometimes

K

is given in a compact form. For example, consider finding matroid cogirth.

\begin{aligned} \min& & \sum_{e\in E} x_e& & &\\ s.t.& & \sum_{e\in B} x_e&\geq 1 & &\forall \text{ base $B$}\\ & & x_e&\geq 0 & &\forall e\in E \end{aligned}

K

is only accessable through a separation oracle, is it possible to optimize over

\mathop{\mathrm{Las}}_t(K)

in polynomial time for constant

t

References

[1]

M. Laurent, A Comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre Relaxations for 0–1 Programming, Mathematics of Operations Research. 28 (2003) 470–496 10.1287/moor.28.3.470.16391.

[2]

A. Schrijver, Polyhedral proof methods in combinatorial optimization, Discrete Applied Mathematics. 14 (1986) 111–133 10.1016/0166-218X(86)90056-9.

[3]

A.R. Karlin, C. Mathieu, C.T. Nguyen, Integrality Gaps of Linear and Semi-Definite Programming Relaxations for Knapsack, in: Integer Programming and Combinatoral Optimization, Springer, Berlin, Heidelberg, 2011: pp. 301–314 10.1007/978-3-642-20807-2_24.

[4]

V. Guruswami, A.K. Sinop, Approximating non-uniform sparsest cut via generalized spectra, in: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, USA, 2013: pp. 295–305.

在 KaTeX 中使用 Fira Math

Yu Cong — Sun, 25 May 2025 00:00:00 UT

Tags: katex

现在有一个半成品可用 https://github.com/congyu711/KaTeX

Fira Math 这字体很不错，想要在 KaTeX 中使用它.

有用的链接:

1 KaTeX 字体部分如何工作?

关于 fonts 和 metrics 有关的东西首先要看dockers/fonts/. 看起来 fonts/里面的字体是从 docker 中安装的 texlive 的 computer modern fonts 提取的. 然后dockers/fonts/buildMetrics.sh 调用 src/metrics里面的代码，直接从有关的 tfm 文件里读一些 metric 信息.

(我猜)生成的 ttf,woff2 等字体是保存 unicode -> (字形 + 一些信息) 这个映射关系的文件.

src/metrics/mapping.pl 就是从 tex math 到 unicode 的映射. 做完这个输出这样的东西

{
    ...
    "AMS-Regular": {
        "65": {
            "char": 65,
            "yshift": 0,
            "xshift": 0,
            "font": "msbm10"
        },
        ...

之后输入到src/metrics/extract_tfms.py 得到

{
    "AMS-Regular": {
        "65": {
            "depth": 0.0,
            "height": 0.68889,
            "italic": 0.0,
            "skew": 0.0,
            "width": 0.72222
        },
        ...

最后会再做点工作然后把 json 塞入src/fontMetricsData.js. 我猜如果成功的把 Fira Math 的 metric 弄到这个 js 文件里，接下来改改 css 里的字体然后 rebuild 就行了.

采集 metric 和生成 fonts 的过程是独立的，所以看起来大部分东西都是可以复用的.

2 Plan

由于 KaTeX 本身字体分成很多小文件, 首先我也把 Fira Math 分成小文件
然后想办法搞到正确的 src/metrics/mapping.pl for Fira Math
最后改改css或者先rebuild再改css之类的.

在与 Fira Math 的作者交流之后意识到:

在 1. 中把Fira Math 像computer modern一样分成小文件可能可以避免让我需要修改大量KaTeX代码. KaTeX这样做是因为cm在TeX中是Type 1 font, 单个文件只能容纳256字符.
Fira Math是Open Type font. glyph id 到对应字符的 unicode 码位的映射已经包含在了字体的cmap表中. KaTeX生成的html里是unicode字符, 所以我不需要为Fira Math修改src/metrics/mapping.pl, 但是仍然需要对应的src/fontMetricsData.js.

3 Extract fonts

KaTeX 把\mathbb{A} \mathcal{B} 等等符号全部映射到普通拉丁字母 A,B 上，这就导致需要在映射的时候做点工作. 而且另一个问题是 KaTeX_AMS-Regular 等字体在使用 unicode private use area 表示一些 unicode 中缺少的符号，比如src/fonts/makeFF中有这样的代码

    0x0A => 0xE010,         # \nleqslant
    0x0B => 0xE00F,         # \ngeqslant

因为这些不是我会用的符号，我决定先不管它.

KaTeX_Main-Regular, KaTeX_Math-Italic.ttf 这些字体表现比较正常，基本都是 Fira Math 的子集，用这里的代码转换. 对于 KaTeX_AMS-Regular 和 KaTeX_Size{k} 这些字体就需要单独处理，我决定下载 fontforge 手动处理. ~~Caligraphic 和 Fraktur 字体 Fira Math 中也没有覆盖~~ , Fira Math 本来就是 sans serif 字体，所以我觉得其他字体都可直接用 KaTeX 版本.

3.1 Italic

~~看起来 Fira Math 字体并无斜体，查了查貌似 TeX 是通过 OpenType MATH table 来实现斜体，我需要想想如何搞到 Fira Math 的斜体版本…~~ 原来 Fira Math 是有斜体的，用 fontforge 修改比手写 unicode 映射要简单点(目前只有常用符号做了修改，很多东西不能正常工作) KaTeX 的很多设计让人疑惑，为什么不做像 unicode-math 一样的设计，把所有东西映射到正确的 unicode 上呢？又比如

\neq

, KaTeX 的做法是把一个

=

和一个类似

\setminus

的东西组合起来，而且在字体中这个像

\setminus

的字符被映射到了 private use area.

Fira Math specimen 有点老旧，实际上其中很多 missing glyph 现在已经补上了. Texlive 里的 firamath-regular.otf 貌似也是旧版本，比如缺少(0x2216 ∖)

事实上从 KaTeX 的 commit history 能看到字体主要是 ylemkimon 的工作. 我发邮件问了问他这些问题，没有得到回复. 不过我看到这个知乎回答之后觉得映射问题可能是为了兼容 screen reader? 不过我不觉得这样做就能让 screen reader 正确读出公式. 另外要谢谢曾祥东老师，他是 Fira Math 的主要作者，读博客和介绍 FiraMath 的知乎回答都让我学到很多东西.

如果只是更换字体的话，很多东西看起来有点奇怪

但是我觉得最常用的 LP, SDP 之类的规划问题显示起来效果还不错

以下是 mathjax 4.0

\begin{equation} \begin{aligned} \min& & \sum_x \delta_x& & &\\ s.t.& & (1-\delta_x - \delta_y) d^2(x,y)\leq \|v_x-v_y\|^2 &\leq (c^2+(\delta_x+\delta_y)f(k)) d^2(x,y) & &\forall x,y\in X\\ & & \delta_x\in [0,1], v_x&\in \mathbb{R}^p & &\forall x\in X \end{aligned} \end{equation}

4 接下来…

现在 Typst 看起来是个输出有点排版需求的 html 内容的好选择. 这里是个例子，可以看到公式被 Typst 变成了 svg. 效果不错，但是不知道如果有很多公式的话绘制一堆 svg 会不会很慢，页面会不会变得很大.

如果要修改 KaTeX 的话，我觉得会有很大的工作量. 一方面 KaTeX 本身有点古老，很多功能的实现方式太局限了. 另一方面我不懂 js 也不懂排版，估计需要了解 open type 字体、unicode-math 的工作方式等.

4.1 Typst svg

用这个来让 Typst 在 html 导出中把所有公式都变成 svg.

#show math.equation: html.frame
#show math.equation.where(block: false): box

typst compile input.typ -f html htmloutput.html --features html

测试用的文档是 https://typst.app/project/rgLBUHLLRwTyTh16Ej3qlM

+-----------------+-------+
|    outputs      | size  |
+-----------------+-------+
| PDF-XeTeX       | 25KB  |
| PDF-Typst       | 39KB  |
| html-Typst      | 194KB |
| FiraMath font   | 122KB |
+-----------------+-------+

5 MathJax 4.0 is out

https://docs.mathjax.org/en/latest/upgrading/whats-new-4.0.html

MathJax 支持 FiraMath. 所以不用再修改 KaTeX.

PDF 阅读器

Yu Cong — Fri, 23 May 2025 00:00:00 UT

Tags: macos

要研究东西总是需要读纸，纸都是 pdf, 所以好用的 pdf reader 很有用.

我在 macOS 上用过这些阅读器:

skim. skim 还挺好用的，缺点是快捷键太不灵活，如果能自己设置 jk 翻页之类的就好了，之前修改了代码，勉强可用，但是后来花了很长时间问 AI 工具都没搞明白如何更新目录和页码.
Preview 是一坨屎. 为什么文件修改之后要 focus 才能更新页面？设置的视图在文件重新加载也不能恢复. 这就导致完全无法用 Preview 来 preview :( skim 和 Preview 都用 Apple pdfkit, 在 app 显示了纵向滚动条的情况下 pdfkit 不能正确处理 pdf 页面宽度. (见 skim bug report)
我的解决方法是调整 skim 显示 scrollbar 的模式
```
defaults write net.sourceforge.skim-app.skim AppleShowScrollBars -string "WhenScrolling"
```
Preview 在 pdf 里面的搜索功能据说会漏掉一些.
PDF.js(in Firefox) 比 chrome 的阅读器好很多. 某个版本是使用j,k移动页面的(也许我记错了，是 chrome 的插件 ), 但是目前变成了翻页. 这种按键逻辑很奇怪，为什么已经有了n,p来翻页还要把j,k设置成这样的功能？即使有人想用j,k来翻页，那么使用 singe page mode 也可以. sumatra 的快捷键就是这样设计的，可惜只能在 windows 上用. 如果用 PDF.js 来和编辑器一起写纸的话， vscode 的插件可以在 tab 里用 PDF.js 预览，编译之后自动刷新，速度有点慢.
zotero 内置的 reader. 优点是可以准确的显示 link 到的东西(skim 有类似功能，但是不能准确显示到目标), 但是滚动起来好卡，而且 zotero 的 tab 用起来很难受. 但是 zotero 有个很好的插件 https://github.com/retorquere/zotero-open-pdf, 让你来选择用什么 reader 打开.

对于写 latex 来说，实际上不需要频繁看编译出来的内容. 我觉得这种 Emacs mode 就很好，但是我不想花时间学如何用 Emacs

用jk移动页面是 Okular 的习惯. 如果能修改 skim, 加入一个修改快捷键的功能会很好，但是对于不懂 objective-c 的人来说可能太难了. 另一个困难是 skim 源码在 SourceForge 上，要用 svn.

Zed theme for VSCode

Yu Cong — Mon, 19 May 2025 00:00:00 UT

Tags: zzz

Zed is a great editor. I especially like the One Light theme in Zed. But for some reasons I cannot fully switch to it.

Throw the following into settings.json to make the Light Modern theme looks like Zed’s One Light.

// colors from zed's one light theme
"editor.lineHeight": 1.6,
"workbench.colorTheme": "Default Light Modern",
"workbench.colorCustomizations": {
    "titleBar.activeBackground": "#dcdcdd",
    "titleBar.inactiveBackground": "#dcdcdd",
    "editor.lineHighlightBackground": "#efeff0",
    "editor.lineHighlightBorder": "#efeff0",
    "tab.activeBackground": "#fafafb",
    "tab.inactiveBackground": "#ebebec",
    "tab.border": "#c9c9ca",
    "tab.activeBorder": "#fafafb",
    "tab.activeBorderTop": "#fafafb",
    "tab.selectedBorderTop": "#fafafb",
    "editorGroupHeader.tabsBackground": "#ebebec",
    "editorGroupHeader.border": "#ebebec",
    "statusBar.background": "#dcdcdd",
    "activityBar.background": "#ebebec",
    "editorCursor.foreground": "#526FFF"
},
"editor.tokenColorCustomizations": {
    "textMateRules": [
        {
            "scope": [
                // in light_plus
                "constant.character",
                "constant.other.option",
                // in light_vs
                "meta.embedded",
                "source.groovy.embedded",
                "string meta.image.inline.markdown",
                "variable.legacy.builtin.python",
                "constant.language",
                "meta.preprocessor",
                "entity.name.function.preprocessor",
                "storage",
                "storage.type",
                "storage.modifier",
                "keyword.operator.noexcept",
                "string.comment.buffered.block.pug",
                "string.quoted.pug",
                "string.interpolated.pug",
                "string.unquoted.plain.in.yaml",
                "string.unquoted.plain.out.yaml",
                "string.unquoted.block.yaml",
                "string.quoted.single.yaml",
                "string.quoted.double.xml",
                "string.quoted.single.xml",
                "string.unquoted.cdata.xml",
                "string.quoted.double.html",
                "string.quoted.single.html",
                "string.unquoted.html",
                "string.quoted.single.handlebars",
                "string.quoted.double.handlebars",
                "punctuation.definition.template-expression.begin",
                "punctuation.definition.template-expression.end",
                "punctuation.section.embedded",
                "keyword",
                "keyword.control",
                "keyword.operator.new",
                "keyword.operator.expression",
                "keyword.operator.cast",
                "keyword.operator.sizeof",
                "keyword.operator.alignof",
                "keyword.operator.typeid",
                "keyword.operator.alignas",
                "keyword.operator.instanceof",
                "keyword.operator.logical.python",
                "keyword.operator.wordlike",
                "variable.language",
            ],
            "settings": {
                "foreground": "#4e77ea"
            }
        },
        {
            "scope": [
                "constant.character.escape"
            ],
            "settings": {
                "foreground": "#AA4A44",
            }
        },
    ]
},

"Gotchas" in Combinatorial Optimization

Yu Cong — Wed, 30 Apr 2025 00:00:00 UT

Tags: optimization, alg

It would be fun to have a list of misleading ideas and traps in CO. I will update this post for new problems.

1 Polytope for s-t cut

\begin{equation} \begin{aligned} \sum_{e\in p} x_e &\geq 1 & &\forall \text{ $s$-$t$ path $p$}\\ x_e &\geq 0 & &\forall e\in E \end{aligned} \end{equation}

Is this polytope integral?

Initially I thought, we have max flow min cut thm and flow polytope is integral and this path formulation of s-t cut should be the dual of flow problem and thus it is integral. However, hitting spanning tree (dual to tree packing) formulation has a integrality gap of 2. Make sense.

However, LP(1) is not the dual of Max-flow.

2 Size of support

Given a linear program with a rank

r

constraint matrix, what is the size of support of its optimal solution?

This is mentioned in a previous post. The description there is not precise. Consider the following linear program,

\begin{aligned} \min& & c^Tx& \\ s.t.& & Ax&\leq b\\ & & x&\geq 0 \end{aligned}

Let

r

be the rank of

A

. We may assume

b\geq0

. For any

c

that this LP has a bounded solution, there must exist an optimal solution

x^*

with support at most

r

. There are at most

r

tight constraints in

Ax\leq b

and hence the optimal solution

x^*

lies in the intersection of

\mathbb{R}^n_+

and a rank

\geq n-r

affine subspace, which is a convex polyhedron. If the LP has a bounded solution, then the optimal

x^*

must be some vertex of the polyhedron. To make a point in our affine subspace a vertex in the polyhedron, we need at least

n-r

hyperplanes

x_i=0

, each of the hyperplanes gives us a zero coordinate. Thus for any bounded solution, the support is at most

r

What about integer programs? One may think that the support should also be at most

r

, since the 0s in the optimal solution of its linear relaxation are integers. But rounding the fractional coordinates may break the feasibility. The currently best upperbound is roughly

m (3\|A\|_1+\sqrt{\log( \|A\|_1 )})

[1].

3 Cocircuit space of binary matroids

This comes from the proof of Lemma 4.2 in [2].

First, recall that we have the following property for binary matroid.

Proposition1[3,Proposition 9.2.2]

Let

A

be a binary representation of a rank-

r

binary matroid

M

. Then the cocircuit space of

M

equals the row space of

A

. Moreover, this space has dimension

r

and is the orthogonal subspace of the circuit space of

M

To see this proposition, consider doing some row operations and deleting zero rows to make the matrix

A

become

[I_r| D]

. Note that row operations do not change the row space. The set of the first

r

columns is a base of

M

and the set of the rest

n-r

columns is a cobase. Denote this base by

B

and the cobase by

B^*

. Now the support of each row is a fundamental cociruit with respect to

B^*

. To see this claim, we have several approaches:

The dual matroids of $[I_r|D]$ is $[-D^T|I_{n-r}]$ . Since we are working with $\mathbb{F}_2$ , the dual is just the binary matroid $[D^T|I_{n-r}]$ . The claim follows easily.
Take a row from $[I_r| D]$ and let $F^*$ be the set of columns with value $0$ . One can easily see that this is a hyperplane since the rank is $r-1$ and any column not in $F^*$ has $1$ in that row, thus adding any element in $E\setminus F^*$ will increase the rank. Then the complement of $F^*$ is a cocircuit.

The claim shows that the cocircuit space contains the row space. Now we prove the other side. Every circuit is a fundamental circuit to some base. Thus we can choose different cobase

B^*

and do the same argument to show that every circuit in the row space. Hence, the cocircuit space is the same as row space of

A

Here is the problem: Given a binary matroid

A\in \mathbb{F}_2^{r\times n}

and a cocycle

C^*

A

, there exists a vector

y\in \mathbb{F}_2^r

such that

C^*

remains a cocycle in

y^T A

With Proposition 1, you may think that this looks fine. The incidence vector of any cocycle is in the row space and

y^T A

is a linear combination of row vectors.

Counterexample: Consider the following binary matroid.

\begin{bmatrix} 0 & 0 & 0 & 1 & 1& 1& 1 \\ 0& 1&1&0&0&1&1\\ 1&0&1&0&1&0&1\\ \end{bmatrix}

Any single column vector is a flat. So its complement should be a cocycle. However, it is easy to see that

\begin{bmatrix} 0 & 1 & 1 & 1 & 1& 1& 1 \end{bmatrix}

is not in the row space.

The bug is the definition of cocycle and cocircuit space. Cocycle is union of cocircuits but things in cocircuit space is the symmetric difference of cocircuits. (Or if you prefer to call the disjoint union of circuits a cycle, the complement of flat is union of cocircuits not cocycle.)

References

[1]

S. Berndt, H. Brinkop, K. Jansen, M. Mnich, T. Stamm, New support size bounds for integer programming, applied to makespan minimization on uniformly related machines, in: 34th International Symposium on Algorithms and Computation (ISAAC 2023), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023: pp. 13:1–13:18 10.4230/LIPIcs.ISAAC.2023.13.

[2]

J. Geelen, R. Kapadia, Computing Girth and Cogirth in Perturbed Graphic Matroids, Combinatorica. 38 (2018) 167–191 10.1007/s00493-016-3445-3.

[3]

J.G. Oxley, Matroid theory, 2nd ed, Oxford University Press, Oxford ; New York, 2011.

Proving integrality gap for LPs

Yu Cong — Tue, 01 Apr 2025 00:00:00 UT

Tags: optimization, LP

Proving integral gap of linear programs are generally hard. It would be great if one can classify LPs with a constant gap. It is known that deciding whether a polyhedron is integral is co-NP-complete [1]. I am interested in techniques for proving constant upperbound for integral gaps of linear programs.

Here are some methods with examples that I read in books and papers.

1 Counting

Just do the counting.

1.1 tree packing theorem

Example1

Consider the following integer program on graph

G=(V,E)

\begin{align*} \lambda=\min& & \sum_{e\in E} x_e& & & \\ s.t.& & \sum_{e\in T} x_e&\ge 1 & &\forall T\in \mathcal T \\ & & x_e&\in\mathbb{Z}_{\ge 0} & &\forall e\in E \end{align*}

where

\mathcal T

is the set of spanning tree in

G

. Let

\tau

be the optimum of the linear relaxation.

Theorem2

\lambda \le 2 \tau

Note that the optimum solution to

\lambda

is the minimum cut in

G

. It is known that

\tau

is the maximum tree packing in

G

and

\tau=\min\limits_{F\subset E}\frac{|E-F|}{r(E)-r(F)}

, where

r

is the rank of the graphic matroid on

G

[2]. Then the proof is a simple counting argument.

Proof

G

is not connected, Let

G_1,...,G_k

be the set of components in

G

. One can easily see that the gap of

G

is at most the largest gap of the components. Thus considering connected graphs is sufficient.

We fix

F^*\in \arg\min \frac{|E-F^*|}{r(E)-r(F^*)}

r(E)-r(F^*)

must be positive and

E-F^*

is a cut in

G

. Suppose

E-F^*

is any cut in

G

. Let

S_1,...,S_h

be components in

G\setminus (E\setminus F^*)

. For any

S_i

, the set of edges with exactly one endpoint in

S_i

(denoted by

e[S_i]

) must contain a cut of

G

since the

G

is connected. One can see that

2|E-F^*|=\sum_i |e[S_i]|\ge \lambda (r(E)-r(F))

since the number of component is

r(E)-r(F^*)

1.2 $k$ -cut

Example3

\begin{align*} \lambda_k=\min& & \sum_{e\in E} c(&e)x_e & & \\ s.t.& & \sum_{e\in T} x_e&\ge k & &\forall T\in \mathcal T \\ & & 0\le x_e&\le 1 & &\forall e\in E \end{align*}

Theorem4

The integral gap of

\lambda_k

is at most

2(1-1/n)

The proof is in section 5 of [3]. Here is a sketch.

For

k

-cut we cannot use the simple counting argument since the dual LP is not a tree packing. (the LP dual needs extra variables

z_e

for constraints

x_e\le 1

.) However, it is still easy to find an upperbound for the integral optimum. If we sort vertices in increasing order of their degree, that is,

\mathop{\mathrm{deg}}(v_1)\le \dots \le \mathop{\mathrm{deg}}(v_n)

, then

\sum_{i=1}^{k-1} \mathop{\mathrm{deg}}(v_i)

is an upperbound for integral

k

-cut. Then it is easy to prove that if the optimal solution

x^*

\lambda_k

is fully fractional (

x_e^*\in (0,1)

for all

e\in E

), then the gap is

2(1-1/n)

. The proof is to use complemantary slackness conditions, i.e.,

z_e=0,\sum_{e\in T}y_T=c(e)\;\forall e\in E

. The following observations reduce general

x^*

to fully fractional case:

Given an optimal solution $x^*$ , let $X$ be the set of edges $e$ such that $x_e^*=0$ . The optimum to $\lambda_k$ on $G/X$ is the same as on $G$ .
For an optimal solution $x^*$ , let $F$ be the set of edges $e$ such that $x_e^*=1$ . Let $x^*|_{E-F}$ be the restriction of $x^*$ to $E-F$ . $x^*|_{E-F}$ is a fully fractional optimum solution to $\lambda_k$ . (Some discussions are needed for the number of components in $G\setminus F$ . The reduction can be done using the fact that if $1\leq \frac{\lambda}{\sigma}\le c$ then $1\le \frac{\lambda+k}{\sigma+k}\le c$ .)

2 Rounding

A constant factor approximation algorithm based on LP may imply a constant upperbound of the corresponding LP.

Examples:

vertex cover. simple threshold rounding or the LP structure (there is a half integral optimal solution)
facility location. Dartmouth
CKR relaxation of multiway cut uiuc cs583 sp18
uniform labeling (multiway cut with assignment cost) FOCS’99
generalized steiner network problem with skew-supermodular requirements. iterative rounding. uiuc cs598 sp09

3 Intermediate problem

I read about this in [4]. Suppose that we want to prove constant gap for

LP1

. The idea is to find another LP (say

LP2

) which is integral or has constant gap and to prove that

\frac{\mathop{\mathrm{OPT}}(IP1)}{\mathop{\mathrm{OPT}}(IP2)}\le c_1

and

\frac{\mathop{\mathrm{OPT}}(LP2)}{\mathop{\mathrm{OPT}}(LP1)}\le c_2

. Finally we will have something like this,

\begin{equation} \mathop{\mathrm{OPT}}(IP1)\le c_1\mathop{\mathrm{OPT}}(IP2)\le c_1 \mathop{\mathrm{OPT}}(LP2)\le c_1 c_2 \mathop{\mathrm{OPT}}(LP1) \end{equation}

3.1 minimum $k$ -edge-connected spanning subgraph

This problem is a special case of 5 in the rounding part.

We want to prove that the integral gap for the following LP is 2.

\begin{align*} LP1=\min& & \sum_{e\in E} w(e&)x_e & &\\ s.t.& & \sum_{e\in C} x_e&\ge k & &\forall \text{cut $C$}\\ & & 0\le x_e &\le 1 & &\forall e\in E \end{align*}

(Finding the minimum

k

-edge-connected spanning subgraph of

G=(V,E)

)

Now we construct LP2. Consider the bidirection version of

G

, denoted by

D=(V,A)

where

A=\{(u,v),(v,u) \quad \forall (u,v)\in E\}

. Pick a special vertex

r

\begin{align*} LP2=\min& & \sum_{e\in A} w(e)&y_e & & \\ s.t.& & \sum_{e\in \delta^+(S)} y_e&\ge k & &\forall S\subset V \land r\in S\\ & & 0\le y_e &\le 1 & &\forall e\in E \end{align*}

(Finding min k-arborescence)

It is known that the polytope in LP2 is integral [2]. Given any feasible solution of LP, for any edge

e=(u,v)\in E

we set

y_{(u,v)}=y_{(v,u)}=x_e

. Thus the optimum of LP2 is no larger than

2\mathop{\mathrm{OPT}}(LP)

since

y

is always feasible.

On the other hand, given a feasible integral solution

y

of LP2, we set

x_e=1

if any orientation of

e

is in

y

. It is clear from the definition of LP2 that

x_e

is a feasible integral solution of LP. Hence, applying eq(1) proves that the integral gap of LP is 2. (Note that in this example

c_1=1

and

c_2=2

4 Notes

There are many discussions about the integrality gap on cstheory.

It seems that the integrality gap has a deep connection with hardness of approximation. There are two kinds of problems that i find particularly interesting.

The LP has a relatively large gap, but some algorithm based on that LP achieves a better approximation than the gap.(see this FOCS’09 paper)
The integrality gap is small (a constant), but approximation algs based on the LP cannot do that good. (Zhiyi Huang gave a talk at UESTC recently. https://arxiv.org/abs/2509.17029 The correlated Pandora’s problem has a natural LP formulation with gap $<4$ , while it is NP-hard to aproximate it within a ratio of $4-\epsilon$ .)

References

[1]

G. Ding, L. Feng, W. Zang, The complexity of recognizing linear systems with certain integrality properties, Mathematical Programming. 114 (2008) 321–334 10.1007/s10107-007-0103-y.

[2]

A. Schrijver, Combinatorial optimization Polyhedra and Efficiency, Springer, 2004.

[3]

C. Chekuri, K. Quanrud, C. Xu, LP Relaxation and Tree Packing for Minimum $k$-Cut, SIAM Journal on Discrete Mathematics. 34 (2020) 1334–1353 10.1137/19M1299359.

[4]

P. Chalermsook, C.-C. Huang, D. Nanongkai, T. Saranurak, P. Sukprasert, S. Yingchareonthawornchai, Approximating k-Edge-Connected Spanning Subgraphs via a Near-Linear Time LP Solver, in: 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2022: pp. 37:1–37:20 10.4230/LIPIcs.ICALP.2022.37.

Detecting base orderable matroids is NP-hard

Yu Cong — Sat, 22 Mar 2025 00:00:00 UT

Tags: matroid

This is posted on https://mathoverflow.net/questions/194006/. But I guess no one cares about negetive results on the algorithmic part of base orderability. The OP left mathoverflow 7 years ago.

Detecting (strong) base orderability of a general matroid is not in P under the independence oracle model. The proof idea is to combine an excluded minor theorem for base-orderability on paving matroids and Seymour&Walton’s theorem on the complexity of detecting matroid minors.

Thm1 [Bonin and Savitsky, https://arxiv.org/pdf/1507.05521] A paving matroid is base orderable iff it has no $M(K_4)$ minor.

Let

\mathscr C\not=\emptyset

be a collection of matroids. We say

\mathscr C

is detectable if one can check for a matroid

M

whether

M

contains any

M'\in\mathscr C

as a minor in polynomial number of oracle calls.

Thm2 [Seymour and Walton] If $\mathscr C$ is detectable, then $\mathscr C$ contains at least one uniform matroid.

In order to show

M(K_4)

is not detectable in paving matroids, we can prove that Thm2 holds on paving matroids. The proof in Seymour and Walton’s paper constructs a special matroid

M(A,B,E)

based on any matroid

M'\in \mathscr C

(, which is non-uniform by assumption). They show that the constructed matroid requires an exponentional number of oracle calls to be distinguished from a uniform matroid. Their construction is as follows:

Let

E

be the ground set and

r

the rank function of

M'

. The groundset

S

M(A,B,E)

has size

2t+|E|

where

t

is any positive integer. Let

A,B,E

be a partition of

S

where

|A|=|B|=t

. A subset

X\subseteq S

is independent iff following conditions are both met,

$|X\cap A|+|X\cap C|\le t+r(X\cap C)$ ,
$|X|\leq t+r(M')$

In fact,

M(A,B,E)

is paving if

M'

is paving. Thus Seymour’s proof applies to paving matroids and detecting

M(K_4)

minor requires exponentional time even in paving matroids.

For strong base orderability, there is a infinite set of excluded minors[https://arxiv.org/pdf/1507.05521, section 9]. However, none of the excluded minors is unifor. I think Seymour and Walton’s proof still applies.

Seymour, P. D.; Walton, P. N., Detecting matroid minors, J. Lond. Math. Soc., II. Ser. 23, 193-203 (1981). ZBL0487.05016.

Bonin, Joseph E.; Savitsky, Thomas J., An infinite family of excluded minors for strong base-orderability, Linear Algebra Appl. 488, 396-429 (2016). ZBL1326.05025.

LP with box constraints

Yu Cong — Thu, 13 Mar 2025 00:00:00 UT

Tags: optimization, LP

In [1] the following edge-connectivity related problem is studied.

Problem1

Given a undirected weighted graph

G=(V,E)

, a weight function

w:E\to \mathbb{Z}_+

and an integer

k

, find a

k

-edge-connected spanning subgraph of

G

with the minimum weight.

Consider the following LP relaxation,

\begin{align*} \min& \quad \sum_{e\in E} w(e)x_e\\ s.t.& \quad \sum_{e\in \delta(S)}x_e\geq k &&\forall S\subset V\\ & \quad 0\le x_e \le 1 &&\forall e\in E \end{align*}

Note that

x_e\le 1

is necessary since each edge can only be chosen once. In the paper the authors mentioned that this kind of constraints are called box constraints and they usually make LPs difficult to solve (for example, positive covering LPs can be solved through MWU). There is also a box-free version of LP relaxation,

\begin{align*} \min& \quad \sum_{e\in E} w(e)x_e\\ s.t.& \quad \sum_{e\in C\setminus S}x_e\geq k-|S| &&\forall \text{ cut } C\\ & &&\forall S\in \left\{ F: |F|\le k-1 \wedge F\subset C \right\}\\ & \quad \phantom{\sum_{e\in C\setminus S}} x_e\ge 0 &&\forall e\in E \end{align*}

which is box-free but includes exponentially many extra constraints. I will call the first LP boxLP and the second one boxlessLP. Any feasible solution to the boxLP is also feasible in the boxlessLP. For any

x_e>1

in a feasible solution of boxlessLP, we consider those cuts containing

e

. For such a cut

C

\sum_{f\in C\setminus e} x_f\geq k-1

and thus

\sum_{e\in C} x_e>k

. Since this holds for any cut

C

containing

e

, we can certainly decrease

x_e

for a smaller objective. Hence, in the optimal solution to boxlessLP, every

x_e

is less than or equal to 1.

Kent Quanrud also used this technique in

k

-cut LP [2].

In fact, one can see from the proof that enumerating all singletons

F=\left\{ f \right\}

is sufficient.

References

[1]

[2]

K. Quanrud, Fast and deterministic approximations for k-cut, LIPIcs, Volume 145, APPROX/RANDOM 2019. 145 (2019) 23:1–23:20 10.4230/LIPICS.APPROX-RANDOM.2019.23.

Vertex and Edge Connectivity Interdiction

Yu Cong — Thu, 13 Feb 2025 00:00:00 UT

Tags: alg, combinatorics, optimization

As a natural generalization of min-cut, the following problem seems interesting to me,

Problem1

Given a graph

G=(V,E)

and an integer

k

, find the minimum edge set whose removal breaks

k

-vertex connectivity?

Alternatively, one can consider a closely related version of Problem 1,

Problem2

Given a graph

G=(V,E)

and an integer

k

, find an edge set

F\subset E

with size at most

k

whose removal minimizes the vertex connectivity of

G-F

This problem can be called the “vertex connectivity interdiction”. One can also consider the “algebraic connectivity interdiction” (the second smallest eigen value of the Laplacian matrix).

In fact, I think Problem 2 is not well motivated. I am interested in it since there may be connections between breaking vertex connectivity and breaking combinatorial rigidity. However, it seems strange to consider interdiction problems on edge set for vertex connectivity. For vertex removal, this problem is easy (by splitting vertex and finding mincut).

1 Checking $k$ -vertex connectivity

Checking if a given graph is

k

-vertex connected can be done in polynomial time. By Menger’s theorem we need to check if for every vertex pair

(s,t)

there are at least

k

vertex disjoint paths (excluding

s

and

t

) connecting

s

and

t

. The number of vertex disjoint paths between

s

and

t

can be easily computed through max flow. We replace every internal vertex

v

with two copies

v_{in}

and

v_{out}

and add directed edges from

N(v)

v_{in}

and from

v_{out}

N(v)

with capacity 1. The integral max flow is the number of internally disjoint paths between

s

and

t

. Since the constraint matrix of flow problems is TU, maximizing the flow gives us the vertex connectivity.

Instead of computing the flow for every pair, if we want one flow that gets the demand for every vertex pair $(i$ , the problem becomes much harder. This is Multi-commodity flow problem.

Currently the fastest algorithm for computing vertex connectivity is ~~[1]~~ https://dl.acm.org/doi/10.1145/3406325.3451088. There is a nice table for a summary of connectivity related algorithms.

[1] appears in the last line. The conference version was published on FOCS96.

1.1 Minimum cut for edge connectivity

Finding the minimum edge set whose removal breaks the

k

-edge connectivity is “easy”. It is known that the global min-cut is the edge connectivity number. ~~Thus we can simply compute the min-cut and remove any number of the edges as needed.~~ — which is not true! For unit edge weights this problem is indeed that easy. However, for general weights edge-connectivity cut is not as easy as min-cut. This problem is actually the unit cost version of connectivity interdiction. See next section for more details.

1.2 Minimum cut for vertex connectivity

With the knowledge of how to compute vertex connectivity, we try to compute the minimum cut for

k

-vertex connectivity in a similar way. First we can find the vertex pair

(s,t)

with the smallest number of internally disjoint paths. Note that we are dealing with the modified graph when computing the vertex connectivity number with flow. Hence the min-cut may contain edges that are not in the original graph, i.e., the edges connecting

v_{in}

and

v_{out}

. For example, consider a graph where every edge has multiplicities 2. The min-cut reported by the flow algorithm should only contain edges between

v_{in}

and

v_{out}

There is a list of open problems on vertex(node) connectivity. ~~I guess Problem 1 is NP-hard but cannot prove it.~~

Update on 2025.3.4 It turns out that finding the minimum edge set whose removal breaks

k

-connectivity (for fixed

k

) is quite easy.

Theorem3

Let

k

be a fixed integer. Given a graph

G

, the minimum edge set breaking

k

-connectivity can be found in polynomial time. In fact, the solution is

\min \left\{ \operatorname{mincut}(G-X)|X\subset V, |X|\leq k-1 \right\}

Proof

One can verify if

G

k

-connected in polynomial time. So we assume that

G

k

-connected, for otherwise the solution is emptyset.

Suppose for a contradiction that there is a minimum edge set

F

that breaks

k

-connectivity and is not in

\left\{ \operatorname{mincut}(G-X)|X\subset V, |X|\leq k-1 \right\}

. Then one can see that

G-F

k-1

-connected and there must be a set of

k-1

vertex whose removal disconnects the

G-F

. We write

N_G(X)

for the set of edges incident to some vertex in

X

in graph

G

. Thus we have

F'=F+N_{G-F}(X)

is a cut of

G

. One can see that

F'-N_G(X)

is also a cut of

G-X

and is not smaller than

\min\{ \operatorname{mincut}(G-X)\}

for all

|X|=k-1

, a contradiction.

However, the capacitated version of Problem 2 and Problem 1 is hard. Given a graph

G=(V,E)

and edge weights

w:E\to \mathbb{Z}_{\geq 0}

and costs

c:E\to \mathbb{Z}_{\geq 0}

and a budget

b\geq 0

, find edge set

F\subset E

such that

c(F)\leq b

and such that removing

F

minimizes the vertex connectivity of

G-F

. Similar to the edge connectivity case (which will be shown in the next section), if the cost

c

is nontrivial then kanpsack is a special case of this problem. (Consider

G=K_n

for some large

k

. Pick a

K_{n-1}

G

and set the cost of edges in

K_{n-1}

to infinity.) How hard is Problem 2 if costs are trivial?

2 (Edge) Connectivity interdiction

If we replace the vertex connectivity in Problem 2 with edge connectivity, then the problem is called connectivity interdiction and was first studied by Zenklusen [2].

Problem4

Given a graph

G=(V,E)

and costs

c:E\to \mathbb{Z}_+

and weights

w:E\to \mathbb{Z}_+

and a budget

B\in \mathbb{Z}_+

, find the edge set

R

such that

c(R)\leq B

and that minimizes the

w

-weighted min cut in

(V,E\setminus R)

The Fault-Tolerant Path problem (FTP) seems sililar to Problem 4. In FTP problem, we are given a edge-weighted directed graph

G=(V,E)

, a subset

U \subseteq E

of vulnerable edges, two vertices

s,t\in V

and integers

k

and

\ell

. The task is to decide whether there exists a subgraph

H

G

with total cost at most

ℓ

such that, after the removal of any

k

vulnerable edges,

H

still contains an

s

t

-path. The problem degenerates into finding

k

-edge connected spanning subgraph if the set of vulnerable edges is

E

A recent paper [3] gives an FPTAS for the problem. Here I try to develop the intuition since I have never seen an algorithm based on reweighting edges this complicated and ingenious.

First one can see that the optimal solution can always be a subset of edges in a cut of

G

. This is because if the optimal solution

R^*

contains any edge not in the cut, we can safely delete it from

R^*

. Thus the optimal solution is indeed a pair

(C^*,R^*\subset C^*)

. The authors call this problem the $b$ -free min-cut problem (

b

is the budget and we are allowed to pick edges for free in the “mincut” with total weight at most

b

So the goal is to find a FPTAS for the

b

-free mincut problem. The problem is hard since it contains knapsack as a special case. (Consider a graph with many parallel edges and only 2 vertices.) However, it is known that there is a FPTAS for knapsack. If we know part of the optimal solution, i.e.,

C^*

, we can use the FPTAS for knapsack to find the optimal

R^*

At this stage, if there is a hint suggesting reweighting the edges, I would guess that

C^*

is exactly (or close to) the min-cut of the re-weighted graph. Based on this idea I would also guess that, although the connectivity interdiction problem (

b

-free min-cut) is NP-hard,

C^*

can be computed in polynomial time. In other words, the intractable part is solving the knapsack in

C^*

. This statement seems reasonable, since this problem is know to be in P for unit costs and in that case the kanpsack is trivial. Let’s assume that my guess is correct and work on the reweighting part.

2.1 Reweighting

There is a chapter on reweighting in Sariel Har-Peled’s gemetric approximation book(not quite the same as the reweighting technique in [3]). Is reweighting a common technique for designing approximation algorithms?

One possible weight function is setting

w(e)=0

for all

e\in R^*

… However, this is cheating since we assume that

R^*

is the hard part. So now we need to find a weight function such that the following holds,

The min-cut of the re-weighted graph is close to $C^*$ .
Computing the weight function takes polynomial time.

From the “cheating” example we can see that knowing

R^*

does help but computing

R^*

is hard. So maybe we can find a slightly worse weight function which is a lot easier to compute. So it seems like we are making a trade-off between how close the global min-cut of the re-weighted graph is to

C^*

, and how much time is needed to compute this weight function. This paper indeed does a great job in finding such a balance. I sent an email to one of the authors to ask for the intuition behind the reweighting but did not get a real answer. They suggested reading [2]. Zenklusen did almost the same thought experiment as above. Instead of using reweighting, he indirectly enumerated

R^*

. Consider the unit cost case for example. If the optimal cut

C^*

is given, the interdicted edges

R^*

will be those

b

edges in

C^*

with heaviest weights. We cannot directly enumerate

R^*

since it still takes exponential time. What we can enumerate is a lowerbound of the weight of edges in

R^*

. Set all edges with weights exceeding this lowerbound to be in

R^*

and find global min-cut with additional budget constraint on these edges. Then we have only

m

lowerbound to enumerate and the budgeted min-cut can be computed fast. For general costs, he enumerated the set of

{1}/{\varepsilon}

edges with heaviest weights in

C^*

The plan is to figure out how did the authors come up with the weight function in [3] and if it is possible to find a better weight function.

The key part is the following new problem called normilized min-cut,

Problem5Normalized min-cut

Given a problem instance of connectivity interdiction, find a cut

C

and its subset

F\subset C

s.t.

0\leq c(F)\leq b

and

\frac{w(C\setminus F)}{b+1-c(F)}

is minimized.

~~I have been thinking for a while how this problem is involved but have no clue. However, it does work…~~ The weight function is defined based on an estimation of Problem 5. The authors claim that the optimal solution to b-free min-cut problem is a 2-approximate min-cut of the reweighted graph. Then they enumerate all 2-approximate min-cut of the reweighted graph and use the FPTAS alg for knapsack on each cut to find a

(1+\epsilon)

-approx solution.

2.2 More on normalized min-cut

If one slightly modifies lemmas in section 2 in [3], there are some interesting properties. Let

\tau

be the value of the optimal normalized min-cut (in [3]

\tau

is an estimation of the optimum) and define

\tilde{w}_\tau

accordingly. Then one can prove that the global min cut in

(G,\tilde{w}_\tau)

is exactly the optimal cut in the normalized min-cut of

(G,w)

(slightly modify lemma3 to see this). Also the value of min-cut in

(G,\tilde{w}_\tau)

\tau(b+1)

For unit cost, the optimum of normalized min-cut can be computed using the same complexity as connectivity interdiction (ignoring polylog factors) [4]. Consider the sequence

\left\{ \lambda_i=\frac{\min_{|F|\le i} w(C\setminus F)}{b+1-i} \right\}

. If this is unimodal,

O(\log b)

calls of connectivity interdiction algorithm should be sufficient. (Note that

b

is at most

m

since costs are unit.) However, one can easily see that this sequence is not unimodal. Thus I don’t quite believe this claim.

Comments on [3]

Add SageMath to pylance?

Yu Cong — Fri, 24 Jan 2025 00:00:00 UT

Tags: sage

I have been wondering how to write .sage files in vscode with proper lint and highlighting since half a year ago.

There is a question asked in 2023 on stackoverflow. https://stackoverflow.com/questions/76201511/add-sagemath-to-pylance

But what is the interpreter path for python used by sage?

It turns out that you can just type sage --python… and everything works

sage --python

A large part of sage is written in cython, so pylance still works bad on this. It would be nice if someone can write a language server for sage.

update on March 31st

The best way to install sage: https://doc.sagemath.org/html/en/installation/conda.html#sec-installation-conda

Works well with vscode.

Graphic matroid representation and cocircuit transversal

Yu Cong — Thu, 09 Jan 2025 00:00:00 UT

Tags: matroid

While reading https://arxiv.org/pdf/2407.09477 (for a reading group), I realized that I lack knowledge about matroid representation.
This is a very incomplete notes on matroid representation related problems.

List of materials I briefly read:

Update The latter half of this post is way off-topic. So I changed the title.

1 Graphic matroids are regular

Consider the vertex-edge incidence matrix. Randomly orient the edges.

A_{v,e}=+1

e

enters

v

-1

e

leaves

v

0

otherwise. Minimal linear dependent columns are exactly the cycles in the original graph. Take

+1

to be the multiplicative identity and

-1

its additive inverse over any field.

2 If $M$ is linear, so is $M^*$

https://fardila.com/Clase/Matroids/LectureNotes/lectures1-25.pdf page 21

In the dual matroid

M^*

, the groundset is the same as

M

and the sum of their ranks is

n

The idea is, consider a linear matroid as a

r

dimensional subspace

V

\mathbb{R}^n

. Let

B^*

be a basis of

M(V^\bot)

(

V^\bot

is the orthogonal complement of the column space of

V

) and

B

be a basis of

M(V)

B^*

spans

V^\bot

and the intersection of

V^\bot

and the subspace spanned by vectors (that is

V

) in

E-B^*

empty. The subspace spanned by

B^*

is exactly the orthogonal complement of

V

which is the subspace spanned by

B

An alternative proof is https://math.mit.edu/~goemans/18438F09/lec8.pdf Theorem 2.

This argument works over any field. Thus both graphic matroids and cographic matroids are regular.

3 Cographic matroids

Cycle rank is the minimum number of edges whose removal makes the graph cycle less. For any spanning forest

F

, all edges outside

F

must be removed since otherwise there will be a cycle. The size of spanning forests are the same, i.e.

n-c

, where

n

is the number of vertices and

c

is the number of components. Thus cycle rank is the rank of the dual matroid of the graphic matroid on

G

For graphic matroids on non-planar graphs, their dual may not be graphic. Thus in general we do not have a graph representation of cographic matroids. However, cographic matroids are still linear and cycle space gives its representation.

Edge space is a vector space over

\mathbb{F}_2

. Elements in the edge space of

G=(V,E)

are subsets of

E

. Addition of two elements is taking their symmetric difference. Bases in the edge space are spanning forests and the rank of edge space is

n-c

Cycle space is naturally defined as the vector space spanned by all cycles (together with

\emptyset

). What is the rank of the cycle space? The rank is exactly

m-(n-c)

since if we fix a spanning forest

F

(of size

n-c

) any edge outside

F

form a cycle and those cycles are linearly independent. The basis chosen in this way is called a fundamental cycle basis. Indeed, the cycle space can be spanned with all induced cycles.

Cut space contains all cuts of the graph(why is this a subspace?). One possible basis of the cut space is

\text{star}(v)

for any

n-c

vertices. Thus the rank of the cut space is

n-c

. The set of minimal cuts also span the cut space. One may observe the fact that the sum of ranks of cut space and cycle space is

m

. In fact, they are orthogonal complement of each other. For proofs see chapter 1.9 in [1].

One important fact we are assuming is that cycle space and cut space are subspaces. This is trivial for graphic matroids since the symmetric difference of two cuts is still a cut and the symmetric difference of two cycle is still a cycle of union of disjoint cycles. Is this still true for non-graphic matroids?

Unfortuantely, for general matroids the set of circuit (or cocircuits) is not closed under taking symmetric difference. This can be seen from circuit axioms. We only have

C\subset C_1 \cup C_2\setminus e

for any circuit

C_1, C_2

and

e\in C_1\cap C_2

. For example, consider two circuits

\left\{ 1,2,3 \right\}

and

\left\{ 2,3,4 \right\}

U_{2,4}

, the symmetric difference,

\left\{ 1,4 \right\}

, is independent.

Note that the example

U_{2,4}

is the excluded minor of binary matroids. So what about binary matroids? It is known that binary matroid is a self dual family of matroids, we need to show that the symmetric difference of two intersecting circuits is another circuit. This is theorem 9.1.2 in [2].

A similar problem is discussed on mathoverflow concerning a special basis (like

\text{star}(v)

) in the “cocircuit space”. I will further elaborate in the next section.

4 Cocircuit transversal

In the comment the OP mentioned a interesting fact, which is corollary 1 in [3]

Corollary1

Given a base

B

of some rank

r

matroid

M

. Let

\mathcal C^*=\left\{ C_e^* | e\in B \right\}

be the set of fundamental cocircuits associated to

B

. Every base of

M

is a transversal of

\mathcal C^*

An alternative way to understand this is that, take a base

B

of some matroid

M

and consider the transversal matroid on

\left\{ C_e^* | e\in B \right\}

, every base of

M

is independent in this transversal matroid. Take graphic matroids for an example. Any edge

e

in a spanning tree

T

defines a unique cut. Any spanning tree is a transversal of these cuts.

I tried to prove this corollary myself but failed. The following proof is from the paper. I think there should be a nice way to prove it directly (without the theorem below) through some “common transversal proof” techniques I am not familiar with.

To simplify the notations, some definitions in [3] are needed. For a base

B

, let

C_e^*

be the unique cocircuit in

e\cup E\setminus B

e

is in

B

. For

e\notin B

, let

C_e^*

\{e\}

. Dually, for a fixed base

B

and

e\notin B

, we can define

C_e

to be the unique circuit in

E

and

C_e=\{e\}

for

e\in B

. There is an interesting theorem in [3].

Theorem2

Given any matroid

M

with groundset

E

and rank

r

, consider any base

B

and any size

r

subset

F\subset E

F

is a transversal of

\{C_e^* | e\in B\}

if and only if

B

is a transversal of

\{C_e | e\in F\}

$C_e^*$ and $C_e$ are both considered under the base $B$ .

Proof

$B$ is a transversal of $\{C_e | e\in F\}$ . We can write $B$ as $\{b_e | b_e \in C_e ,\forall e\in F\}$ since $B$ is a system of distinct representatives. For $e\in B\cap F$ , $b_e=e$ since $C_e$ ’s are singletons for these $e$ . Thus if we can show that $e\in C_{b_e}^*$ for all $b_e\in B\setminus F$ then we finish this half. $e$ must be in $C_{b_e}^*$ since otherwise the intersection of $C_{b_e}^*$ and $C_e$ is $\{b_e\}$ and it is known that the intersection of any circuit and cocircuit in a matroid is never going to be a singleton.
$F$ is a transversal of $\{C_e^* | e\in B\}$ . $F$ is $\{f_e | f_e \in C_e^* ,\forall e\in B\}$ . And again we can ignore the intersection and prove $e\in C_{f_e}$ for $f_e\in F\setminus E$ . The proof is identical.

So what if

F

is another base? We can get the corollary if we show that the base

B

is a transversal of

\{C_e | e\in F\}

for any other base

F\not = B

. Finally, here is the proof of the corollary.

Proof

By contradiction. Suppose that there is a base

B

which is not a transversal of

\{C_e | e\in F\}

. Then by Hall’s theorem there exists a independent set

I\subset F

such that

|I|>|\bigcup_{e\in I} C_e\cap B|

. Note that

C_e

’s are fundamental circuits of

B

. Thus

\bigcup_{e\in I} C_e\cap B

is the largest independent set in the cycle

\cup_{e\in I} C_e

. A contradiction.

References

[1]

R. Diestel, Graph Theory, Springer Berlin Heidelberg, Berlin, Heidelberg, 2017 10.1007/978-3-662-53622-3.

[2]

J.G. Oxley, Matroid theory, 2nd ed, Oxford University Press, Oxford ; New York, 2011.

[3]

R.A. Brualdi, On fundamental transversal matroids, Proceedings of the American Mathematical Society. 45 (1974) 151–156 10.1090/S0002-9939-1974-0387087-4.

Matroid base packing and covering

Yu Cong — Sat, 04 Jan 2025 00:00:00 UT

Tags: matroid, optimization, combinatorics

There are few text books in combinatorial optimization discussing topics in matroid base packing, while matroid base covering(matroid partition problem) is everywhere. Packing and covering of trees in graphs can be found in chapter 51 in [1].

1 Base packing & base covering

Problem1

Given a matroid

M=(E,\mathop{\mathrm{\mathcal I}})

and its bases

\mathop{\mathrm{\mathcal B}}

, find

(minimum base covering) the min number of bases whose union is $E$ , and
(maximum base packing) the max number of pairwise disjoint bases.

These problems can be formulated with the following integer programs, base packing:

\begin{align*} \max& & \sum_{B\in\mathop{\mathrm{\mathcal B}}} x_B& & &\\ s.t.& & \sum_{B:e\in B} x_B &\leq 1 & &\forall e\in E\\ & & x_B&\in \left\{ 0,1 \right\} & &\forall \text{ base $B$} \end{align*}

base covering:

\begin{align*} \min& & \sum_{B\in\mathop{\mathrm{\mathcal B}}} x_B& & &\\ s.t.& & \sum_{B:e\in B} x_B &\geq 1 & &\forall e\in E\\ & & x_B&\in \left\{ 0,1 \right\} & &\forall \text{ base $B$} \end{align*}

Integer programs are hard and these IPs have exponential number of variables. We consider the linear relaxations.

Actually these two problems are not hard on general matroids. Both of them can be solved in polynomial number of independence oracle calls.

matroid base covering = matroid partitioning ≈ matroid union. Let $M=(E,\mathop{\mathrm{\mathcal I}})$ be the matroid. The minimum number of bases that cover the groundset is $\arg\min\limits_k r_{k}(E)=|E|$ , where $r_{k}(\cdot)$ is the rank function of $M^k$ .
matroid base packing ≈ matroid union. Maximum integral base packing number is $\arg\max\limits_k r_{k}(E)=kr(M)$ .

Thus the integral version of these two problem is polynomial solvable (in terms of the number of oracle calls) since matroid union is tractable. We will discuss computing the fractional version later.

2 Matroid strength and density

We will talk about matroid strength and density and their relation with base packing and covering in this section. I think none of the results is new. You can find some of them in [2] and [3].

The fractional base covering number for graphic matroids are called fractional arboricity. It is known that the fractional arboricity

\alpha(G)

equals to

\max\limits_{\emptyset \subsetneq X\subset E}\frac{|X|}{r(X)}

. Define the density for a matroid

M

\alpha(M)=\max\limits_{\emptyset \subsetneq X\subset E}\frac{|X|}{r(X)}

. The name “density” comes from [2]. I use symbol

\alpha

since density is a generalization of arboricity.

For the packing part consider the fractional version of Nash-Williams theorem,

Theorem2

The fractional spanning tree packing number of a connected graph

G=(V,E)

equals to

\max \frac{|E[\mathcal P]|}{|\mathcal P|-1}

, where the maximum is taken among all partitions

\mathcal P

V

The fraction in above theorem can be rewrite as

\frac{|E-F|}{r(E)-r(F)}

, which only uses elements in the groundset and the rank function and thus can be generalized to non-graphic matroids. The maximum of this fraction,

\sigma(M)=\max_{F\subset E}\frac{|E-F|}{r(E)-r(F)}

is called matroid strength.(The name also comes from [2].)

For the connections between density and strength, we have the following inequality,

\alpha(M)=\max \frac{|X|}{r(X)} \geq \frac{|E|}{r(E)} \geq \min \frac{|E-F|}{r(E)-r(F)} =\sigma(M).

Theorem3

Maximum fractional base packing number is

\sigma(M)

Proof

The proof is similar to the graph strength proof for tree packing in [1]. Let

B(M)

be the base polytope of

M

and

\Pi

be the powerset of

E

. Consider the following linear programs,

\begin{align*} LP1=\min& & lx& \\ s.t.& & x&\in B(M) \end{align*}

\begin{align*} LP2=\max& & \sum_{F\subsetneq E} y_{E\setminus F}(r(E)-r(F))& \\ s.t.& & \sum_{F\subsetneq E} y_{E\setminus F} \chi^{E\setminus F} & \leq l\\ & & y & \in \mathbb{R}^\Pi_+ \end{align*}

and the dual of

LP2

\begin{align*} LP3=\min& & lx& & &\\ s.t.& & x^T\chi^{E\setminus F} &\geq r(E)-r(F) & &\forall F\subsetneq E\\ & & x&\in \mathbb{R}^E_+ & & \end{align*}

We first prove that the polyhedron in

LP3

Q=\{ x | x\geq 0,x^T\chi^{E\setminus F} \geq r(E)-r(F) \quad \forall F\subsetneq E\}

is the base polytope of

M

. One can see that

B(M)\subseteq Q

. Now suppose

Q

is larger than

B(M)

, there must exists

x\in Q

such that

x(U)>r(U)

for some

U\subseteq E

. Thus

\mathop{\mathrm{OPT}}(LP3)>\mathop{\mathrm{OPT}}(LP1)

. However, for the optimal solution

x

LP1

and any feasible solution

y

LP2

we have

\mathop{\mathrm{OPT}}(LP1)\geq \sum_{F\subsetneq E} y_{E\setminus F}\chi^{E\setminus F}\cdot x = \sum_{F\subsetneq E} y_{E\setminus F} \left(\sum_{e\in E}x_e-\sum_{e\in F}x_e\right)\geq \sum_{F\subsetneq E} y_{E\setminus F} \left(r(E)-r(F)\right)=\mathop{\mathrm{OPT}}(LP3)

Hence

Q=B(M)

Recall that

\sigma(M)=\min_{F\subsetneq E}\frac{|E\setminus F|}{r(E)-r(F)}

. Note that

\sigma(M)\geq 1

\sigma(M)

can be interpreted as the largest

\lambda>1

such that

|E\setminus F| \geq \lambda(r(E)-r(F))

holds for all

F\subsetneq E

. Hence

\sigma(M)=\max \{\lambda | \mathbf 1\in \lambda B(M)\}

since

Q=B(M)

. For fixed

\lambda

\mathbf 1 \in \lambda B(M)

if and only if there exists

\lambda_b\geq 0

for all bases of

M

such that

\sum_b \lambda_b=\lambda

and

\sum_b \lambda_b \chi^b\leq 1

. Hence this shows

\sigma(M)

is exactly the base packing LP

\max\{\sum_b{\lambda_b}| \sum_{b}\lambda_b\chi^b\leq 1,\lambda_b\geq 0\;\forall b\in B\}

Note that this proof is a straightforward generalization of the tree packing theorem in [1], which is similar to the blocking polyhedra method described in [4].

2.1 Constructive proof

In [5] there is a constructive proof that recovers the optimal

F\subset E

\sigma(M)

from any optimal solution of hitting set LP(dual to base packing).

The idea is to show that any fraction solution

y

to base hitting set can be converted to another solution

y'

such that

y'(e)\in \left\{ 0,c \right\}

for some global constant

c

and

\sum_e w(e)y'(e)\leq \sum_e w(e)y(e)

Define two sets

P, P'\in \mathbb{R}^{|E|}

\begin{aligned} P &=\left\{ y\in \mathbb{R}^{|E|}: y(e)\geq 0 \;\forall e\in E; y(B)\geq 1 \; \forall \text{ base B} \right\},\\ P' &=\left\{ y\in P: \forall e\in E, \exists B^e\in \mathcal B \; s.t. \; e\in B^e \land y(B^e)=\min_{B\in \mathcal B} y(B) \right\}. \end{aligned}

P'

is contained in

P

and every element is in a minimum base with respect to weights

y:E\to \mathbb{R}

Proposition4

Let

y\in P

. There exists

y'\in P'

s.t.

y(e)\geq y'(e)

for all

e

Proof

The proof is contrustive. Let

B=\left\{ e_1,\ldots, e_r \right\}

be a minimum weight base with

y

. Assume that

y(e_1)\leq \ldots \leq y(e_r)

. For each element

e\notin B

, let

C_e

be the fundamental circuit in

B+e

. Then we define

y'

as follows.

y'(e)= \begin{cases} y(e) & e\in B\\ \min\limits_{e\in C_e} y(e) &e\notin B \end{cases}

One can easily verify that

y'(e)\leq y(e)

for all

e

and

B

is still the minimum weight base under weights

y'

. Now it remains to show that

y'\in P'

Every element is in a minimum base. For $e\in B$ this is automatically satisfied. We consider $e\notin B$ . Let $f\in C_e$ be the element in the fundamental circuit of $B+e$ with smallest weight $y(e)$ . $B^e=B+e-f$ is a base and we have $y'(B^e)=y'(B)$ .
For all base $B'$ , $y'(B')\geq 1$ holds since $y'(B')\geq y'(B) = y(B)\geq 1$ .

Proposition 4 shows that we can easily convert any solution to base hitting set (

y\in P

) to a more well-behaved solution (

y'\in P'

). Note that our final goal is to find some special optimal solution

y'\in \{0,c\}^E

. Thus we want to analyse the optimal

y'

further.

We are solving

\max_{y'} \left\{ \sum_e w(e)y'(e)| y'\in P' \right\}

. Notice that the minimum weight base under weight

y'

should always have weight 1. We want to prove that for any weight

w

there is an optimal

y'

with only one non-zero value. Thus we consider expressing the solution with values in

y'

. Suppose we have computed the optimal

y'

and let

\theta_1 < \ldots < \theta_h

be the set of distinct values in

y'

. Let

\mu_i

be the number of edges with

y'(e)=\theta_i

. One immediate observation is that the objective

\sum_e w(e)y'(e)

can be written as

\sum_i \theta_i \mu_i

. Let

v_i=r(\left\{ e: y'(e)\leq \theta_{i} \right\})-r(\left\{ e: y'(e)\leq \theta_{i-1} \right\})

be the rank increment when involving elements with

y'(e)=\theta_i

. Another immediate observation is that the weight of minimum base is

\sum_{e\in B} y'(e)=\sum_i v_i\theta_i=1

. Based on these observations we write the following LP for

\theta

\begin{aligned} \min& & &\sum_i \mu_i\theta_i \\ s.t.& & &\sum_i v_i\theta_i = 1\\ & & &0 \leq \theta_1\leq \theta_2\leq \ldots \leq \theta_h \end{aligned}

Suppose the optimal

y'

is given, we can compute the optimal

\theta

in the above LP and recover another solution

y''

to base hitting set. One can see that

y''

is still in

P'

. This LP has

h+1

linearly independent constraints and

h

variables. Thus only

h

of the constraints are tight. We have already known that

\sum_i v_i\theta_i = 1

must be tight. Then there is always an optimal solution

\theta

such that

0=\theta_1=\ldots=\theta_k < \theta_{k+1}=\ldots = \theta_h =c

. Let

F

be the set of elements with

y''(e)=0

. Note that the minimum weight base contains

r(F)

elements of

F

. Thus we known that

c=\frac{1}{r(E)-r(F)}

. The objective is

\sum_{e\in E} w(e)y''(e)=\sum_{e\in E-F}w(e)y''(e)=\frac{\sum_{e\in E-F} w(e)}{r(E)-r(F)}

Theorem5

Minimum fractional base covering is

\alpha(M)

The proof is similar to and easier than the previous one. The corresponding polyhedron in

LP3

becomes

\{x| x^T\chi^{F}\leq r(F)\; \forall F\subset E\}

which is exactly the independence polytope. Similarly, constructive proof for base covering exists [5].

Note that these two theorems can be generalized to weighted packing and covering of matroid bases.

2.2 Integral gap

It is known that the integral base packing number is

\left\lfloor \sigma(M) \right\rfloor

and the integral base covering number is

\left\lceil \alpha(M) \right\rceil

. Thus the integral gap for both base packing and covering are quite small.

In [3] there are stronger theorems describing the relations between integral packing/covering number and

\sigma

\alpha

Theorem6

Let

\varepsilon\in [0,1)

be the fractional part of

\sigma(M)

\alpha(M)

, then there exists a packing(covering) of size

\left\lfloor \sigma(M) \right\rfloor

(

\left\lceil \alpha(M) \right\rceil

), one of the independnet sets in the packing(covering) is of size at most

\varepsilon\cdot r(M)

2.3 Duality

Applying the rank function of matroid dual derives the following (Theorem 1 in [2]),

Theorem7

For matroid

M

without any loop or coloop,

\sigma(M^*)=\frac{\alpha(M)}{\alpha(M)-1}

Another relation worth noting is hitting set and set covering. The hitting set problem for matroid bases is known as computing the cogirth of the matroid. However, base covering is not a dual problem for cogirth. Sets in the corresponding hitting set problem of set covering is

S_e=\left\{ B|e\in B \right\}

for all

e\in E

3 Computing the strength and density

For graphic matroids, the strength and fractional arboricity are known to be computable in strongly polynomial time. See chapter 51.4 in [1] and this notes.

The idea is to consider the dual problem which has only

|E|

variables. If there is a separation oracle for testing whether a dual solution

x

is feasible, then ellipsoid method can be used for a polynomial time algorithm.

For spanning tree packing the dual is graph min-cut problem, which is easy for graphic matroids but NP-Hard for general matroids (to find the cogirth). The separation oracle = find minimum weight base. Chekuri and Quanrud [6] discovered near-linear time approximation scheme for some implicit fraction packing problems. For fractional base packing, their algorithm outputs a (1-ε)-approximation with

\tilde O(n/\epsilon^2)

independence oracle calls. For the capacitated version, the number of oracle calls becomes

\tilde O(rn/\epsilon^2)

. Almost at the same time, Jérôme Galtier published similar results for graphs [7] and for matroids [5].

For spanning tree covering the dual is finding a maximum edge set whose intersection with each spanning tree is at most 1. This problem can be thought as a set cover, in which the sets are

\left\{ T|e\in T \right\}

for each edge

e

. The separation oracle solves the following problem: given edge weight

x:E\to [0,1]

, is there a spanning tree with weight greater than 1? We can simply find a matroid base with the largest weight. Thus for general matroid we can find the fractional arboricity through ellipsoid method.

References

[1]

A. Schrijver, Combinatorial optimization Polyhedra and Efficiency, Springer, 2004.

[2]

P.A. Catlin, J.W. Grossman, A.M. Hobbs, H.-J. Lai, Fractional arboricity, strength, and principal partitions in graphs and matroids, Discrete Applied Mathematics. 40 (1992) 285–302 10.1016/0166-218X(92)90002-R.

[3]

G. Fan, H. Jiang, P. Li, D.B. West, D. Yang, X. Zhu, Extensions of matroid covering and packing, European Journal of Combinatorics. 76 (2019) 117–122 10.1016/j.ejc.2018.09.010.

[4]

A. Schrijver, Polyhedral proof methods in combinatorial optimization, Discrete Applied Mathematics. 14 (1986) 111–133 10.1016/0166-218X(86)90056-9.

[5]

J. Galtier, Fast approximation of matroid packing and covering, Annals of Operations Research. 271 (2018) 575–598 10.1007/s10479-018-2756-8.

[6]

C. Chekuri, K. Quanrud, Near-Linear Time Approximation Schemes for some Implicit Fractional Packing Problems, in: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2017: pp. 801–820 10.1137/1.9781611974782.51.

[7]

J. Galtier, Computing weighted strength and applications to partitioning, SIAM Journal on Discrete Mathematics. 32 (2018) 2747–2782 10.1137/15M102914X.

运筹学会数学规划分会第十届研究生论坛

Yu Cong — Sat, 09 Nov 2024 00:00:00 UT

Tags: travel

1 南充站 -> 开会的地方

南充看起来并不发达，是个很小的城市. 从车站到看起来是市中心的地方只需要步行 20 分钟.

开会的地方在北湖宾馆，旁边就是北湖公园.

开会之前的晚上也有免费晚饭，爽诶.

虽然味道一般，但是有免费的饭吃就很好. 开一天的会但是有五顿饭 + 两次茶歇可以免费吃！而且东湖宾馆对面是一个小商业区，有麦当劳可以吃.

2 会

五位老师讲 30min 报告，四人做连续优化，一人做人工智能. 学生报告 30 人，分两组同时进行，每人只有 15min, 分成两组在两个会议厅进行，所以只能听到一半人的 talk. 我听的 15 个人里只有一人讲偏组合优化的东西，其他人都是讲连续优化. 每场报告平均要听到 10 次「Lipschitz」. 所以一天下来大概听到了 100 多次 Lipschitz continuity…

会议在安排上有点问题. 各组学生报告之间没有任何时间间隔，而且大家并不能在 15 分钟之内轻松的讲完自己的工作. 所以尽管提前读了所有 abstract 然后选好了感兴趣的 talk, 从一个会场赶到另外一个之后会听不到开头. 大家花了太少的时间介绍背景和问题，花太多的时间介绍技术，因此如果没有非常仔细的听开头，即使看过 abstract 我也完全无法听懂他在讲什么(也有我不懂连续优化的原因). 另外我可以很确定的说，在数学系做优化的学生明显不如在计算机系做优化的学生会讲东西，有人甚至给 15 分钟的 talk 做了 62 页 slides, 而且并不是 \pause 之类的产生的动画而是写满定理和证明的 62 页. 最终他也没有翻完 62 页 slides… 我觉得问题可能在于，连续优化方面的问题和技术相比组合更复杂更难有直观的解释，我听的 15 个同学中并没有人能简洁的说自己做的东西的 intuition. 因此只有同方向的人才能听懂一些 talk. 不过五个教授中有三人讲的很好，即使只有本科最优化导论的基础(没错就是我)也能大概听懂.

给我的感觉是，连续方面的优化有更多的数学分析工具，因此可以研究更复杂的问题，大家都在做非凸、非连续的东西；大家不太关心复杂度，而更关注收敛率；使用的都是数值算法而不是组合算法，所以即使是 equivalent reformulation 数值结果上也可能会差距很大；一个技术往往可以吃很久很久(也许吧…). 感觉 tcs 和连续数学规划最大的共同点是有些方向发的期刊一样…

一个有意思的(并且我觉得很反直觉)地方是，貌似数学系做的连续数学规划比理论计算机这里做的组合优化在工业上有更广的应用. 上数学课的时候大家都会说工业里的东西往往都不是连续的，离散的东西更难处理，然而真正做离散问题的技术反而不如连续规划在工业上常用. 开会的时候有个例子. 香港理工大学的张在坤老师讲自己十二年前在中科院的博士论文中未发表的高维无导数优化方法. 是个很简单的算法，有点类似 Seidel 的 LP algorithm, 选一个低维子空间，在子空间里近似的算导数做优化，然后转化到一个更低维度的子问题，然后递归的在子问题上跑这个算法. framework 看起来和那些 multidimensional search/ LP-type 的算法一模一样，只不过选子空间有些技巧. 这样的算法难以想象在工业界有如此大的应用: 会上讲 Intel, Airbus, Scipy 1.14 和国内 EDA 方面的一些公司都要用高效的高维无导数优化方法. 然而同样用这些框架的组合优化方法大概只在理论上有意义了，为了更好看的复杂度用了很多对 implementation 非常不友好的技术. 另外一点是，听到的所有学生的报告都会写代码做实验，反而 tcs 这里不写(重要原因是因为很多东西写不出来…) 所以可能 tcs 更应该被丢到数学系而做数学规划的人应该去计算机系. (当然作为做 tcs 这边的人我很不希望这样)

另外这个论坛会在做 presentation 的 30 个研究生中选出 9 人发优秀报告奖. 可以明显看出哪些人是为了拿奖来的，哪些人是为了旅游来的 :) 吃晚饭的时候跟同桌的学生聊天，发现国内 MP 的 community 中的联系貌似比 tcs 这里要强的多. 这体现在几乎每个做 MP 的研究生都认识来参会的机会所有老师，而且他们的老师也都是师兄弟关系；在 TCS 这边至少除去东部几个强组之外我觉得不是这样.

来听这个会确实改变了我对 OR/MP 研究的认识，我原本以为这边会做超级应用的数学，像是计算经济 / 工业规划问题之类的，结果发现大多数人还是在做理论的，而且做的东西看起来都很深.

3 返回

南充天气很好，不像成都几乎每天都是雾霾 / 阴天.

Minimizing Sum of PWL Convex Functions

Yu Cong — Fri, 20 Sep 2024 00:00:00 UT

Tags: alg, optimization, LP

This is my note on low-dimension linear programming & color refinement algorithms.

1 the problem & failed attempts

In September I’m working on the following small problem,

Problem1Minimizing the Sum of Piecewise Linear Convex Functions

Given

n

piecewise linear convex functions

f_1,...,f_n:\mathbb{R}\to \mathbb{R}

of total

m

breakpoints, and

n

linear functions

a_i\cdot x-b_i:\mathbb{R}^d\to \mathbb{R}

, find

\min_x \sum_i f_i(a_i\cdot x-b_i)

which is highly related to algorithms for linear programming in low dimensions.

This can be solve in

O(2^{2^d}(m+n))

through Megiddo’s algorithm for multidimensional search problem. (see my slides)

I want to show that for general piecewise linear convex functions in

\mathbb{R}^d

, my problem can be formulated as a LP-type problem with low combinatorial dimension.

A failed attempt is trying to write

F=\sum_i f_i

. However, there may be too many breakpoints on

F

. (see a previous post). Another possible way is using some dimension reduction techniques [1].

I do not have high expectation on this method, since I need to show that color refinement indeed reduce the dimension from $n+d$ to $d$ and that the reduction can be done in linear time.

Given a initial coloring of vertices in a directed graph

G=(V,A)

, we want to compute the coarsest regular congruent coloring.

Colorings can be considered as equivalence relations on the vertices. An equivalence relation

R

V

is congruent if for all

u,v,w\in V

, [

(u,v)\in R

and

(v,w)\in A

] implies that [

\exists v'\in V

such that

(v,v')\in A

and

(v',w)\in R

]. Note that this coincides with the general definition of congruence relation in algebraic structures.(We can copy each vertex #outdegree times to make

A

an unary operation.)

A coloring is regular if for any two vertices

u,v

, the number of successors in each color are the same. Consider two colorings

C_1,C_2

C_1

is a refinement of

C_2

(or

C_2

is coarser than

C_1

) if for any two vertices having the same color in

C_1

, they have the same color in

C_2

Basic Lemma 1 in [2] shows that the problem above is equivalent to the description in this wiki page.

The first quasilinear time algorithm for the color refinement problem on graphs is given in [2], and later in [3]. It is shown in [4] that

O((m+n)\log n)

is the best possible running time. It is also shown in [5] that for any number of vertices there exists a graph which requires at least

n-2

iterations to reach a stable coloring(note that the upperbound is

n-1

). See this for a nice survey on applications.

2.1 connections

Now we add edge weights on the directed graph. Suppose all arcs have weight 1. One can see that a congruent and regular coloring requires that two vertices have the same color iff for each color the total weight of arcs going to vertices in that color are the same. Slightly generalize this configuraton, we can consider arbitrary arc weights.

Now color refinement on matrices are almost the same as doing color refinement on the incidence matrix of a weighted digraph. However, not every matrix is square. For any matrix

A\in \mathbb{R}^{v\times w}

, we consider it as a bitartite graph

G=(V\sqcup W,A)

, where

|V|=v

and

|W|=w

. Then

A_{ij}=w(i, j)

(i, j)

is an arc in

G

and 0 otherwise.

This part is not intuitive and complicated. I think the ESA 14 paper [1] is very concise (compared to the arxiv version), however still takes four and a half pages to explain this part. So I will just briefly explain the idea.

This connection is based on an important theorem, stating that the color refinement of a matrix

A

has strong relation with the fractional automorphism of

A

. To do the reduction, we first compute a color refinement of

A

. Based on the partitions of columns and rows of

A

in the color refinement(partition matrices), we can conpute the factor matrix of

A

, denoted by

[A]

, which is small than

A

. Then finally, the authors proved a reduction lemma, which shows that the optimal solution to factor matrix of the entire LP(I don’t know the exact name, just the matrix one uses in the simplex method) is a linear mapping of the optimal solution of the original LP.

2.2 is it useful?

The reduction looks clever and has a wide application. However, as far as I know, it does nothing on my problem. I don’t think the matrix in my problem is special enough to allow color refinement algorithms run in linear time on it. Also color refinement does not necessarily partition all

f_i

columns in one part. ¯\_(⊙︿⊙)_/¯

2.3 reflection

Multidimensional search is harder than LP-type problems, this question is no exception. Now there are three kinds of problems I am interested in.

Minimax parametric optimization
Multidimensional search problems
LP-type problems

What’s the connections among them?…

Definition2Minimax parametric optimization problem

Given a combinatorial maximization problem with a parameter. Find the parameter value minimizing the weight of a solution to the combinatorial maximization problem.

Definition3Multidimensional search problem

Given a set of hyperplanes

\mathcal H

\mathbb{R}^d

, and an oracle which answers the relative position of one hyperplane and a unknown fixed point

x^*\in \mathbb{R}^d

. Compute the relative position of every hyperplane in

\mathcal H

and

x^*

with as small number of oracle calls as possible.

Definition4LP-type problem

Given a set

S

and a function

f

from

S

to a totally ordered set.

f

has to satisfy two properties,

monotonicity: $\forall A\subseteq B\subseteq S, f(A)\leq f(B)\leq f(S)$ ,
locality: $\forall A\subset B\subset S$ , consider any element $x\in S$ , if $f(A)=f(B)=f(A+x)$ , then $f(A)=f(B+x)$ .

Now there are some important concrete problem in the intersections.

2.3.1 Euclidean one-centre problem

Problem5Euclidean one-centre problem

Given

n

points

V=\{v_1,\dots, v_n\}

\mathbb{R}^d

, with weights

w_1,\dots,w_n

, find a point in

\mathbb{R}^d

which has the minimal of the maximum weighted distance to all points in

V

, that is, compute

\min_x \max_{i} w_i^2(v_i-x)^2

Dyer showed that this problem can be considered as a multidimensional search problem (in

\mathbb{R}^{d+1}

) in [6]. The conversion is not easy and not intuitive. Das et al. claimed that this problem is a LP-type problem with some additional constraints in [7]. It is still unknown whether it is possible to formulate the weighted Euclidean one-centre problem in

\mathbb{R}^d

as a LP-type problem with combinatorial dimension

O(d)

(which is quite surprising…)

2.3.2 Minimizing the sum of some pwl convex functions(my problem)

see above for the definition.

Clearly this is a minimax parametric optimization problem. Zemel also showed this is a multidimensional search problem with dimension

d

. We want to know that if this is a LP-type problem with combinatorial dimension

O(d)

…

It seems that recognizing LP-type is hard. A lecture on LP-type problems by David Eppstein.

Here are some materials for Low dimension LP:

chapter 20 of Geometric Approximation Algorithms by Sariel Har-Peled. https://sarielhp.org/book/chapters/lp.pdf
the fastest deterministic algorithm for this problem [8]
my slides on Seidel’s $O(d!n)$ algorithm

References

[1]

M. Grohe, K. Kersting, M. Mladenov, E. Selman, Dimension Reduction via Colour Refinement, in: Algorithms - ESA 2014, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014: pp. 505–516.

[2]

A. Cardon, M. Crochemore, Partitioning a graph in O(AlogV), Theoretical Computer Science. 19 (1982) 85–98 10.1016/0304-3975(82)90016-0.

[3]

R. Paige, R.E. Tarjan, Three Partition Refinement Algorithms, SIAM Journal on Computing. 16 (1987) 973–989 10.1137/0216062.

[4]

C. Berkholz, P. Bonsma, M. Grohe, Tight lower and upper bounds for the complexity of canonical colour refinement, Theor. Comp. Sys. 60 (2017) 581–614 10.1007/s00224-016-9686-0.

[5]

S. Kiefer, B.D. McKay, The Iteration Number of Colour Refinement, in: 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2020: pp. 73:1–73:19 10.4230/LIPIcs.ICALP.2020.73.

[6]

M.E. Dyer, On a Multidimensional Search Technique and Its Application to the Euclidean One-Centre Problem, SIAM Journal on Computing. 15 (1986) 725–738 10.1137/0215052.

[7]

S. Das, A. Nandy, S. Sarvottamananda, Linear time algorithms for Euclidean 1-center in $\R^d$ with non-linear convex constraints, Discrete Applied Mathematics. 280 (2020) 71–85 10.1016/j.dam.2019.09.009.

[8]

T.M. Chan, Improved deterministic algorithms for linear programming in low dimensions, ACM Trans. Algorithms. 14 (2018) 10.1145/3155312.

Piecewise Linear Convex Functions

Yu Cong — Mon, 16 Sep 2024 00:00:00 UT

Tags: alg, optimization

This post is a note on epigraphs, infimal convolution, Minkowski sum & convex conjugate of piecewise linear convex functions in

\mathbb{R}^d

. I want to provide proofs for relations between these operations and counterexamples for wrong guesses.

Notions:

infimal convolution: $\square$ ,
Minkowski sum: $\oplus$ ,
convex conjugate: $f^*$ ,
epigraph: $\mathop{\mathrm{epi}}f$ .

some notes:

1 piecewise linear function $f:\mathbb{R}^d\to \mathbb{R}$

A intuitive but very complex definition is the following[1],

Definition1

Let

\mathcal P

be a set of bounded convex polytopes in

\mathbb{R}^d

. A piecewise linear function

f

can be defined as

f(x)=c_P^T x+d_P, \text{ for } x\in P,

where

(c_P,d_P)\in \mathbb{R}^{d+1}

is a vector associated with polytope

P

For continuity, we require

c_P^T x+d_P=c_Q^T x+d_Q, \; \forall x\in P\cap Q, \forall P,Q\in \mathcal{P}

For convexity, we further require that

for any subset $\mathcal{P'}\subset \mathcal P$ , $\cup_{P\in \mathcal P'}$ is convex;(for polytopes)
the restriction of $f$ to any two polytopes from P that share a facet is convex.[2] (for $f$ . In fact, 1 is already included in 2.)

(Later I find a much simpler definition in [3] example 3.5.)

Definition2piecewise linear function in

\mathbb{R}^d

f(x)=\max \{a_1^Tx+b_1,\ldots,a_L^Tx+b_L\}

It is shown in exercise 3.29 that every piecewise linear convex function can be expressed in this form.

Proof

We have

f(\alpha x+(1-\alpha)y)\leq \alpha f(x)+(1-\alpha)f(y)

for any

\alpha\in [0,1]

\alpha x+(1-\alpha)y

can locate in the same

X_i

y

since we can arbitrarily choose

\alpha

. Thus

f(x)\geq \frac{f(\alpha x+(1-\alpha)y)-(1-\alpha)f(y)}{\alpha}

. Note that

f

is linear in each

X_i

. Thus

f(x)\geq \frac{a_i^T(\alpha x+(1-\alpha)y)+b_i-(1-\alpha)(a_i^Ty+b_i)}{\alpha}=a_i^Tx+b_i

. Thus

f(x)=\max_i a_i^Tx+b_i

Now given these definitions we are particularly interested in the minimum number of polytopes which define the piecewise liner convex function in high dimension.

2 properties

pwl = piecewise linear. Let

L

be the number of hyperplanes in the definition of

f

2.1 convex conjugate

Observation3

Let

f^*

be the convex conjugate of a pwl convex function

f

f^*

is also pwl convex if restricted to

\mathop{\mathrm{dom}}f^*

Consider the convex conjugate from a geometric view. The epigraph of our pwl convex function

f

is some convex polytope in

\mathbb{R}^d\times \overline{\mathbb{R}}

. The convex conjugate is

f^*(z)=\sup_x\{z^Tx-f(x)\}

z^Tx

is a hyperplane with normal vector

z

and passing through the origin. Now

\sup_x\{z^Tx-f(x)\}

is the amount of space hyperplane

z^Tx

has to shift along the

d+1

dimension to make itself a supporting hyperplane of

\mathop{\mathrm{epi}}f

. Note that the tangent points are exactly vertices of

\mathop{\mathrm{epi}}f

Proof

It is safe to write

f^*(z)=\max_x\{z^Tx-f(x)\}

since we only consider the extended domain. Thus we have

f^*(z)=\max_x\{z^Tx-\max_i\{a_i^Tx+b_i\}\}=\max_x\{\max_i\{z^Tx-a_i^Tx+b_i\}\}

. Let

n

be the number of vertices on

\mathop{\mathrm{conv}}(\mathop{\mathrm{epi}}f)

. One can see that

f^*(z)

is the maximum of

O(nL)

affine functions.

I believe there will be only

O(n)

hyperplanes on

f^*

instead of

O(nL)

… However, we know that in general

f^*

is the maximum of at least

O(n)

functions since every vertex corresponds to a hyperplane in

\mathop{\mathrm{epi}}f^*

2.2 pwl convex function in $\mathbb{R}$ $\circ$ a linear mapping

Problem4

Let

f:\mathbb{R}^d\to\mathbb{R}

be a pwl convex function. Does there always exist a pwl convex

g:\mathbb{R}\to \mathbb{R}

and a linear mapping

a^Tx-b:\mathbb{R}^d\to \mathbb{R}

such that

f(x)=g(a^Tx-b)

As you expected, the answer is no. Let

f:\mathbb{R}^2\to \mathbb{R}

be the maximum of a set of 2D planes. Consider a series of points

\left\{ p_1,p_2,...,p_k \right\}

on the 2D plane. After applying the linear mapping to

P=\left\{ p_1,p_2,...,p_k \right\}

, we will get a sequence of numbers(points in 1D)

P'=\left\{ p_1',p_2',...,p_k' \right\}

. We assume that

P'

is non-decreasing. Note that the value of

g

P'

is always unimodal since

g

is convex. However, the value of

f

P

may not be unimodal. Thus the composition of a linear mapping and a pwl convex function in 1D is not equivalent to pwl convex functions in high dimensions.

3 sum of pwl convex functions

I want to show that in general the number of hyperplanes in the sum of pwl convex functions can be large.

It is known that for two pwl convex functions

f,g

f^*+g^*=(f\square g)^*

. It is also known that

\mathop{\mathrm{epi}}f \oplus \mathop{\mathrm{epi}}g=\mathop{\mathrm{epi}}f\square g

(with some requirements on

f

and

g

). There is a theorem in [4] section 4.3 which shows that the number of faces of the Minkowski sum of two polytopes can be huge. The bound can be reached by sums of cyclic polytopes.

We can define pwl convex functions based on cyclic polytopes and we know that the Minkowski sum will have lots of faces(of different dimensions). We also know that the number of faces in

f^*

is at least

n

where

n

is the number of vertices(faces of 1D) in

\mathop{\mathrm{epi}}f

. Now if the infimal convolution of two pwl convex functions also has many faces, the number of faces in the sum of pwl convex functions will be large.

Problem5

Let

f_1,f_2

be two pwl convex functions in

\mathbb{R}^d

. Let

n_1,n_2

be the number of hyperplanes in

f_1,f_2

respectively. What is the minimum number of hyperplanes in

f_1 \square f_2

References

[1]

A. Toriello, J.P. Vielma, Fitting piecewise linear continuous functions, European Journal of Operational Research. 219 (2012) 86–95 10.1016/j.ejor.2011.12.030.

[2]

J.M. Tarela, E. Alonso, M.V. Martı́nez, A representation method for PWL functions oriented to parallel processing, Mathematical and Computer Modelling. 13 (1990) 75–83 10.1016/0895-7177(90)90090-a.

[3]

S.P. Boyd, L. Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, UK ; New York, 2004.

[4]

T. Mountford, T. Liebling, K. Fukuda, P. Gritzmann, G. Ziegler, Minkowski Sums of Polytopes: Combinatorics and Computation, PhD thesis, n.d.

Tropical bases and matroid circuits

Yu Cong — Thu, 11 Jul 2024 00:00:00 UT

Tags: matroid

http://matroidunion.org/?p=5403

In the post the author describes an interesting problem,

Problem1

Given a matroid

M=(E,\mathcal{I})

, find a minimal set of circuits that defines the matroid

The way he consider this problem is not by looking at the circuits but the flats. Any circuit excludes some sets from being flats. If we are given a circuit

c

, any set

A

that contains

|c|-1

elements of

c

can not be a flat of

M

. So the idea is to find a minimal set of circuits that excludes all non-flat sets of

M

1 combinatorial definition

A circuit

c

excludes a set

A

if exactly one element of

c

is not in

A

. A collection of circuits

\mathcal{C}'\subseteq \mathcal{C}(M)

is a tropical basis of

M

if for every non-flat set

A

there is a circuit

c\in \mathcal{C}'

that excludes

A

. The problem is then to find a minimal tropical basis of

M

2 algebraic geometry view

Tropical basis originally comes from algebraic geometry, see tropical geometry. The min tropical semiring is the semiring

(\mathbb{R}\cup \{+\infty\},\oplus,\otimes)

x\oplus y = \min\{x,y\}

and

x\otimes y = x+y

. The identity element for

\oplus

+\infty

and for

\otimes

0

. The tropical variety of a tropical polynomial is the set of points where the polynomial achieves its minimum value at least twice.

Tropical variety of a linear tropical polynomial is called tropical hyperplane.

For any set

A\subseteq E

there is a natural representation of

A

as a vector in

\{0,+\infty\}^{|E|}

, where the

i

-th coordinate is

0

i\in A

and

+\infty

otherwise.(This is very similar to the indicator vector of a set in

\{0,1\}^n

. If the

i

-th element exists in the given set we put the identity element of

\otimes

and otherwise the identity element of

\oplus

.) Then we can define a linear tropical polynomial associated with a circuit

c

just like the dot product of two vectors in

\mathbb{R}^n

. For example consider a circuit

c=\{1,2,3\}

U_{2,4}

f_{c}(x_1,x_2,x_3,x_4)=(0 \otimes x_1)\oplus(0 \otimes x_2)\oplus(0 \otimes x_3)\oplus(+\infty\otimes x_4)

Denote the tropical hyperplane of a circuit

c

T(c)

T(c)

is the space of all vectors

v

where

f_c(v)

achieves its minimum at least twice.

T(\mathcal C)=\bigcap_{c\in \mathcal C}T(c)

is the set of

v

excluded by all circuits in

\mathcal C

. A set

\mathcal{C}'\subseteq \mathcal C(M)

is a tropical basis for matroid

M

T(\mathcal C')=T(\mathcal C)

3 connections between the two definitions

Combinatorial definition.

\mathcal C'\subseteq \mathcal C

is a tropical basis if for every non-flat set

A

there is a circuit

c\in \mathcal C'

that excludes

A

Algebraic definition.

\mathcal C'\subseteq \mathcal C

is a tropical basis if

T(\mathcal C')=T(\mathcal C)

Lemma For any $\mathcal C'\subseteq \mathcal C$ , $T(\mathcal C')= T(\mathcal C)$ if and only if $T(\mathcal C')\cap \{0,1\}^n= T(\mathcal C)\cap \{0,1\}^n$

This lemma shows that we can only consider the indicator vectors when dealing with the algebraic definition.

Note that

\mathcal{C}

excludes all non-flat sets of

M

. Thus our combinatorial definition is equivalent to

\mathcal C'

excluding the same sets as

\mathcal C

. Let

\overline{T}(c)

be the collection of sets which are not excluded by

C

. Then

\mathcal C'

is a tropical basis if and only if

\overline{T}(\mathcal C')=\overline{T}(\mathcal C)

. Now we consider the algebraic definition. One can see that for any non-loop circuit

c

and set

A

A\notin T(c)

(in other words,

f_c(A)

achieves its minimum only once), if and only if

c

excludes

A

. Thus

T(\mathcal C)

is the collection of all sets that are not excluded by any of the circuits in

\mathcal C

. Now it is easy to see that these two definitions are equivalent.

current status It seems that Bergman fan of a matroid is related to this topic. https://arxiv.org/abs/math/0411260 This seems to be a bridge between matroid theory and algebraic geometry. June Huh’s work is also related to this topic. https://arxiv.org/abs/1104.2519

Talldoor

pangu.hs

1 Python → Haskell

Linear time linebreaker?

1 line-breaking

2 SMAWK algorithm

3 character-level operations

Matroid girth

1 Regular matroid

2 Proper minor-closed class of binary matroids

3 Perturbed graphic matroids

3.1 Reductions

3.2 cogirth → even cuts

3.3 girth → parity cycle + parity join

4 More on perturbed graphic matroids

References

Matroid circuit packing and covering

1 Packing/Covering Defect

2 Complexity

3 Cycle Double Cover

3.1 Matroids with the circuit cover property

3.2 Matroids satisfying νk,w=τk,w\nu_{k,w}=\tau_{k,w}

Understanding Lasserre Hierarchy

1 K⊂[0,1]n→K∩{0,1}nK\subset [0,1]^n \to K\cap \{0,1\}^n

2 Probability Perspective

2.1 Feasible Probability

2.2 Projection in KK

3 Properties

3.1 Convex Hull and Conditional Probability

3.2 Decomposition Theorem

4 Moment Relaxation

5 Applications

5.1 Sparsest Cut

5.2 Matching

6 Questions

6.1 Replace Mtℓ(y)≽0M_t^\ell(y)\succeq 0 with Las⁡tproj(y)∈K\mathop{\mathrm{Las}}_t^{proj}(y)\in K

6.2 Separation Oracle for Implicit KK

References

在 KaTeX 中使用 Fira Math

1 KaTeX 字体部分如何工作?

2 Plan

3 Extract fonts

3.1 Italic

4 接下来…

4.1 Typst svg

5 MathJax 4.0 is out

PDF 阅读器

Zed theme for VSCode

"Gotchas" in Combinatorial Optimization

1 Polytope for s-t cut

2 Size of support

3 Cocircuit space of binary matroids

References

Proving integrality gap for LPs

1 Counting

1.1 tree packing theorem

1.2 kk-cut

2 Rounding

3 Intermediate problem

3.1 minimum kk-edge-connected spanning subgraph

4 Notes

References

Detecting base orderable matroids is NP-hard

LP with box constraints

References

Vertex and Edge Connectivity Interdiction

1 Checking kk-vertex connectivity

1.1 Minimum cut for edge connectivity

1.2 Minimum cut for vertex connectivity

2 (Edge) Connectivity interdiction

2.1 Reweighting

2.2 More on normalized min-cut

Add SageMath to pylance?

Graphic matroid representation and cocircuit transversal

1 Graphic matroids are regular

2 If MM is linear, so is M*M^*

3 Cographic matroids

4 Cocircuit transversal

References

Matroid base packing and covering

3.2 Matroids satisfying $\nu_{k,w}=\tau_{k,w}$

1 $K\subset [0,1]^n \to K\cap \{0,1\}^n$

2.2 Projection in $K$

6.1 Replace $M_t^\ell(y)\succeq 0$ with $\mathop{\mathrm{Las}}_t^{proj}(y)\in K$

6.2 Separation Oracle for Implicit $K$

1.2 $k$ -cut

3.1 minimum $k$ -edge-connected spanning subgraph

1 Checking $k$ -vertex connectivity

2 If $M$ is linear, so is $M^*$

2.1.1 color refinement on graphs $\to$ on matrices

2.1.2 color refinement on matrices $\to$ dimension reduction of LPs

1 piecewise linear function $f:\mathbb{R}^d\to \mathbb{R}$

2.2 pwl convex function in $\mathbb{R}$ $\circ$ a linear mapping