Tags: optimization, LP
- https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf
- https://web.stanford.edu/class/cs369h/
- Laurent’s survey [1]
- https://rameesh-paul.github.io/sos.pdf
- https://www.ams.jhu.edu/~abasu9/papers/main-Lasserre.pdf
- Chapter 3 of Ali Kemal Sinop’s PhD thesis
1 \(K\subset [0,1]^n \to K\cap \set{0,1}^n\)
We want to solve a 0-1 integer program. Since this task is NP-hard in general, we usually consider its linear relaxation. Different LP formulations have different integrality gaps. For example, consider the following linear relaxation of the max matching IP in non-bipartite graph. \[\begin{equation*} \begin{aligned} \sum_{e\in \delta(v)} x(e)&\leq 1 & &\forall v\in V\\ x(e)&\in [0,1] & &\forall e\in E \end{aligned} \end{equation*}\]:(, this polytope is not integral.
Edmonds proved that the following formulation is integral.
\[\begin{equation*}
\begin{aligned}
\sum_{e\in \delta(v)} x(e)&\leq 1 & &\forall v\in V\\
x(e)&\in [0,1] & &\forall e\in E\\
\sum_{e\in E[U]} x(e) &\leq (|U|-1)/2 & &\forall U\subset
V, |U| \text{ odd}
\end{aligned}
\end{equation*}\]
Schrijver [2]
showed that those odd constraints can be obtained by adding cutting
planes to the previous polytope. Fortunately for matching polytope we
have a polynomial time separation oracle. However, for harder problems
adding cutting planes may make the program NP-hard to solve. Lasserre
hierarchy is a method to strengthen the polytope to approaching the
integer hull while providing provable good properties and keeping the
program polynomial time solvable (if applied constant number of times).
2 Probability Perspective
There is a good interpretation of the linear relaxation of 0-1 integer programs. Let \(K=\set{x\in \R^n| Ax\geq b}\subset [0,1]^n\) be the polytope of the linear relaxation. The goal of solving the integer program is to describe all possible discrete distribution over \(K\cap \set{0,1}^n\). Note that for a fixed distribution the expected position \((\sum_p \mathop{\mathrm{Pr}}[X_p(1)=1] x_p(1),...,\sum_p \mathop{\mathrm{Pr}}[X_p(n)=1] x_p(n))^T\) is in \(\mathop{\mathrm{conv}}(K\cup \set{0,1}^n)\) and iterating over all possible distribution gives us the integer-hull. Hence we can find the integral optimal solution if having access to all distribution over integer points. For any \(x\in K\), \(x_i\) can be seen as the probability of \(x_i=1\). We only care about the discrete distribution on feasible integral points. However, each \(x\in K\) only describes some marginal probabilities and this this marginal probability may not be even feasible. Consider the following 2D example. Any point in \(\text{green area}\setminus \text{orange area}\) is not a marginal distribution of any possible joint distribution over \((0,0),(1,0)\) and \((1,1)\). The idea is to iteratively prune this area.
2.1 Feasible Probability
For now let’s forget about \(K\) and consider \(y\in [0,1]^{2^{[n]}}\) and a discrete distribution \(D\) on \(\set{0,1}^n\). We want to make \(y_I=\mathop{\mathrm{Pr}}_{D}[\bigwedge_{i\in I}(X_i=1)]\). In fact, there is a one to one correspondence between \(y\) and \(D\). If \(D\) is given, computing \(y_I\) is easy for any \(I\subseteq [n]\). If \(y\) is given, recovering the distribution \(D\) is the same as solving a system of \(2^n\) linear equations with \(2^n\) variables (\(2^n-1\) of the the equations come form \(y_I\), and the remaining one is \(\sum_p D(p)=1\).) Thus, with a slight abuse of notation, we will refer to \(y\) as a distribution. We work with the 2D example first. Let \(x=(x_1,x_2)^T\in K\) be a marginal distribution. One can see that \(y=(1,x_1,x_2,\mathop{\mathrm{Pr}}[X_1=X_2=1])^T\) and the last number is not arbitrary. In fact, \(\mathop{\mathrm{Pr}}[X_1=X_2=1]\) must in range \([\max(0, x_1+x_2-1),\min(x_1,x_2)]\). To make sure \(y\) is indeed a probability distribution the moment matrix is considered. The moment matrix \(M(y)\) is of size \(2^n \times 2^n\) and \(M(y)[I,J]\) is defined as the expectation \(E[\prod_{i\in I\cup J}X_i]=y_{I\cup J}\) (the only non-zero term is \(1\cdot \mathop{\mathrm{Pr}}[\bigwedge_{i\in I\cup j}(X_i=1)]=y_{I\cup J}\)). The expectation is taken over the distribution defined by \(y\).
Lemma1
For any probability distribution \(y\),
the moment matrix is psd.
Proof
We need to verify \(z^T M(y) z\geq 0\)
for any \(z\). \[\begin{equation*}
\begin{aligned}
z^T M(y) z &= \sum_I \sum_J z_I y_{I\cup J} z_J\\
&= \sum_I \sum_J z_I E[\prod_{i\in I\cup J} X_i]
z_J\\
&= E\left[\left( \sum_I (z_I \prod_{i\in I}
X_i)\right)^2 \right]
\end{aligned}
\end{equation*}\]
Note that in the proof something like sum of squares appears. Lasserre
hierarchy has deep connections with SOS
optimization and is also known as sum-of-squares hierarchy.
Lemma2
If \(M(y)\) is psd then \(y\) is a probability distribution.
It is easy to see that \(y_I\in [0,1]\)
for all \(I\subseteq [n]\). Consider
the following submatrix
\[\begin{bmatrix}
\emptyset & y_I\\
y_I & y_I
\end{bmatrix}\]
It is psd since \(M(y)\) is psd. The
determinant is \(y_I(1-y_I)\geq 0\).
Let \(\mathop{\mathrm{Pr}}_D[p]\) be
the probability of selecting \(p\in\set{0,1}^n\) in \(D\). It remains to prove the following
system of linear equations has a solution such that \(\mathop{\mathrm{Pr}}_D[p]\in [0,1]\) for
all \(p\).
\[\begin{equation*}
\begin{aligned}
y_{[n]} &= \mathop{\mathrm{Pr}}_D[\mathbf 1]\\
y_{[n]\setminus \set{n}} &= \sum_{p:\bigwedge\limits_{i\in
[n-1]}(p_i=1)} \mathop{\mathrm{Pr}}_D[p]\\
y_{[n]\setminus \set{n-1}} &= \sum_{p:\bigwedge\limits_{i\in
[n]\setminus \set{n-1}}(p_i=1)} \mathop{\mathrm{Pr}}_D[p]\\
&\vdots \\
y_{\set{1}} &= \sum_{p:p_1=1} \mathop{\mathrm{Pr}}_D[p]\\
y_\emptyset &= \sum_p \mathop{\mathrm{Pr}}_D[p]
\end{aligned}
\end{equation*}\] I believe this can be proven with the idea of
Lemma 2 here.
2.2 Projection in \(K\)
Let the projection of \(y\) be \((y_{\set{1}},\dots,y_{\set{n}})^T\). For any \(y\) the projection should always lie in \(K\). One may want to define moment matrices for constraints \(Ax\geq b\). This is called the moment matrix of slacks. For simplicity we only consider one linear constraint \(a^Tx-b\geq 0\). The moment matrix for this constraint is \(M(y)=\left( \sum_{i=1}^n a_i y_{I\cup J\cup \set{i}}-b y_{I\cup J} \right)_{I,J\subseteq [n]}\). Then we can do similar arguments. \[\begin{equation*} \begin{aligned} z^T M(y) z &= \sum_I \sum_J z_I z_J (\sum_{i=1}^n a_i y_{I\cup J\cup \set{i}}-b y_{I\cup J})\\ &= \sum_I \sum_J z_I z_J (\sum_i a_i E[\prod_{k\in I\cup J\cup\set{i}} X_k] - b E[\prod_{k\in I\cup J}X_k] )\\ &= E\left[ \sum_I \sum_J z_I z_J (\sum_i a_i X_i -b) \prod_{k\in I\cup J}X_k \right]\\ &= E\left[ (\sum_i a_i X_i -b) \left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \right] \end{aligned} \end{equation*}\] Note that we can combine the expectations since they are taken over the same probability distribution. Now assume that we have \(a^TX-b\geq 0\). \[\begin{equation*} \begin{aligned} E&\left[ (\sum_i a_i X_i -b) \left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \right]\\ &= \sum \mathop{\mathrm{Pr}}[\cdots](a^T X-b)\left(\sum_I z_I \prod_{i\in I} X_i \right)^2 \geq 0 \end{aligned} \end{equation*}\] If \(a^TX\geq b\) is satisfied, then the corresponding slack moment matrix is psd. Finally, this is a more formal definiton.
Definition3\(t\)-th level of Lasserre
hierarchy
The \(t\)-th level of Lasserre
hierarchy \(\mathop{\mathrm{Las}}_t(K)\) of a convex
polytope \(K=\set{x\in \R^n| Ax\geq b}\subset
[0,1]^n\) is the set of vectors \(y\in
\R^{2^n}\) that make the following matrices psd.
Note that the \(t\)-th level of
Lasserre hierarchy only involve entries \(y_I\) with \(|I|\leq 2t+1\). (\(+1\) comes from the moment matrix of
slacks) The matrices have dimension \(\binom{n}{2t+1}=n^{O(t)}\) and there are
only \(m+1\) matrices. Thus to optimize
some objective over the \(t\)-th level
of Lasserre hierarchy takes \(mn^{O(t)}\) time which is still polynomial
in the input size. (The separation oracle computes eigenvalues and
eigenvectors. If there is a negative eigenvalue we find the
corresponding eigenvector \(v\) and the
separating hyperplane is \(\sum_{I,J}v_{I}v_{J} x_{I,J}=0\). See Example
43.)
- moment matrix \(M_t(y):=(y_{I\cup J})_{|I|,|J|\leq t}\succeq 0\)
- moment matrix of slacks \(M_t^\ell(y):=\left( \sum_{i=1}^n A_{\ell i}y_{I\cup J\cup \set{i}}-b_\ell y_{I\cup J} \right)_{|I|,|J|\leq t}\succeq 0\)
3 Properties
Almost everything in this part can be found here. Suppose that we have the \(t\)-th level of Lasserre hierarchy \(\mathop{\mathrm{Las}}_t(K)\). Denote by \(\mathop{\mathrm{Las}}_t^{proj}(K)\) the projection of the \(t\)-th level.- \(\mathop{\mathrm{Las}}_t(K)\) is convex
- \(y_I\in [0,1]\) for all \(y\in \mathop{\mathrm{Las}}_t(K)\)
- \(0\leq y_I \leq y_J \leq 1\) for all \(J\subset I\) with \(|I|,|J|\leq t\)
- \(y_{I\cup J}\leq \sqrt{y_I \cdot y_J}\)
- \(K\cap \set{0,1}^n \subset \mathop{\mathrm{Las}}_t^{proj}(K)\) for all \(t\in [n]\)
- \(\mathop{\mathrm{Las}}_t^{proj}(K)\subset K\)
- \(\mathop{\mathrm{Las}}_n(K)\subset \mathop{\mathrm{Las}}_{n-1}(K)\subset \dots \subset \mathop{\mathrm{Las}}_0(K)\)
3.1 Convex Hull and Conditional Probability
Lemma4
For \(t\geq 1\), let \(y\in \mathop{\mathrm{Las}}_t(K)\) and \(S\subset [n]\) be any subset of variables
of size at most \(t\). then \[y\in \mathop{\mathrm{conv}}\set{z\in
\mathop{\mathrm{Las}}_{t-|S|}(K)| z_i\in \set{0,1} \forall i\in
S}.\]
For any \(y\in\mathop{\mathrm{Las}}_n(K)\) and \(S=[n]\), the previous lemma implies the
projection of \(y\) is convex
combination of integral vectors in \(K\cap
\set{0,1}^n\). Then it follows that \(\mathop{\mathrm{Las}}_n^{proj}(K)=K\cap
\set{0,1}^n\). This also provides proofs for the facts that if
\(M_n(y)\succeq 0\) and \(M_n^{\ell}(y)\succeq 0\) then \(y\) is indeed a probability distribution
and the projection is in \(K\).
Proof
The proof is constructive and is by induction on the size of \(S\).
\(y\in \mathop{\mathrm{Las}}_t(K)\) is
a probability distribution if we consider only \(|I|\leq t\), \(y_I=\mathop{\mathrm{Pr}}[\bigwedge_{i\in
I}X_i=1]\). The vectors \(z^{(1)},z^{(2)}\) we constructed in the
previous proof can be understood as conditional probabilities. \[\begin{equation*}
\begin{aligned}
&z^{(1)}_I=\frac{y_{I\cup\set{i}}}{y_i}=\frac{\mathop{\mathrm{Pr}}[\bigwedge_{k\in
I\cup
\set{i}}X_k=1]}{\mathop{\mathrm{Pr}}[X_i=1]}=\mathop{\mathrm{Pr}}[\bigwedge_{k\in
I}X_k=1 | X_i=1]\\
&z^{(2)}_I=\frac{y_I-y_{I\cup\set{i}}}{1-y_i}=\frac{\mathop{\mathrm{Pr}}[\bigwedge_{k\in
I} (X_k=1) \land
X_i=0]}{\mathop{\mathrm{Pr}}[X_i=0]}=\mathop{\mathrm{Pr}}[\bigwedge_{k\in
I}X_k=1 | X_i=0]
\end{aligned}
\end{equation*}\]
The proof is basically showing that
\[\begin{equation*}
\begin{aligned}
y_I &= \mathop{\mathrm{Pr}}[X_i=1]
\mathop{\mathrm{Pr}}[\bigwedge_{k\in I}X_k=1 |
X_i=1]+\mathop{\mathrm{Pr}}[X_i=0] \mathop{\mathrm{Pr}}[\bigwedge_{k\in
I}X_k=1 | X_i=0]\\
&= \mathop{\mathrm{Pr}}[\bigwedge_{i\in I}X_i=1]
\end{aligned}
\end{equation*}\]
For any partially feasible probability distribution \(y\in\mathop{\mathrm{Las}}_t(K)\), \(y_i \in (0,1)\) implies that both \(X_i=0\) and \(X_i=1\) happen with non-zero probability,
which in turn impies \(z^{(1)},z^{(2)}\in
\mathop{\mathrm{Las}}_{t-1}(K)\). One can also explicitly express
\(y\) as convex combination and see the
relation with Möbius inversion, see p9 in this
notes.
In Lemma 4, each vector in the
convex combination (those with integer value on \(S\), such as \(z^{(1)},z^{(2)}\)) can be understood as a
partial probability distribution under condition \([\bigwedge_{i\in I} (X_i=1) \bigwedge_{j\in
J}(X_j=0)]\) where \(I\sqcup
J=S\), and the probability assigned to it is exactly the chance
its condition happens. More formally, Lemma 4 implies the following,
- \(S=\set{i}\). Assume that \(y_{\set{i}}\in (0,1)\). For simplicity I
use \(y_i\) for \(y_{\set{i}}\). Define two vectors \(z^{(1)},z^{(2)}\) as \(z^{(1)}_I=\frac{y_{I\cup\set{i}}}{y_i}\)
and \(z^{(2)}_I=\frac{y_I-y_{I\cup\set{i}}}{1-y_i}\).
One can easily verify that \(y=y_i
z^{(1)}+(1-y_i)z^{(2)}, z^{(1)}_i=1\) and \(z^{(2)}_i=0\). It remains to verify \(z^{(1)},z^{(2)}\in
\mathop{\mathrm{Las}}_{t-1}(K)\). Since \(M_t(y)\) is psd, there must be vectors
\(v_I,v_J\) such that \(\langle v_I,v_J \rangle=y_{I\cup J}\) for
all \(|I|,|J|\leq t\). Take \(v_I^{(1)}=v_{I\cup\set{i}}/\sqrt{y_i}\). We
have \[\langle v_I^{(1)},v_J^{(1)}
\rangle=\frac{y_{I\cup
J\cup\set{i}}}{y_i}=M_{t-1}(z^{(1)})[I,J]\] for all \(|I|,|J|\leq t-1\). Thus \(M_{t-1}(z^{(1)})\) is psd. Similarly, one
can take \(v_I^{(2)}=(v_I-v_{I\cup
\set{i}})/\sqrt{(1-y_i)}\) and show \(M_{t-1}(z^{(2)})\) is psd.
For each moment matrix of slacks one can use exactly the same arguments to show \(M_{t-1}^{\ell}(z^{(1)})\succeq 0\) and \(M_{t-1}^{\ell}(z^{(2)})\succeq 0\). - For the inductive steps one can see that our arguments for the base case can be applied recursively on \(z^{(1)},z^{(2)}\).
Corollary5
Let \(y\in\mathop{\mathrm{Las}}_t(K)\).
For any subset \(S\subset [n]\) of size
at most \(t\), there is a distribution
\(D(S)\) over \(\set{0,1}^S\) such \[
\mathop{\mathrm{Pr}}_{z\sim D(S)}\left[ \bigwedge_{i\in I} (z_i=1)
\right]=y_I \quad \forall I\subset S
\]
Moreover, this distribution is “locally consistent” since the
prabability assigned to each vector only depends on its condition.
Since the constraints in \(\mathop{\mathrm{Las}}_t\) only concern the
psdness of certain matrices, one may naturally think about its
decomposition. This leads to a vector representation of \(y_I\) for all \(|I|\leq t\) and may be helpful in rounding
algorithms. For \(J\subset I\), \(v_I\) lies on the sphere of radius \(\|v_J\|/2=\sqrt{y_J}/2\) and center \(v_J /2\).
3.2 Decomposition Theorem
We have seen that \(\mathop{\mathrm{Las}}_n^{proj}(K)\) is the integer hull. Can we get better upperbounds based on properties of \(K\)? Another easy upperbound is \(\max_{x\in K}|\mathop{\mathrm{ones}}(x)|+1\), where \(\mathop{\mathrm{ones}}(x)=\set{i|x_i=1}\). This is because \(y\in \mathop{\mathrm{Las}}_t(K)\) is a partial distribution for \(|I|\leq t\) that can be realized as the marginal distribution of some distribution on \(K\cap \set{0,1}^n\); if \(k\cap \set{0,1}^n\) does not contain a point with at least \(t\) ones, we certainly have \(\mathop{\mathrm{Pr}}[\bigwedge_{i\in I}(X_i=1)]=0\) for \(|I|\geq t\). This fact implies that for most hard problems we should not expect \(\mathop{\mathrm{Las}}_k\) to give us a integral solution for constant \(k\). Karlin, Mathieu and Nguyen [3] proved a more general form of Lemma 4 using similar arguments.
Theorem6Decomposition
Theorem
Let \(y\in
\mathop{\mathrm{Las}}_t(K)\), \(S\subset [n]\) and \(k\in [0,t]\) such that \(k\geq |\mathop{\mathrm{ones}}(x)\cap S|\)
for all \(x\in K\). Then \[
y\in \mathop{\mathrm{conv}}\set{z| z\in \mathop{\mathrm{Las}}_{t-k}(K);
z_{\set{i}}\in \set{0,1} \forall i\in S}.
\]
4 Moment Relaxation
In this section we briefly show the non-probabilistic view of Lasserre hierarchy and how this idea is used in polynomial optimization problems. Everything in this section can be found inuseful_link[6].
Consider the following polynomial optimiation problem \[\begin{equation*}
\begin{aligned}
\min& & a(x)& & &\\
s.t.& & b(x)&\geq 0 & &\forall b\in B\\
& & x&\in\set{0,1}^n
\end{aligned}
\end{equation*}\] where \(a,b,c\) are polynomials. We want to
formulate this problem with SDP.
We can consider polynomials \(a,b\) as
multilinear polynomials. Since \(x_i\in
\set{0,1}\), we have \(x_i^2=x_i\). Now we can consider
enumerating \(x_S=\prod_{i\in S}x_i\)
and write these polynomials as linear functions. For example, we can
rewrite \(a(x)=\sum_{S\subset
[n]}\sum_{\alpha_S:S\to \Z} a_S \prod_{i\in S}x_i^{\alpha_S(i)}\)
as \(\sum_{S\subset [n]} a_S x_S\)
which is linear in the moment sequence \((x_\emptyset,
x_{\set{1}},\ldots,x_{[n]})\).
Recall that our goal is to find a SDP formulation. A common technique is
replace each variable with a vector. We consider the moment vectors
\([v_S\in \R^\gamma]_{S\in 2^{[n]}}\).
Similar to the LP case, we want \(\langle
v_A,v_B \rangle=x_{A\cup B}\). This is exactly the Gram
decomposition of the moment matrix. There exist such moment vectors iff
the moment matrix is psd. For \(b(x)\geq
0\), we consider the slack moment matrix \(M^b(x)=\left( \sum_S b_S x_{I\cup J\cup S}
\right)_{I,J}\)
Then the program becomes the following SDP
\[\begin{equation*}
\begin{aligned}
\min& & \sum_{S\subseteq [n]}a_S x_S& & &\\
s.t.& & M^b(x)&\succeq 0 & &\forall b\in B\\
& & M(x)&\succeq 0\\
& & x_{\emptyset}&=1
\end{aligned}
\end{equation*}\]
Note that if the max degree of polynomials \(a,b\) is at most \(d\), then the following program is a
relaxation of the original polynomial optimiation problem (cf. Corollary
3.2.2.).
\[\begin{equation*}
\begin{aligned}
\min& & \sum_{S\subseteq [n]}a_S x_S& & &\\
s.t.& & M_{F}^b(x)&\succeq 0 & &\forall b\in
B\\
& & M_{F\uplus V_{\leq d}}(x)&\succeq 0\\
& & x_{\emptyset}&=1
\end{aligned}
\end{equation*}\] where \(F\subset
2^{[n]}\), \(\uplus(A,B)=\set{a\cup b|
\forall a\in A,b\in B}\) is element-wise union and \(M_{F}\) is the submatrix of \(M(F)\) on entries \(F\times F\). Taking \(F=\binom{[n]}{\leq t}\) gives us \(\mathop{\mathrm{Las}}_t\).
5 Applications
5.1 Sparsest Cut
There are lots of applications in the useful links, but none of them discusses sparsest cut [4].
Problem7sparsest cut
Given a vertex set \(V\) and two weight
functions \(c,D:\binom{V}{2} \to \R_{\geq
0}\), find \(T\subset V\) that
minimizes the sparsity of \(T\) \[
\Phi(T)=\frac{\sum_{u < v}c_{u,v}|\chi^T(u)-\chi^T(v)|}{\sum_{u <
v}D_{u,v}|\chi^T(u)-\chi^T(v)|},
\] where \(\chi^T\) is the
indicator vector of \(T\).
In [4] Guruswami and Sinop describe Lasserre
hierarchy in a slightly different way. (Note that useful_link[6]
is Sinop’s thesis) We have seen that \(y\in
[0,1]^{2^{[n]}}\) is sufficient for describing the joint
distribution. However, the total number of events is \(3^n\), since for each variable \(X_i\) in an event there are 3 possible
states, \(X_i=0,X_i=1\) and \(X_i\) is absent.
Instead of using \(y\in
[0,1]^{2^{[n]}}\), they enumerate each of the \(3^n\) events and consider the vectors in
the Gram decomposition. For each set \(S\subset V\) of size \(\leq r+1\), and for each 0-1 labeling \(f\) on elements of \(S\), they define a vector \(x_S(f)\). Note that \(S(f)\) enumerates all events and one should
understand \(x_S(f)\) as the vector
corresponding to \(y_{S,f}\in
[0,1]^{3^{[n]}}\) in the Gram decomposition and \(\langle x_S(f), x_T(g) \rangle=y_{f(S)\land
g(T)}\). Then \(x_S(f)\) should
have the following properties:
- if \(f(S)\) and \(g(T)\) are inconsistant, i.e. there is an element \(e\in S\cap T\) and \(f(e)\neq g(e)\), then one should have \(\langle x_S(f), x_T(g) \rangle=y_{f(S)\land g(T)}=0\).
- if \(f(S)\land g(T)\) and \(f'(A)\land g'(B)\) are the same event, i.e. \(A\cup B=S\cup T\) and the labels are the same, then \(\langle x_S(f), x_T(g) \rangle=\langle x_A(f'), x_B(g') \rangle\)
- \(\|x_{\emptyset}\|^2=1\) here \(\emptyset\) is the union of all events.
- for all \(u\in V\), \(\|x_u(0)\|^2+\|x_u(1)\|^2=\|x_{\emptyset}\|^2=1\).
- for \(S\subset V, u\in S\) and \(f\in \set{0,1}^{S\setminus \set{u}}\), \(x_S(f\land (u=1))+x_S(f\land (u=0))=x_{S\setminus \set{u}}(f)\). (Note that two lhs vectors are orthogonal)
Lemma8pseudo
probability
Let \(x\in \mathop{\mathrm{Las}}_t(V)\)
for \(t\geq 0\). Then the following
holds:
- \(\|x_S(f)\|^2 \in [0,1]\) for all \(|S|\leq t+1\).
- \(\|x_S(f)\|^2 \leq \|x_T(g)\|^2\) if \(T\subset S\) and \(f(t)=g(t)\) for all \(t\in T\).
- \(\|x_S(f)\|^2 = \sum_{h\in \set{0,1}^{T-S}} \|x_T(f\land h)\|^2\) if \(S\subset T\).
- If \(S\in \binom{V}{\leq t}\), \(f\in \set{0,1}^S\) and \(u\notin S\), then \(x_{S+u}(f\land u=1)+x_{S+u}(f\land u=0)=x_{S}(f)\).
Proof
Let \(N_t=\sum_{r=0}^{t+1}\binom{V}{r}2^r\) be
the number of vectors in \(x\).
Consider the moment matrix \(M_t\in
\R^{N_t\times N_t}\), where each entry \(M_t[f(S),g(T)]\) is \(\langle x_S(f),x_T(g)\rangle\). The moment
matrix is positive semidefinite since vectors in \(x\) form a Gram decomposition of \(M_t\).
Then write \(x_u=x_{\set{u}}(1)\). The
follwing “SDP” is a relaxation of sparsest cut.
\[\begin{equation*}
\begin{aligned}
\min& & \frac{\sum_{u < v}c_{u,v}\|x_u-x_v\|^2}{\sum_{u
< v}D_{u,v}\|x_u-x_v\|^2}\\
s.t.& & \sum_{u < v}D_{u,v}\|x_u-x_v\|^2&\geq 0\\
& & x\in \mathop{\mathrm{Las}}_r(V)&
\end{aligned}
\end{equation*}\]
Scaling every \(x_S(f)\) by a factor of
the square root of the objective’s denominator gives us a real SDP.
\[\begin{equation*}
\begin{aligned}
\min& & \sum_{u < v}c_{u,v}\|x_u-x_v\|^2\\
s.t.& & \sum_{u < v}D_{u,v}\|x_u-x_v\|^2&= 1\\
& & x\in
\mathop{\mathrm{Las}}_r(V),\|x_\emptyset\|^2&>0
\end{aligned}
\end{equation*}\]
The rounding method is too complicated, so it won’t be covered here.
- Consider the following submatrix of \(M_t\). \[\begin{bmatrix} \langle x_\emptyset,x_\emptyset\rangle & \langle x_\emptyset,x_S(f)\rangle\\ \langle x_S(f),x_\emptyset\rangle & \langle x_S(f),x_S(f)\rangle \end{bmatrix}\succeq 0\] Computing the determinant gives us \(\|x_S(f)\|^2(1-\|x_S(f)\|^2)\geq 0\).
- Again consider certain submatrix of \(M_t\). \[\begin{bmatrix} \langle{x_T(g)},{x_T(g)}\rangle & \langle{x_T(g)},{x_S(f)}\rangle\\ \langle{x_S(f)},{x_T(g)}\rangle & \langle{x_S(f)},{x_S(f)}\rangle \end{bmatrix}\succeq 0\] The determinant is \(\|x_S(f)\|^2(\|x_T(g)\|^2-\|x_S(f)\|^2)\geq 0\).
- We only need to show \(\|x_S(f)\|^2=\|x_{S+u}(f\land u=0)\|^2 +\|x_{S+u}(f\land u=1)\|^2\) and the rest follows by induction. Note that \(x_u(0)+x_u(1)=x_\emptyset\) since we have \(\|x_u(0)\|^2+\|x_u(1)\|^2=\|x_{\emptyset}\|^2\) and they are orthogonal. \[\begin{equation*} \begin{aligned} \|x_{S+u}(f\land u=0)\|^2 +\|x_{S+u}(f\land u=1)\|^2 &= \langle{x_S(f)},{x_u(0)}\rangle+\langle{x_S(f)},{x_u(1)}\rangle\\ &= \langle{x_S(f)},{x_u(0)+x_u(1)}\rangle\\ &= \langle{x_S(f)},{x_\emptyset}\rangle=\|x_S(f)\|^2 \end{aligned} \end{equation*}\]
- Notice that \(x_{S+u}(f\land u=1)\) and \(x_{S+u}(f\land u=0)\) are orthogonal. Denote by \(x_S(f')\) the projection of \(f\) on the hyperplane spanned by \(x_{S+u}(f\land u=1)\) and \(x_{S+u}(f\land u=0)\). One can verify that \(f'=x_{S+u}(f\land u=1)+x_{S+u}(f\land u=0)\). Then it remains to show \(\langle x_S(f'), x_S(f)\rangle=\|x_S(f)\|^2\), which immediately follows from 3.
5.2 Matching
This application can be found in section 3.3 ofuseful_link[1].
We consider the maximum matching IP in non-bipartite graphs. Let \(K=\set{x\in \R_{\geq 0}^n | \sum_{e\in
\delta(v)}x_e\geq 1 \; \forall v\in V}\) be the polytope and
consider \(\mathop{\mathrm{Las}}_t(K)\). In the notes
Rothvoss shows the following lemma.
Lemma9
\(\mathop{\mathrm{Las}}_t^{proj}(K)\subseteq
(1+\frac{1}{2t})\cdot\mathop{\mathrm{conv}}(K\cap \set{0,1}^n)\).
Proof
Let \(y\in
\mathop{\mathrm{Las}}_t(K)\). It suffices to show that \(\sum_{e\in E[U]} y_e\leq
(1+\frac{1}{2t})k\) for all \(|U|=2k+1\), since \(\set{x\in K| \text{$x$ satisfies odd
constraints}}\) is the matching polytope. When \(k>t\), the degree constraints imply that
\(\sum_{e\in E[U]} y_e\leq k+\frac{1}{2} \leq
(1+\frac{1}{2t})k\). Now consider the case \(k\leq t\). Note that for fixed \(U\), any \(I\subset E[U]\) of size \(|I|> k\) has \(y_I=0\), since it is impossible to find a
matching in \(U\) covering more that
\(k\) vertices. Then by Lemma 4 \(y\) can
be represented as a convex combination of solutions \(z\in \mathop{\mathrm{Las}}_0(K)\) in which
\(z_e\in \set{0,1}\) for all \(e\in E[U]\). The convex combination implies
that \(\sum_{e\in E[U]} y_e\leq k\)
when \(k\leq t\).
However, one can see that Lemma
9 is not tight. \(\mathop{\mathrm{Las}}_0^{proj}(K)\) should
be contained in \((1+\frac{1}{2})\cdot\mathop{\mathrm{conv}}(K\cap
\set{0,1}^n)\) and \(\mathop{\mathrm{Las}}_n^{proj}(K)\) should
be exactly the integer hull. Can we prove a slightly better gap that
matches observations at \(\mathop{\mathrm{Las}}_0\) and \(\mathop{\mathrm{Las}}_n\)? The later part
of the proof in fact shows that \(y\in
\mathop{\mathrm{Las}}_t(K)\) satisfies all odd constraints with
\(|U|\leq 2t+1\). Consider an odd cycle
with \(2t+3\) vertices. \((1/2,\ldots,1/2)^T\in \R^{2t+3}\) is a
feasible solution in \(\mathop{\mathrm{Las}}_t(K)\) and proves a
tight lowerbound of \(k+1/2\).
6 Questions
6.1 Replace \(M_t^\ell(y)\succeq 0\) with \(\mathop{\mathrm{Las}}_t^{proj}(y)\in K\)
6.2 Separation Oracle for Implicit \(K\)
Sometimes \(K\) is given in a compact form. For example, consider finding matroid cogirth. \[\begin{equation*} \begin{aligned} \min& & \sum_{e\in E} x_e& & &\\ s.t.& & \sum_{e\in B} x_e&\geq 1 & &\forall \text{ base $B$}\\ & & x_e&\geq 0 & &\forall e\in E \end{aligned} \end{equation*}\] If \(K\) is only accessable through a separation oracle, is it possible to optimize over \(\mathop{\mathrm{Las}}_t(K)\) in polynomial time for constant \(t\)?References
[1]
M.
Laurent, A Comparison of the
Sherali-Adams,
Lovász-Schrijver, and Lasserre
Relaxations for 0–1 Programming,
Mathematics of Operations Research. 28 (2003) 470–496 10.1287/moor.28.3.470.16391.
[2]
A.
Schrijver, Polyhedral proof methods in combinatorial optimization,
Discrete Applied Mathematics. 14 (1986) 111–133 10.1016/0166-218X(86)90056-9.
[3]
A.R. Karlin, C. Mathieu, C.T. Nguyen,
Integrality Gaps of Linear and
Semi-Definite Programming
Relaxations for Knapsack, in: Integer
Programming and Combinatoral
Optimization, Springer, Berlin, Heidelberg, 2011: pp.
301–314 10.1007/978-3-642-20807-2_24.
[4]
V.
Guruswami, A.K. Sinop, Approximating non-uniform sparsest cut via
generalized spectra, in: Proceedings of the Twenty-Fourth Annual
ACM-SIAM Symposium on Discrete
Algorithms, Society for Industrial and Applied
Mathematics, USA, 2013: pp. 295–305.