Useful links:
- https://sites.math.washington.edu/~rothvoss/lecturenotes/lasserresurvey.pdf
- https://web.stanford.edu/class/cs369h/
- Laurent’s survey [1]
- https://rameesh-paul.github.io/sos.pdf
- https://www.ams.jhu.edu/~abasu9/papers/main-Lasserre.pdf
- Chapter 3 of Ali Kemal Sinop’s PhD thesis
When I started writing this post, I hadn’t found so many “useful links” yet. The content below and link 1.2. only focus on using the Lasserre hierarchy on LPs, while 3.4.5.6. mention more general cases.
1
We want to solve a 0-1 integer program. Since this task is NP-hard in general, we usually consider its linear relaxation. Different LP formulations have different integrality gaps. For example, consider the following linear relaxation of the max matching IP in non-bipartite graph.
:(
, this polytope is not integral.
Edmonds proved that the following formulation is integral.
Schrijver [2] showed that those odd constraints can be obtained by adding cutting planes to the previous polytope. Fortunately for matching polytope we have a polynomial time separation oracle. However, for harder problems adding cutting planes may make the program NP-hard to solve. Lasserre hierarchy is a method to strengthen the polytope to approaching the integer hull while providing provable good properties and keeping the program polynomial time solvable (if applied constant number of times).
2 Probability Perspective
There is a good interpretation of the linear relaxation of 0-1 integer programs. Let be the polytope of the linear relaxation. The goal of solving the integer program is to describe all possible discrete distribution over . Note that for a fixed distribution the expected position is in and iterating over all possible distribution gives us the integer-hull. Hence we can find the integral optimal solution if having access to all distribution over integer points.
For any , can be seen as the probability of . We only care about the discrete distribution on feasible integral points. However, each only describes some marginal probabilities and this this marginal probability may not be even feasible. Consider the following 2D example. Any point in is not a marginal distribution of any possible joint distribution over and . The idea is to iteratively prune this area.

Now we need to think about how to represent all possible joint distribution. One natural way is to use a vector for the distribution law of every possible integer point in . However, this method does not work well with our existing marginal probabilities. Let be a random vector such that and . Computing all feasible is the same as finding all possible bivariate discrete distribution on the integer points. To make a feasible probability from some joint distribution and to make we have to add more constraints.
2.1 Feasible Probability
For now let’s forget about and consider and a discrete distribution on . We want to make . In fact, there is a one to one correspondence between and . If is given, computing is easy for any . If is given, recovering the distribution is the same as solving a system of linear equations with variables ( of the the equations come form , and the remaining one is .) Thus, with a slight abuse of notation, we will refer to as a distribution.
We work with the 2D example first. Let be a marginal distribution. One can see that and the last number is not arbitrary. In fact, must in range .
To make sure is indeed a probability distribution the moment matrix is considered. The moment matrix is of size and is defined as the expectation (the only non-zero term is ). The expectation is taken over the distribution defined by .
For any probability distribution , the moment matrix is psd.
We need to verify for any .
Note that in the proof something like sum of squares appears. Lasserre hierarchy has deep connections with SOS optimization and is also known as sum-of-squares hierarchy.
If is psd then is a probability distribution.
It is psd since is psd. The determinant is .
Let be the probability of selecting in . It remains to prove the following system of linear equations has a solution such that for all .
I believe this can be proven with the idea of Lemma 2 here.
2.2 Projection in
Let the projection of be . For any the projection should always lie in . One may want to define moment matrices for constraints . This is called the moment matrix of slacks. For simplicity we only consider one linear constraint . The moment matrix for this constraint is . Then we can do similar arguments.
Note that we can combine the expectations since they are taken over the same probability distribution. Now assume that we have .
If is satisfied, then the corresponding slack moment matrix is psd.
Finally, this is a more formal definiton.
The -th level of Lasserre hierarchy of a convex polytope is the set of vectors that make the following matrices psd.
- moment matrix
- moment matrix of slacks
Note that the -th level of Lasserre hierarchy only involve entries with . ( comes from the moment matrix of slacks) The matrices have dimension and there are only matrices. Thus to optimize some objective over the -th level of Lasserre hierarchy takes time which is still polynomial in the input size. (The separation oracle computes eigenvalues and eigenvectors. If there is a negative eigenvalue we find the corresponding eigenvector and the separating hyperplane is . See Example 43.)
3 Properties
Almost everything in this part can be found here.
Suppose that we have the -th level of Lasserre hierarchy . Denote by the projection of the -th level.
- is convex
- for all
- for all with
- for all
1.2.3. show that behaves similarly to a real probability distribution.
4.5.6. show that .
The goal of this section is to show that . When working on the Lasserre hierarchy, instead of considering the projection solely, we usually perform the analysis on .
3.1 Convex Hull and Conditional Probability
For , let and be any subset of variables of size at most . then
For any and , the previous lemma implies the projection of is convex combination of integral vectors in . Then it follows that . This also provides proofs for the facts that if and then is indeed a probability distribution and the projection is in .
The proof is constructive and is by induction on the size of .
- . Assume that . For simplicity I use
for . Define two vectors as and
.
One can easily verify that and . It remains to verify . Since is psd, there must be vectors such that for all
. Take . We
have
for all . Thus is psd. Similarly, one can
take and show is psd.
For each moment matrix of slacks one can use exactly the same arguments to show and . - For the inductive steps one can see that our arguments for the base case can be applied recursively on .
is a probability distribution if we consider only , . The vectors we constructed in the previous proof can be understood as conditional probabilities.
The proof is basically showing that
For any partially feasible probability distribution , implies that both and happen with non-zero probability, which in turn impies . One can also explicitly express as convex combination and see the relation with Möbius inversion, see p9 in this notes.
In Lemma 4, each vector in the convex combination (those with integer value on , such as ) can be understood as a partial probability distribution under condition where , and the probability assigned to it is exactly the chance its condition happens. More formally, Lemma 4 implies the following,
Let . For any subset of size at most , there is a distribution over such
Moreover, this distribution is “locally consistent” since the prabability assigned to each vector only depends on its condition.
Since the constraints in only concern the psdness of certain matrices, one may naturally think about its decomposition. This leads to a vector representation of for all and may be helpful in rounding algorithms. For , lies on the sphere of radius and center .
3.2 Decomposition Theorem
We have seen that is the integer hull. Can we get better upperbounds based on properties of ? Another easy upperbound is , where . This is because is a partial distribution for that can be realized as the marginal distribution of some distribution on ; if does not contain a point with at least ones, we certainly have for .
This fact implies that for most hard problems we should not expect to give us a integral solution for constant .
Karlin, Mathieu and Nguyen [3] proved a more general form of Lemma 4 using similar arguments.
Let , and such that for all . Then
4 Moment Relaxation
In this section we briefly show the non-probabilistic view of Lasserre hierarchy and how this idea is used in polynomial optimization problems.
Everything in this section can be found in useful_link[6]
.
Consider the following polynomial optimiation problem where are polynomials. We want to formulate this problem with SDP.
We can consider polynomials as multilinear polynomials. Since , we have . Now we can consider enumerating and write these polynomials as linear functions. For example, we can rewrite as which is linear in the moment sequence .
Recall that our goal is to find a SDP formulation. A common technique is replace each variable with a vector. We consider the moment vectors . Similar to the LP case, we want . This is exactly the Gram decomposition of the moment matrix. There exist such moment vectors iff the moment matrix is psd. For , we consider the slack moment matrix
Then the program becomes the following SDP
Note that if the max degree of polynomials is at most , then the following program is a relaxation of the original polynomial optimiation problem (cf. Corollary 3.2.2.).
where , is element-wise union and is the submatrix of on entries . Taking gives us .
5 Applications
5.1 Sparsest Cut
There are lots of applications in the useful links, but none of them discusses sparsest cut [4].
Given a vertex set and two weight functions , find that minimizes the sparsity of where is the indicator vector of .
In [4] Guruswami and Sinop describe Lasserre
hierarchy in a slightly different way. (Note that useful_link[6]
is Sinop’s thesis) We have seen that is sufficient for describing the joint
distribution. However, the total number of events is , since for each variable in an event there are 3 possible states,
and is absent.
Instead of using , they enumerate each of the events and consider the vectors in the Gram decomposition. For each set of size , and for each 0-1 labeling on elements of , they define a vector . Note that enumerates all events and one should understand as the vector corresponding to in the Gram decomposition and . Then should have the following properties:
- if and are inconsistant, i.e. there is an element and , then one should have .
- if and are the same event, i.e. and the labels are the same, then
- here is the union of all events.
- for all , .
- for and , . (Note that two lhs vectors are orthogonal)
Let for . Then the following holds:
- for all .
- if and for all .
- if .
- If , and , then .
Let be the number of vectors in . Consider the moment matrix , where each entry is . The moment matrix is positive semidefinite since vectors in form a Gram decomposition of .
- Consider the following submatrix of . Computing the determinant gives us .
- Again consider certain submatrix of . The determinant is .
- We only need to show and the rest follows by induction. Note that since we have and they are orthogonal.
- Notice that and are orthogonal. Denote by the projection of on the hyperplane spanned by and . One can verify that . Then it remains to show , which immediately follows from 3.
Then write . The follwing “SDP” is a relaxation of sparsest cut.
Scaling every by a factor of the square root of the objective’s denominator gives us a real SDP.
The rounding method is too complicated, so it won’t be covered here.
5.2 Matching
This application can be found in section 3.3 of useful_link[1]
.
We consider the maximum matching IP in non-bipartite graphs. Let be the polytope and
consider . In
the notes Rothvoss shows the following lemma.
.
Let . It suffices to show that for all , since is the matching polytope. When , the degree constraints imply that . Now consider the case . Note that for fixed , any of size has , since it is impossible to find a matching in covering more that vertices. Then by Lemma 4 can be represented as a convex combination of solutions in which for all . The convex combination implies that when .
However, one can see that Lemma 9 is not tight. should be contained in and should be exactly the integer hull. Can we prove a slightly better gap that matches observations at and ? The later part of the proof in fact shows that satisfies all odd constraints with . Consider an odd cycle with vertices. is a feasible solution in and proves a tight lowerbound of .
6 Questions
6.1 Replace with
I don’t see any proof relying on the psdness of slack moment
matrices…
It turns out that problems occur in the proof of Lemma 4. If is defined as , then we cannot guarantee . Without Lemma 4, may not be exactly and the hierarchy seems less interesting? But an alternative formulation (see Sparsest cut, which entirely ignore the slack moment matrices) still allows good rounding even without Lemma 4. Generally speaking, if the psdness of slack moment matrices is neglected, then we won’t have Law of total probability(Lemma 4); However, we still have “finite additivity property of probability measures”(Lemma 8 (3)).
6.2 Separation Oracle for Implicit
Sometimes is given in a compact form. For example, consider finding matroid cogirth.
If is only accessable through a separation oracle, is it possible to optimize over in polynomial time for constant ?