🎉 从330转为WashU帮助手册啦!

CSE347
模块
Cse347 L8

Lecture 8

NP-optimization problem

Cannot be solved in polynomial time.

Example:

  • Maximum independent set
  • Minimum vertex cover

What can we do?

  • solve small instances
  • hard instances are rare - average case analysis
  • solve special cases
  • find an approximate solution

Approximation algorithms

We find a "good" solution in polynomial time, but may not be optimal.

Example:

  • Minimum vertex cover: we will find a small vertex cover, but not necessarily the smallest one.
  • Maximum independent set: we will find a large independent set, but not necessarily the largest one.

Question: How do we quantify the quality of the solution?

Approximation ratio

Intuition:

How good is an algorithm AA compared to an optimal solution in the worst case?

Definition:

Consider algorithm AA for an NP-optimization problem LL. Say for any instance ll, AA finds a solution output cA(l)c_A(l) and the optimal solution is c(l)c^*(l).

Approximation ratio is either:

maxlLcA(l)c(l)=α\max_{l \in L} \frac{c_A(l)}{c^*(l)}=\alpha

for maximization problems, or

minlLcA(l)c(l)=α\min_{l \in L} \frac{c^A(l)}{c_*(l)}=\alpha

for minimization problems.

Example:

Alice's Algorithm, AA, finds a vertex cover of size cA(l)c_A(l) for instance l(G)l(G). The optimal vertex cover has size c(l)c^*(l).

We want approximation ratio to be as close to 1 as possible.

Vertex cover:

A vertex cover is a set of vertices that touches all edges.

Let's try an approximation algorithm for the vertex cover problem, called Greedy cover.

Greedy cover

Pick any uncovered edge, both its endpoints are added to the cover CC, until all edges are covered.

Runtime: O(m)O(m)

Claim: Greedy cover is correct, and it finds a vertex cover.

Proof:

Algorithm only terminates when all edges are covered.

Claim: Greedy cover is a 2-approximation algorithm.

Proof:

Look at the two edges we picked.

Either it is covered by Greedy cover, or it is not.

If it is not covered by Greedy cover, then we will add both endpoints to the cover.

In worst case, Greedy cover will add both endpoints of each edge to the cover. (Consider the graph with disjoint edges.)

Thus, the size of the vertex cover found by Greedy cover is at most twice the size of the optimal vertex cover.

Thus, Greedy cover is a 2-approximation algorithm.

Min-cut:

Given a graph GG and two vertices ss and tt, find the minimum cut between ss and tt.

Max-cut:

Given a graph GG, find the maximum cut.

Local cut

Algorithm:

  • start with an arbitrary cut of GG.
  • While you can move a vertex from one side to the other side while increasing the size of the cut, do so.
  • Return the cut found.

We will prove its:

  • Runtime
  • Feasibility
  • Approximation ratio
Runtime for local cut

Since size of cut is at most E|E|, the runtime is O(m)O(m).

When we move a vertex from one side to the other side, the size of the cut increases by at least 1.

Thus, the algorithm terminates in at most V|V| steps.

So the runtime is O(EV2)O(|E||V|^2).

Feasibility for local cut

The algorithm only terminates when no more vertices can be moved.

Thus, the cut found is a feasible solution.

Approximation ratio for local cut

This is a half-approximation algorithm.

We need to show that the size of the cut found is at least half of the size of the optimal cut.

We could first upper bound the size of the optimal cut is at most E|E|.

We will then prove that solution we found is at least half of the optimal cut E2\frac{|E|}{2} for any graph GG.

Proof:

When we terminate, no vertex could be moved

Therefore, The number of crossing edges is at least the number of non-crossing edges.

Let d(u)d(u) be the degree of vertex uVu\in V.

The total number of crossing edges for vertex uu is at least 12d(u)\frac{1}{2}d(u).

Summing over all vertices, the total number of crossing edges is at least 12uVd(u)=12E\frac{1}{2}\sum_{u\in V}d(u)=\frac{1}{2}|E|.

So the total number of non-crossing edges is at most E2\frac{|E|}{2}.

EOP

Set cover

Problem:

You are collecting a set of magic cards.

XX is the set of all possible cards. You want at least one of each card.

Each dealer jj has a pack SjXS_j\subseteq X of cards. You have to buy entire pack or none from dealer jj.

Goal: What is the least number of packs you need to buy to get all cards?

Formally:

Input XX is a universe of nn elements, and a collection of subsets of XX, Y={S1,S2,,Sm}XY=\{S_1, S_2, \ldots, S_m\}\subseteq X.

Goal: Pick CYC\subseteq Y such that SiCSi=X\bigcup_{S_i\in C}S_i=X, and C|C| is minimized.

Set cover is an NP-optimization problem. It is a generalization of the vertex cover problem.

Greedy set cover

Algorithm:

  • Start with empty set CC.
  • While there is an element xx in XX that is not covered, pick one such element xSix\in S_i where SiS_i is the set that has not been picked before.
  • Add SiS_i to CC.
  • Return CC.
def greedy_set_cover(X, Y):
    # X is the set of elements
    # Y is the collection of sets, hashset by default
    C = []
    def non_covered_elements(X, C):
        # return the elements in X that are not covered by C
        # O(|X|)
        return [x for x in X if not any(x in c for c in C)]
    non_covered = non_covered_elements(X, C)
    # O(|X|) every loop reduce the size of non_covered by 1
    while non_covered:
        max_cover,max_set = 0,None
        # O(|Y|)
        for S in Y:
            # Intersection of two sets is O(min(|X|,|S|))
            cur_cover = len(set(non_covered) & set(S))
            if cur_cover > max_cover:
                max_cover,max_set = cur_cover,S
        C.append(max_set)
        non_covered = non_covered_elements(X, C)
    return C

It is not optimal.

Need to prove its:

  • Correctness:
    Keep picking until all elements are covered.
  • Runtime:
    O(XY2)O(|X||Y|^2)
  • Approximation ratio:
Approximation ratio for greedy set cover

Harmonic number:

Hn=i=1n1i=11+12+13++1n=Θ(logn)H_n=\sum_{i=1}^n\frac{1}{i}=\frac{1}{1}+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{n}=\Theta(\log n)

We claim that the size of the set cover found is at most HnlognH_n\log n times the size of the optimal set cover.

First bound:

Proof:

If the optimal picks kk sets, then the size of the set cover found is at most (1+logn)k(1+\log n)k sets.

Let n=Xn=|X|.

Observe that

For the first round, the elements that we not covered is nn.

U0=n|U_0|=n

In the second round, the elements that we not covered is at most U0x|U_0|-x where x=S1x=|S_1| is the number of elements in the set picked in the first round.

U1=U0S1|U_1|=|U_0|-|S_1|

...

So xiUi1kx_i\geq \frac{|U_{i-1}|}{k}.

We proceed by contradiction.

Suppose all sets in the optimal solution are <U0k< \frac{|U_0|}{k}. Then the sum of the sizes of the sets in the optimal solution is <U0=n< |U_0|=n.

There exists a least ratio of selection of sets determined by kik_i. Otherwise the function (selecting the set cover) will not terminate (no such sets exists)

Some math magics: (11k)k1e(1-\frac{1}{k})^k\leq \frac{1}{e}

So n(11k)C1=1n(1-\frac{1}{k})^{|C|-1}=1, C1+klnn|C|\leq 1+k\ln n.

So the size of the set cover found is at most (1+lnn)k(1+\ln n)k.

EOP

So the greedy set cover is not too bad...

Second bound:

Greedy set cover is a HdH_d-approximation algorithm of set cover.

Proof:

Assign a cost to the elements of XX according to the decisions of the greedy set cover.

Let δ(Si)\delta(S^i) be the new number of elements covered by set SiS^i.

δ(Si)=SiUi1\delta(S^i)=|S_i\cap U_{i-1}|

If the element xx is added by step ii, when set SiS_i is picked, then the cost of xx to

1δ(Si)=1xi\frac{1}{\delta(S^i)}=\frac{1}{x_i}

Example:

X={A,B,C,D,E,F,G}S1={A,C,E}S2={B,C,F,G}S3={B,D,F,G}S4={D,G}\begin{aligned} X&=\{A,B,C,D,E,F,G\}\\ S_1&=\{A,C,E\}\\ S_2&=\{B,C,F,G\}\\ S_3&=\{B,D,F,G\}\\ S_4&=\{D,G\} \end{aligned}

First we select S2S_2, then cost(B)=cost(C)=cost(F)=cost(G)=14cost(B)=cost(C)=cost(F)=cost(G)=\frac{1}{4}.

Then we select S1S_1, then cost(A)=cost(E)=12cost(A)=cost(E)=\frac{1}{2}.

Then we select S3S_3, then cost(D)=1cost(D)=1.

If element xx was covered by greedy set cover due to the addition of set SiS^i at step ii, then the cost of xx is 1δ(Si)\frac{1}{\delta(S^i)}.

Total cost of GSC=xXc(x)=i=1CX covered at iteration ic(x)=i=1Cδ(Si)1δ(Si)=C\textup{Total cost of GSC}=\sum_{x\in X}c(x)=\sum_{i=1}^{|C|}\sum_{X\textup{ covered at iteration }i}c(x)=\sum_{i=1}^{|C|}\delta(S^i)\frac{1}{\delta(S^i)}=|C|

Claim: Consider any set SS that is a subset of XX. The cost paid by the greedy set cover for SS is at most HSH_{|S|}.

Suppose that the greedy set covers SS in order x1,x2,,xSx_1,x_2,\ldots,x_{|S|}, where {x1,x2,,xS}=S\{x_1,x_2,\ldots,x_{|S|}\}=S.

When GSC covers xjx_j, {xj,xj+1,,xS}\{x_j,x_{j+1},\ldots,x_{|S|}\} are not covered.

At this point, the GSC has the option of picking SS

This implies that the δ(S)\delta(S) is at least Sj+1|S|-j+1.

Assume that SS is picked S^\hat{S} for which δ(S^)\delta(\hat{S}) is maximized (S^\hat{S} may be SS or other sets that have not covered xjx_j).

So, δ(S^)δ(S)Sj+1\delta(\hat{S})\geq \delta(S)\geq |S|-j+1.

So the cost of xjx_j is δ(S^)1δ(S)1Sj+1\delta(\hat{S})\leq \frac{1}{\delta(S)}\leq \frac{1}{|S|-j+1}.

Summing over all jj, the cost of SS is at most j=1S1Sj+1=HS\sum_{j=1}^{|S|}\frac{1}{|S|-j+1}=H_{|S|}.

Back to the proof of approximation ratio:

Let CC^* be optimal set cover.

C=xXc(x)SjCxSjc(x)|C|=\sum_{x\in X}c(x)\leq \sum_{S_j\in C^*}\sum_{x\in S_j}c(x)

This inequality holds because of counting element that is covered by more than one set.

Since xSjc(x)HSj\sum_{x\in S_j}c(x)\leq H_{|S_j|}, by our claim.

Let dd be the largest cardinality of any set in CC^*.

CSjCHSjSjCHd=HdC|C|\leq \sum_{S_j\in C^*}H_{|S_j|}\leq \sum_{S_j\in C^*}H_d=H_d|C^*|

So the approximation ratio for greedy set cover is HdH_d.

EOP