Lecture 8

NP-optimization problem

Cannot be solved in polynomial time.

Example:

Maximum independent set
Minimum vertex cover

What can we do?

solve small instances
hard instances are rare - average case analysis
solve special cases
find an approximate solution

Approximation algorithms

We find a "good" solution in polynomial time, but may not be optimal.

Example:

Minimum vertex cover: we will find a small vertex cover, but not necessarily the smallest one.
Maximum independent set: we will find a large independent set, but not necessarily the largest one.

Question: How do we quantify the quality of the solution?

Approximation ratio

Intuition:

How good is an algorithm $A$ compared to an optimal solution in the worst case?

Definition:

Consider algorithm $A$ for an NP-optimization problem $L$ . Say for any instance $l$ , $A$ finds a solution output $c_A(l)$ and the optimal solution is $c^*(l)$ .

Approximation ratio is either:

\max_{l \in L} \frac{c_A(l)}{c^*(l)}=\alpha

for maximization problems, or

\min_{l \in L} \frac{c^A(l)}{c_*(l)}=\alpha

for minimization problems.

Example:

Alice's Algorithm, $A$ , finds a vertex cover of size $c_A(l)$ for instance $l(G)$ . The optimal vertex cover has size $c^*(l)$ .

We want approximation ratio to be as close to 1 as possible.

Vertex cover:

A vertex cover is a set of vertices that touches all edges.

Let's try an approximation algorithm for the vertex cover problem, called Greedy cover.

Greedy cover

Pick any uncovered edge, both its endpoints are added to the cover $C$ , until all edges are covered.

Runtime: $O(m)$

Claim: Greedy cover is correct, and it finds a vertex cover.

Proof:

Algorithm only terminates when all edges are covered.

Claim: Greedy cover is a 2-approximation algorithm.

Proof:

Look at the two edges we picked.

Either it is covered by Greedy cover, or it is not.

If it is not covered by Greedy cover, then we will add both endpoints to the cover.

In worst case, Greedy cover will add both endpoints of each edge to the cover. (Consider the graph with disjoint edges.)

Thus, the size of the vertex cover found by Greedy cover is at most twice the size of the optimal vertex cover.

Thus, Greedy cover is a 2-approximation algorithm.

Min-cut:

Given a graph $G$ and two vertices $s$ and $t$ , find the minimum cut between $s$ and $t$ .

Max-cut:

Given a graph $G$ , find the maximum cut.

Local cut

Algorithm:

start with an arbitrary cut of $G$ .
While you can move a vertex from one side to the other side while increasing the size of the cut, do so.
Return the cut found.

We will prove its:

Runtime
Feasibility
Approximation ratio

Runtime for local cut

Since size of cut is at most $|E|$ , the runtime is $O(m)$ .

When we move a vertex from one side to the other side, the size of the cut increases by at least 1.

Thus, the algorithm terminates in at most $|V|$ steps.

So the runtime is $O(|E||V|^2)$ .

Feasibility for local cut

The algorithm only terminates when no more vertices can be moved.

Thus, the cut found is a feasible solution.

Approximation ratio for local cut

This is a half-approximation algorithm.

We need to show that the size of the cut found is at least half of the size of the optimal cut.

We could first upper bound the size of the optimal cut is at most $|E|$ .

We will then prove that solution we found is at least half of the optimal cut $\frac{|E|}{2}$ for any graph $G$ .

Proof:

When we terminate, no vertex could be moved

Therefore, The number of crossing edges is at least the number of non-crossing edges.

Let $d(u)$ be the degree of vertex $u\in V$ .

The total number of crossing edges for vertex $u$ is at least $\frac{1}{2}d(u)$ .

Summing over all vertices, the total number of crossing edges is at least $\frac{1}{2}\sum_{u\in V}d(u)=\frac{1}{2}|E|$ .

So the total number of non-crossing edges is at most $\frac{|E|}{2}$ .

EOP

Set cover

Problem:

You are collecting a set of magic cards.

$X$ is the set of all possible cards. You want at least one of each card.

Each dealer $j$ has a pack $S_j\subseteq X$ of cards. You have to buy entire pack or none from dealer $j$ .

Goal: What is the least number of packs you need to buy to get all cards?

Formally:

Input $X$ is a universe of $n$ elements, and a collection of subsets of $X$ , $Y=\{S_1, S_2, \ldots, S_m\}\subseteq X$ .

Goal: Pick $C\subseteq Y$ such that $\bigcup_{S_i\in C}S_i=X$ , and $|C|$ is minimized.

Set cover is an NP-optimization problem. It is a generalization of the vertex cover problem.

Greedy set cover

Algorithm:

Start with empty set $C$ .
While there is an element $x$ in $X$ that is not covered, pick one such element $x\in S_i$ where $S_i$ is the set that has not been picked before.
Add $S_i$ to $C$ .
Return $C$ .

def greedy_set_cover(X, Y):
    # X is the set of elements
    # Y is the collection of sets, hashset by default
    C = []
    def non_covered_elements(X, C):
        # return the elements in X that are not covered by C
        # O(|X|)
        return [x for x in X if not any(x in c for c in C)]
    non_covered = non_covered_elements(X, C)
    # O(|X|) every loop reduce the size of non_covered by 1
    while non_covered:
        max_cover,max_set = 0,None
        # O(|Y|)
        for S in Y:
            # Intersection of two sets is O(min(|X|,|S|))
            cur_cover = len(set(non_covered) & set(S))
            if cur_cover > max_cover:
                max_cover,max_set = cur_cover,S
        C.append(max_set)
        non_covered = non_covered_elements(X, C)
    return C

It is not optimal.

Need to prove its:

Correctness:
Keep picking until all elements are covered.
Runtime:
$O(|X||Y|^2)$
Approximation ratio:

Approximation ratio for greedy set cover

Harmonic number:

$H_n=\sum_{i=1}^n\frac{1}{i}=\frac{1}{1}+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{n}=\Theta(\log n)$

We claim that the size of the set cover found is at most $H_n\log n$ times the size of the optimal set cover.

First bound:

Proof:

If the optimal picks $k$ sets, then the size of the set cover found is at most $(1+\log n)k$ sets.

Let $n=|X|$ .

Observe that

For the first round, the elements that we not covered is $n$ .

|U_0|=n

In the second round, the elements that we not covered is at most $|U_0|-x$ where $x=|S_1|$ is the number of elements in the set picked in the first round.

|U_1|=|U_0|-|S_1|

...

So $x_i\geq \frac{|U_{i-1}|}{k}$ .

We proceed by contradiction.

Suppose all sets in the optimal solution are $< \frac{|U_0|}{k}$ . Then the sum of the sizes of the sets in the optimal solution is $< |U_0|=n$ .

There exists a least ratio of selection of sets determined by $k_i$ . Otherwise the function (selecting the set cover) will not terminate (no such sets exists)

Some math magics: $(1-\frac{1}{k})^k\leq \frac{1}{e}$

So $n(1-\frac{1}{k})^{|C|-1}=1$ , $|C|\leq 1+k\ln n$ .

So the size of the set cover found is at most $(1+\ln n)k$ .

EOP

So the greedy set cover is not too bad...

Second bound:

Greedy set cover is a $H_d$ -approximation algorithm of set cover.

Proof:

Assign a cost to the elements of $X$ according to the decisions of the greedy set cover.

Let $\delta(S^i)$ be the new number of elements covered by set $S^i$ .

\delta(S^i)=|S_i\cap U_{i-1}|

If the element $x$ is added by step $i$ , when set $S_i$ is picked, then the cost of $x$ to

\frac{1}{\delta(S^i)}=\frac{1}{x_i}

Example:

\begin{aligned} X&=\{A,B,C,D,E,F,G\}\\ S_1&=\{A,C,E\}\\ S_2&=\{B,C,F,G\}\\ S_3&=\{B,D,F,G\}\\ S_4&=\{D,G\} \end{aligned}

First we select $S_2$ , then $cost(B)=cost(C)=cost(F)=cost(G)=\frac{1}{4}$ .

Then we select $S_1$ , then $cost(A)=cost(E)=\frac{1}{2}$ .

Then we select $S_3$ , then $cost(D)=1$ .

If element $x$ was covered by greedy set cover due to the addition of set $S^i$ at step $i$ , then the cost of $x$ is $\frac{1}{\delta(S^i)}$ .

\textup{Total cost of GSC}=\sum_{x\in X}c(x)=\sum_{i=1}^{|C|}\sum_{X\textup{ covered at iteration }i}c(x)=\sum_{i=1}^{|C|}\delta(S^i)\frac{1}{\delta(S^i)}=|C|

Claim: Consider any set $S$ that is a subset of $X$ . The cost paid by the greedy set cover for $S$ is at most $H_{|S|}$ .

Suppose that the greedy set covers $S$ in order $x_1,x_2,\ldots,x_{|S|}$ , where $\{x_1,x_2,\ldots,x_{|S|}\}=S$ .

When GSC covers $x_j$ , $\{x_j,x_{j+1},\ldots,x_{|S|}\}$ are not covered.

At this point, the GSC has the option of picking $S$

This implies that the $\delta(S)$ is at least $|S|-j+1$ .

Assume that $S$ is picked $\hat{S}$ for which $\delta(\hat{S})$ is maximized ( $\hat{S}$ may be $S$ or other sets that have not covered $x_j$ ).

So, $\delta(\hat{S})\geq \delta(S)\geq |S|-j+1$ .

So the cost of $x_j$ is $\delta(\hat{S})\leq \frac{1}{\delta(S)}\leq \frac{1}{|S|-j+1}$ .

Summing over all $j$ , the cost of $S$ is at most $\sum_{j=1}^{|S|}\frac{1}{|S|-j+1}=H_{|S|}$ .

Back to the proof of approximation ratio:

Let $C^*$ be optimal set cover.

|C|=\sum_{x\in X}c(x)\leq \sum_{S_j\in C^*}\sum_{x\in S_j}c(x)

This inequality holds because of counting element that is covered by more than one set.

Since $\sum_{x\in S_j}c(x)\leq H_{|S_j|}$ , by our claim.

Let $d$ be the largest cardinality of any set in $C^*$ .

|C|\leq \sum_{S_j\in C^*}H_{|S_j|}\leq \sum_{S_j\in C^*}H_d=H_d|C^*|

So the approximation ratio for greedy set cover is $H_d$ .

EOP

Cse347 L7 Cse347 L9