Lecture 8
NP-optimization problem
Cannot be solved in polynomial time.
Example:
- Maximum independent set
- Minimum vertex cover
What can we do?
- solve small instances
- hard instances are rare - average case analysis
- solve special cases
- find an approximate solution
Approximation algorithms
We find a "good" solution in polynomial time, but may not be optimal.
Example:
- Minimum vertex cover: we will find a small vertex cover, but not necessarily the smallest one.
- Maximum independent set: we will find a large independent set, but not necessarily the largest one.
Question: How do we quantify the quality of the solution?
Approximation ratio
Intuition:
How good is an algorithm compared to an optimal solution in the worst case?
Definition:
Consider algorithm for an NP-optimization problem . Say for any instance , finds a solution output and the optimal solution is .
Approximation ratio is either:
for maximization problems, or
for minimization problems.
Example:
Alice's Algorithm, , finds a vertex cover of size for instance . The optimal vertex cover has size .
We want approximation ratio to be as close to 1 as possible.
Vertex cover:
A vertex cover is a set of vertices that touches all edges.
Let's try an approximation algorithm for the vertex cover problem, called Greedy cover.
Greedy cover
Pick any uncovered edge, both its endpoints are added to the cover , until all edges are covered.
Runtime:
Claim: Greedy cover is correct, and it finds a vertex cover.
Proof:
Algorithm only terminates when all edges are covered.
Claim: Greedy cover is a 2-approximation algorithm.
Proof:
Look at the two edges we picked.
Either it is covered by Greedy cover, or it is not.
If it is not covered by Greedy cover, then we will add both endpoints to the cover.
In worst case, Greedy cover will add both endpoints of each edge to the cover. (Consider the graph with disjoint edges.)
Thus, the size of the vertex cover found by Greedy cover is at most twice the size of the optimal vertex cover.
Thus, Greedy cover is a 2-approximation algorithm.
Min-cut:
Given a graph and two vertices and , find the minimum cut between and .
Max-cut:
Given a graph , find the maximum cut.
Local cut
Algorithm:
- start with an arbitrary cut of .
- While you can move a vertex from one side to the other side while increasing the size of the cut, do so.
- Return the cut found.
We will prove its:
- Runtime
- Feasibility
- Approximation ratio
Runtime for local cut
Since size of cut is at most , the runtime is .
When we move a vertex from one side to the other side, the size of the cut increases by at least 1.
Thus, the algorithm terminates in at most steps.
So the runtime is .
Feasibility for local cut
The algorithm only terminates when no more vertices can be moved.
Thus, the cut found is a feasible solution.
Approximation ratio for local cut
This is a half-approximation algorithm.
We need to show that the size of the cut found is at least half of the size of the optimal cut.
We could first upper bound the size of the optimal cut is at most .
We will then prove that solution we found is at least half of the optimal cut for any graph .
Proof:
When we terminate, no vertex could be moved
Therefore, The number of crossing edges is at least the number of non-crossing edges.
Let be the degree of vertex .
The total number of crossing edges for vertex is at least .
Summing over all vertices, the total number of crossing edges is at least .
So the total number of non-crossing edges is at most .
EOP
Set cover
Problem:
You are collecting a set of magic cards.
is the set of all possible cards. You want at least one of each card.
Each dealer has a pack of cards. You have to buy entire pack or none from dealer .
Goal: What is the least number of packs you need to buy to get all cards?
Formally:
Input is a universe of elements, and a collection of subsets of , .
Goal: Pick such that , and is minimized.
Set cover is an NP-optimization problem. It is a generalization of the vertex cover problem.
Greedy set cover
Algorithm:
- Start with empty set .
- While there is an element in that is not covered, pick one such element where is the set that has not been picked before.
- Add to .
- Return .
def greedy_set_cover(X, Y):
# X is the set of elements
# Y is the collection of sets, hashset by default
C = []
def non_covered_elements(X, C):
# return the elements in X that are not covered by C
# O(|X|)
return [x for x in X if not any(x in c for c in C)]
non_covered = non_covered_elements(X, C)
# O(|X|) every loop reduce the size of non_covered by 1
while non_covered:
max_cover,max_set = 0,None
# O(|Y|)
for S in Y:
# Intersection of two sets is O(min(|X|,|S|))
cur_cover = len(set(non_covered) & set(S))
if cur_cover > max_cover:
max_cover,max_set = cur_cover,S
C.append(max_set)
non_covered = non_covered_elements(X, C)
return C
It is not optimal.
Need to prove its:
- Correctness:
Keep picking until all elements are covered. - Runtime:
- Approximation ratio:
Approximation ratio for greedy set cover
Harmonic number:
We claim that the size of the set cover found is at most times the size of the optimal set cover.
First bound:
Proof:
If the optimal picks sets, then the size of the set cover found is at most sets.
Let .
Observe that
For the first round, the elements that we not covered is .
In the second round, the elements that we not covered is at most where is the number of elements in the set picked in the first round.
...
So .
We proceed by contradiction.
Suppose all sets in the optimal solution are . Then the sum of the sizes of the sets in the optimal solution is .
There exists a least ratio of selection of sets determined by . Otherwise the function (selecting the set cover) will not terminate (no such sets exists)
Some math magics:
So , .
So the size of the set cover found is at most .
EOP
So the greedy set cover is not too bad...
Second bound:
Greedy set cover is a -approximation algorithm of set cover.
Proof:
Assign a cost to the elements of according to the decisions of the greedy set cover.
Let be the new number of elements covered by set .
If the element is added by step , when set is picked, then the cost of to
Example:
First we select , then .
Then we select , then .
Then we select , then .
If element was covered by greedy set cover due to the addition of set at step , then the cost of is .
Claim: Consider any set that is a subset of . The cost paid by the greedy set cover for is at most .
Suppose that the greedy set covers in order , where .
When GSC covers , are not covered.
At this point, the GSC has the option of picking
This implies that the is at least .
Assume that is picked for which is maximized ( may be or other sets that have not covered ).
So, .
So the cost of is .
Summing over all , the cost of is at most .
Back to the proof of approximation ratio:
Let be optimal set cover.
This inequality holds because of counting element that is covered by more than one set.
Since , by our claim.
Let be the largest cardinality of any set in .
So the approximation ratio for greedy set cover is .
EOP