Lecture 9
Randomized Algorithms
Hashing
Hashing with chaining:
Input: We have integers in range . We want to map them to a hash table with slots.
Hash function:
Goal: Hashing a set , into such that the number of elements in each slot is at most .
Collisions
When multiple keys are mapped to the same slot, we call it a collision, we keep a linked list of all the keys that map to the same slot.
Runtime of insert, query, delete of elements
Worst-case runtime of insert, query, delete of elements
Therefore, we want chains to be short, or , as long as is reasonably sized, or equivalently, we want the number in any set to hash uniformly across all slots.
Simple Uniform Hashing Assumptions
The elements we want to hash (the set ) is picked uniformly at random from . Therefore, we could see that this simple hash function works fine:
Question: What happens if an adversary knows this function and designs to make the worst-case runtime happen?
Answer: The adversary can make the runtime of each operation by simply making all the elements hash to the same slot.
Randomization to the rescue
We don't want the adversary to know the hash function based on just looking at the code.
Idea: Randomize the choice of the hash function.
Randomized Algorithm
Definition
A randomized algorithm is an algorithm the algorithm makes internal random choices.
2 kinds of randomized algorithms:
- Las Vegas: The runtime is random, but the output is always correct.
- Monte Carlo: The runtime is fixed, but the output is sometimes incorrect.
We will focus on Las Vegas algorithms in this course.
or some other probabilistic quantity.
Randomization can help
Idea: Randomize the choice of hash function from a family of hash functions, .
If we randomly pick a hash function from this family, then the probability that the hash function is bad on any particular set is small.
Intuitively, the adversary can not pick a bad input since most hash functions are good for any particular input .
Universal Hashing: Goal
We want to design a universal family of hash functions, , such that the probability that the hash table behaves badly on any input is small.
Universal Hashing: Definition
Suppose we have buckets in the hash table. We also have inputs and . We want and to be unlikely to hash to the same bucket.
is a universal family of hash functions if for any two elements ,
where is picked uniformly at random from the family .
Universal Hashing: Analysis
Claim: If we choose randomly from a universal family of hash functions, , then the hash table will exhibit good behavior on any set of size with high probability.
Question: What are some good properties and what does it mean by with high probability?
Claim: Given a universal family of hash functions, , . For any probability , if , the chance that no two keys hash to the same slot is .
Example: If we pick . As long as , the chance that no two keys hash to the same slot is .
If we pick . As long as , the chance that no two keys hash to the same slot is .
Proof Strategy:
- Compute the expected value of collisions. Note that collisions occurs when two different values are hashed to the same slot. (Indicator random variables)
- Apply a "tail" bound that converts the expected value to probability. (Markov's inequality)
Compute the expected number of collisions
Let be the size of the hash table. is the number of keys in the set . is the size of the universe.
For inputs , we define a random variable
is called an indicator random variable, that takes value or .
The expected number of collisions is
Define : random variable that represents the cost of inserting/searching/deleting from the hash table.
total number of elements that collide with (= number of elements such that ).
So, .
By linearity of expectation,
if . Total cost of insert/search operations is . by linearity of expectation.
Say is the total number of collisions.
because each collision is counted twice.
If we want , then we need .
The probability of no collisions
We know that the expected value of number of collisions is now , but what about the probability of NO collisions?
Markov's inequality: For non-negative random variable , .
Use Markov's inequality: For non-negative random variable , .
Apply this to :
So, if we want , with probability , you will have no collisions.
More general conclusion
Claim: For a universal hash function family , if number of keys , then the probability that at most keys hash to the same slot is .
Example: Quicksort
Based on partitioning [assume all elements are distinct]: Partition()
- Rearranges into
Runtime: , linear time.
def partition(A,p,r):
x=A[r]
lo=p
for i in range(p,r):
if A[i]<x:
A[lo],A[i]=A[i],A[lo]
lo+=1
A[lo],A[r]=A[r],A[lo]
return lo
def quicksort(A,p,r):
if p<r:
q=partition(A,p,r)
quicksort(A,p,q-1)
quicksort(A,q+1,r)
Runtime analysis
Let the number of element in be .
By even split assumption, .
Which is approximately the same as merge sort.
Average case analysis is always suspicious.
Randomized Quicksort
- Pick a random pivot element.
- Analyze the expected runtime. over the random choices of pivot.
def randomized_partition(A,p,r):
ix=random.randint(p,r)
x=A[ix]
A[r],A[ix]=A[ix],A[r]
lo=p
for i in range(p,r):
if A[i]<x:
A[lo],A[i]=A[i],A[lo]
lo+=1
A[lo],A[r]=A[r],A[lo]
return lo
def randomized_quicksort(A,p,r):
if p<r:
q=randomized_partition(A,p,r)
randomized_quicksort(A,p,q-1)
randomized_quicksort(A,q+1,r)
by linearity of expectation.
So,
Claim: the solution to this recurrence is or .
Proof:
We prove by induction.
Base case:
Inductive step: Assume that for all .
Then,
Then we use the fact that (can be proved by induction).
We need to prove that .
Choose and such that for all .
If , then .
EOP
A more elegant proof:
Let be an indicator random variable that is if element of rank is compared to element of rank .
Running time:
So, the expected number of comparisons is
This is equivalent to the expected number of comparisons in randomized quicksort.
The expected number of running time is
For any two elements , the probability that is compared to is (either or is picked first as the pivot before the any elements of the ranks larger than and less than )
So, with harmonic number, ,
EOP