Geek Meal: Hash Tables

A hash table is a data structure that provides very fast insertion and searching. No matter how many data items are insertion and searching can take close to constant time O(1).
One important concept is how a range of key values is transformed into a range of array index values. In a hash table, this is accomplished using hash function.

Key can be anything:

If keys are index numbers

Simple! store them in an array A[key] location.

What if keys are strings?
In this case, we can try converting words into integers using ASCII numbers.
1. We can add the digit values?
For example: cats = c(=3) + a(=1) + t(=20) + s(=19) = 43.
Thus in the hash table, the word cat would be stored at 43rd index.

How well this would work?
Suppose we have 10 letters word zzzzzzzzzz, then the value it would obtain is 26*10 = 260 That means total range of word indexes is from 1 to 260. But we have a million words in the dictionary.

Hashing

We need to compress the huge range of numbers into a range that matches a reasonably sized array.
We can do it using mod function:

smallRangeIndex = largeRangeIndex % smallRangeMax;

This would compress any number into (0 - smallRangeMax) range.

Collision We pay the price of squeezing the large numbers into a small range. Now multiple numbers can point to one index in the smaller range. This is called collision.

We have different methods to resolve collisions:

Open Addressing

In open addressing, when a data item can't be placed at an index calculated by hash function (because the place is already occupied!) another location in array is sought.
There are three methods to explore the new location:

Linear Probing

When we have collision at ith index, increase the i by 1 and check (i+1)th location, if that is also occupied check at (i+2)th location and so on until you find an empty space.
Quadratic Probing

Instead of increasing i by 1 every time, increase in power of two:
First hit : increase by 1^2
Second hit : increase by 2^2
Third hit : increase by 3^3
and so on

Code for linear probing

int search(int key)
{
    // calculate the hash value of the key
    int hash = hashfunction(key);
    while(hashtable[hash] != NULL){
 // if we find a key at "hash" index, return the stored value
 if(hashtable[hash] == key)
     return hashtable[hash];
 
 // otherwise, increase the hash linearly 
 hash++;
    }
    
    // return -1 if we do not find any such key in the hashtable.
    return -1;
}

void insert(int key)
{
    // calculate hash value of the key
    int hash = hashfunction(key);
    
    // if the location is already occupied increase the hash
    while(hashtable[hash] != NULL) {
 hash++;
    }
    
    hashtable[hash] = key;
}

Geek Meal

Tuesday, February 19, 2013

Hash Tables

No comments:

Post a Comment