A hash table is a data structure that provides very fast insertion and searching. No matter how many data items are insertion and searching can take close to constant time O(1).
One important concept is how a range of key values is transformed into a range of array index values. In a hash table, this is accomplished using hash function.
Key can be anything:
If keys are index numbers
In this case, we can try converting words into integers using ASCII numbers.
1. We can add the digit values?
For example: cats = c(=3) + a(=1) + t(=20) + s(=19) = 43.
Thus in the hash table, the word cat would be stored at 43rd index.
How well this would work?
Suppose we have 10 letters word zzzzzzzzzz, then the value it would obtain is 26*10 = 260 That means total range of word indexes is from 1 to 260. But we have a million words in the dictionary.
Hashing
We need to compress the huge range of numbers into a range that matches a reasonably sized array.
We can do it using mod function:
This would compress any number into (0 -
Collision We pay the price of squeezing the large numbers into a small range. Now multiple numbers can point to one index in the smaller range. This is called collision.
We have different methods to resolve collisions:
Open Addressing
Code for linear probing
One important concept is how a range of key values is transformed into a range of array index values. In a hash table, this is accomplished using hash function.
Key can be anything:
If keys are index numbers
Simple! store them in an array A[key] location.
What if keys are strings?In this case, we can try converting words into integers using ASCII numbers.
1. We can add the digit values?
For example: cats = c(=3) + a(=1) + t(=20) + s(=19) = 43.
Thus in the hash table, the word cat would be stored at 43rd index.
How well this would work?
Suppose we have 10 letters word zzzzzzzzzz, then the value it would obtain is 26*10 = 260 That means total range of word indexes is from 1 to 260. But we have a million words in the dictionary.
Hashing
We need to compress the huge range of numbers into a range that matches a reasonably sized array.
We can do it using mod function:
smallRangeIndex = largeRangeIndex % smallRangeMax;
This would compress any number into (0 -
smallRangeMax
) range.
Collision We pay the price of squeezing the large numbers into a small range. Now multiple numbers can point to one index in the smaller range. This is called collision.
We have different methods to resolve collisions:
Open Addressing
In open addressing, when a data item can't be placed at an index calculated by hash function (because the place is already occupied!) another location in array is sought.
There are three methods to explore the new location:
-
Linear Probing
When we have collision at ith index, increase the i by 1 and check (i+1)th location, if that is also occupied check at (i+2)th location and so on until you find an empty space.
-
Quadratic Probing
Instead of increasing i by 1 every time, increase in power of two:
First hit : increase by 1^2
Second hit : increase by 2^2
Third hit : increase by 3^3
and so on
Code for linear probing
int search(int key) { // calculate the hash value of the key int hash = hashfunction(key); while(hashtable[hash] != NULL){ // if we find a key at "hash" index, return the stored value if(hashtable[hash] == key) return hashtable[hash]; // otherwise, increase the hash linearly hash++; } // return -1 if we do not find any such key in the hashtable. return -1; } void insert(int key) { // calculate hash value of the key int hash = hashfunction(key); // if the location is already occupied increase the hash while(hashtable[hash] != NULL) { hash++; } hashtable[hash] = key; }
No comments:
Post a Comment