PROGRAMMING BLOG BY RAJA GHOSH: ADVANTAGES AND DISADVANTAGES OF HASHING

Thursday, March 18, 2010

ADVANTAGES AND DISADVANTAGES OF HASHING

Advantages

The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large (thousands or more). Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.

If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not allowed), one may reduce the average lookup cost by a careful choice of the hash function, bucket table size, and internal data structures. In particular, one may be able to devise a hash function that is collision-free, or even perfect (see below). In this case the keys need not be stored in the table .

Disadvantages

Hash tables can be more difficult to implement than self-balancing binary search trees. Choosing an effective hash function for a specific application is more an art than a science. In open-addressed hash tables it is fairly easy to create a poor hash function.

Although operations on a hash table take constant time on average, the cost of a good hash function can be significantly higher than the inner loop of the lookup algorithm for a sequential list or search tree. Thus hash tables are not effective when the number of entries is very small. (However, in some cases the high cost of computing the hash function can be mitigated by saving the hash value together with the key.)

For certain string processing applications, such as spell-checking, hash tables may be less efficient than tries, finite automata, or Judy arrays. Also, if each key is represented by a small enough number of bits, then, instead of a hash table, one may use the key directly as the index into an array of values. Note that there are no collisions in this case.

The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order. Therefore, there is no efficient way to efficiently locate an entry whose key is nearest to a given key. Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered search trees have lookup and insertion cost proportional to log(n), but allow finding the nearest key at about the same cost, and ordered enumeration of all entries at constant cost per entry.

If the keys are not stored (because the hash function is collision-free), there may be no easy way to enumerate the keys that are present in the table at any given moment.

Although the average cost per operation is constant and fairly small, the cost of a single operation may be quite high. In particular, if the hash table uses dynamic resizing, an insertion or deletion operation may occasionally take time proportional to the number of entries. This may be a serious drawback in real-time or interactive applications.

Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear search, may be faster if the table is relatively small and keys are integers or other short strings. According to Moore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system.

Hash tables become quite inefficient when there are many collisions. While extremely uneven hash distributions are extremely unlikely to arise by chance, a malicious adversary with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance (i.e., a denial of service attack). In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable

15 comments:

UnknownDecember 10, 2013 at 6:55 PM
Thanks for listing the advantages and disadvantages of hashing. This post gave me a helpful guidance about this security concept. I will visit again.
e signatures
ReplyDelete
Replies
UnknownFebruary 17, 2014 at 3:08 PM
This was really easy to read and understand!
ReplyDelete
Replies
brataSeptember 12, 2014 at 7:09 AM
Thanks Wikipedia!
ReplyDelete
Replies
UnknownJune 6, 2015 at 6:45 PM
http://en.wikipedia.org/wiki/Hash_table#Advantages
ReplyDelete
Replies
UnknownSeptember 16, 2015 at 7:32 PM
copy and paste from wikipedia haha........!!!!!!!
https://en.wikipedia.org/wiki/Hash_table
ReplyDelete
Replies
UnknownOctober 26, 2015 at 2:04 PM
THANKS WIKI.....
ReplyDelete
Replies
AnonymousNovember 16, 2015 at 2:34 PM
i dont like your wallpaper
why no book titles!
WHAT ARE YOU READING!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ReplyDelete
Replies
UnknownNovember 21, 2015 at 8:09 PM
Hi, I have a basic doubt regarding why do we use Hash Tables at all. As we know, that implementing hash tables costs us the overhead of hash function. However, if we use an array of structures to save our dictionary pair we will not have to waste CPU cycles for calculating the hash function. In an array of structures, the key(in the dictionary pair) shall serve as the index while performing a search. For e.g., I have to store the following data set:
(1, ABC) (2, CDE) (3, EFG) (4, GHI)
and if my structure is called "info" then we can perform a search in O(1) time using "info[1].data". We will also have the advantage of never having a collision.
So my question is why do we use Hash Tables at all?
ReplyDelete
Replies
Helth Related TalkDecember 16, 2016 at 6:05 PM
Thank u so mach. .......
for this topic
ReplyDelete
Replies
UnknownJuly 22, 2019 at 10:42 PM
Thanks a lot
ReplyDelete
Replies
GraceSeptember 26, 2019 at 3:44 PM
What are Data Lake Solutions? How does it work for Business Strategies

STEP BY STEP GUIDE FOR DATA LAKE SOLUTION AND ARCHITECTURE

Modern Strategies & Approaches of Data Lake Solutions

Data Lake vs Data Warehouse Things you need to know

Can Data Lakes be a Game Changer for Businesses?

7 Tips to adopt Data Lake Strategies

Are you confused as to whether you should adopt a Data Lake: Here are the 5 Best Practices that actually Work

6 Reasons to consider while choosing a Data Lake Strategy

3 Best Strategies to get the most out of Data Lake Solutions
ReplyDelete
Replies
UnknownFebruary 13, 2021 at 12:35 AM
Understand hashing in data structure in details here
https://quescol.com/data-structure/hashing-in-data-structure
ReplyDelete
Replies
UnknownFebruary 13, 2021 at 12:36 AM
Understand hashing in data structure in details here
ReplyDelete
Replies
UnknownFebruary 13, 2021 at 12:39 AM
Understand Best tutorial on internet Collision in data structure

Understand Best tutorial on internet Collision Resolution technique in data structure

Understand Best tutorial on internet Linear probing Collision Resolution technique in data structure
ReplyDelete
Replies

Add comment