Thursday, March 18, 2010

ADVANTAGES AND DISADVANTAGES OF HASHING

Advantages

The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large (thousands or more). Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.

If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not allowed), one may reduce the average lookup cost by a careful choice of the hash function, bucket table size, and internal data structures. In particular, one may be able to devise a hash function that is collision-free, or even perfect (see below). In this case the keys need not be stored in the table .

Disadvantages

Hash tables can be more difficult to implement than self-balancing binary search trees. Choosing an effective hash function for a specific application is more an art than a science. In open-addressed hash tables it is fairly easy to create a poor hash function.

Although operations on a hash table take constant time on average, the cost of a good hash function can be significantly higher than the inner loop of the lookup algorithm for a sequential list or search tree. Thus hash tables are not effective when the number of entries is very small. (However, in some cases the high cost of computing the hash function can be mitigated by saving the hash value together with the key.)

For certain string processing applications, such as spell-checking, hash tables may be less efficient than tries, finite automata, or Judy arrays. Also, if each key is represented by a small enough number of bits, then, instead of a hash table, one may use the key directly as the index into an array of values. Note that there are no collisions in this case.

The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order. Therefore, there is no efficient way to efficiently locate an entry whose key is nearest to a given key. Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered search trees have lookup and insertion cost proportional to log(n), but allow finding the nearest key at about the same cost, and ordered enumeration of all entries at constant cost per entry.

If the keys are not stored (because the hash function is collision-free), there may be no easy way to enumerate the keys that are present in the table at any given moment.

Although the average cost per operation is constant and fairly small, the cost of a single operation may be quite high. In particular, if the hash table uses dynamic resizing, an insertion or deletion operation may occasionally take time proportional to the number of entries. This may be a serious drawback in real-time or interactive applications.

Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear search, may be faster if the table is relatively small and keys are integers or other short strings. According to Moore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system.

Hash tables become quite inefficient when there are many collisions. While extremely uneven hash distributions are extremely unlikely to arise by chance, a malicious adversary with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance (i.e., a denial of service attack). In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable

10 comments:

  1. Thanks for listing the advantages and disadvantages of hashing. This post gave me a helpful guidance about this security concept. I will visit again.
    e signatures

    ReplyDelete
  2. This was really easy to read and understand!

    ReplyDelete
  3. http://en.wikipedia.org/wiki/Hash_table#Advantages

    ReplyDelete
  4. copy and paste from wikipedia haha........!!!!!!!
    https://en.wikipedia.org/wiki/Hash_table

    ReplyDelete
  5. i dont like your wallpaper
    why no book titles!
    WHAT ARE YOU READING!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    ReplyDelete
    Replies
    1. IKR WTF is with this frcking WALLPAPER!!!!!!!!!!!!!!!!!!!!!!!!!
      I WILL HUNT YOU DOWN, MARK MY WORDS
      YOU ARE A DEADMAN

      Delete
  6. Hi, I have a basic doubt regarding why do we use Hash Tables at all. As we know, that implementing hash tables costs us the overhead of hash function. However, if we use an array of structures to save our dictionary pair we will not have to waste CPU cycles for calculating the hash function. In an array of structures, the key(in the dictionary pair) shall serve as the index while performing a search. For e.g., I have to store the following data set:
    (1, ABC) (2, CDE) (3, EFG) (4, GHI)
    and if my structure is called "info" then we can perform a search in O(1) time using "info[1].data". We will also have the advantage of never having a collision.
    So my question is why do we use Hash Tables at all?

    ReplyDelete
  7. Thank u so mach. .......
    for this topic

    ReplyDelete