Friday, March 7, 2014

HashTable and HashMap key-value storage in the memory

It is another common question that interviewer asks if he is looking for a core programmer with good memory internals insight.


How HashTable and HashMap key-value are stored in the memory?


Explaination:



  • HashMap is actually an array of special objects that hold both Key and Value.
  • The array has some amount of buckets, say "256".
  • The hashing algorithm is provided by the hashCode() method that every object has.     
  • When we are writing a new Class, we should always take care of proper hashCode() and equals() methods implementation. The default one (of the Object class) takes the memory pointer as a number. But that's no good for most of the classes we would like to use. 


  • For example, the String class uses an algorithm that makes the hash from all the characters in the string - think of it like this: hash = 1.char + 2.char + 3.char...(simplified). Therefore, two equal Strings, even though they are on a different location in the memory, have the same hashCode().
    The result of hashCode(),  lets say "513", is then the number of bucket where the object should be stored if we had an array that big. But we don't, ours is only 256 slots long. So we use the classical calculation 'hashcode mod (array length)' or '513 mod 256 = 1' and store the Key-Value pair in a slot number 1.


    • If there's no such pair, it's ok.
    • If there is one (collision), we chain them together into a linked list
    • If the backing array becomes too full, so we would have to make long linked lists, we make a new array with doubled length, rehash all the elements and add them into the new array, then we dispose the old one. This is most likely the most expensive operation on HashMap, so you want to tell your Maps how many buckets you'll use.
    • If somebody tries to get a value, he provides a key, we hash it, mod it, and then go through the potential linked list for the exact match.

    An image, courtesy of Wikipedia:



     In the above case -
    • there is an array with 256 buckets (numbered, of course, 0-255)
    • there are five people. Their hashcodes, after being put through mod 256, point to four different slots in the array.
    • you can see Sandra Dee didn't have a free slot, so she has been chained after John Smith.
    Now, if you tried to look up a telephone number of Sandra Dee, you would hash her name, mod it by 256, and look into the bucket 152. There you'd find John Smith. That's no Sandra, look further ... aha, there's Sandra chained after John.

    No comments:

    Post a Comment