Friday, March 7, 2014

HashTable and HashMap key-value storage in the memory

It is another common question that interviewer asks if he is looking for a core programmer with good memory internals insight.


How HashTable and HashMap key-value are stored in the memory?


Explaination:



  • HashMap is actually an array of special objects that hold both Key and Value.
  • The array has some amount of buckets, say "256".
  • The hashing algorithm is provided by the hashCode() method that every object has.     
  • When we are writing a new Class, we should always take care of proper hashCode() and equals() methods implementation. The default one (of the Object class) takes the memory pointer as a number. But that's no good for most of the classes we would like to use. 


  • For example, the String class uses an algorithm that makes the hash from all the characters in the string - think of it like this: hash = 1.char + 2.char + 3.char...(simplified). Therefore, two equal Strings, even though they are on a different location in the memory, have the same hashCode().
    The result of hashCode(),  lets say "513", is then the number of bucket where the object should be stored if we had an array that big. But we don't, ours is only 256 slots long. So we use the classical calculation 'hashcode mod (array length)' or '513 mod 256 = 1' and store the Key-Value pair in a slot number 1.


    • If there's no such pair, it's ok.
    • If there is one (collision), we chain them together into a linked list
    • If the backing array becomes too full, so we would have to make long linked lists, we make a new array with doubled length, rehash all the elements and add them into the new array, then we dispose the old one. This is most likely the most expensive operation on HashMap, so you want to tell your Maps how many buckets you'll use.
    • If somebody tries to get a value, he provides a key, we hash it, mod it, and then go through the potential linked list for the exact match.

    An image, courtesy of Wikipedia:



     In the above case -
    • there is an array with 256 buckets (numbered, of course, 0-255)
    • there are five people. Their hashcodes, after being put through mod 256, point to four different slots in the array.
    • you can see Sandra Dee didn't have a free slot, so she has been chained after John Smith.
    Now, if you tried to look up a telephone number of Sandra Dee, you would hash her name, mod it by 256, and look into the bucket 152. There you'd find John Smith. That's no Sandra, look further ... aha, there's Sandra chained after John.

    Abstract Principals For Writing Unit Tests

    These abstract thoughts about writing clean tests, is the result of my "before and after" experiences with the Test classes.

    Must Follow Principles

    1. Boyscouting. Always leave the campground cleaner than when you arrived. If you have

    to do anything to a test, make it cleaner, easier to read, than when you started.

    2. KYK. (pronounced "kick") Keep Your Knowledge. This is an extended version of Boyscouting. If you have to dive deep enough into some code (unit-tests here, but this is true in general), to learn what it's doing -- don't throw that knowledge away! You will need it again. More importantly, someone else will need it. See the code as the next person will see it.

    3. Use good names. A specific implementation of KYK. Once you know something about a method (class, object, field...), rename it so that knowledge doesn't disappear.

    4. WTF's . the measure of code quality. How many times do you have to stop and (swear and) puzzle out what a test does?



    5. Good tests assert the results of a calculation. Bad tests (which usually have lots of mocks) test... that the code we wrote, is the code that we wrote. Most of the tests (if we are mocking something) set up an expect that essentially injected a link object to be returned... and then asserted that the returned object was the one that was injected! Yes, it tested a little bit of the logic... but mostly it tested that injection works. Which we already knew.

    6. DRY, DRY, DRY. Don't repeat yourself.

    7. Short is beautiful. All other things being equal, shorter tests are easier to read, to work with, to fix, and to write... than long tests. Make your test methods short. Then make them shorter than that.

    8. Comment if you must. Saving knowledge in a comment is better than letting it die. But even better is turning that comment into a method or field name. The best tests read like a series of english sentences.

    9. Increase coverage. Remember many of your tests may be bogus (no asserts, replay() without verify(), etc.) [believe me it happens when you are working in a large team]
    If you've been cleaning up a method, great! Now run a junit/emma "coverage as", and see what pieces of a method are not being covered. See if you can cover at least one such piece. (If you don't know how to measure coverage locally, install 'eclemma' in Eclipse. Or see me and I'll help you.)

    10. Once you've cleaned up the test, clean up the code.  Sounds backwards?
    Would you rather have obscure production code, or an obscure test? Give me a good, clean, test, and I can refactor the production code to make it clean. But if the test is hopeless, I can't do much of anything with the production code, except cross my fingers and pray.


    I will be explaining each point in more detail in my next post with some actual examples of  writing good test cases.

    Till then Comments/Suggestions are always welcome.

    Block, Static Block and Constructor Execution

    One of the most basic question in core java that we encounter in interviews is related to blocks/static blocks.

    I am sharing a problem that will clear the basic concepts on How Blocks and Static Blocks Work in Java ?

    Sample code displaying block, static block and constructor execution. 
    Yes, we can define blocks like -

    {
           System.out.println("Starting block....");
    }


    Note: It does not give any compile time error.

    As we can see in the screenshot the order of execution of blocks and constructor in Java is -

    1. Static block is the first block that gets loaded.
    2. Then any other blocks as defined
    3. Constructor is last to be loaded

    Tuesday, February 11, 2014

    Detailed examples for test case writing


    1.Fix the test names.  If you can only do one thing to improve a test, change the test method's name to say what it does.  Yes, it may be long sometimes -- but by this way you know exactly what it does.

    2."Coverage as" is your friend.  Continuing #1, if you're not sure what a test does, run a junit "coverage as" on that one test.  Now you can see from the green/pink coloring just what parts of the original class or method are getting executed.  That will give you at least a clue as to what the test is doing, and thus how it should be named.

    3.Don't confuse real objects and mock objects.  If you're mixing a bunch of "real" objects and EasyMock objects, make sure you can tell them apart.  Use names!  I like to name my EasyMock objects mockSomethingOrOther.  (And particularly where you're doing an 'expect' on a void method, you can't tell just by looking at the code what is real vs. what is mock).

    4.DRY: group common expects.  If you have several tests with identical expect's, pull them into a service method -- and give it a name that says what you're doing!  If they are almost identical, you can still probably pull them into a service method with an argument or two.

    5.Don't overuse Easymock Static mocks are still useful.  A "static mock" is just a regular (non-EasyMock) object that you build yourself, implementing the same interface as the 'real' object that you need as a dependency.  If you're writing lots of expects on an EasyMock object, or if you really need a mock to calculate something... then write your own mock.

    6.Order the phases of your tests.  If you're using EasyMock, your tests should look something like this:
    A) Set up your mocks
    B) expect's ...
    C) replay()
    D) execute the method(s) under test (and save the results)
    E) verify()    (Required if you have a replay()!)
    F) assert
    In particular, don't put your asserts between your replay() and verify().  There's a good chance you'll unintentionally invoke a mocked method. 

    7.Don't mix EasyMock.createControl() and EasyMock.createMock().  It’s one of the mistakes that I used to do many times. Sometimes we create most of their mocks from a common "ctrl" EasyMock object... and then stir in one or two explicit EasyMock.createMock's.  We almost always forget to replay() or verify() those extra mocks.  Don't do it!  You will forget one.  Use ctrl for all of your mocks.

    8.DRY: string constants  If you use a simple string constant in your mocks, expects, etc. more than once... pull it into a static, upper-case-named constant.  And make it meaningful!

    9.DRY: boolean constants.  Don't pass 'true' or 'false' into your test methods, expect's, etc.  It's a magic number. Use (or make) an upper-case static final constant.  A 'true' by itself means nothing.  Say what it means!

    10.@Ignore if you must.  If you must disable a test temporarily, use @Ignore -- don't comment out the @Test (or worse, the entire code of the test method).  And do add a comment about why the test is @Ignore'd!

    11.Get rid of warnings.  Use control-shift-O to clean up your imports.  Remove warnings wherever possible.  If not possible, let Eclipse tell you what annotation to use to ignore the warning.  Don't leave "intentional" warnings just lying around.  If the "false positive" warnings are removed, the ones that are remaining may be real bugs!

    12.Get rid of unused code.  If you don't need it to pass the test, delete it!  If you feel like you need it, then you haven't finished writing (rewriting) the test.

    13.Don't "swallow" exceptions!  If you must catch an exception (and usually you don't need to, just let the exception happen and Junit will fail and print the stack trace), then make sure to call fail() or otherwise make it clear whether the test succeeded or failed!

    14.EasyMock's IMocksControl is a good thing.  Use it, it will help make sure you don't "lose" any mocks that should be replay()'d.  However, if you find yourself using multiple IMocksControl's in the same test, you're probably making a mess.  Reorder the construction and use of your mocks so that you only need one IMocksControl.

    15.Treat your mocks as if they were final.  If you've defined a mock in an @Before method, don't go redefining it elsewhere.  That's sloppy C-style "everything is a global variable" coding.  If you need new mocks, create them, and make them local/temporary variables.  Otherwise, it becomes really hard to tell, just by reading the code, which expects match up with which versions of your mock.

    Just follow these and you will love writing tests.

    “Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best.” - Robert C. Martin