Monday, June 27, 2011

Nested Loop vs Hash Match

Last week I was called to help a team facing performance issue in one of the production database instance. This was the statement
Our application performance has good overall past 2 weeks, however today we notice that this one single SP is taking 50 secs to execute when the same took 2 secs yesterday .


The issue was with single stored procedure. The first thing to check was if there is any blocking in the database instance. No, nothing!!!. I couldn’t find any blocked processes in the instance.


Since the issue was with single stored procedure I started to perform the steps which every DBA will do.Took the list of tables on which this stored procedure is dependent on and checked their fragmentation details. No the tables were not fragmented.


OK then maybe there would be issue with statistics of these tables. Probably the statistics of tables were not updated. But when I checked the statistics were fine.


At this point I decided that a trace needs to be started to capture the execution plan of the queries. On  analyzing the execution plan I found that a particular select statement doing join on huge tables was using Nested Loop join. I remember the lines from a book written by Grant Fritchey that nested loop join are appropriate for small tables and for large tables a hash match join would be a better join.



Whoop is this the problem. Is optimizer creating a wrong plan? If so why. Do I need to give join hints here and see if optimizer generates a better plan? I was thinking about these all things then suddenly a though flashed across my mind. Why don’t flush the plan cache and see. I use the command
 DBCC FLUSHPROCINDB to flush all the procedure cache for the particular database .



Once the plan cache was flushed the stored procedure was behaving as expected. It was responding fast. Now I decided to capture trace with execution plan once again  to check if the nested loop join operator was changed or not. As expected the query was using hash match join and not nested loop.

Probably updating the statistics would have resolved the issue but the tables were huge and updating the statistics would have taken more than 20 minutes. I could have tagged the procedure cache to be recompiled by using sp_recompile stored procedure but at that time flushing the procedure cache looked a better option. The changed execution plan is given below.

   
Nested loop join works better if the tables joined are small.As long as you have small dataset being joined this is the best join that you would expect to see.

Hash Match joins are often very effecient on large datasets specially if one table is much smaller than the other.It works well  on tables which are not sorted on columns used in join condition.Be aware that sometimes it may also be an indication of a missing where clause or an of an index.

The KB article has the details on how the three join operators normally work
http://support.microsoft.com/kb/197297