This week I saw a case where a poor choice of index may have actually improved performance…. temporarily.

An application with performance problems had been investigated and a new index had been implemented which reportedly improved reduced the relevant query’s execution time from 40 seconds to between 6-15 seconds.  However, there were still complaints about the variable length of the execution time, so I began to review the situation.

From V$SQL I could see hundreds of examples of the query that had run recently – after the index had been created.  The elapsed time for these took up to 103 seconds to complete.  The number of rows returned and physical/logical block gets also varied greatly.  I began to doubt the accuracy of the description of the new index’s impact, so I needed to verify that the new index was the best choice.

The queries were similar to this (altered for simplicity and confidentiality):

SELECT some_columns
FROM  dummy_table
AND to_char(TIME_STAMP, 'yyyy-mm-dd HH24:MI:SS') <= '2009-08-18 17:03:54'

Each query had a literal for the TIME_STAMP comparison.
The first thing I noticed was the common developer error of putting the function on the column instead of the constant.  It should have been rewritten as:

TIME_STAMP <= to_date('2009-08-18 17:03:54','yyyy-mm-dd HH24:MI:SS');

This would allow Oracle to only convert between char and date once on the constant, rather than on the column for each row processed.  Also, it would allow the TIME_STAMP value to be used to access an index, if a suitable one existed.

These were the relevant indexes:

                                                   DIST KEYS       LEAF BLKS LAST ANALYZED
INDEX_NAME                                         (SELCTY%)   BPK (DEPTH)   (ESTIMATE %)   COLUMN (BYTES ASC/DESC)
-------------------------------------------------- ---------- ---- --------- -------------- ---------------------------------------
DUMMY_TABLE_IDX1                                   34K (0)    #### 243K (3)  15/08/09 (1)   FOREIGN_KEY (22 ASC)

DUMMY_TABLE_IDX2                                   346K (1)    108 283K (3)  15/08/09 (1)   TIME_STAMP (7 ASC)

DUMMY_TABLE_IDX3                                   25M (48)      1 192K (3)  15/08/09 (7)   FOREIGN_KEY (22 ASC)
                                                                                            TIME_STAMP (7 ASC)

DUMMY_TABLE_PK (U)                                 51M (100)     1 274K (2)  15/08/09 (1)   PRIMARY_KEY (22 ASC)

DUMMY_TABLE_IDX3 was the newly created index that was supposed to improve performance.

Although the to_char function prevented the TIME_STAMP column from being used to access the index, it could still be used to filter rows before accessing the DUMMY_TABLE to get the rest of the columns.  The DUMMY_TABLE was big and could never reside completely in the cache, so avoiding unnecessary table accesses would boost performance.  But would filtering rows based on the TIME_STAMP achieve this aim?

I am always suspicious of date_column < :date_time_value because it is hardly ever very selective.  In this case I noticed that the date and time compared to the TIME_STAMP column was always within a second or two of the actual execution time.  As one would expect from the name, the TIME_STAMP column has only past times in it.

Select count(1) from dummy_table where time_stamp > sysdate;

…returned no rows.  So, the TIME_STAMP column is of no use in an index to improve the performance of this query.

So why was there an impression that the index was beneficial?  One possible answer is that the new index is compact (dense) because it hasn’t had much/any data deleted from it, where as the index that would have previously been used by the query (DUMMY_TABLE_IDX1) had become sparse after a lot of data was deleted.  The index information above shows that the old index is 26% larger than the new index (confirmed by DBA_SEGMENTS) despite the new index having an extra column!  At first the smaller index (DUMMY_TABLE_IDX3), with less empty blocks would have been more efficient than the old one (see this post about index performance).  But, after time passes and data is deleted, DUMMY_TABLE_IDX3 will become larger and perform worse because it stores an extra column (TIME_STAMP) which adds nothing for performance because it doesn’t filter any rows.

Simply coalescing the index would have provided more benefit than adding the new index, without the overhead.  Compressing it would have improved it even more.

An alternative to the above:
Replace DUMMY_TABLE_IDX3 with a compressed functional index on (FOREIGN_KEY, AMOUNT1 + AMOUNT2), because the criterion “AMOUNT1 + AMOUNT2 <> 0” does filter out a lot of rows for some of the queries, and so table accesses (sequential I/O) would be reduced.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s