When I first saw 10g’s time model views I wasn’t impressed.  The components overlapped, so they didn’t add up to 100% and there weren’t many categories.  I’d been using selected wait events + CPU statistics to approximate a breakdown of foreground response time ever since I’d first seen Steve Adams’ IXORA site.  I didn’t see anything extra offered by the time model views….. but I was wrong.

I’ll use a Statspack report, that was sent to me for analysis, to demonstrate how the 10g time model statistics can be useful.

Here are extracts from a statspack report for an 11.1.0.7 database on a dual CPU AIX server over a one hour period:

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
log file sync                                  230,767         868      4   49.6
log file parallel write                        227,285         577      3   33.0
CPU time                                                       198          11.3
db file parallel write                          17,889          26      1    1.5
SQL*Net more data to client                    203,568          22      0    1.2
          -------------------------------------------------------------

Looking at the top wait events first gives the impression that the database wasn’t very busy, and that commits and redo log I/O are the biggest targets for tuning.  However, if we then consider the DB Time and DB CPU metrics, a different picture starts to form:

   Elapsed:      60.00 (mins) Av Act Sess:       0.9
   DB time:      56.54 (mins)      DB CPU:      22.40 (mins)
Time Model System Stats
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % DB time
----------------------------------- -------------------- ---------
DB time                                          3,392.7
DB CPU                                           1,344.1      39.6
sql execute elapsed time                           862.5      25.4
parse time elapsed                                 100.0       2.9
PL/SQL execution elapsed time                       10.2        .3
connection management call elapsed                   6.2        .2
hard parse elapsed time                              3.3        .1
hard parse (sharing criteria) elaps                  0.3        .0
hard parse (bind mismatch) elapsed                   0.2        .0
sequence load elapsed time                           0.1        .0
repeated bind elapsed time                           0.0        .0
background elapsed time                            666.4
background cpu time                                 20.1
-------------------------------------------------------------

We can now see that the database was much busier than the top waits section indicated.  Remember DB Time = DB CPU + I/O time + other waits (foreground), so if all the foreground I/O and wait events were timed and captured correctly, they should add up to 2,000 seconds.  It is obvious from the top timed events section that about 1,000 seconds of I/O or other wait time is missing.

Should we just follow the usual step of jumping to the top SQL sections of the report and start investigating how to tune each of them?  In this case, the report goes on to show that the SQL captured by Statspack for the hour covers 100% of SQL executions and 97.3% of reads from the database buffer cache but only:

  • 6.4% of Total DB CPU  (foreground database sessions)
  • 3.1% of Total Disk Reads
  • 27.5% of Total DB Time  (foreground database sessions)

So how useful are the top SQL sections of the report?

I still want to know what the database was doing while it was busy, so that I can characterise the database and get a feel for where the opportunities for tuning are.

Browsing through the rest of the report, I notice a fair amount of direct I/O, mostly LOB related.

Statistic                                      Total     per Second    per Trans
--------------------------------- ------------------ -------------- ------------
physical reads                                53,787           14.9          0.2
physical reads direct                         52,626           14.6          0.2
physical reads direct (lob)                   47,504           13.2          0.2
physical writes                               61,375           17.1          0.3
physical writes direct                         8,773            2.4          0.0
physical writes direct (lob)                   8,773            2.4          0.0

Direct I/O is performed by the foreground processes, but the wait event times for direct I/O are not accurate.  (Inaccuracies vary depending on many factors, version/patch level, OS, mount options etc).  Also, LOB APIs are not associated with a user cursor (eg top SQL), so the time for some of the LOB work (and data transmission) can be hard to measure/identify.

Related notes for those interested:
268476.1 – LOB Performance Guideline – (see “CPU time and Elapsed time – not reported accurately”)
BUG 3504487 – DBMS_LOB/OCILob* CALL RESOURCE USAGE IS NOT REPORTED, AS THEY ARE NOT PART OF A CURSOR
223117.1 – Tuning I/O-related waits
50415.1 – WAITEVENT: “direct path read” Reference Note
50416.1 – WAITEVENT: “direct path write” Reference Note

Note 223117.1 says: Due to the way in which time for these waits is recorded (it does not measure the time taken to perform the I/O), their relative position in listings such as Statspack’s “Top 5 Wait/Timed Events” cannot be used to evaluate their true impact.

The DB Time metric does include the time the database sessions were performing direct I/O, so it becomes a useful sanity check when reporting on the break down of the wait events.  I like to compare (DB Time – DB CPU) to the total of the non-idle foreground wait events, to give a percentage of missing event time.  (About 30% in this case).

So…. if I assume that all the missing time (1,000 seconds) is related to direct I/O, then I can estimate that DB Time is made up of about 60% I/O and 40% CPU.

In this particular case, the DBAs supporting the database (and that sent our team the Statspack report), discovered that the database was not using direct or concurrent I/O for the JFS2 filesystem.  After testing with direct I/O and a larger database block cache, my next target for tuning (if required) would be to investigate the LOBs to see if any were suitable to be cached.  (Reduce direct I/O reads, and perform writes in the background).

Final Notes on DB Time

The first thing I do when logging into a UNIX or Linux database server is check the load.  I type “w” by habit, and compare it to my memory of usual loads for the system.  I find it a very good starting point for getting a ‘feel’ of how burdened the system is.

DB Time allows us to calculate a very simple but useful metric equivalent to load: average active sessions, calculated from DB Time / clock time.  We may write more about how we use this metric to support our customers in a future post.

While the DBA is mostly concerned with what the database is doing while it is busy, sometimes we can and should expand the scope wider than the information provided by DB Time and the time model statistics.  For example, the Statspack report example used for this post offered more useful information about the performance of the database server and the application as a whole.  (See this post).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s