“cannot allocate new log”, “checkpoint not complete”

“log file switch (checkpoint incomplete)”

The event “log file switch (checkpoint incomplete)” is a foreground wait event; it delays one or more database sessions.

More and/or larger redo log groups may improve performance when you see these messages/waits.  If you can’t or you don’t want to grow your redo log estate, then learning more about checkpoints and redo population might provide an alternative solution.

In this post I’ll briefly cover the basics, then introduce some rare information about checkpoints.

Checkpoint Basics

Oracle has evolved to make the DBWR lazier.  Delaying writing dirty blocks to disk can be more efficient because

  1. the same blocks may be changed more than once before being written, and
  2. the DBWR will sort blocks so that it can write them in order.  (The more blocks there are to be written, the greater the chance of finding contiguous blocks).

However, redo for dirty blocks can’t be overwritten.  If the redo logs fill up (all active), then changes in the database stop until all of the dirty blocks corresponding to the oldest redo log have been written to disk.

Checkpoint tuning is about finding the best rate of activity for the DBWR by calibrating thresholds for CKPT triggers.

Threshold / Trigger
Type Threshold Metric
V$Instance_Recovery Counter
Default Std Edition
log_checkpoint_timeout Incremental Temporal WRITES_LOG_CHECKPOINT_SETTINGS 1800 Yes
log_checkpoint_interval Incremental Volume WRITES_LOG_CHECKPOINT_SETTINGS unset Yes
fast_start_mttr_target Incremental Volume WRITES_MTTR unset No
Self / auto tune
Incremental Temporal WRITES_AUTOTUNE 3%
No (?)
Redo logsize
Incremental Volume WRITES_LOGFILE_SIZE 81 (11.2)
90 (10.2)
Redo thread full / active Thread Volume WRITES_FULL_THREAD_CKPT n/a Yes

Temporal thresholds smooth total the peaks of write activity (DBWR spikes lag behind the LGWR spikes) but can’t handle sudden bursts of redo generation that fill all of the logs in a short amount of time.

Volume thresholds can be quicker to scale up DBWR activity in reaction to bursts of redo generation, but may increase I/O peaks and total writes (where the same data blocks are modified repeatedly).

Rare Information or Rash Assertion

While there is a lot written about this topic, I found that even Oracle’s documentation didn’t match what I have observed.  Here are my claims that are contradicted in Oracle documentation, forums and blogs:

  1. Thread checkpoints are not triggered by log switches, despite what the Oracle 11.2 manual says.
  2. Self-tune checkpoints lag behind the redo log tail by five minutes and writes are restricted to 3% of physical I/O.
  3. Checkpoints are triggered when they lag behind the redo log tail by 81% of [ the sum of the redo logs except the largest one ].  (90% in 10.2).

Log Switch Checkpoints

Most of Oracle’s documentation, (at the time of writing this), is very outdated, or at best misleading.

However, John Watson’s book “OCA Oracle Database 11g: Administration I Exam Guide” says:

There is no checkpoint following a log switch. This has been the case since release 8i, though to this day many DBAs do not realize this.

If one enables checkpoint beginning/completed messages in the alert log, then it will look like log switch checkpoints are occurring:

Beginning log switch checkpoint up to RBA ...

However, when testing in an 11.2 database with no other users, although I saw the message stating that a log switch checkpoint had started, nothing further happened.  The DBWR was not progressing the checkpoint by writing dirty blocks to disk.  Checkpoint activity only occurred when one of the other thresholds was reached.  (See table above).

My theory is that the “log switch checkpoints” are no longer real thread checkpoints, but just RBA markers.  They do not complete without something else triggering a real checkpoint.  When an incremental checkpoint takes the checkpoint position in the control file up to the log switch marker, then the checkpoint position is written to the datafile headers and a message is posted to the alert log:

Completed checkpoint up to RBA ....

This is part of the evolution of the DBWR, becoming lazier (more efficient), but it relies on incremental checkpoints.  If incremental checkpoint thresholds aren’t being reached for some reason, then the only thread checkpoints will happen when all of the redo logs are full or active, resulting in “log file switch (checkpoint incomplete)” delays and the checkpoint position in the datafiles lagging more than expected.

Self / Auto Tune Checkpoints

These incremental checkpoints are enabled when fast_start_mttr_target is unset or set to 1 or more.  If the checkpoint position lags behind the redo log tail by five minutes, then incremental checkpoints will start, subject to some vague load criteria.  All I can add is that the parameters below seem to be related:

Parameter                            Default Description
_disable_selftune_checkpointing      FALSE    Disable self-tune checkpointing
_selftune_checkpoint_write_pct       3        Percentage of total physical i/os for self-tune ckpt
_selftune_checkpointing_lag          300      Self-tune checkpointing lag the tail of the redo log

I’ve found that self-tune checkpoints don’t happen in Oracle 11.2 Express Edition.  I haven’t tested in Standard Edition yet, but I guess that they won’t happen either.

Log Size Checkpoints

One of the challenges with having worked with Oracle databases for so many years is the amount of “unlearning” that is necessary with each new version.  It doesn’t help that Oracle’s staff face this challenge too, when maintaining documentation.

Oracle’s documentation (eg 11.2 Reference manual and MOS Note 147468.1 Checkpoint Tuning and Troubleshooting Guide), claim that checkpoints are influenced by the size of the smallest redo log:

A checkpoint occurs at every log switch.  If a previous checkpoint is already in progress, the checkpoint forced by the log switch will override the current checkpoint. This necessitates well-sized redo logs to avoid unnecessary checkpoints as a result of frequent log switches.
The lag between the incremental checkpoint target and the log tail is also limited by 90% of the smallest online log file size. This makes sure that in most cases log switch would not need to wait for checkpoint.

Recovery I/O can also be limited by setting the LOG_CHECKPOINT_TIMEOUT parameter or by the size specified for the smallest redo log.

LOG_FILE_SIZE_REDO_BLKS    Maximum number of redo blocks required to guarantee that a log switch does not occur before the checkpoint completes.

I believe these are outdated statements.

An incremental checkpoint is triggered when the checkpoint position lags behind the redo log tail by V$INSTANCE_RECOVERY.LOG_FILE_SIZE_REDO_BLKS.  This mechanism has the objective to avoid waiting for thread checkpoints when all the redo logs are full or active, but it is no longer based on the size of the smallest redo log.  In the databases I’ve examined, the threshold is calculated by adding the size of all the redo logs except the largest one, then multiplying by 0.81 (in 11.2, or 0.9 in 10.2).  The parameter controlling the multiplier is _target_rba_max_lag_percentage.

This change in behaviour is consistent with the evolution of the lazy DBWR.

Unfortunately, some environments may fill as little as an eighth of each redo log, which circumvents this safety mechanism.  (See this post).


One thought on “Checkpoint Tuning and the Evolution of the Lazy Database Writer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s