“cannot allocate new log”, “checkpoint not complete”
“log file switch (checkpoint incomplete)”
The event “log file switch (checkpoint incomplete)” is a foreground wait; it delays one or more sessions.
More and/or larger redo log groups may improve performance when you see these messages/waits. If you can’t or you don’t want to grow your redo log estate, then learning more about checkpointing and redo population might provide an alternative solution.
In this post I’ll briefly cover the basics, then introduce some rare information about checkpoints.
Checkpoint Basics
Delaying writing dirty blocks to disk can be more efficient because
- the same blocks may be changed more than once before being written, and
- the DBWR will sort blocks so that it can write them in order. (The more time that passes, the more chance of contiguous blocks).
However, redo for dirty blocks can’t be overwritten, so if the redo logs fill up, changes in the database stop until the dirty blocks have been written to disk.
Checkpoint tuning is about finding the best rate of activity for the DBWR by calibrating thresholds for CKPT triggers. Oracle has evolved to make the DBWR lazier.
Threshold / Trigger
|
Type |
Threshold Metric
|
V$Instance_Recovery Counter
|
Default |
Std Edition |
| log_checkpoint_timeout |
Incremental |
Temporal |
WRITES_LOG_CHECKPOINT_SETTINGS |
1800 |
Yes |
| log_checkpoint_interval |
Incremental |
Volume |
WRITES_LOG_CHECKPOINT_SETTINGS |
unset |
Yes |
| fast_start_mttr_target |
Incremental |
Volume |
WRITES_MTTR |
unset |
No |
Self / auto tune
_selftune_checkpoint_write_pct
_selftune_checkpointing_lag |
Incremental |
Temporal |
WRITES_AUTOTUNE |
3%
300sec |
No (?) |
Redo logsize
_target_rba_max_lag_percentage |
Incremental |
Volume |
WRITES_LOGFILE_SIZE |
81 (11.2)
90 (10.2) |
Yes |
| Redo thread full / active |
Thread |
Volume |
WRITES_FULL_THREAD_CKPT |
n/a |
Yes |
Temporal thresholds smooth total the peaks of write activity (DBWR spikes lag behind the LGWR spikes) but can’t handle sudden bursts of redo generation that fill all of the logs in a short amount of time.
Volume thresholds can be quicker to scale up DBWR activity in reaction to bursts of redo generation, but may increase I/O peaks and total writes (where the same data blocks are modified repeatedly).
Rare Information or Rash Assertion
While there is a lot written about this topic, I found that even Oracle’s documentation didn’t match what I have observed. Here are my claims that are contradicted in Oracle documentation, forums and blogs:
- Thread checkpoints are not triggered by log switches, despite what the Oracle 11.2 manual says.
- Self-tune checkpoints lag behind the redo log tail by five minutes and writes are restricted to 3% of physical I/O.
- Checkpoints are triggered when they lag behind the redo log tail by 81% of [ the sum of the redo logs except the largest one ]. (90% in 10.2).
Log Switch Checkpoints
Most of Oracle’s documentation, (at the time of writing this), is very outdated, or at best misleading.
However, John Watson’s book “OCA Oracle Database 11g: Administration I Exam Guide” says:
There is no checkpoint following a log switch. This has been the case since release 8i, though to this day many DBAs do not realize this.
If one enables checkpoint beginning/completed messages in the alert log, then it will look like log switch checkpoints are occurring:
ALTER SYSTEM SET LOG_CHECKPOINTS_TO_ALERT=TRUE SCOPE=MEMORY;
ALTER SYSTEM SWITCH LOGFILE;
....
Beginning log switch checkpoint up to RBA ...
However, when testing in an 11.2 database with no other users, although I saw the message stating that a log switch checkpoint had started, nothing further happened. The DBWR was not progressing the checkpoint by writing dirty blocks to disk. Checkpoint activity only occurred when one of the other thresholds was reached. (See table above).
My theory is that the “log switch checkpoints” are no longer real thread checkpoints, but just RBA markers. They do not complete without something else triggering a real checkpoint. When an incremental checkpoint takes the checkpoint position in the control file up to the log switch marker, then the checkpoint position is written to the datafile headers and a message is posted to the alert log:
Completed checkpoint up to RBA ....
This is part of the evolution of the DBWR, becoming lazier (more efficient), but it relies on incremental checkpoints. If incremental checkpoint thresholds aren’t being reached for some reason, then the only thread checkpoints will happen when the redo logs are all full or active, resulting in delays and the checkpoint position in the datafiles lagging more than expected.
Self / Auto Tune Checkpoints
These incremental checkpoints are enabled when fast_start_mttr_target is unset or set to 1 or more. If the checkpoint position lags behind the redo log tail by five minutes, then incremental checkpoints will start, subject to some vague load criteria. All I can add is that the parameters below seem to be related:
Parameter Default Description
_disable_selftune_checkpointing FALSE Disable self-tune checkpointing
_selftune_checkpoint_write_pct 3 Percentage of total physical i/os for self-tune ckpt
_selftune_checkpointing_lag 300 Self-tune checkpointing lag the tail of the redo log
I’ve found that self-tune checkpoints don’t happen in Oracle 11.2 Express Edition. I haven’t tested in Standard Edition yet, but I guess that they won’t happen either.
Log Size Checkpoints
One of the challenges with having worked with Oracle databases for so many years is the amount of “unlearning” that is necessary with each new version. It doesn’t help that Oracle’s staff face this challenge too, when maintaining documentation.
Oracle’s documentation (eg 11.2 Reference manual and MOS Note 147468.1 Checkpoint Tuning and Troubleshooting Guide), claim that checkpoints are influenced by the size of the smallest redo log:
A checkpoint occurs at every log switch. If a previous checkpoint is already in progress, the checkpoint forced by the log switch will override the current checkpoint. This necessitates well-sized redo logs to avoid unnecessary checkpoints as a result of frequent log switches.
The lag between the incremental checkpoint target and the log tail is also limited by 90% of the smallest online log file size. This makes sure that in most cases log switch would not need to wait for checkpoint.
Recovery I/O can also be limited by setting the LOG_CHECKPOINT_TIMEOUT parameter or by the size specified for the smallest redo log.
LOG_FILE_SIZE_REDO_BLKS Maximum number of redo blocks required to guarantee that a log switch does not occur before the checkpoint completes.
I believe these are outdated statements.
An incremental checkpoint is triggered when the checkpoint position lags behind the redo log tail by V$INSTANCE_RECOVERY.LOG_FILE_SIZE_REDO_BLKS. This mechanism has the objective to avoid waiting for thread checkpoints when all the redo logs are full or active, but it is no longer based on the size of the smallest redo log. In the databases I’ve examined, the threshold is calculated by adding the size of all the redo logs except the largest one, then multiplying by 0.81 (in 11.2, or 0.9 in 10.2). The parameter controlling the multiplier is _target_rba_max_lag_percentage.
This change in behaviour is consistent with the evolution of the lazy DBWR.
Unfortunately, some environments may fill as little as an eighth of each redo log, which circumvents this safety mechanism. (See this post).