Redo logs are switched before being full, which complicates checkpoint tuning and choosing the size of redo logs.

One of the causes is when space reserved by redo strand buffers is not used.  When one strand is full and cannot reserve more space in the redo log, then a switch is triggered.  When the other strand buffers have little in them, the reserved redo log space is wasted.

When there aren’t many concurrent transactions, only one public strand may be active.  (Related parameters: _log_parallelism_dynamic, _log_parallelism_max).

If flashback database (or supplemental logging) is enabled, then private redo strand buffers may be allocated, but not used.  (The SGA is allocated before the instance is open, so the memory structures must be available in case flashback is disabled).  Guaranteed restore points also disable private redo strands.  In these cases, each redo strand reserves space that will never be used.
To see the size of each public strand buffer, look at the first and largest sizes in X$KCRFSTRAND, eg:

select strand_size_kcrfa from x$kcrfstrand ;

For the memory reserved for private strand buffers, check this:

select bytes from v$sgastat where name = 'private strands';

Note that if the total redo buffer size is larger than each redo log, then the redo log is divided by the number of public strand buffers.  The result is that the redo logs may switch even before a public redo buffer is full.  Eg, 8 public redo strand buffers of 16MB each with 50MB redo log files could result in archive logs being only 6.25MB.

Refer to MOS note 1356604.1 and bugs 9272059 & 10354739.  (The bug has a patch to improve diagnostic information, and the diagnostic information will be part of 12.1).

Real Example

I recently investigated a performance problem on an Enterprise Edition instance with default values for the checkpoint related parameters.  The redo logs switched when only an eighth full (12.5MB out of 100MB).  The low number of redo blocks used in each log bypassed the incremental checkpoint threshold based on redo log size (see this post).  Whenever all of the redo logs filled in less than five minutes (the self-tune checkpoint threshold), then no incremental checkpoints were triggered and all sessions had to wait for a thread checkpoint to complete.  The thread checkpoint wrote all dirty buffers to disk, eliminating the checkpoint lag, so no matter how long this high rate of redo generation continued, the time based and volume based thresholds were never reached.  Every cycle of the redo log estate resulted in delays waiting for the lazy DBWR to do all its work at once.

The contributing factors were CPU_COUNT defaulting to 128 and a large SGA resulting in the derived redo buffer size (140MB) being larger than each redo log.  Since the total redo buffer size was bigger than the redo logs, the redo log size was divided by the eight public redo strand buffers, giving the approximate size expected for the archive logs.  (Assuming a low concurrency of sessions changing data).  This database was in flashback mode and most of the changes were generated by a single session.

Possible workarounds/tunables for controlling archive log size or reducing the impact on checkpoints:

  1. Larger and/or more redo logs
  2. Set CPU_COUNT to a lower number, such as the number of cores (rather than virtual CPUs).
  3. Set the LOG_BUFFER parameter to force a smaller size.
  4. Set fast_start_mttr_target (or log_checkpoint_interval in Std Ed) to a low value to trigger incremental checkpoints.
  5. Set _log_parallelism_dynamic=false to spread activity across the public strands, filling redo logs more and triggering size based incremental checkpoints.

One thought on “Hollow Redo Logs

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s