Beware of DISM. It bites.
Pre-12c databases on Solaris use DISM to allow the SGA to be resized without restarting the database.  That is the theory at least, but is it really safe to do so?

Increasing the SGA dynamically can introduce a severe performance issue that can slow down the database so much that it eventually crashes.  That could be a nasty surprise for the DBA that attempts to increase the buffer cache or shared pool on a busy production database that is subjected to a heavier workload than usual.

Recommendation

Don’t use DISM!   Don’t touch that SGA_MAX_SIZE parameter.

Use ISM instead.  The instance will have a greater chance to obtain large pages and the whole SGA will be locked in RAM.

Even if you use DISM but never increase the SGA, you will probably end up with lots of small pages (bad for TLB/TSB misses).  What is the point if you can only safely increase the SGA size by restarting the instance?

The same goes for MEMORY_MAX_TARGET – but AMM is for ASM, trivial databases or “Oracle for Dummies” anyway.

The Details

If you want to know more, read on.

Environment

  • Solaris 10 10/08 s10s_u6wos_07b Zone running inside a Solaris 11.2 LDOM
  • RDBMS 11.2.0.2.11
  • SGA_MAX_SIZE parameter set
  • Veritas Filesystem with Direct I/O enabled

Background

A DR zone running standby databases shared the same LDOM as the test zone.  In order to make the most of the RAM, the standby databases had their SGAs reduced while they were not activated.  The plan was to dynamically increase them just before or after a switchover/failover.  The SGA_MAX_SIZE was set to match the primary databases, but the individual components were lower to make more RAM available for use by the test databases and application processes.

Symptoms

Soon after increasing the shared pool and buffer cache sizes on the standby databases, the database processes started to use a lot of CPU.  The OS showed high system time, mutex contention and context switches.  The output from commands like ps -ef | grep pmon would pause for 30 seconds when it reached the affected database processes.

Cause

When the shared segment using DISM is created, swap space is reserved for the whole segment.  The used granules will be allocated and the underlying pages locked in RAM.  The free granules may mostly be unallocated.

col component format a32
select min(BASEADDR), max(BASEADDR), count(1) Granules, sum(a.gransize)/1048576 MB, a.GRANFLAGS, component, a.GRANSTATE
from x$ksmge a, x$kmgsct b
where a.grantype = b.grantype (+)
group by a.GRANFLAGS, component, a.GRANSTATE
order by 1,2
/
MIN(BASEADDR)    MAX(BASEADDR)      GRANULES         MB  GRANFLAGS COMPONENT                        GRANSTATE
---------------- ---------------- ---------- ---------- ---------- -------------------------------- ----------------
0000000400000000 0000000454000000         85       1360          0                                  FREE
0000000455000000 000000045C000000          8        128          4 DEFAULT buffer cache             ALLOC_LOCKED
000000045D000000 000000045D000000          1         16          4 java pool                        ALLOC_LOCKED
000000045E000000 000000045E000000          1         16          4 large pool                       ALLOC_LOCKED
000000045F000000 000000047E000000         32        512          4 shared pool                      ALLOC_LOCKED

pmap -xs on the dism process from the LDOM shows

         Address     Kbytes        RSS       Anon     Locked Pgsz Mode   Mapped File
0000000380000000      12288      12288          -       8192   4M rwxs-    [ dism shmid=0x7d000027 ]
0000000380C00000       4096       4096          -       4096   8K rwxs-    [ dism shmid=0x7d000027 ]
0000000400000000    1310720          -          -          -    - rwxs-    [ dism shmid=0x52000028 ]
0000000450000000     524288     524288          -     442368   8K rwxs-    [ dism shmid=0x52000028 ]
0000000470000000     151552     151552          -     151552   4M rwxs-    [ dism shmid=0x52000028 ]
0000000479400000      94208      94208          -      94208   8K rwxs-    [ dism shmid=0x52000028 ]
0000000480000000       4096       4096          -          -   4M rwxs-    [ dism shmid=0x52000029 ]

Now alter the buffer cache to add 128MB to the buffer cache:

         Address     Kbytes        RSS       Anon     Locked Pgsz Mode   Mapped File
0000000380000000      12288      12288          -       8192   4M rwxs-    [ dism shmid=0x7d000027 ]
0000000380C00000       4096       4096          -       4096   8K rwxs-    [ dism shmid=0x7d000027 ]
0000000400000000     262144     262144          -          -   8K rwxs-    [ dism shmid=0x52000028 ]
0000000410000000    1048576          -          -          -    - rwxs-    [ dism shmid=0x52000028 ]
0000000450000000     524288     524288          -     442368   8K rwxs-    [ dism shmid=0x52000028 ]
0000000470000000     151552     151552          -     151552   4M rwxs-    [ dism shmid=0x52000028 ]
0000000479400000      94208      94208          -      94208   8K rwxs-    [ dism shmid=0x52000028 ]
0000000480000000       4096       4096          -          -   4M rwxs-    [ dism shmid=0x52000029 ]

Notice that the newly mapped 256MB is resident but is using unlocked 8KB pages.

MIN(BASEADDR)    MAX(BASEADDR)      GRANULES         MB  GRANFLAGS COMPONENT                        GRANSTATE
---------------- ---------------- ---------- ---------- ---------- -------------------------------- ----------------
0000000400000000 0000000407000000          8        128          4 DEFAULT buffer cache             ALLOC
0000000408000000 0000000454000000         77       1232          0                                  FREE
0000000455000000 000000045C000000          8        128          4 DEFAULT buffer cache             ALLOC_LOCKED
000000045D000000 000000045D000000          1         16          4 java pool                        ALLOC_LOCKED
000000045E000000 000000045E000000          1         16          4 large pool                       ALLOC_LOCKED
000000045F000000 000000047E000000         32        512          4 shared pool                      ALLOC_LOCKED

Look for the granule state of “ALLOC” as opposed to “ALLOC_LOCKED”.  (When using ISM instead of DISM, “ALLOC” is okay – the granules are still locked in RAM).

In our case it wasn’t a problem with resource limits preventing the pages from being locked.  Truss shows that the ora_dism process doesn’t even try to lock the granules, (no memcntl system calls).  The output from prctl showed no resource limitation was reached for the project or zone.  Also, all the memory can be locked by restarted the database or by growing and shrinking the SGA components repeatedly without a restart.

Lockstat showed:

lockstat -A -D 10 -n 100000 -ckWw sleep 1
Adaptive mutex spin: 94006 events in 2.089 seconds (45004 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Hottest Caller
-------------------------------------------------------------------------------
78401  83%  83% 0.00   591979 0x10012cf59f00         anon_array_enter
 4226   4%  88% 0.00    23909 volgl_spin[16]         voliod_iohandle
 3658   4%  92% 0.00   188558 cpu_info_template_lock[8] read_kstat_data
 2200   2%  94% 0.00    34158 0x4000083e880          volpage_getlist_internal
 1554   2%  96% 0.00   162917 0x10017fff0d00         vdc_send_request
  966   1%  97% 0.00    11957 0x1001bdc31b08         i_ldc_invoke_cb
  868   1%  98% 0.00    56698 cpu_lock[8]            p_online_internal
  207   0%  98% 0.00     9404 0x300e9475100          sfmmu_hblk_alloc
  144   0%  98% 0.00   323738 0x10018fe95080         vdc_handle_cb
  139   0%  98% 0.00    26508 vx_worklist_lk[16]     vx_worklist_process
-------------------------------------------------------------------------------

Adaptive mutex block: 986 events in 2.089 seconds (472 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Hottest Caller
-------------------------------------------------------------------------------
  748  76%  76% 0.00   316080 0x10012cf59f00         anon_array_enter
  113  11%  87% 0.00 12666391427 cpu_lock[8]            getloadavg
   27   3%  90% 0.00  2371473 cpu_info_template_lock[8] read_kstat_data
   25   3%  93% 0.00   455025 0x10018ffe1b00         vdc_send_request
   23   2%  95% 0.00   318199 0x4000083e880          volpage_freelist
   22   2%  97% 0.00   431246 volgl_spin[16]         voliod_iohandle
    6   1%  98% 0.00 17120585 0x10012cef8140         anon_map_privatepages
    3   0%  98% 0.00   194315 0x1001da4a9df8         vol_nm_debug_log_entry
    3   0%  98% 0.00   305389 0x10016fff0c40         vdc_handle_cb
    2   0%  99% 0.00    90667 0x10025fd7b0d0         vx_rwsleep_rec_lock
-------------------------------------------------------------------------------

Affected database sessions showed stacks similar to this:

echo "::threadlist -v"|mdb -k >threadlist.out
    default_mutex_lock_delay+0x6c()
    mutex_vector_enter+0x17c()
    anon_array_enter+0xbc()
    spt_anon_getpages_task+0x328()
    vmtask_task+0x7c()
    vmtask_run_xjob+0x1bc()
    spt_anon_getpages+0x1f0()
    segspt_dismfault+0x2f4()
    as_lock_mwlist+0x1bc()
    mvec_activate+0x178()
    as_pagelock+0x48()
    vx_bp_lock_pages+0x48()
    vx_dio_physio+0x2e8()
    vx_dio_rdwri+0x668()
    vx_dio_read+0x220()
    vx_read1+0x6f0()
    vx_read+0x404()
    fop_read+0x84()
    lo_read+0x58()
    fop_read+0x84()
    pread+0x224()
    syscall_trap+0xc8()

Increasing the SGA_TARGET value dynamically has the same effect.

Resolution

The quickest fix is to shut down the instance and restart it with the same memory parameters.

Conjecture

The Oracle developers may have purposely avoided locking newly allocated 8k pages immediately to help Solaris coalesce the fragmented memory to create large pages.  However, this coalescing behaviour caused performance issues of its own in Solaris 10.  (Search for set mpss_coalesce_disable=1 and set pg_contig_disable=1 ).

Oracle 12c uses OSM instead of DISM, which is a big clue as to how we should configure older versions.

References

ISM or DISM Misconfiguration can Slow Down Oracle Database Performance (Doc ID 1472108.1)

SuperCluster – OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster; DISM ( Dynamic Intimate Shared Memory )(11g) is not (Doc ID 1468297.1)

Oracle Scratchpad post about OSM and 8KB pages

Note that BUG:15794950 is the most common DISM problem referenced in MOS notes, but it isn’t relevant to this OS version.  Also, this case involves performance issues with 8k pages, not large pages.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s