NZ DBA: Oracle Database Tuning & Tips

June 11, 2015

The Not-So-Dynamic SGA

Filed under: Oracle 11g,Oracle Database,OVM SPARC,Performance Tuning — Ari @ 6:59 pm
Tags: ,

Beware of DISM. It bites.
Pre-12c databases on Solaris use DISM to allow the SGA to be resized without restarting the database.  That is the theory at least, but is it really safe to do so?

Increasing the SGA dynamically can introduce a severe performance issue that can slow down the database so much that it eventually crashes.  That could be a nasty surprise for the DBA that attempts to increase the buffer cache or shared pool on a busy production database that is subjected to a heavier workload than usual.

Recommendation

Don’t use DISM!   Don’t touch that SGA_MAX_SIZE parameter.

Use ISM instead.  The instance will have a greater chance to obtain large pages and the whole SGA will be locked in RAM.

Even if you use DISM but never increase the SGA, you will probably end up with lots of small pages (bad for TLB/TSB misses).  What is the point if you can only safely increase the SGA size by restarting the instance?

The same goes for MEMORY_MAX_TARGET – but AMM is for ASM, trivial databases or “Oracle for Dummies” anyway.

The Details

If you want to know more, read on.

Environment

  • Solaris 10 10/08 s10s_u6wos_07b Zone running inside a Solaris 11.2 LDOM
  • RDBMS 11.2.0.2.11
  • SGA_MAX_SIZE parameter set
  • Veritas Filesystem with Direct I/O enabled

Background

A DR zone running standby databases shared the same LDOM as the test zone.  In order to make the most of the RAM, the standby databases had their SGAs reduced while they were not activated.  The plan was to dynamically increase them just before or after a switchover/failover.  The SGA_MAX_SIZE was set to match the primary databases, but the individual components were lower to make more RAM available for use by the test databases and application processes.

Symptoms

Soon after increasing the shared pool and buffer cache sizes on the standby databases, the database processes started to use a lot of CPU.  The OS showed high system time, mutex contention and context switches.  The output from commands like ps -ef | grep pmon would pause for 30 seconds when it reached the affected database processes.

Cause

When the shared segment using DISM is created, swap space is reserved for the whole segment.  The used granules will be allocated and the underlying pages locked in RAM.  The free granules may mostly be unallocated.

col component format a32
select min(BASEADDR), max(BASEADDR), count(1) Granules, sum(a.gransize)/1048576 MB, a.GRANFLAGS, component, a.GRANSTATE
from x$ksmge a, x$kmgsct b
where a.grantype = b.grantype (+)
group by a.GRANFLAGS, component, a.GRANSTATE
order by 1,2
/
MIN(BASEADDR)    MAX(BASEADDR)      GRANULES         MB  GRANFLAGS COMPONENT                        GRANSTATE
---------------- ---------------- ---------- ---------- ---------- -------------------------------- ----------------
0000000400000000 0000000454000000         85       1360          0                                  FREE
0000000455000000 000000045C000000          8        128          4 DEFAULT buffer cache             ALLOC_LOCKED
000000045D000000 000000045D000000          1         16          4 java pool                        ALLOC_LOCKED
000000045E000000 000000045E000000          1         16          4 large pool                       ALLOC_LOCKED
000000045F000000 000000047E000000         32        512          4 shared pool                      ALLOC_LOCKED

pmap -xs on the dism process from the LDOM shows

         Address     Kbytes        RSS       Anon     Locked Pgsz Mode   Mapped File
0000000380000000      12288      12288          -       8192   4M rwxs-    [ dism shmid=0x7d000027 ]
0000000380C00000       4096       4096          -       4096   8K rwxs-    [ dism shmid=0x7d000027 ]
0000000400000000    1310720          -          -          -    - rwxs-    [ dism shmid=0x52000028 ]
0000000450000000     524288     524288          -     442368   8K rwxs-    [ dism shmid=0x52000028 ]
0000000470000000     151552     151552          -     151552   4M rwxs-    [ dism shmid=0x52000028 ]
0000000479400000      94208      94208          -      94208   8K rwxs-    [ dism shmid=0x52000028 ]
0000000480000000       4096       4096          -          -   4M rwxs-    [ dism shmid=0x52000029 ]

Now alter the buffer cache to add 128MB to the buffer cache:

         Address     Kbytes        RSS       Anon     Locked Pgsz Mode   Mapped File
0000000380000000      12288      12288          -       8192   4M rwxs-    [ dism shmid=0x7d000027 ]
0000000380C00000       4096       4096          -       4096   8K rwxs-    [ dism shmid=0x7d000027 ]
0000000400000000     262144     262144          -          -   8K rwxs-    [ dism shmid=0x52000028 ]
0000000410000000    1048576          -          -          -    - rwxs-    [ dism shmid=0x52000028 ]
0000000450000000     524288     524288          -     442368   8K rwxs-    [ dism shmid=0x52000028 ]
0000000470000000     151552     151552          -     151552   4M rwxs-    [ dism shmid=0x52000028 ]
0000000479400000      94208      94208          -      94208   8K rwxs-    [ dism shmid=0x52000028 ]
0000000480000000       4096       4096          -          -   4M rwxs-    [ dism shmid=0x52000029 ]

Notice that the newly mapped 256MB is resident but is using unlocked 8KB pages.

MIN(BASEADDR)    MAX(BASEADDR)      GRANULES         MB  GRANFLAGS COMPONENT                        GRANSTATE
---------------- ---------------- ---------- ---------- ---------- -------------------------------- ----------------
0000000400000000 0000000407000000          8        128          4 DEFAULT buffer cache             ALLOC
0000000408000000 0000000454000000         77       1232          0                                  FREE
0000000455000000 000000045C000000          8        128          4 DEFAULT buffer cache             ALLOC_LOCKED
000000045D000000 000000045D000000          1         16          4 java pool                        ALLOC_LOCKED
000000045E000000 000000045E000000          1         16          4 large pool                       ALLOC_LOCKED
000000045F000000 000000047E000000         32        512          4 shared pool                      ALLOC_LOCKED

Look for the granule state of “ALLOC” as opposed to “ALLOC_LOCKED”.  (When using ISM instead of DISM, “ALLOC” is okay – the granules are still locked in RAM).

In our case it wasn’t a problem with resource limits preventing the pages from being locked.  Truss shows that the ora_dism process doesn’t even try to lock the granules, (no memcntl system calls).  The output from prctl showed no resource limitation was reached for the project or zone.  Also, all the memory can be locked by restarted the database or by growing and shrinking the SGA components repeatedly without a restart.

Lockstat showed:

lockstat -A -D 10 -n 100000 -ckWw sleep 1
Adaptive mutex spin: 94006 events in 2.089 seconds (45004 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Hottest Caller
-------------------------------------------------------------------------------
78401  83%  83% 0.00   591979 0x10012cf59f00         anon_array_enter
 4226   4%  88% 0.00    23909 volgl_spin[16]         voliod_iohandle
 3658   4%  92% 0.00   188558 cpu_info_template_lock[8] read_kstat_data
 2200   2%  94% 0.00    34158 0x4000083e880          volpage_getlist_internal
 1554   2%  96% 0.00   162917 0x10017fff0d00         vdc_send_request
  966   1%  97% 0.00    11957 0x1001bdc31b08         i_ldc_invoke_cb
  868   1%  98% 0.00    56698 cpu_lock[8]            p_online_internal
  207   0%  98% 0.00     9404 0x300e9475100          sfmmu_hblk_alloc
  144   0%  98% 0.00   323738 0x10018fe95080         vdc_handle_cb
  139   0%  98% 0.00    26508 vx_worklist_lk[16]     vx_worklist_process
-------------------------------------------------------------------------------

Adaptive mutex block: 986 events in 2.089 seconds (472 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Hottest Caller
-------------------------------------------------------------------------------
  748  76%  76% 0.00   316080 0x10012cf59f00         anon_array_enter
  113  11%  87% 0.00 12666391427 cpu_lock[8]            getloadavg
   27   3%  90% 0.00  2371473 cpu_info_template_lock[8] read_kstat_data
   25   3%  93% 0.00   455025 0x10018ffe1b00         vdc_send_request
   23   2%  95% 0.00   318199 0x4000083e880          volpage_freelist
   22   2%  97% 0.00   431246 volgl_spin[16]         voliod_iohandle
    6   1%  98% 0.00 17120585 0x10012cef8140         anon_map_privatepages
    3   0%  98% 0.00   194315 0x1001da4a9df8         vol_nm_debug_log_entry
    3   0%  98% 0.00   305389 0x10016fff0c40         vdc_handle_cb
    2   0%  99% 0.00    90667 0x10025fd7b0d0         vx_rwsleep_rec_lock
-------------------------------------------------------------------------------

Affected database sessions showed stacks similar to this:

echo "::threadlist -v"|mdb -k >threadlist.out
    default_mutex_lock_delay+0x6c()
    mutex_vector_enter+0x17c()
    anon_array_enter+0xbc()
    spt_anon_getpages_task+0x328()
    vmtask_task+0x7c()
    vmtask_run_xjob+0x1bc()
    spt_anon_getpages+0x1f0()
    segspt_dismfault+0x2f4()
    as_lock_mwlist+0x1bc()
    mvec_activate+0x178()
    as_pagelock+0x48()
    vx_bp_lock_pages+0x48()
    vx_dio_physio+0x2e8()
    vx_dio_rdwri+0x668()
    vx_dio_read+0x220()
    vx_read1+0x6f0()
    vx_read+0x404()
    fop_read+0x84()
    lo_read+0x58()
    fop_read+0x84()
    pread+0x224()
    syscall_trap+0xc8()

Increasing the SGA_TARGET value dynamically has the same effect.

Resolution

The quickest fix is to shut down the instance and restart it with the same memory parameters.

Conjecture

The Oracle developers may have purposely avoided locking newly allocated 8k pages immediately to help Solaris coalesce the fragmented memory to create large pages.  However, this coalescing behaviour caused performance issues of its own in Solaris 10.  (Search for set mpss_coalesce_disable=1 and set pg_contig_disable=1 ).

Oracle 12c uses OSM instead of DISM, which is a big clue as how we should configure older versions.

References

ISM or DISM Misconfiguration can Slow Down Oracle Database Performance (Doc ID 1472108.1)

SuperCluster – OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster; DISM ( Dynamic Intimate Shared Memory )(11g) is not (Doc ID 1468297.1)

Oracle Scratchpad post about OSM and 8KB pages

Note that BUG:15794950 is the most common DISM problem referenced in MOS notes, but it isn’t relevant to this OS version.  Also, this case involves performance issues with 8k pages, not large pages.

June 14, 2014

Undone by Undo

The Puzzle

After switching to a new undo tablespace, transactions would fail with ORA-01552.

The Background

During a data warehouse migration using GoldenGate, the target database returned an ORA-1628 error when applying transactions from the source.
The assigned DBA set a couple of initialisation parameters as prescribed by
Troubleshooting ORA-1628 – max # extents (32765) reached for rollback segment <SEGMENT_NAME> (Doc ID 1580182.1)

alter system set "_smu_debug_mode" = 4 scope=both sid='*';
alter system set "_rollback_segment_count"=999999 scope=both sid='*';

The second parameter caused this warning:

ORA-30018: Create Rollback Segment failed, USN 32760 is out of range

During the migration, the undo tablespace grew to a ridiculous size.  In an attempt to reclaim terabytes of space, the DBA tried to replace the undo tablespace with a smaller one.  However, after creating a new tablespace and switching to it, transactions failed with:

ORA-01552: cannot use system rollback segment for non-system tablespace...

So, the original and huge undo tablespace was retained for the ‘go-live’ while Oracle Support tried to work out what caused the ORA-01552.

After going live on the new database, ETL performance was disappointing due to the overhead of DDL for management of the undo segments and the domination of the buffer cache by undo blocks.

At first Oracle Support claimed that creating the maximum number of undo segments was beneficial.  However, when pushed, they could not explain exactly how having 32,760 undo segments helped in a database with a low number concurrent transactions.  How would it prevent hitting the maximum number of extents (ORA-1628)?  What was the cost?

After removing the undo parameters prescribed by 1580182.1, there was no improvement to performance and the ORA-01552 still prevented a switch to a smaller undo tablespace.

The Solution

I started investigating and found that when a new undo tablespace was created, no new rows were inserted into UNDO$ is because the limit had been reached:

select ts#,name from ts$ where name like 'UNDO%';

TS# NAME
--- --------
  2 UNDOTBS1
  4 UNDOTBS2
145 UNDOTBS3
break on report
compute sum of segments on report
select max(US#),count(1) segments,ts# from sys.undo$ group by ts#;

MAX(US#)     SEGMENTS TS#
---------- ---------- ---
32758           20753 2
32759           12006 4
0                   1 0
           ----------
            sum 32760

The UNDOTBS3 tablespace had no undo segments in it because the database couldn’t create any more!

Once the UNDOTBS1 tablespace was dropped, new undo segments could be created in other undo tablespaces, and the ORA-01552 was avoided.

I suggested that Oracle Support change the careless recommendations in Doc 1580182.1.  At the time of writing, Doc 1580182.1 now says:

alter system set "_rollback_segment_count"=1000 scope=spfile;

January 14, 2014

SQL Tuning Advisor and DML

Filed under: Oracle Database,Performance Tuning — Ari @ 5:07 pm
Tags: , ,

Does the SQL Tuning Advisor auto task tune DML statements?

Sort of; sometimes.  Read on.

The documentation says:

Oracle Database automatically runs the SQL Tuning Advisor on selected high-load SQL statements from the Automatic Workload Repository (AWR) that qualify as tuning candidates…..
<skip some lines>
….. Oracle Database analyzes statistics in the AWR and generates a list of potential SQL statements that are eligible for tuning. These statements include repeating high-load statements that have a significant impact on the system. Only SQL statements that have an execution plan with a high potential for improvement will be tuned. Recursive SQL and statements that have been tuned recently (in the last month) are ignored, as are parallel queries, DMLs, DDLs, and SQL statements with performance problems that are caused by concurrency issues.

However, I was alarmed to see statements like this being run by the SQL Tuning autotasks:

/* SQL Analyze(186,1) */ INSERT INTO TABLE_A SELECT * FROM ...

What would the implications be?  Consider locking, segment growth, redo, triggers, etc.

I could see the execution plan in the SGA started with:

 0 INSERT STATEMENT
 1  LOAD TABLE CONVENTIONAL

The session also had a DML lock on TABLE_A.

I tested with dbms_sqltune, trying to replicate the situation, but after opening the cursor a couple of times, the report just said:

Type of SQL statement not supported.

I logged a Service Request with Oracle Support who said:

…the sql tuning advisor will check the select part only and not the DML part,yes technically it is considered DML but actually internally it will not use the DML but will use the SELECT part…

Another clarification is for such DMLs, the SELECT parts are only used i,e for execution internally and not the DML part and therefore there will be no redo generation or any data update/delete/insert …

I can’t find this treatment of DMLs documented anywhere, so I suggested to Oracle that they correct their documentation and I created this post.

January 9, 2014

Memory vs SQL Loader and Wide Tables

Filed under: Oracle Database — Ari @ 12:33 pm
Tags: , ,

A customer migrated an ETL application to a new Linux server and found that a SQL Loader job was now failing with a return code of 137.

I couldn’t find anything useful on the internet, documentation or Oracle support site about this sqlldr exit code, and there was nothing in the sqlldr logs.

I ran the job myself and noticed that the free memory on the server disappeared very quickly and remained low until the sqlldr process died.  (I didn’t have root access to check OS message logs).

The table being loaded had more than fifty VARCHAR2(4000) columns and direct path method was used.  Using the default SQLLDR parameter values, the direct load buffer needed over 1GB!
Setting a lower value for COLUMNARRAYROWS in the control file for this table reduced the memory requirement for loading this table, and allowed the job to complete successfully.

OVM for SPARC vs TCP

Filed under: OVM SPARC,UNIX / Linux — Ari @ 12:05 pm
Tags: , , ,

When connecting to new OVM for SPARC (LDOMs), I found that there was a network problem, which appears not to be a known issue.
This post might be of interest to you if you plan to use a similar platform.
We have a primary control domain and an LDOM (Solaris VM) created on a T4-2 server. Solaris 11.1 is used on the primary domain and the LDOM.
Virtualised network devices are created over link aggregates with VLAN tagging over 10Gb Oracle CNA HBAs in the primary domains and are assigned to the LDOM.
The problem occurs with any outbound traffic on the 10G HBA.

The first symptoms were the inability to transfer large files from Windows using SFTP, and PuTTy SSH or VNC sessions being disconnected frequently and randomly.

I used Microsoft Network Monitor to capture the network traffic during these problems and found that acknowledgement packets from Solaris that should have been empty had two extra bytes added, and so the checksum was incorrect.  After a while, these checksum errors and the retransmissions would result in the TCP session being disconnected.

The problem affected multiple protocols, but only from the LDOMs, not the primary domain.

The closest issues I could find related to checksum offloading, (where the network card performs the checksum processing to reduce the load on the server’s CPUs).
Disabling checksum offloading on the LDOMs for the 10Gb CNA cards successfully worked around the problem.  (The 1GB cards were not affected).
Eg, Add

set ip:dohwcksum=0x0

to /etc/system in the LDOM and reboot.
I’ll update this post as Oracle Support provide more information.

OVM for SPARC vs CPU Metrics

A couple of months ago, I created an Oracle database cluster (11.2.0.4.0 RAC) on a Solaris 11 LDOM.

I found that the CPU time statistics in the database are all zero, (eg in V$SYS_TIME_MODEL, V$SYSSTAT and AWR reports).

I couldn’t find any notes or bugs that matched, but Oracle Support have just created Solaris 11.1.15.4.0 which they say will address this issue.  (An OS bug).

October 18, 2013

VxFS File Fragmentation

Filed under: VxFS — Ari @ 10:14 pm
Tags: , , ,

This post might interest you if your organisation has Oracle databases running on Veritas filesystems (VxFS).

What Does Veritas File Fragmentation Mean?

The extents allocated to a datafile are not contiguous in the filesystem.  If the extents are too small and scattered then large or sequential I/Os (eg, full tablescans) might be inefficient.

Since learning about how to avoid UFS Fragmentation for Oracle datafiles, I was able to recognise a similar problem with a database on Veritas volume/filesystem.  (Solaris 10).  Many of the notes I made about UFS Fragmentation apply to Veritas fragmentation, so check that page for more information.

A Real World Example

The customer’s DBA had complained that the performance of a training database was significantly less than a test database; both being (RMAN) duplicates.  There were very few differences between the two, and they used the same storage (XIV), but the difference had been narrowed down to I/O performance.

Here is an example, using a full scan of a big table, showing the difference between ORATRN and ORATST:

alter session set db_file_multiblock_read_count=128;
select /*+ full(t) parallel(t) nocache */ count(1) from A_BIG_TABLE t;

Output from vxstat -i 1

 OPERATIONS          BLOCKS           AVG TIME(ms)
 TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE
Fri Oct 18 11:09:36 2013
 vol oratrn           2224         5    429840        80  17.46   2.20
Fri Oct 18 11:09:37 2013
 vol oratrn           1890         4    355216        36  20.78   1.32
Fri Oct 18 11:09:38 2013
 vol oratrn           2008         4    313488        34  19.12   1.11
Fri Oct 18 11:09:39 2013
 vol oratrn           2325         7    328352        84  15.96   1.51
Fri Oct 18 11:09:40 2013
 vol oratrn           1960         7    310064        84  19.62   1.33
Fri Oct 18 11:09:41 2013
 vol oratrn           1895         7    320480        88  19.63   2.62

Fri Oct 18 10:48:45 2013
 vol oratst           1157         2   2107904         8  26.73   3.07
Fri Oct 18 10:48:46 2013
 vol oratst           1178         2   2154944         4  27.32   9.16
Fri Oct 18 10:48:47 2013
 vol oratst           1167         3   2172352        48  26.82   1.80
Fri Oct 18 10:48:48 2013
 vol oratst           1279         4   2346320         8  25.14   7.13
Fri Oct 18 10:48:49 2013
 vol oratst           1223         2   2219488        32  25.97   9.91

The throughput is much higher in the ORATST filesystem, and the average I/O size is larger.

Truss of the database session showed pread system calls of 1MB (128x8KB DB blocks) for both databases, and the Veritas settings were the same.  (The XIV distributes space in 1MB chunks, and 1MB is a common maximum for I/Os, so this should be an efficient value).

I suspected the files were more fragmented in the ORATRN volume than in the ORATST volume because the I/Os were broken up into smaller pieces, despite the environment being almost identical.  I wasn’t sure though, because Veritas is an extent based filesystem, so I wonder if the database files could become fragmented in the same way as files in UFS can.
The customer’s DBA confirmed that the ORATRN database was created with several RMAN channels, while the ORATST database was created with only one.

How can we see the file fragmentation?  I found a standard Veritas command called fsmap.

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                    Volume  Extent Type     File Offset     Extent Size     File
                    oratrn         Data         0 Bytes       264.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       264.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       392.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       520.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       648.00 KB       384.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.01 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.13 MB       304.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.43 MB        80.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.51 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.63 MB       384.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         2.01 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
 ... etc

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                    Volume  Extent Type     File Offset     Extent Size     File
                    oratst         Data         0 Bytes         8.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         8.00 KB        32.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        40.00 KB        64.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       104.00 KB         5.31 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         5.41 MB        24.81 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        30.23 MB        25.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        56.10 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        72.10 MB        19.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        91.35 MB        27.69 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       119.04 MB        23.38 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       142.41 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       174.41 MB        17.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       191.41 MB        34.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       226.16 MB        42.19 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       268.35 MB        17.94 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       286.29 MB        38.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       324.29 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       356.29 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       372.29 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       404.29 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       420.29 MB        18.31 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       438.60 MB        16.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       454.85 MB        21.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       476.60 MB        17.38 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       493.98 MB        23.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       517.73 MB        23.50 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       541.23 MB        20.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       561.35 MB        16.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       577.60 MB        16.69 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       594.29 MB        16.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       610.41 MB        18.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       628.54 MB        22.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       651.41 MB        34.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       685.41 MB        43.06 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       728.48 MB        45.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       773.60 MB        23.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       797.48 MB        24.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       821.73 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       837.73 MB        18.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       855.85 MB        22.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       877.98 MB        64.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       941.98 MB        96.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.01 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.03 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.04 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.06 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.08 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.09 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.11 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.12 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.14 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.15 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.17 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.19 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.20 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.22 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.23 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.25 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.26 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.28 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.29 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.31 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.33 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.34 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.36 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.37 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.39 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.40 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.42 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.44 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.45 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.47 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.48 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.50 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.51 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.53 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.54 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.56 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.58 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.59 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.61 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.62 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.64 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.65 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.67 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.69 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.70 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.72 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.73 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.75 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.76 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.78 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.79 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.81 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.83 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.84 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.86 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.87 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.89 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.90 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.92 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.94 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.95 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.97 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.98 GB        10.02 MB     ./TWO_GB_DATAFILE.DBF

After copying the ORATRN files to a new filesystem using rsync, the extent sizes were up to 256MB each, eg:

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                   oranew          Data         0 Bytes        24.00 KB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data        24.00 KB       246.62 MB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data       246.65 MB       256.00 MB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data       502.65 MB       256.00 MB     ./TWO_GB_DATAFILE.DBF
... etc

Also see Symantec’s note on this issue.

Update

During another project, I had the opportunity to examine how RMAN caused this fragmentation.

The database version was 9.2, so RMAN wrote from a 256KB output/disk buffer using four asynchronous threads.  The average VxFS extent size of the restored datafiles was 256KB.

Later versions of Oracle use larger and configurable buffers, so the problem will not be as severe.

We are going to try to use the VxFS tunables “initial_extent_size” and “max_seqio_extent_size” to reduce the fragmentation during the next restore test.

June 30, 2013

Row Migration vs Datapump

Symptoms

Datapump takes a long time to export a small table, waiting for db file sequential read events many more times than there are blocks in the table.
The table doesn’t have any LOBs.

This problem has been seen in 10.2, 11.2 and 12.1. I couldn’t find a bug for this; the closest to an acknowledgement of the problem is MOS note 1086414.1 .

Cause

The problematic table has a high percentage of migrated rows and Datapump is using the direct path method to export the table, possibly because of the table’s small size.

A trace will show that direct path reads are used to get the first table blocks, but then each migrated row will be fetched by executing a sequential block read. Blocks will be read multiple times, once for each row migrated to it. Subsequent reads are not satisfied from cache, so each migrated row results in a physical block read!

A migrated row has no Header flag, just the First and Last flags in the trace file for a block dump, eg:

fb: —-FL–

Workarounds

  1. Defragment the table’s rows with a CTAS or move.
  2. Use expdp with access_method=external_table

Test Steps

The problem can be replicated by following these steps:

create table test_tab (col1 number, col2 varchar2(10));
begin
for i in 1..1000000
loop
insert into test_tab values (i,null);
end loop;
commit;
update test_tab set col2='xxxxx';
commit;
end;
/
@?/rdbms/admin/utlchain
analyze table test_tab list chained rows;
select count(1) from chained_rows;

expdp / directory=DATA_PUMP_DIR tables=test_tab dumpfile=slow.dmp

expdp / directory=DATA_PUMP_DIR tables=test_tab access_method=external_table dumpfile=fast.dmp

September 9, 2012

Diagnostic Directory Disaster

Have you thought about where your database takes a dump?

It is common to install the Oracle software on internal disks and place the database files on external storage.  The DIAGNOSTIC_DEST (and Automatic Diagnostic Repository) is stored under $ORACLE_BASE or $ORACLE_HOME by default, so without further consideration this ends up on local disk too.

Often there are only a couple of internal disks, which are mirrored.  This should be adequate for the OS and Oracle binaries, but not for operations with sustained high IOPS.

In some situations, many Oracle processes may write a lot of data to different files in the ADR concurrently, which may overwhelm a single internal disk.

I’ve seen an extreme example on a Solaris server with multiple test databases sharing a single (mirrored) internal disk and UFS filesystems for their ADR, Oracle binaries, swap space and OS  files.  When one highly parallel statement started hitting ORA-0600 errors, each parallel slave process started writing trace and core files to the ADR.  The server became very sluggish; many OS commands were affected.  A shutdown abort of the affected instance left behind the parallel slave processes, and even kill -9 wouldn’t terminate them.  The internal disks were swamped; the service times for I/Os were reaching 40 seconds while the disk heads thrashed back and forward!  This situation continued for 30 minutes before all the processes finished writing and finally terminated.

Enabling SQL tracing for an instance could also put pressure on the internal disk/s to cope.

If the ADR / DIAGNOSTIC_DEST was on SAN storage, (multiple spindles, large cache), then the spike in I/O activity would be better accommodated.

For Oracle clusters there is another reason to avoid internal/local disk for the ADR: it easier to manage RAC instances with logs (alert, listener, etc) on shared (and highly available) storage, to avoid having to log into every server.

The issue could be mitigated by setting a low value for MAX_DUMP_FILE_SIZE, but in production environments, you may not want to truncate trace files so that the entire trace file can be sent to Oracle Support without having to replicate the issue, (eg internal errors causing an instance crash or data corruption).

Pointing DIAGNOSTIC_DEST at a ZFS volume might mitigate the use of internal disks because of the sequential nature of the ZFS writes.

August 2, 2012

Solaris Cluster vs Network Speed

Filed under: Solaris Cluster,UNIX / Linux — Ari @ 1:56 pm
Tags:

While investigating rsync/ssh performance, I discovered that our Solaris Clusters had just over half the expected throughput on their gigabit NICs.

This was caused by the cluster/streams related module clhbsndr being automatically pushed to each public NIC via /etc/iu.ap .

Symptoms:

  • Round trip time via ping was higher than non-cluster servers on the same network
  • The throughput measured with iperf (and other tools) was around 400-600MBit/s instead of 940Mbit/s
  • The clhbsndr module is loaded for the public interfaces, eg
ifconfig e1000g1 modlist

0 arp
1 ip
2 clhbsndr
3 e1000g

Status

We have confirmed that bug 6625886 is the cause and are now waiting for Oracle to say if the module can safely be removed from the public interfaces of a Solaris 10 cluster, eg:

ifconfig e1000g1 modremove clhbsndr@2
Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.