NZ DBA: Oracle Database Tuning & Tips

January 14, 2014

SQL Tuning Advisor and DML

Filed under: Oracle Database,Performance Tuning — Ari @ 5:07 pm
Tags: , ,

Does the SQL Tuning Advisor auto task tune DML statements?

Sort of; sometimes.  Read on.

The documentation says:

Oracle Database automatically runs the SQL Tuning Advisor on selected high-load SQL statements from the Automatic Workload Repository (AWR) that qualify as tuning candidates…..
<skip some lines>
….. Oracle Database analyzes statistics in the AWR and generates a list of potential SQL statements that are eligible for tuning. These statements include repeating high-load statements that have a significant impact on the system. Only SQL statements that have an execution plan with a high potential for improvement will be tuned. Recursive SQL and statements that have been tuned recently (in the last month) are ignored, as are parallel queries, DMLs, DDLs, and SQL statements with performance problems that are caused by concurrency issues.

However, I was alarmed to see statements like this being run by the SQL Tuning autotasks:

/* SQL Analyze(186,1) */ INSERT INTO TABLE_A SELECT * FROM ...

What would the implications be?  Consider locking, segment growth, redo, triggers, etc.

I could see the execution plan in the SGA started with:

 0 INSERT STATEMENT
 1  LOAD TABLE CONVENTIONAL

The session also had a DML lock on TABLE_A.

I tested with dbms_sqltune, trying to replicate the situation, but after opening the cursor a couple of times, the report just said:

Type of SQL statement not supported.

I logged a Service Request with Oracle Support who said:

…the sql tuning advisor will check the select part only and not the DML part,yes technically it is considered DML but actually internally it will not use the DML but will use the SELECT part…

Another clarification is for such DMLs, the SELECT parts are only used i,e for execution internally and not the DML part and therefore there will be no redo generation or any data update/delete/insert …

I can’t find this treatment of DMLs documented anywhere, so I suggested to Oracle that they correct their documentation and I created this post.

January 9, 2014

Memory vs SQL Loader and Wide Tables

Filed under: Oracle Database — Ari @ 12:33 pm
Tags: , ,

A customer migrated an ETL application to a new Linux server and found that a SQL Loader job was now failing with a return code of 137.

I couldn’t find anything useful on the internet, documentation or Oracle support site about this sqlldr exit code, and there was nothing in the sqlldr logs.

I ran the job myself and noticed that the free memory on the server disappeared very quickly and remained low until the sqlldr process died.  (I didn’t have root access to check OS message logs).

The table being loaded had more than fifty VARCHAR2(4000) columns and direct path method was used.  Using the default SQLLDR parameter values, the direct load buffer needed over 1GB!
Setting a lower value for COLUMNARRAYROWS in the control file for this table reduced the memory requirement for loading this table, and allowed the job to complete successfully.

OVM for SPARC vs TCP

Filed under: OVM SPARC,UNIX / Linux — Ari @ 12:05 pm
Tags: , , ,

When connecting to new OVM for SPARC (LDOMs), I found that there was a network problem, which appears not to be a known issue.
This post might be of interest to you if you plan to use a similar platform.
We have a primary control domain and an LDOM (Solaris VM) created on a T4-2 server. Solaris 11.1 is used on the primary domain and the LDOM.
Virtualised network devices are created over link aggregates with VLAN tagging over 10Gb Oracle CNA HBAs in the primary domains and are assigned to the LDOM.
The problem occurs with any outbound traffic on the 10G HBA.

The first symptoms were the inability to transfer large files from Windows using SFTP, and PuTTy SSH or VNC sessions being disconnected frequently and randomly.

I used Microsoft Network Monitor to capture the network traffic during these problems and found that acknowledgement packets from Solaris that should have been empty had two extra bytes added, and so the checksum was incorrect.  After a while, these checksum errors and the retransmissions would result in the TCP session being disconnected.

The problem affected multiple protocols, but only from the LDOMs, not the primary domain.

The closest issues I could find related to checksum offloading, (where the network card performs the checksum processing to reduce the load on the server’s CPUs).
Disabling checksum offloading on the LDOMs for the 10Gb CNA cards successfully worked around the problem.  (The 1GB cards were not affected).
Eg, Add

set ip:dohwcksum=0x0

to /etc/system in the LDOM and reboot.
I’ll update this post as Oracle Support provide more information.

OVM for SPARC vs CPU Metrics

A couple of months ago, I created an Oracle database cluster (11.2.0.4.0 RAC) on a Solaris 11 LDOM.

I found that the CPU time statistics in the database are all zero, (eg in V$SYS_TIME_MODEL, V$SYSSTAT and AWR reports).

I couldn’t find any notes or bugs that matched, but Oracle Support have just created Solaris 11.1.15.4.0 which they say will address this issue.  (An OS bug).

October 18, 2013

VxFS File Fragmentation

Filed under: VxFS — Ari @ 10:14 pm
Tags: , , ,

This post might interest you if your organisation has Oracle databases running on Veritas filesystems (VxFS).

What Does Veritas File Fragmentation Mean?

The extents allocated to a datafile are not contiguous in the filesystem.  If the extents are too small and scattered then large or sequential I/Os (eg, full tablescans) might be inefficient.

Since learning about how to avoid UFS Fragmentation for Oracle datafiles, I was able to recognise a similar problem with a database on Veritas volume/filesystem.  (Solaris 10).  Many of the notes I made about UFS Fragmentation apply to Veritas fragmentation, so check that page for more information.

A Real World Example

The customer’s DBA had complained that the performance of a training database was significantly less than a test database; both being (RMAN) duplicates.  There were very few differences between the two, and they used the same storage (XIV), but the difference had been narrowed down to I/O performance.

Here is an example, using a full scan of a big table, showing the difference between ORATRN and ORATST:

alter session set db_file_multiblock_read_count=128;
select /*+ full(t) parallel(t) nocache */ count(1) from A_BIG_TABLE t;

Output from vxstat -i 1

 OPERATIONS          BLOCKS           AVG TIME(ms)
 TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE
Fri Oct 18 11:09:36 2013
 vol oratrn           2224         5    429840        80  17.46   2.20
Fri Oct 18 11:09:37 2013
 vol oratrn           1890         4    355216        36  20.78   1.32
Fri Oct 18 11:09:38 2013
 vol oratrn           2008         4    313488        34  19.12   1.11
Fri Oct 18 11:09:39 2013
 vol oratrn           2325         7    328352        84  15.96   1.51
Fri Oct 18 11:09:40 2013
 vol oratrn           1960         7    310064        84  19.62   1.33
Fri Oct 18 11:09:41 2013
 vol oratrn           1895         7    320480        88  19.63   2.62

Fri Oct 18 10:48:45 2013
 vol oratst           1157         2   2107904         8  26.73   3.07
Fri Oct 18 10:48:46 2013
 vol oratst           1178         2   2154944         4  27.32   9.16
Fri Oct 18 10:48:47 2013
 vol oratst           1167         3   2172352        48  26.82   1.80
Fri Oct 18 10:48:48 2013
 vol oratst           1279         4   2346320         8  25.14   7.13
Fri Oct 18 10:48:49 2013
 vol oratst           1223         2   2219488        32  25.97   9.91

The throughput is much higher in the ORATST filesystem, and the average I/O size is larger.

Truss of the database session showed pread system calls of 1MB (128x8KB DB blocks) for both databases, and the Veritas settings were the same.  (The XIV distributes space in 1MB chunks, and 1MB is a common maximum for I/Os, so this should be an efficient value).

I suspected the files were more fragmented in the ORATRN volume than in the ORATST volume because the I/Os were broken up into smaller pieces, despite the environment being almost identical.  I wasn’t sure though, because Veritas is an extent based filesystem, so I wonder if the database files could become fragmented in the same way as files in UFS can.
The customer’s DBA confirmed that the ORATRN database was created with several RMAN channels, while the ORATST database was created with only one.

How can we see the file fragmentation?  I found a standard Veritas command called fsmap.

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                    Volume  Extent Type     File Offset     Extent Size     File
                    oratrn         Data         0 Bytes       264.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       264.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       392.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       520.00 KB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data       648.00 KB       384.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.01 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.13 MB       304.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.43 MB        80.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.51 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         1.63 MB       384.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratrn         Data         2.01 MB       128.00 KB     ./TWO_GB_DATAFILE.DBF
 ... etc

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                    Volume  Extent Type     File Offset     Extent Size     File
                    oratst         Data         0 Bytes         8.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         8.00 KB        32.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        40.00 KB        64.00 KB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       104.00 KB         5.31 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         5.41 MB        24.81 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        30.23 MB        25.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        56.10 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        72.10 MB        19.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data        91.35 MB        27.69 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       119.04 MB        23.38 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       142.41 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       174.41 MB        17.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       191.41 MB        34.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       226.16 MB        42.19 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       268.35 MB        17.94 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       286.29 MB        38.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       324.29 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       356.29 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       372.29 MB        32.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       404.29 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       420.29 MB        18.31 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       438.60 MB        16.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       454.85 MB        21.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       476.60 MB        17.38 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       493.98 MB        23.75 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       517.73 MB        23.50 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       541.23 MB        20.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       561.35 MB        16.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       577.60 MB        16.69 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       594.29 MB        16.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       610.41 MB        18.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       628.54 MB        22.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       651.41 MB        34.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       685.41 MB        43.06 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       728.48 MB        45.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       773.60 MB        23.88 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       797.48 MB        24.25 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       821.73 MB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       837.73 MB        18.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       855.85 MB        22.12 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       877.98 MB        64.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data       941.98 MB        96.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.01 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.03 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.04 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.06 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.08 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.09 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.11 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.12 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.14 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.15 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.17 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.19 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.20 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.22 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.23 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.25 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.26 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.28 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.29 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.31 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.33 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.34 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.36 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.37 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.39 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.40 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.42 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.44 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.45 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.47 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.48 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.50 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.51 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.53 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.54 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.56 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.58 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.59 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.61 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.62 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.64 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.65 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.67 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.69 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.70 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.72 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.73 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.75 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.76 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.78 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.79 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.81 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.83 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.84 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.86 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.87 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.89 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.90 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.92 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.94 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.95 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.97 GB        16.00 MB     ./TWO_GB_DATAFILE.DBF
                    oratst         Data         1.98 GB        10.02 MB     ./TWO_GB_DATAFILE.DBF

After copying the ORATRN files to a new filesystem using rsync, the extent sizes were up to 256MB each, eg:

$: find . | /opt/VRTSvxfs/sbin/fsmap -aH -
                   oranew          Data         0 Bytes        24.00 KB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data        24.00 KB       246.62 MB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data       246.65 MB       256.00 MB     ./TWO_GB_DATAFILE.DBF
                   oranew          Data       502.65 MB       256.00 MB     ./TWO_GB_DATAFILE.DBF
... etc

Also see Symantec’s note on this issue.

Update

During another project, I had the opportunity to examine how RMAN caused this fragmentation.

The database version was 9.2, so RMAN wrote from a 256KB output/disk buffer using four asynchronous threads.  The average VxFS extent size of the restored datafiles was 256KB.

Later versions of Oracle use larger and configurable buffers, so the problem will not be as severe.

We are going to try to use the VxFS tunables “initial_extent_size” and “max_seqio_extent_size” to reduce the fragmentation during the next restore test.

June 30, 2013

Row Migration vs Datapump

Symptoms

Datapump takes a long time to export a small table, waiting for db file sequential read events many more times than there are blocks in the table.
The table doesn’t have any LOBs.

This problem has been seen in 10.2, 11.2 and 12.1. I couldn’t find a bug for this; the closest to an acknowledgement of the problem is MOS note 1086414.1 .

Cause

The problematic table has a high percentage of migrated rows and Datapump is using the direct path method to export the table, possibly because of the table’s small size.

A trace will show that direct path reads are used to get the first table blocks, but then each migrated row will be fetched by executing a sequential block read. Blocks will be read multiple times, once for each row migrated to it. Subsequent reads are not satisfied from cache, so each migrated row results in a physical block read!

A migrated row has no Header flag, just the First and Last flags in the trace file for a block dump, eg:

fb: —-FL–

Workarounds

  1. Defragment the table’s rows with a CTAS or move.
  2. Use expdp with access_method=external_table

Test Steps

The problem can be replicated by following these steps:

create table test_tab (col1 number, col2 varchar2(10));
begin
for i in 1..1000000
loop
insert into test_tab values (i,null);
end loop;
commit;
update test_tab set col2='xxxxx';
commit;
end;
/
@?/rdbms/admin/utlchain
analyze table test_tab list chained rows;
select count(1) from chained_rows;

expdp / directory=DATA_PUMP_DIR tables=test_tab dumpfile=slow.dmp

expdp / directory=DATA_PUMP_DIR tables=test_tab access_method=external_table dumpfile=fast.dmp

September 9, 2012

Diagnostic Directory Disaster

Have you thought about where your database takes a dump?

It is common to install the Oracle software on internal disks and place the database files on external storage.  The DIAGNOSTIC_DEST (and Automatic Diagnostic Repository) is stored under $ORACLE_BASE or $ORACLE_HOME by default, so without further consideration this ends up on local disk too.

Often there are only a couple of internal disks, which are mirrored.  This should be adequate for the OS and Oracle binaries, but not for operations with sustained high IOPS.

In some situations, many Oracle processes may write a lot of data to different files in the ADR concurrently, which may overwhelm a single internal disk.

I’ve seen an extreme example on a Solaris server with multiple test databases sharing a single (mirrored) internal disk and UFS filesystems for their ADR, Oracle binaries, swap space and OS  files.  When one highly parallel statement started hitting ORA-0600 errors, each parallel slave process started writing trace and core files to the ADR.  The server became very sluggish; many OS commands were affected.  A shutdown abort of the affected instance left behind the parallel slave processes, and even kill -9 wouldn’t terminate them.  The internal disks were swamped; the service times for I/Os were reaching 40 seconds while the disk heads thrashed back and forward!  This situation continued for 30 minutes before all the processes finished writing and finally terminated.

Enabling SQL tracing for an instance could also put pressure on the internal disk/s to cope.

If the ADR / DIAGNOSTIC_DEST was on SAN storage, (multiple spindles, large cache), then the spike in I/O activity would be better accommodated.

For Oracle clusters there is another reason to avoid internal/local disk for the ADR: it easier to manage RAC instances with logs (alert, listener, etc) on shared (and highly available) storage, to avoid having to log into every server.

The issue could be mitigated by setting a low value for MAX_DUMP_FILE_SIZE, but in production environments, you may not want to truncate trace files so that the entire trace file can be sent to Oracle Support without having to replicate the issue, (eg internal errors causing an instance crash or data corruption).

Pointing DIAGNOSTIC_DEST at a ZFS volume might mitigate the use of internal disks because of the sequential nature of the ZFS writes.

August 2, 2012

Solaris Cluster vs Network Speed

Filed under: Solaris Cluster,UNIX / Linux — Ari @ 1:56 pm
Tags:

While investigating rsync/ssh performance, I discovered that our Solaris Clusters had just over half the expected throughput on their gigabit NICs.

This was caused by the cluster/streams related module clhbsndr being automatically pushed to each public NIC via /etc/iu.ap .

Symptoms:

  • Round trip time via ping was higher than non-cluster servers on the same network
  • The throughput measured with iperf (and other tools) was around 400-600MBit/s instead of 940Mbit/s
  • The clhbsndr module is loaded for the public interfaces, eg
ifconfig e1000g1 modlist

0 arp
1 ip
2 clhbsndr
3 e1000g

Status

We have confirmed that bug 6625886 is the cause and are now waiting for Oracle to say if the module can safely be removed from the public interfaces of a Solaris 10 cluster, eg:

ifconfig e1000g1 modremove clhbsndr@2

Encryption vs Transfer Speed

Filed under: ssh — Ari @ 1:46 pm
Tags:

Do you want to make scp and rsync go faster?

LAN

One easy method to squeeze a little more speed out of file transfer tools using ssh by trying a different encryption method.

For example, we saved 15 seconds per gigabyte sent across our LAN on our SPARC Solaris servers, by using:

rsync --rsh="ssh -c arcfour" ...

or

scp -c arcfour source dest

WAN

If your problem is latency across a WAN, then HPN-SSH might be the solution.

Also, if you want secure authentication but don’t need the data encrypted, then you could use SSH with no encryption.  HPN-SSH allows the use of a ‘none” cipher.

For example, on a network with 32.5 ms RTT (ping) and Gigabit NICs, scp transfer rates increased from 1.35MB/s to around 100MB/s with HPN-SSH, 4MB buffers (TCP and SSH), and the “NONE CIPHER” enabled.

HPN_SSH Settings
  • HPNDisabled no
  • NoneEnabled=yes (optional)
Kernel Settings

All these were set to 4MB from their defaults:

  • tcp_cwnd_max
  • tcp_max_buf
  • tcp_recv_hiwat
  • tcp_xmit_hiwat

(The send and receive buffers could be set per route instead to differentiate between LAN and WAN connections).

Example of syntax (remove TcpRcvBufPoll=no on Linux and Windows):

/opt/openssh-5.9p1-hpn/bin/scp -oTcpRcvBufPoll=no -oNoneSwitch=yes -oNoneEnabled=yes sourcefile remoteserver:/destpath/destfile

If the NONE cipher is not appropriate, then HPN-SSH’s multi-threaded AES cipher might be the fastest solution for your server if it has multiple cores.  Alternatively, I’ve found arcfour used less CPU so it was better for several concurrent rsync and scp transfers, because it offered greater throughput on busy processors.

July 24, 2012

Datapump vs Application Upgrade

Filed under: Datapump — Ari @ 6:51 am
Tags: , , ,

If you have used Datapump to migrate, duplicate or re-organise a schema, then you may have inadvertently introduced an application upgrade time bomb.

A primary key index (or unique key index) that was created by an “add constraint” statement, will be dropped by an “alter table xxx drop primary key” statement.  If the developers rely on this “lazy” method, then the SQL in their application upgrades may fail if the index was created in another way.  (Eg Datapump import, or manual DBA maintenance).

For example, the table, index and primary key constraint (with user specified names rather than system generated names) are created by the application scripts like this:

create table TEST as select * from dual;
alter table TEST add constraint TEST_PK primary key (dummy);

Later, Datapump (11.2.0.2.3) is used to move the schema to a new database.  Datapump creates the tables, then the indexes, before adding the constraints separately.  (Except when reference partitioning is used).

The Datapump SQL is equivalent to:

create table TEST as select * from dual;
create index TEST_PK on TEST(dummy);
alter table TEST add constraint TEST_PK primary key (dummy);

This works fine until the next application upgrade, when a column is to be added to the primary key:

SQL> alter table TEST add (dummy2 varchar2(100) default 'a' not null);

Table altered.

SQL> alter table TEST drop primary key;

Table altered.

SQL> alter table TEST add constraint TEST_PK primary key (dummy,dummy2);
alter table TEST add constraint TEST_PK primary key (dummy,dummy2)
*
ERROR at line 1:
ORA-00955: name is already used by an existing object 

SQL> -- The application upgrade script failed
SQL> select index_name from user_indexes where table_name='TEST';

INDEX_NAME                                                                                                                                                              
----------                                                                                                                                          
TEST_PK

The index wasn’t dropped, but it can no longer support the primary key, so the “add constraint” statement can’t create an appropriately named index to enforce the new primary key.

Prevention

  1. Developers could create objects using commands similar to Datapump, ie run “create index” explicitly
  2. Use the full syntax when dropping constraints and intending the index to be dropped too, eg
    alter table TEST drop primary key drop index;
  3. Manually recreate affected indexes and constraints as required after a Datapump import.  (Not always a quick task).
  4. After a Datapump import, manually set the index properties to make it seem that the index was created by an “add constraint” statement.  (This involves updating ind$, so this may not be considered appropriate for production environments).

Ind$.Property

These are the index property flags for a Oracle 11.2:

 /* unique : 0x01 */
 /* partitioned : 0x02 */
 /* reverse : 0x04 */
 /* compressed : 0x08 */
 /* functional : 0x10 */
 /* temporary table index: 0x20 */
 /* session-specific temporary table index: 0x40 */
 /* index on embedded adt: 0x80 */
 /* user said to check max length at runtime: 0x0100 */
 /* domain index on IOT: 0x0200 */
 /* join index : 0x0400 */
 /* system managed domain index : 0x0800 */
 /* The index was created by a constraint : 0x1000 */
 /* The index was created by create MV : 0x2000 */
 /* composite domain index : 0x8000 */

An “alter table xxx drop primary key” statement will drop the index too if the bits for 0×1001 are set (4097 in decimal; meaning ‘unique and created by a constraint’).  A Datapump import will lose the 0×1000 flag, so a DBA could generate statements to reinstate it by running this in the source database:

-- For Oracle 11.2
set linesize 300
spool ind_property_hack.sql
select 'update sys.ind$ set property=property+4096 
 where bitand(property,4096)=0
 and obj#= (
  select object_id
  from dba_objects do2
  where owner='''||do.owner||'''
  and object_name='''||do.object_name||''');'
from sys.ind$ i, dba_objects do
where bitand(property,4096)=4096
and i.obj# = do.object_id
and i.dataobj# = do.data_object_id
and do.owner not in (select distinct user_name from SYS.default_pwd$)
order by 1
/
spool off
Next Page »

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.