Diagnostic Directory Disaster

Have you thought about where your database takes a dump?

It is common to install the Oracle software on internal disks and place the database files on external storage. The DIAGNOSTIC_DEST (and Automatic Diagnostic Repository) is stored under $ORACLE_BASE or $ORACLE_HOME by default, so without further consideration this ends up on local disk too.

Often there are only a couple of internal disks, which are mirrored. This should be adequate for the OS and Oracle binaries, but not for operations with sustained high IOPS.

In some situations, many Oracle processes may write a lot of data to different files in the ADR concurrently, which may overwhelm a single internal disk.

I’ve seen an extreme example on a Solaris server with multiple test databases sharing a single (mirrored) internal disk and UFS filesystems for their ADR, Oracle binaries, swap space and OS files. When one highly parallel statement started hitting ORA-0600 errors, each parallel slave process started writing trace and core files to the ADR. The server became very sluggish; many OS commands were affected. A shutdown abort of the affected instance left behind the parallel slave processes, and even kill -9 wouldn’t terminate them. The internal disks were swamped; the service times for I/Os were reaching 40 seconds while the disk heads thrashed back and forward! This situation continued for 30 minutes before all the processes finished writing and finally terminated.

Enabling SQL tracing for an instance could also put pressure on the internal disk/s to cope.

If the ADR / DIAGNOSTIC_DEST was on SAN storage, (multiple spindles, large cache), then the spike in I/O activity would be better accommodated.

For Oracle clusters there is another reason to avoid internal/local disk for the ADR: it easier to manage RAC instances with logs (alert, listener, etc) on shared (and highly available) storage, to avoid having to log into every server.

The issue could be mitigated by setting a low value for MAX_DUMP_FILE_SIZE, but in production environments, you may not want to truncate trace files so that the entire trace file can be sent to Oracle Support without having to replicate the issue, (eg internal errors causing an instance crash or data corruption).

Pointing DIAGNOSTIC_DEST at a ZFS volume might mitigate the use of internal disks because of the sequential nature of the ZFS writes.

Diagnostic Directory Disaster

Share this:

Related

Leave a comment Cancel reply