In a previous post I found that direct and concurrent I/O was disabled when restoring a certain type of backup with RMAN on UFS.
A similar issue disables direct and concurrent I/O when writing to sparse tempfiles. I haven’t seen any advice on this specific issue, but it must be common because tempfiles on Solaris UFS are created sparse, (not all space is allocated; the file has “holes” in it). Solaris won’t allow direct I/O for synchronous writes to sparse UFS files. Long average wait times for direct path write temp are an indication of this problem. Another is a high number of 8KB writes shown by iostat, while truss shows larger (eg 128KB or 1MB) writes. It can also show up as free buffer waits when the temp space is buffered and written via the DBWR, eg when sorting with the TO_CLOB function, which creates a temporary LOB segment.
For this post, I shall refer to files that are not sparse as ‘dense’ files.
One way to compare the performance difference yourself is:
- Create a new temporary tablespace on a forcedirectio UFS filesystem.
- Time the parallel creation of a very large index, (so that the temporary tablespace is written to concurrently).
- After this, the tempfile/s might still be sparse. Check with the du command or use ls -ls which shows the number of allocated 512 byte sectors for data+metadata. Compare this to the file’s length. If there is any doubt, copy the tempfile and copy/rename it back again to convert it to a normal file. (Or you can do this online by dropping the tempfile and adding the dense one back with the reuse clause).
- Restart the instance or flush the cache.
- Time the parallel creation of the large index again. This time it should be quicker, with less time spent waiting for direct writes.
A dense tempfile can become sparse again as soon as it auto-extends. Even when the sort write size = the temporary tablespace extent size = the auto-extension next size = 1MB, some operations (eg hash joins) may not touch all blocks in the new sort segment extents, leaving the file sparse. Over time, the extents will be reused and all the blocks touched.
The OS uses bmap_has_holes() to compare the data+metadata to the length of the file. Just one unallocated block is enough for the entire file to be treated as a sparse file.
The tempfiles will be treated as dense when all the blocks have been touched, or when the tempfile is resized down far enough to remove all of the unallocated blocks. (Shrinking a used tempfile causes the existing sort segment to be completely shrunk, one block at a time, which can take a while).
If possible, pre-allocate sufficient temporary tablespace capacity and use an OS tool like cp to get rid of the holes in the tempfiles. This will allow concurrent direct I/O when sorting, hash joining, using global temporary tables, etc.
If using auto-extension, set the next increment to match the tablespace extent size, (eg 1MB). However, I have seen examples of files that were sparse when they auto-extended, and even hit the auto-extension limit (cannot extend error) and remained sparse! So, I recommend pre-allocation of dense tempfiles and not allowing auto-extension for temporary tablespaces.
Why use synchronous writes to tempfiles?
I assume that Oracle has chosen to open tempfiles with the O_DSYNC flag because many processes could be using the tempfiles, and the alternative could result in lots of fdsync calls. But do we need to know if any writes to tempfiles have made it to disk? If the server lost power with dirty tempfile blocks in the cache, it wouldn’t affect the integrity of the database. I can’t explain why Oracle have chosen to use O_DSYNC for tempfiles.
Even without the O_DSYNC flag, a single write lock would be enforced while the tempfiles are sparse.