In previous posts, I’ve written about the importance of considering fragmentation and other configuration settings when datafiles reside on UFS or VxFS.  How about NTFS?  Would it suffer from any of the same performance pitfalls?

I have read and experimented a little to get a feel for the situation, but I haven’t investigated as thoroughly as I have for other file systems, so I stand to be corrected.

File Locking

On Windows, Oracle runs as a single process with multiple threads, so file level contention between Oracle processes is not an issue.

Sparse Files

Although sparse files exist in Windows, I haven’t seen any evidence of them being used by Oracle.  At the least, I have no related concerns relating to contiguous allocation and file lock contention.

Allocation

Oracle extends data files by setting the length of the file, (which appears to allocate space on disk immediately), then writes to the file in 1MB I/Os to format the database blocks.

When RMAN creates compressed disk backup pieces, it doesn’t know the final size when each is created, so a certain amount (eg 64MB) is pre-allocated, then the file grows as the backup progresses, being truncated at the end if necessary.   In cases like this, there are multiple allocation requests over time that may be interleaved with other requests (eg multiple channel backup to the same file system).

Fragmentation

Is fragmentation, of any type, something we should design to avoid?

Database Extent Distribution

Keep in mind that Oracle database segments are distributed in extents across the data file(s).  So long as the extents are in multiples of the largest I/O size, (usually 1MB), does it really matter?  Very large tables will often be scanned in parallel or partitioned anyway.

Filesystem and Storage Layout

During backups, and possibly during full table/index scans, a data file’s blocks will be read sequentially.  Some operating systems and storage appliances can improve performance by pre-fetching and or switching to larger direct I/O when sequential read activity is recognised.

Before the wide spread use of SANs, SAME, snapshots, and SSDs, we were concerned about disk head seek times, so we wanted data file blocks stored contiguously.  But… does it really matter in most large and important production environments any more?  I doubt it.

The NTFS MFT could be considered fragmented when files have many extents (a range of contiguously allocated clusters).  The MFT records then have to point to other records or to blocks so it can track all the extents for one file.  I haven’t seen any trustworthy statistics or accounts of how much impact a fragmented MFT could have, but I would expect the MFT to be well cached to serve file access, especially for servers dedicated to one or more Oracle databases (direct I/O avoids polluting the cache).  So, perhaps a tiny bit more CPU time for each file access?

The MFT limits how many extents (how fragmented) a single file can be, so it might be worth avoiding careless extreme fragmentation with large Oracle databases by:

  • Pre-allocating capacity for table spaces rather than relying on frequent auto-extension (especially if concurrent).
  • Set a reasonable auto-extension size – safety net only (>=1MB; multiple of the usual extent size; not too large as to waste space or make the foreground process wait to long).
  • Avoid sharing a file system housing data files with small and transient files such as trace files, backup pieces and archive logs.

From Windows Server 2012, the MFT can use large file record segments (4KB vs 1KB).  This would reduce the type of fragmentation mentioned above.

Cluster Size

NTFS has a default cluster size of 4K (for large volumes), but can be formatted up to 64KB.

Note that Oracle will perform I/Os smaller and larger than the cluster size, so IOPS are not affected, unless the extents too small to contain 1MB I/Os.

The data attribute of the file records in the MFT effectively track extents (a pair of starting cluster and contiguous cluster count) so they aren’t smaller with 64KB clusters.

The bitmap file that tracks free and used clusters will be smaller, eg for an empty 449GB file system, the overhead is:

  • 103MB with 4KB clusters, and
  • 91MB with 64KB clusters.

A file system that only stores data files will not waste space with 64KB clusters, and NTFS compression (only supported with 4KB clusters) will not be required.

In conclusion, there doesn’t seem to be much benefit, but it seems logical to manage free and used space in larger chunks if there will only be relatively few and large files stored in the file system.

 

Tools

I used SysInternals Suite to investigate:

  • ProcMon to watch data file creation, backups, restores, auto-extension
  • DiskView and Contig to see file fragmentation
  • DiskMon to see I/O sizes

FSUTIL can be used to see cluster size and MFT record size.