RMAN backups, restores and maintenance activities were taking far too long for data warehouse on Oracle.
The throughput from disk and to the tape library were not the only factors; meta-data processing, latency communicating with the media manager and RMAN configuration were significant.
Changing a few RMAN configuration parameters and creating new indexes in the RMAN repository database made a dramatic improvement.
- Oracle 10gR2 Database with 7,500 datafiles.
- Daily level 1 backups and weekly level 0 backups.
- Backed up to a virtual tape library (VTL) with de-duplication and compression enabled.
- VTL vendor specified FILESPERSET=1 to allow the VTL’s de-duplication technology to save space.
- Using an 11.2 RMAN catalogue database that also records backups for hundreds of other databases.
- 365 day backup retention.
- Datafiles stored on NFS (no OS caching).
Issue – Too Many Backup Sets
After a year of backing up thousands of datafiles and archive logs each day, there were 3,000,000 backup sets in the RMAN repository for this database.
- Many meta-data operations were slow, so that backups would seem to stall for an hour before beginning to send data to the VTL.
- Backups of other databases using the same RMAN catalogue were also affected, although not to the same degree.
- Deleting a week’s cycle of obsolete RMAN backups for the data warehouse took more than four days.
- Before each datafile was backed up, the control file would be scanned. The control file had grown to 500MB because of all the backup objects, and the NFS file system didn’t provide OS caching, so 3.75 terabytes of control file data was read from the storage during each backup just to process the meta-data.
The proliferation of small tablespaces and datafiles was an obvious design flaw, but not one that we could address in the medium term because they are created automatically by the third party data warehouse application.
Did I really need to set FILESPERSET=1? The VTL de-duplication technology breaks each backup piece into chunks (256KB?) and replaces the repeated chunks with pointers, so the vendor recommends not multiplexing, compressing or encrypting the RMAN backups. However, setting MAXOPENFILES=1 parameter also disables multiplexing, while still allowing a backup set to contain multiple datafiles. The datafiles are now concatenated into the backup pieces instead of being interleaved. I don’t expect any alignment issues because the RMAN output buffer is 256KB in this case.
The VTL vendor also requested that backup pieces be kept below 100GB to reduce locking and capacity management issues. As a DBA, I wamt to limit the time it takes to restore part of the database (eg a datafile or tablespace). At first I thought I could use MAXPIECESIZE to achieve these aims without compromising on the number of backup sets, (i.e. have multiple backup pieces per backup set). However, I was misled by some Oracle MOS notes that confused “backup pieces” with “backup sets”. Datafiles are mapped to backup sets, not to each backup piece, even when multiplexing is not involved. The worst case for restoring a single datafile would be to read through all the backup pieces in a set only to find the datafile at the end of the last piece. Instead, I used MAXSETSIZE and set it to be slightly larger than the greatest datafile size.
Now there were a reasonable number of backup sets being created each day, we just had to wait for the old backups to age out of the repository. What could I do to speed up RMAN catalogue access in the mean time? Separating this database’s backup repository into a different schema didn’t help because some of the execution plans were shared across schemata. The same goes for creating SQL Profiles – one plan didn’t always suit every DB’s RMAN repository data.
I did create some indexes, which may be of use to others with large RMAN repositories:
create index BCF_I98 on BCF(DBINC_KEY, BS_KEY) tablespace RMAN_CATALOG compress 1 ; create index BCF_I99 on BCF(DBINC_KEY, CKP_SCN) tablespace RMAN_CATALOG compress 1 ; create index BDF_I98 on BDF(DBINC_KEY, FILE#, BS_KEY) tablespace RMAN_CATALOG compress 2 parallel 4; alter index BDF_I98 noparallel; create index BDF_I99 on BDF(DBINC_KEY, CKP_SCN) tablespace RMAN_CATALOG compress 1 parallel 4; alter index BDF_I99 noparallel; create index BP_I98 on BP(HANDLE) tablespace RMAN_CATALOG parallel 4; alter index BP_I98 noparallel; create index BP_I99 on BP(DB_KEY, status, DEVICE_TYPE, copy#, piece#, SITE_KEY, TAG, HaNDLE, bs_key) tablespace RMAN_CATALOG compress 7 parallel 4; alter index BP_I99 noparallel; create index BRL_I99 on BRL(DBINC_KEY, LOW_SCN) tablespace RMAN_CATALOG compress 1 parallel 4; alter index BRL_I99 noparallel; create index BS_I98 on BS(BCK_TYPE, status, PIECES, KEEP_OPTIONS, bs_key, COMPLETION_TIME) tablespace RMAN_CATALOG compress 4 parallel 4; alter index BS_I98 noparallel; create index BS_I99 on BS(DB_KEY,STATUS,INPUT_FILE_SCAN_ONLY) tablespace RMAN_CATALOG compress parallel 4; alter index BS_I99 noparallel; create index BSF_I99 on BSF(DB_KEY, BS_KEY) tablespace RMAN_CATALOG compress 1; create index RLH_I99 on RLH(DBINC_KEY,LOW_SCN) tablespace RMAN_CATALOG compress 1 parallel 4; alter index RLH_I99 noparallel; create index RSR_I99 on RSR(RSR_END,DBINC_KEY) tablespace RMAN_CATALOG; create index DBINC_I99 on DBINC(DB_KEY,DBINC_KEY) tablespace RMAN_CATALOG;
The team looking after the VTL didn’t notice any reduction in the effectiveness of the de-duplication, so this this approach was successful. Six months later, the database was rebuilt with 1/10th the number of datafiles and tablespaces, resolving this issue and a few others.