You are on page 1of 8

White Paper

BEST PRACTICES FOR MANAGING THE EMC


CLOUD TIERING APPLIANCE DATABASE

Abstract
This document discusses the best practices for managing the internal
database used by EMC Cloud Tiering Appliance.
October 2011

Copyright 2011 EMC Corporation. All Rights Reserved.


EMC believes the information in this publication is accurate of its
publication date. The information is subject to change without notice.
The information in this publication is provided as is. EMC Corporation
makes no representations or warranties of any kind with respect to the
information in this publication, and specifically disclaims implied
warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in this
publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.
Part Number h8909

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

Table of Contents
Executive summary.................................................................................................. 4
Audience ............................................................................................................................ 4

Cloud Tiering Appliance database ............................................................................ 4


Features that utilize the database ...................................................................................... 4
User interaction .................................................................................................................. 5

Database maintenance and backup ......................................................................... 5


Running database vacuum ................................................................................................. 5
Backing up the database .................................................................................................... 6
Restoring orphan management........................................................................................... 7

Conclusion .............................................................................................................. 8

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

Executive summary
The EMC Cloud Tiering Appliance (CTA) is used to implement a tiered storage
strategy through file level archiving, thereby facilitating significant storage savings.
This document discusses the best practices for managing the internal database used
by Cloud Tiering Appliance.

Audience
This white paper is intended for storage administrators who are tasking with
managing the tiered storage environment and must interact with the Cloud Tiering
Appliance on a regular basis. It is also intended for field personnel who are
responsible for implementing a tiered storage solution with the EMC Cloud Tiering
Appliance.
This document is supplemental to the published product documentation and is not
intended as a replacement for said documentation. It is assumed that the reader is
familiar with those documents including the Cloud Tiering Appliance and Cloud
Tiering Appliance/VE Getting Started Guide, Cloud Tiering Appliance and Cloud
Tiering Appliance/VE Release Notes, and CTA and CTA/VE Interoperability Matrix
before going through this document. If you have not reviewed those documents,
please do so in addition to reading this paper in order to gain a complete
understanding of the Cloud Tiering Appliance technology.

Cloud Tiering Appliance database


The Cloud Tiering Appliance database contains information that maintains mappings
of archived files to their stubs.

Features that utilize the database


Most of the features accessed through the GUI utilize the CTA database. This
includes:

Creating or listing schedules using the Schedule page

Creating or viewing policies using the Policies page

Generating reports or managing archived files using the Archived Files page

Creating/viewing file servers or NAS repositories from the Configuration page

When using the command line, a large number of the rffm commands make extensive
use of the CTA database. In addition, the scheduler component of CTA triggers tasks
to run at specific times. When a task runs (regardless of the type) it reads or modifies
the contents of the database.
The callback daemons (ACD, CCD, and FCD) do not utilize the database.

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

User interaction
Users interact with the CTA database indirectly by using the GUI and CLI. At no point
will the underlying database implementation be exposed to the CTA administrator.
Interacting directly with the database can be extremely risky or disruptive. If an
administrator needs to access the database then EMC support services should be
contacted to provide assistance.

Database maintenance and backup


The CTA database does not require any specific maintenance to be performed by
administrators. However, depending on the amount of activity that heavily uses the
database such as archiving and orphan management the Cloud Tiering Appliance
database performance can degrade over time. This requires a process known as
vacuuming that cleans up empty or leftover rows from database activity in addition to
other steps to optimize the performance of database processes.
CTA provides a configurable alert to notify an administrator when the CTA database
size has exceeded expected disk usage and a vacuum may be required. Depending
on the number of archived files and the size of the database, you may notice
degraded performance of the appliance before an alert is triggered which requires a
vacuum to be run.
NOTE: When you upgrade CTA, the database tables may be re-indexed or rebuilt. This
process can take a few minutes, a few hours, or up to a few days depending upon the
number of entries in the database.

Running database vacuum


Before starting the vacuum process, any File Management tasks which leverage the
database must be stopped. This includes archiving, stub scanning, simulation,
orphan deletion, repository migration tasks among others. These tasks will be
forcibly stopped when the vacuum is started so care should be taken to schedule the
vacuum task to prevent interrupting other CTA activity. Depending on the size of the
database, the vacuum process can take an extended time to complete. Before
starting the process, ensure that there is a sufficient window where archiving is not
required to run.
To schedule a vacuum task to run in the future at a regular interval, use the following
command from the command-line of the CTA:
rffm scheduleVacuumTask [--VacuumStartTime=STARTTIME] [-VacuumWeekRepetition=WEEKS]

When this command is used, STARTTIME should be entered using the format YYYYMM-DD HH-MM-SS and WEEKS is specified as an integer for the weekly period to run
the vacuum. For example:
# rffm scheduleVacuumTask --VacuumStartTime=2011-06-01 12:00:00 -VacuumWeekRepetition=4

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

This command would schedule a vacuum task to run every 4 weeks starting on
6/1/2011 at 12:00PM. When a vacuum task is scheduled to run an alert will be sent
automatically to notify an administrator of the impending task and that all tasks will
be stopped before vacuuming can run.
To immediately start a vacuum of the CTA database use the following command from
the command-line of the CTA:
rffm doDBMaintenance

This command will stop all active File Management tasks including archiving, stub
scanning, simulation, and orphan deletion. It will then stop the postgresql database
service and start a database vacuum task. During this period, the CTA cannot be
used for any tasks and the filemanagement service must not be started manually.
Whether the vacuum is started automatically by the scheduled vacuum task or using
the manual command, the effect on running tasks and services is the same. The
automatic scheduled vacuum task should be used when possible to ensure regular
maintenance is performed on the database to prevent it from growing too large.
To monitor the progress of the database vacuum periodically check the following
logs:
/opt/rainfinity/filemanagement/conf/DBMaintenance.log
/var/lib/pgsql/vacuum.log
NOTE: Recalls will continue to be serviced by the CTA and CTA-HA during database
maintenance and recall services are unaffected by database activity. Once the
database maintenance task has been started, it cannot be stopped and no attempt
should be made to interrupt it or to start the filemanagement service manually. Doing
so can corrupt the CTA database, requiring a manual restore of the CTA configuration
and database from a backup.
If the database size is larger than 200GB the database vacuum task may fail. The
database size can be found by using the following command from the CTA command
line:
du sh /var/lib/pgsql/data/base

If the output indicates the directory is larger than 200GB, please contact EMC Support
for a manual procedure to reduce the size of the CTA database and to perform the
vacuum.

Backing up the database


The CTA database is not an integral part of recalling archived data and thus the loss
of the CTA database will not inherently cause data unavailability. However, the CTA
database stores significant amounts of metadata describing the states of files on
primary and secondary storage as well as their relationships. The loss of this

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

information will affect most features of the appliance as previously described. It is


therefore desirable to protect the CTA database through periodic backups.
Periodic backups should be taken of the CTA configuration and database by
leveraging the Automatic Backup/Recovery feature. This feature provides scheduling
for generating backups and writing them to Centera or an NFS NAS repository. For
more information on configuring and using this feature, consult the EMC Cloud Tiering
Appliance 7.5 Getting Started Guide and the Online Help modules.
Specific consideration should be given to the types of events that modify the contents
of the CTA database. Specifically, archiving, stub scanning, multi-tier archiving,
orphan deletion, repository migrations tasks have the potential to generate large
amounts of database changes. It would typically be advised to schedule backups
after these types of tasks have completed.
In the event of a disaster the output file from the Automatic Backup/Recovery feature
can be restored on CTA using the fmrestore utility from the CTA CLI. This will erase any
existing configuration on CTA and replace it with the configuration and database
contents stored in the backup file. The EMC Cloud Tiering Appliance 7.5 Getting
Started Guide and the Online Help modules provide instructions on how to recovery
backup files from Centera and NAS after a disaster.

Restoring orphan management


The orphan file management feature allows CTA to clean up unused data on
secondary storage. In order to remove a piece of data from secondary storage the CTA
must have an entry in its database to indicate that it was the creator of that piece of
data. As an example, EMC customers using EmailXtender may have objects stored
on EMC Centera that were not created by and should not be deleted by CTA. Therefore
CTA will not delete anything from a secondary storage location unless it has a
database entry which references it.
Due to this requirement it is very important to preserve the integrity of the CTA
database. It is highly recommended to perform periodic backups of the CTA database
as mentioned above using the Automatic Backup/Recovery feature. However, it
should be noted that even when periodic backups are taken there may still be some
orphan data on secondary storage that cannot be identified by CTA. Consider the
following sequence of events:
a. At 1 P.M. the administrator runs the fmbackup command.
b. At 2 P.M. an archiving task is launched.
c. At 3 P.M. a user deletes a stub file created by the archiving task that was just
launched.
The next day the administrator needs to recover from a disaster by rebuilding a new
CTA so the backup file is loaded using the fmrestore command
In this scenario, there will not be a record in the CTA database referencing the object
on secondary storage nor will there be a stub on primary storage. CTA will not be able
to delete the object on secondary storage because it cannot confirm that it created it.

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

In order to minimize the impact of this scenario an administrator should take frequent
backups of the CTA database, especially during periods of heavy archiving activity.
If the user did not delete the stub in step 3 then the CTA stub scanner would have recreated the CTA database entry when it read the stub contents during its regularly
scheduled scan. This would restore the ability to perform orphan file management.
In a worst-case scenario where no backup of the CTA database was ever taken, the
new CTA will be able to perform orphan management for all stubs found by the stub
scanner but all data orphaned before the stub scanner runs could no longer be
deleted by CTA.

Conclusion
Cloud Tiering Appliance provides several features to allow storage administrators to
manage archived data and the stub files that are written to primary storage when files
are archived. This document describes best practices for handling stub and orphan
files.

Best Practices for Managing the


EMC Cloud Tiering Appliance Database

You might also like