Professional Documents
Culture Documents
Abstract
This document discusses the best practices for managing the internal
database used by EMC Cloud Tiering Appliance.
October 2011
Table of Contents
Executive summary.................................................................................................. 4
Audience ............................................................................................................................ 4
Conclusion .............................................................................................................. 8
Executive summary
The EMC Cloud Tiering Appliance (CTA) is used to implement a tiered storage
strategy through file level archiving, thereby facilitating significant storage savings.
This document discusses the best practices for managing the internal database used
by Cloud Tiering Appliance.
Audience
This white paper is intended for storage administrators who are tasking with
managing the tiered storage environment and must interact with the Cloud Tiering
Appliance on a regular basis. It is also intended for field personnel who are
responsible for implementing a tiered storage solution with the EMC Cloud Tiering
Appliance.
This document is supplemental to the published product documentation and is not
intended as a replacement for said documentation. It is assumed that the reader is
familiar with those documents including the Cloud Tiering Appliance and Cloud
Tiering Appliance/VE Getting Started Guide, Cloud Tiering Appliance and Cloud
Tiering Appliance/VE Release Notes, and CTA and CTA/VE Interoperability Matrix
before going through this document. If you have not reviewed those documents,
please do so in addition to reading this paper in order to gain a complete
understanding of the Cloud Tiering Appliance technology.
Generating reports or managing archived files using the Archived Files page
When using the command line, a large number of the rffm commands make extensive
use of the CTA database. In addition, the scheduler component of CTA triggers tasks
to run at specific times. When a task runs (regardless of the type) it reads or modifies
the contents of the database.
The callback daemons (ACD, CCD, and FCD) do not utilize the database.
User interaction
Users interact with the CTA database indirectly by using the GUI and CLI. At no point
will the underlying database implementation be exposed to the CTA administrator.
Interacting directly with the database can be extremely risky or disruptive. If an
administrator needs to access the database then EMC support services should be
contacted to provide assistance.
When this command is used, STARTTIME should be entered using the format YYYYMM-DD HH-MM-SS and WEEKS is specified as an integer for the weekly period to run
the vacuum. For example:
# rffm scheduleVacuumTask --VacuumStartTime=2011-06-01 12:00:00 -VacuumWeekRepetition=4
This command would schedule a vacuum task to run every 4 weeks starting on
6/1/2011 at 12:00PM. When a vacuum task is scheduled to run an alert will be sent
automatically to notify an administrator of the impending task and that all tasks will
be stopped before vacuuming can run.
To immediately start a vacuum of the CTA database use the following command from
the command-line of the CTA:
rffm doDBMaintenance
This command will stop all active File Management tasks including archiving, stub
scanning, simulation, and orphan deletion. It will then stop the postgresql database
service and start a database vacuum task. During this period, the CTA cannot be
used for any tasks and the filemanagement service must not be started manually.
Whether the vacuum is started automatically by the scheduled vacuum task or using
the manual command, the effect on running tasks and services is the same. The
automatic scheduled vacuum task should be used when possible to ensure regular
maintenance is performed on the database to prevent it from growing too large.
To monitor the progress of the database vacuum periodically check the following
logs:
/opt/rainfinity/filemanagement/conf/DBMaintenance.log
/var/lib/pgsql/vacuum.log
NOTE: Recalls will continue to be serviced by the CTA and CTA-HA during database
maintenance and recall services are unaffected by database activity. Once the
database maintenance task has been started, it cannot be stopped and no attempt
should be made to interrupt it or to start the filemanagement service manually. Doing
so can corrupt the CTA database, requiring a manual restore of the CTA configuration
and database from a backup.
If the database size is larger than 200GB the database vacuum task may fail. The
database size can be found by using the following command from the CTA command
line:
du sh /var/lib/pgsql/data/base
If the output indicates the directory is larger than 200GB, please contact EMC Support
for a manual procedure to reduce the size of the CTA database and to perform the
vacuum.
In order to minimize the impact of this scenario an administrator should take frequent
backups of the CTA database, especially during periods of heavy archiving activity.
If the user did not delete the stub in step 3 then the CTA stub scanner would have recreated the CTA database entry when it read the stub contents during its regularly
scheduled scan. This would restore the ability to perform orphan file management.
In a worst-case scenario where no backup of the CTA database was ever taken, the
new CTA will be able to perform orphan management for all stubs found by the stub
scanner but all data orphaned before the stub scanner runs could no longer be
deleted by CTA.
Conclusion
Cloud Tiering Appliance provides several features to allow storage administrators to
manage archived data and the stub files that are written to primary storage when files
are archived. This document describes best practices for handling stub and orphan
files.