You are on page 1of 27

Essential Elements of Data

Center Facility Operations


Schneider Electric
Data Center Science Center
White Paper 196

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

70% of data center outages are directly attributable to human


error according to the Uptime Institutes analysis of their
abnormal incident reporting (AIR) database1. This figure
highlights the critical importance of having an effective operations
and maintenance (O&M) program. This presentation describes
unique management principles and provides a comprehensive,
high-level overview of the necessary program elements for
operating a mission critical facility efficiently and reliably
throughout its life cycle. Practical management tips and advice
are also given.

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Introduction
Importance of operations and maintenance (O&M) program

Most facility outages attributable to human (operator) error


Majority of data center facility TCO is in OPEX, not CAPEX, where greatest
potential cost savings reside
Largest portion of OPEX are energy costs, which are rising
Drive for energy efficiency reducing capacity safety margins and system
redundancy, increasing importance of proactive
maintenance and data center infrastructure
management (DCIM)
High levels of facility automation and equipment
performance data have created new opportunities
for enhancing reliability while reducing costs,
when properly managed

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Mission Critical Mentality


Failure is not an option
Focuses on risk mitigation
Grasps interconnectedness of facility
and IT systems
Data center availability is paramount
Highly complex, fast-paced changes
in mission critical facility
Challenging to manage

Unique outside pressures


Government regulations
Customer audits

NOTE: In this paper, only system planning is covered. System planning refers to the power, cooling, racks,
and other support infrastructure systems. Planning related to the IT equipment is not discussed here.
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Mission Critical Mentality


Code of Conduct

Mission Critical Mindset principles


Focused on risk mitigation in all operational and
maintenance activities, work processes, and
procedures
Acting with confidence and patience that is an
outgrowth of careful planning and preparation
Analytical, process-driven approach to risk
avoidance and problem solving
Comprehensive understanding of the function and
interconnectedness of facility systems and
components
Commitment to continuous learning and process
improvement

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Impact
Proactively deals with all potential threats to
system availability and worker/occupant safety
Prevents risks from becoming problems;
enables faster response times and fewer errors
if problems do arise
Helps identify and mitigate risk in complex
environments; ensures predictable and safe
operation
Quickly identify and resolve potential threats
or actual problems; avoid or reduce system
downtime
Increases skills and operational efficiency to
maintain an edge in a constantly changing
environment

12 Essential Elements of an O&M Program


Environmental Health and Safety
Key components include

Injury, illness prevention


Electrical safety
Hazard analysis
Hazard communication

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Environmental Health and Safety
Key Program Attributes

Description

Safety plans and training

Written safety plans must be established that describe the safe work practices and
procedures to be observed by all workers. Regular training on the program
elements must also be conducted.

Hazard analysis

All operational procedures shall start with an analysis of the possible hazards
involved. Risks must be identified and safety measures assigned.

Lockout/tagout procedures

Proper procedures to prevent the unexpected energizing or startup of machines or


equipment (or which causes a release of stored energy) shall be used when
servicing or maintaining equipment.

Personal protective equipment


(PPE)

Appropriate protective equipment should be provided, properly sized, stored,


maintained, and utilized as required to mitigate identified safety hazards.

Hazardous material handling

Hazardous materials must be properly identified, labeled, stored, maintained, and


used in conformance with manufacturers requirements, local laws, and
ordinances.

Includes a list of hazardous chemicals, use of material safety data sheets (MSDS),
Hazard communications program proper labeling of all hazardous materials containers, and employee training on use
of and protection from hazardous materials.
Compliance with all applicable
health and safety laws and
regulations

Requirements will likely vary by region and by level of government (e.g., local,
state, federal).

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Personnel Management
Hiring and training
Competent, team-oriented people with
mission critical mentality
Well-rounded team

Develop staffing model


Clearly defined roles and responsibilities

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Emergency Preparedness and Response
Develop emergency operating
procedures EOPs for all high-risk
failure scenarios
Develop, rehearse escalation
procedures
Conduct regular scenario drills
Formal failure analysis for significant
facility events

See White Paper 199, Data Center Emergency Preparedness and Response, for
more information.
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Maintenance Management
Key tasks
Asset management
Work order management
Spare parts management

Ensure power and cooling continual performance


Improved reliability with
Good asset intelligence
Proactive and preventative predictive
maintenance plan

Results in
More accurate maintenance budget
forecasts
Minimized TCO and downtime
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Maintenance Management > Asset Management
Accurate, consistent tracking of critical facility assets
Computerized maintenance management system (CMMS)
Record, track, and manage asset data and maintenance history

Scope of service (SOS)


Defines maintenance frequency, specific activities, # of man hours
Establishes standard for procurement of

Service agreements
Maintenance scheduling
Procedure development
Continuous program improvement

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Maintenance Management > Asset Management
Recommended asset management information
Type - top level classification (e.g. electrical,
mechanical, fire system)
Sub-type (e.g. PDU, UPS, CRAH)
Text description of asset
Make - asset manufacturer name
Model - manufacturer model #
Size or rating
Location ID (room/area)
Trade responsible for maintenance
Manufacturer serial #
Install date
Warranty expiration date
Date asset to be replaced
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Maintenance Management > Work Order Management

Tool for service process management


Allows work to be
Correctly prioritized
Assigned the right resources
Complete d on schedule

Standalone ticketing system OR


Integrated work order module in a
CMS or DCIM system
Provide valuable information to facility personnel

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Maintenance Management > Spare Parts Management
Shortens mean time to recovery MTTR
Inventory should include parts with lead times longer than acceptable
downtime
Maintain spare parts list
Stock frequently used items
Re-evaluate annually

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Change Management
Method of Procedure - MOP
- process
Detailed checklist of
specified tasks

MOP helps control work


activity along with
Operational procedure
development and review
Risk analysis and
communication
Structured work practices
Vendor/contractor
supervision

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Documentation Management
Facilitates development of

Accurate procedures
Proper training
Workplace safety
Process improvement

Document management software application


System to keep critical infrastructure records
organized, up-to-date
Detailed checklist of specified tasks

Manual process can also work

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Training
Establish training program that organizes operational and maintenance
tasks into categories
Mapped to capability levels basic, intermediate, advanced

Train and evaluate personnel to certify them


Require annual recertification exams

Ongoing education keeps personnel current

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Infrastructure Management
System to match facility resources with changing IT requirements
Prevent downtime
Improve resiliency
and response
Reduce operating
expenses
Provide a sound
basis for capacity
planning decisions
Three key tasks
Facility monitoring
Capacity management
IT/Facilities integration
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Quality Management
Key components
Quality Assurance (QA): Typified by process and procedure
standardization
Quality Control (QC): Quality checks, inspections, and audits
Continuous Quality Improvement

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Energy Management

Energy typically the single


largest data center expense
3 core tasks of an effective
energy management program
Performance benchmarking
Efficiency analysis
Strategic energy sourcing
Optimized energy sourcing
Reduce exposure to price volatility
Secure pricing that fits budget and business objectives

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Financial Management

Financial-related issues can impact facilitys


day-to-day availability and resiliency
Processes should focus on
Purchasing
Invoice matching
Financial reporting/analysis
Facility managers and purchasing department
should maintain close relationship

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Performance Monitoring and Review
Regularly monitor and review facility
performance
Determines health and effectiveness
of O&M program
Shows where it is trending
Quality process should incorporate
facility KPIs
Benefits
Aligns operational activities with
business goals
Positive reinforcement for innovation
and process improvement
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Common Mistakes
Common Mistakes

Maintenance program is not driven


by metrics
Poor training

Ineffective change management


Failure to consistently test &
evaluate skills
Poor documentation

Description
Often the result of poor asset management
No linkage made between break/fix maintenance
activities and preventative maintenance
Training is not formalized and/or is not taken seriously
Over-reliance on technician shadowing
No linkage between certification level and tasking
Inadequate risk analysis
Poor or non-existent procedures
No defined process for performing critical work tasks
Existing skills/training level not formally evaluated
Scenario drills are not employed
Incident and drill results are not evaluated
No coherent sequence of operations
Drawings and schedules are outdated
Lack of revision control and/or lack of digitization

Failure to develop and implement a


quality control system

Lack of governance or resources to measure, monitor,


and review performance

Stuck in manual mode

Failure to implement CMMS, EDMS, DCIM, etc

Overconfidence

Assumption that future performance can be predicted


by past experience

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Facility Operations Services


Using Outside Vendors for O&M Programs

Offer services for both existing and new data centers


Advise on
Develop
Implement
Operate

See White Paper 198, How to Write an Effective RFP for Data Center Facility
Operations Services, for more information.
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

12 Essential Elements of an O&M Program


Performance Monitoring and Review > Recommended Facility KPIs
Critical load uptime
Load redundancy
maintained
Support system uptime
Maintenance completion
Staffing coverage
Security policy
conformance
Emergency preparedness
drills
Emergency response
procedure adherence
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Safety policy and procedure


adherence
Procedure development,
management and use
Quality control/improvement
Training compliance
Process improvement
Operational reporting
Proper event notification and
escalation
Timely and accurate cost reporting

Conclusion
Efficient Operations & Maintenance program
Mitigates threats, effects of human error
Focus on 12 essential elements of O&M program
Must have facilities operation team with mission critical mindset
Operational philosophy focuses on
risk mitigation
Preparedness
standardized processes
continuous improvement

Schneider Electric Data Center Science Center WP 196 Presentation February 2014

Resources
Facility Operations Maturity Model for Data Centers
White Paper 197
How To Write an Effective RFP For Data Center Facility Operations Services
White Paper 198
Data Center Emergency Preparedness and Response
White Paper 199
Classification of Data Center Infrastructure Management (DCIM) Tools
White Paper 104
How Data Center Infrastructure Management (DCIM) Software Improves Planning and Cuts
Operational Costs
White Paper 107
Avoiding Common Pitfalls of Evaluating and Implementing DCIM Software
White Paper 170
Browse all APC white papers
whitepapers.apc.com
Browse all APC TradeOff Tools
tools.apc.com
Schneider Electric Data Center Science Center WP 196 Presentation February 2014

You might also like