vCOPS Incident MGT Highlevel PDF

ITT3241
Operating a More
Reliable Cloud Through
Proactive Incident and
Problem Management
Rich Benoit, VMware, Inc.
Doug Huber, VMware, Inc.
#vmworldittran
Disclaimer
 This session may contain product features that are

currently under development.
 This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
 Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
 Technical feasibility and market demand will affect final delivery.
 Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
2
Session Executive Summary
 Cloud Transformation
 What is Incident and Problem Management?
 Why Proactive Incident and Problem Management
 Current State
 Evolution from Reactive to Proactive Incident and Problem
Management
 Operational Benefits
 Key Performance Indicators
3
3
A New Operating Model for the Cloud Era
Reactive Proactive Innovative

IT Business
Management
People, Culture
& Organization
Processes
& Control
Software Technology
& Architecture
4
Five Capabilities Which Unlock Cloud Benefits
Description Major processes impacted
Service catalog with standardized offerings and tiered

On-Demand SLAs, actively managed and governed throughout its  Request fulfillment
Services lifecycle, and with end-user access via a self-service  Application development
portal
 Request fulfillment
Automated Automated provisioning, release and deployment of  Application development
Provisioning & infrastructure, platform and end-user compute  Release and deployment
Deployment services management
 Incident management
Proactive Incident &
Monitoring and filtering of events, automatic incident  Request fulfillment
Problem
resolution, and problem diagnosis  Event management
Management
 Application development
Security, compliance, and risk management policies  Information security

Policy-based management
embedded into standard configurations enabling
Security, Compliance  Compliance management
policy-aware applications and automation of security,
& Risk Management  Risk management
audit, and risk management processes
IT Financial IT cost transparency and service-level usage-based  Financial management

Management (ITFM) ‘showbacks’ or ‘chargebacks’ using automated  Supplier management
for Cloud metering and billing tools  Demand management
5
Lets Agree Upon A Definition…
Although these process areas are based on ITSM principles in Level 1 of

the Maturity model, by Level 3 what had been a very manual reactive set
of processes are now beginning to automate and become much more
proactive in nature – or Cloud Operations
Incident Management
Focuses on how to handle performance problems or outages. The
primary focus of Incident Management is to manage the incident until it is
resolved. Problem Management
Problem Management
Focuses on identifying root causes to repetitive and high priority incidents.
Once Root Causes have been identified, a plan of action will be generated
that will ideally repair the underlying problem. If the problem can’t be fixed,
additional monitoring and event management handling may be
implemented in an attempt to minimize or eliminate future occurrences of
the problem.
6
Cloud Operations unlocks the benefits of Cloud
Efficiency Agility Reliability
Free-up as much as 25% of Reduce time to market for new Ensure data and application
labor operating costs through innovations and increase security, compliance,
standardization, automation, flexibility availability, and recoverability
and streamlining operations1 to employees and customers
1 Your savings may vary
7
Why Proactive Incident and Problem Management
Cloud is changing how resources are shared and consumed today
Proactive Intelligent Comprehensive

Management Automation Visibility
 Analyze  Performance  Health

 Optimize  Capacity  Risk
 Forecast  Configuration
Efficiency Agility Reliability
8
Typical Process Flow Today
9
How Did We Get Here?
Constant Evolution of the Tools…

Generation 1: Focused on infrastructure management inside silo’d technology domains.
IT architectures
Reactive Reactive were simple 3 tier
Incident Problem
Management Help Desk Management hierarchy
Administrators
10

Generation 2: A focus on ITIL was added and on developing a framework around IT
management processes that could be implemented in tools
IT architectures
Reactive Level 1 Reactive morphed to a full
Incident Problem mesh
Management Support Management
Level 2 Support
Level 3 Support
11

Generation 3: Virtualization quickly identified how the lack of IT governance and process
results in sprawl and higher costs, which has resulted in more focus on end-to-end process
automation to get the advantage of ITOM investments.
Automated Complexity
Workflows exploded
Reactive Reactive
Incident Interactive Problem
Management Workflows Management
Level 1 Support
Level 2 Support
Level 3 Support
12
Cloud Ops
 Automating incident and problem management in the data
center is the key to becoming proactive
 Intelligent analytics and control continuously:
• Assesses the thousands of performance metrics and available
capacity across the entire IT stack,
• Considers all business and physical constraints
• Drives the necessary actions to tune and maintain the environment
in an optimal operating state.
 Instead of alerting you when problems
occur, or are about to occur:
• Optimizing performance, maximizing infrastructure
efficiencies and reducing operational costs.
• Prevents events/alerts from happening
• Controls the environment in a “healthy” state
 Intelligently prioritize resources and
automatically scale up or down as
performance and business
demand fluctuates
13
What is Proactive Incident and Problem Management
Ensure and Restore Optimize for

Service Levels Efficiency and Cost
Monitor Plan
Slow performance Utilization / forecast
!
Problem Maintenance
Remediate Isolate Automate Optimize

Rollback change Config issue Orchestrate changes Reclaim capacity
Reactive Proactive
14
Operational Benefits Across Three Dimensions
Benefits
Dimensions Examples
▪ Simplified infrastructure management via abstraction, policy-based

automation, and “app awareness”
Efficiency ▪ Less complex monitoring requirements due to smarter alerting of potential
issues.
▪ Fewer resources needed for labor intensive processes as a result of
automation
▪ Increased time to awareness when a critical cloud service issue arises

Agility ▪ Greater speed in addressing faults or performance based issues
▪ Capacity scaled prior to impacting performance or business requirements
▪ Improved quality of service and experience for consumers due to a reduction

in downtime (planned and unplanned)
Reliability ▪ Greater adherence to SLAs (e.g., availability, latency)
▪ Fewer events turning into incidents or problems through a proactive
approach
15
Proactive Incident and Problem Management OPEX Savings
Incident Management Change Lifecycle Savings

Lifecycle Savings  Manage changes to
 Manage/Resolve incidents apps/infrastructure
 Proactive alerts reduce costs  “Before/after” analysis reduces
30-40% changed-related incidents 30-40%
Incident Management Problem Management

Savings Savings
 Managing Service Desk issues  Closing problems after systems
(Incidents) restored, includes root cause
 Manual threshold elimination analysis
reduces erroneous tickets by  Root cause analysis improves
50-60% problem closure by 30%
Source: Reducing Operational Expense with Virtualization and Systems Management - Enterprise Management Associates
16
Business Impact and KPIs
Frequency of Interruption Mean Time to Repair (MTTR)
Widely used to measure the time

Service impact covers a number
between a fault occurring, and the
of metrics including application
KPI
fault being fixed. A good measure

response time, unscheduled and
of how quickly IT responds to
scheduled downtime, and
problems occurring in managed
frequency of security breaches.
systems.
IT staff costs (including overtime Less downtime, and reduces lost

IMPACT
and on-call costs for support), productivity from idle business unit
business unit staff costs (including staff. Fewer IT resources are
time lost when systems are down), being diverted from strategic
and lost revenue from downtime project work break-fix activity.
17
Business Impact and KPIs
Ease of Management Faster Service Deployment
Overall ease of management and

functionality that is unique to Reducing the time to deploy, re-
KPI
cloud. Most enterprises report deploy or move a business service

daily management tasks are is one of the most widely-reported
easier, or at least the same, in a OpEx reduction outcomes
virtual environment.
Zero-downtime migration
Provisioning is easier with
eliminates both application
IMPACT
templates than with traditional

downtime costs for business
software installation, availability is
users, and overtime payments for
easier to ensure with resource
out-of-hours migrations. Faster,
pooling and live migration.
cheaper lifecycle .
18
In Closing
 Proactive incident and problem management is enabling IT

transformation in support of your cloud solutions
 Utilizing proactive cloud health monitoring capabilities thru learned

behavior analytics and methods to help your organization attain its IT
and Business goals
 Identified Key Performance Indicators to evaluate and measure your

journey to cloud operations
 improving
Identified savings opportunities in IT Operating Expenses, while
cloud services availability and quality
19
Learn more about VMware Cloud Solutions
 Maximize the power of cloud computing to:

• Deliver new IT services that fuel business growth
• Transform IT into a source of innovation
• Dramatically improve IT efficiency, agility and
reliability
 Develop key capabilities in your organization

with VMware Cloud Operations Services
• Advisory, education, and remediation services
• Insight, prioritized recommendations, and expert
guidance to transform operational processes,
organizational structures, and financial models
vmware.com/cloud
20
Additional IT Transformation Tracks
SESSION ID TITLE DAY TIME
ITT1918 Is My Organization Ready to Reap the Benefits of the Cloud? Monday 12:30 PM
From Reactive to Innovative: The Journey to the Cloud Crosses

ITT3237 Monday 02:00 PM
People, Process, Technology and Measurement
ITT3245 VMware on VMware: Our Journey to the Cloud (Part 1) Monday 05:00 PM
Planning and Measuring the Impact of Cloud: IT Metrics that

ITT3244 Tuesday 11:00 AM
Matter
ITT3242 Managing Cloud Security, Compliance, and Risk Management Tuesday 12:30 PM
Advice From Your Peers: How to Best Run and Manage a Cloud
ITT1953 Tuesday 02:00 PM
Environment
ITT3243 Delivering IT Financial Management for Cloud Tuesday 05:00 PM
ITT3238 Taking Your Workloads to the Cloud: Why, How, and When? Wednesday 08:30 AM
VMware on VMware: How the Virtualization Leader is Moving to

ITT3246 Wednesday 10:00 AM
the Cloud (Part 2)
Operating a More Reliable Cloud Through Proactive Incident and
ITT3241 Wednesday 11:30 AM
Problem Management
ITT3239 On-Demand IT: Leveraging Cloud for Efficient Self-Service IT Wednesday 04:00 PM
ITT3240 From Weeks to Hours: Automated Provisioning and Deployment Thursday 12:30 PM
21
FILL OUT
A SURVEY
EVERY COMPLETE SURVEY

IS ENTERED INTO
DRAWING FOR A
$25 VMWARE COMPANY
STORE GIFT CERTIFICATE
ITT3241
Operating a More
Reliable Cloud Through
Proactive Incident and
Problem Management
Rich Benoit, VMware, Inc.
Doug Huber, VMware, Inc.
#vmworldittran

vCOPS Incident MGT Highlevel PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

vCOPS Incident MGT Highlevel PDF

Uploaded by

Copyright:

Available Formats

ITT3241

Rich Benoit, VMware, Inc.

Doug Huber, VMware, Inc.

 This session may contain product features that are

Reactive Proactive Innovative

Description Major processes impacted

Service catalog with standardized offerings and tiered

Security, compliance, and risk management policies  Information security

IT Financial IT cost transparency and service-level usage-based  Financial management

Although these process areas are based on ITSM principles in Level 1 of

Efficiency Agility Reliability

1 Your savings may vary

Cloud is changing how resources are shared and consumed today

Proactive Intelligent Comprehensive

 Analyze  Performance  Health

Efficiency Agility Reliability

Constant Evolution of the Tools…

Constant Evolution of the Tools…

Constant Evolution of the Tools…

Ensure and Restore Optimize for

Remediate Isolate Automate Optimize

▪ Simplified infrastructure management via abstraction, policy-based

▪ Increased time to awareness when a critical cloud service issue arises

▪ Improved quality of service and experience for consumers due to a reduction

Incident Management Change Lifecycle Savings

Incident Management Problem Management

Frequency of Interruption Mean Time to Repair (MTTR)

Widely used to measure the time

fault being fixed. A good measure

IT staff costs (including overtime Less downtime, and reduces lost

Ease of Management Faster Service Deployment

Overall ease of management and

cloud. Most enterprises report deploy or move a business service

templates than with traditional

 Proactive incident and problem management is enabling IT

 Utilizing proactive cloud health monitoring capabilities thru learned

 Identified Key Performance Indicators to evaluate and measure your

 Maximize the power of cloud computing to:

 Develop key capabilities in your organization

From Reactive to Innovative: The Journey to the Cloud Crosses

Planning and Measuring the Impact of Cloud: IT Metrics that

ITT3243 Delivering IT Financial Management for Cloud Tuesday 05:00 PM

VMware on VMware: How the Virtualization Leader is Moving to

EVERY COMPLETE SURVEY

Rich Benoit, VMware, Inc.

Doug Huber, VMware, Inc.

You might also like