You are on page 1of 24

Many companies spend millions of dollars to build client/server computing

environments and many millions more to maintain and update that


environment. In most cases, the overall capacity of the environment is
unknown and capacity planning and procurement is done without a defined
methodology. This results in something known as server sprawl. Where
many servers are running at minimal utilization.

Many customers are turning to VMware to solve the problem of server sprawl
by implementing VMware Hypervisors in their server environment. When
planning for server consolidation through virtualization many organizations
have a very limited understanding of their existing server environment in
terms of the different hardware they have, how that hardware is being used,
and which servers will make good server virtualization candidates.

Large IT environments have dozens – sometimes hundreds or thousands – of


servers running multiple applications and services for a wide range of
departments, owners and business domains. Deciding on how to combine
these into fewer, more manageable physical resources, while at the same
time planning for future expansion, unexpected demands, and organizational
changes, can be a daunting task.

This is why an Infrastructure Virtualization Methodology is needed.

A Infrastructure virtualization methodology helps you to understand what


capacity exists in the organization today and how to minimize the impact of
virtualization on that workload.

In addition a well defined methodology will help an organization estimate


capacity requirements, understand the limits with respect to what
virtualization can do, and take an approach that yields quick wins to build
confidence in virtualization.

A well defined Assessment Methodology is critical for preventing resource


contention in the new virtual infrastructure and reduces the likelihood of
panic buying and last minute surprises.

A proper approach to Infrastructure virtualization helps an organization to


shorten time to implementation. When an organization takes the necessary
time to Assess and Plan and Design it helps to decrease the time of all
phases necessary to build out a virtual infrastructure. Without taking the
proper time to access the current environment, the other three phases, pilot,
Plan and Design, and Implementation take longer because no time was taken
at the beginning of the project to understand the existing environment.

Developing an Infrastructure Virtualization Assessment Methodology involves


understanding what can and should be virtualized. A good assessment
methodology should include a Inventory of all systems as well as an
understand of the Workload or Performance characteristics of each.

The inventory should be used to identify ineligible candidates and answer


the question “can we virtualize this system.”

A workload or performance analysis helps to stratify the system, understand


its workload, and determine if a system should be virtualized and if so when.

Next an Infrastructure virtualization assessment methodology needs to be


able to determine how to estimate capacity, either through the use of
modeling software or by using rule of thumb estimating based on the
performance characteristics of the server.

In either case the estimates should be conservative in order to allow room for
growth, or for the occasional spikes in workload on the server.

In organizations that are attempting to justify a virtualization project to


management, cost modeling should be a part of any Infrastructure
Virtualization Assessment. In order to determine the costs of a virtual
environment, the total cost ownership, payback period, and Return on
investment should be understood in order to make an effective case for
virtualization.

A cost model is built by understanding the organizations environment and


determining what the cost of IT is to the organization. These costs are
usually determined by meeting with various departments in the organization
and learning the true cost of IT infrastructure at that organization.
This concludes the Infrastructure Virtualization Overview module. Let’s review
some of the key areas covered:

Capacity Planning is not done by many organizations today.

Organizations that don’t know what capacity they have in their physical
environment, have a difficult time determining how much capacity will be
needed for virtualization.

An infrastructure virtualization assessment methodology involves


inventorying the existing infrastructure , understanding the workload or
performance characteristics of the systems, performing consolidation
estimation, and justifying the project to management through the use of cost
modeling.

Module : Welcome to Infrastructure Virtualization – Understanding your


environment

Upon completion of this module, you should be able to:

Discuss the methods for collecting data.

Identify non-virtualization candidates

Understand the role of inventory

Understand the some of the basic inventory metrics to collect on a system

To make good decisions about capacity planning and about consolidation the
project team must begin by obtaining a detailed understanding of the
capacity that is currently present. A starting point of any inventory is simply
to count the existing resources.

It is difficult to make decisions about how to proceed with an IT Infrastructure


without an understanding of the total number of systems on the network.
VMware has found during assessments that an average of 20% of systems
are suddenly found when a thorough inventory of all systems is performed.

There are a number of ways to collect the data from manual to fully
automated.

In manual data collection the customer and consultant must work together to
get an understanding of the inventory from the existing methods used to
track servers. The drawback of this method is that often retired servers get
repurposed for other projects so they stays on the network long after they
are no longer being tracked. Manual methods also break down when the
method used to account for servers is not kept update when changes occur.

Partially automated solutions are agent based and usually provide a wealth of
information about the system. In this method of collection, an agent is
usually installed on each system. The agent collects the data and sends the
information to centralized management server. The management console
can then query a database for information about each system.

The drawback of an agent based solution is that those systems that do not
have agents will be missed in the inventory.

Fully automated solutions will search the network and discover servers
through various network protocols. For example servers may be located
through a WINS database search, a network broadcast, or a TCP/IP Ping
sweep.

The drawback of fully automated inventory solutions is that often they


require certain TCP/IP Ports to be opened on firewalls in order to discover all
the systems on the network.

The role of inventory is to identify and exclude ineligible candidates. Some


data of interest would be System Architecture, Operating Systems,
Specialized Peripherals, and Compute requirements.

Some examples of Non-Candidates would be a non x86 or x64 platform or


unsupported operating systems. Keep in mind that the new vSphere
supports 13 new operating systems that were not supported in VMware
Infrastructure 3.5. This information can be determined by capturing an
inventory of the datacenter.

Systems with specialized hardware can be difficult to port to a virtual


platform. It’s important to discover if there are any systems that have
devices like fax cards, USB dongles, serial connectors used for licensing, or
other specialized hardware.
Depending on the tool used, automated inventory tools may not be able to
discover the specialized hardware. This may require manual steps to
discover these.

By capturing the inventory we can identify those servers that exceed


allowable maximums such as a server that has to many CPUs or to much RAM
to run in the virtual environment. Keep in mind that vSphere introduces new
maximums that increase the limits that are imposed in VMware Infrastructure
3.5.

No matter which method is used to collect data, it’s important to collect data
on the four core resources of the server; the CPU, Memory, Disk, and
Network. Inventory information on these four key resources need to be
collected so that when performance information is collected, the performance
data can be correlated with the inventory.

VMware’s extensive experience in virtualizing server infrastructures has


revealed that RAM and CPU are usually the limiting factors when it comes to
achieving higher consolidation ratios.

When collecting inventory information about the existing physical systems


CPU, the minimum information that needs to be collected is the number of
CPUs, Number of CPU cores, and the CPU speed.

When collecting inventory information on memory, be sure to collect the RAM


size As mentioned earlier Maximum RAM restrictions in vSphere and VMware
Infrastructure 3.5 can be a limiting factor depending on which of those
VMware platforms that you choose to implement. It might also be useful to
collect the page file size.

Storage plays a pivotal role in many of the features of vSphere and Virtual
Infrastructure 3.5. It is therefore important to gain a through understanding
of the storage environment in the organization. An understanding of the Disk
Size in megabytes is important for Capacity Planning purposes, but other
factors are important as well.

Is the servers operating system data and program files stored separately
from the application data? Does the customer use external storage to store
data? What type of storage is used, direct attached, Network Attached, or
ISCSI, or Fiber SANs.

If external storage is used, find out how much storage capacity is available to
store newly created virtual machines. Some of this information may not be
available through the use of an automated data collection tool and may
require manual investigation.

When collecting inventory information for the networking, be sure to collect


the number of NICs and speed of the NICs on the server.

Capturing the VLAN information is not a requirement for capacity planning


purposes, however it can be useful to capture this information for planning
purposes and understanding how the various NICs in a server are used.

To demonstrate the process of finding and inventorying servers we will use


the VMware Capacity Planner Data Manager. There are many different tools
available for finding and inventorying servers, please consult the
documentation for your tool to learn how to perform these steps.

This concludes the Infrastructure Assessment Understanding Your


Environment module. Let’s review some of the key areas covered:

Discuss the methods for collecting data.

The primary role of inventory is to determine which systems make good


Identify virtualization candidates

Understand the some of the basic inventory metrics to collect on a system

Module 2

Welcome to Infrastructure Assessment – Collecting Performance Data. This


module will discuss collecting performance data from physical systems on the
network.

Upon completion of this module, you should be able to

Understand why performance data is needed.

Understand the core performance metrics that you need to collect to make
good capacity decisions.
Understand Load Profiling.

And understand why capturing workload peaks is essential to capacity


planning.

Performance Data is needed to understand the workload that each physical is


performing before it is consolidated on to an ESX server. Performance Data
is important because multiple physical servers will be consolidated on to a
few number of ESX servers. Therefore, its important to stack these
workloads intelligently and with care.

In order to accurately model what future workloads will look like on ESX
servers it’s important to capture each servers utilization levels. Just like the
Understanding your environment module, we will focus on the four core
resources CPU, Memory, Disk, and network.

Planning for virtual machines on an ESX server requires that you look at the
four core resources and size them appropriately. In order to size the ESX
server appropriately, it is important to understand the workload that will be
placed on these core resources.

The slide shows the four core resources in a ESX Server. When sizing an ESX
server to run your virtual machines, VMware recommends:

For the CPU you sum needed cycles for all Virtual machines.

For the Memory you size the desired RAM Maximum for all virtual machines.

For the disk you sum the desired disk sizes for all virtual machines virtual
disks plus space for other files such as the virtual machine swqp file.

For the NIC you sum the needed bandwidth for all virtual machines.

All performance monitoring tools collect data in their own unique ways. This
lesson will focus on the core metrics that can aid in understanding future
workloads that will run on the ESX servers.

Once inventory analysis is complete, categorize systems according to


priority. Identify those that are the qualified candidates. These that are the
optimal candidates for virtualization they offer quick wins for an organization
because they are easily virtualized.

Move to the semi-qualified candidates next. These are systems that are less
than optimal candidates but can still be virtualized with additional
consideration.

Qualified candidates are those systems that are excellent candidates for
virtualization. They have relatively low utilization rates and whose individual
configurations present no obvious barriers to virtualization. These systems
are known as low hanging fruit because they offer the highest consolidation
rates and most immediate returns.

Some examples of qualified candidates include:

Infrastructure servers such as print servers or domain controllers.

Low Utilization application or web servers.

Small and medium sized file servers.

Semi-qualified candidates are less straight forward than qualified candidates.


They may be still viable for virtualization, but they require additional
consideration. There are various reasons for this including more complex
configurations and additional overhead in the form of virtual machine
resource on the host. Some examples include Database Servers, Application
servers with high utilization rates, and MSCS Clustered Applications.

Thresholds are the first step to determine qualified versus semi-qualified


candidates. Thresholds are maximums on the proposed ESX host that no
virtual machine should exceed when it’s placed on the host.

Beginning with qualified candidates, compare system performance data to


established thresholds. The slide gives an example of some system
thresholds. In order for a system to be qualified it must first not exceed any
of the thresholds on the slide. So for example a qualified would not use over
60% of the processing capacity of the ESX server host.

The thresholds are important because they leave some spare capacity on the
ESX host to handle workload spikes that may occur. The example in the slide
makes some basic assumptions about the ESX environment in that it
assumes that Transparent Page Sharing on the ESX is used. This is why the
thresholds are higher for memory than for the CPU. This assumes that not all
the memory used in the physical server environment will necessarily be used
in the virtual environment.

In order to determine a candidates qualification for virtualization we need to


capture some basic performance metrics on the system.
Two important Processor utilization metrics that are should be collected are
%CPU Utilization and CPU Queue.

%CPU Utilization can be expressed as %Processor Time this is the amount of


time the processor is busy executing non-idle tasks. %CPU Utilization will
give an indication as to how much work the processor is performing.

CPU Queue is the number of threads in the processor queue. This counter
counts ready threads, not threads that are running. A sustained processor
queue of greater than two threads generally indicates processor congestion.

When capturing memory metrics, collect information on:

Available Memory this is the actual RAM utilization of the server or the
amount of memory on the server that is free.

Page file usage is the percentage of the page file being used y the system.

Paging or Memory Pages per second is the amount of data being sent to and
from page file because a specific page was not found in a processes working
set or elsewhere in memory.

File Cache is the amount of memory the operating system has set aside for
file cache.

When monitoring the disk, monitor the disk IOPS per second to determine the
number of read and write operations being sent to the disk subsystem. Many
monitoring tools will only report the logical I/O being sent from the operating
system to the disk drive subsystem. When calculating physical I/O or the
amount of data the disk controller is actually moving, you must factor in the
RAID overhead to get an accurate representation of I/O that the drives
subsystem is actually performing.

I/O Speed is the rate that bytes are transferred to and from the disk drive
during read or write operations.

For the network, collect information on the bandwidth of the interface. The
bandwidth is the number of bytes the server has sent to and received from
the network.
To turn the performance data into useful information, it must be correlated
with inventory data. Many organizations attempt to use a performance
monitoring tool by itself. The performance logs paint a picture like the one
shown in the table labelled typical.

For server consolidation and capacity planning, the conclusion that the
utilization is 25 percent is not accurate. When inventory and performance
information are combined, as they are in the table labelled Correlated, the
results give a more useful picture.

When you apply the inventory information you will discover that actual
capacity is 5.2 gigahertz and that 830 megahertz of that capacity is actually
being used. This equates to 16% utilization of capacity, a significantly lower
number than was yielded by the non-correlated example.

Older, slower CPU’s with high utilization cause this skewing to occur. VMware
has found that 40% of the servers at a typical client site are slower than 500
MHz. CPU Utilization is not the only metric that needs to be correlated with
inventory. Other examples would include CPU Queue, Page File Utilization,
Paging, Memory Available, and others.

When determining the utilization of the system the goal is to capture the
Peak Workload. This does not mean the max observed value. If you have
ever watched a performance monitor while you start up a program you have
seen the processor utilization jump to almost 100% during startup. Every
machine will hit 100% utilization or come close to it at some point or another.
The key is to understand sustained loads.

Take an example where were we are monitoring eight servers. On average


these servers that are being monitored run about 5% Processor Utilization.
During Prime Time, from 8:00 a.m. to 6:00 PM, these servers run closer to 9%
utilization. This represented by the blue line on the chart.

(Editors note start animation 1) Now say that these servers are Exchange
servers and in the morning, they typically run 3 to 4 times hotter than the
average. The same is true at closing time and after lunch. This is
represented by the red line on the chart.
When planning for capacity, if we were only account for the average
utilization of these Exchange servers to be able to meet the needs of just the
“average” utilization, we would have a lot of very unhappy users in the
morning, at lunch and at closing time.

If peak load is not considered, we might have thought that combining the
load of these eight exchange servers into one server was reasonable.
However, when Peak load is considered, we would never attempt that type of
consolidation.

In order to obtain peak rates, performance samples need to be collected over


longer periods of time in order to capture irregular utilization rates. It’s not
uncommon for an assessment period of a datacenter to take 30 days in order
to get a good statistical sample of the utilization of all systems.

This concludes the Infrastructure Assessment Collecting Performance Data


module. Let’s review some of the key areas covered:

Performance Data is need because multiple physical servers will be


consolidated on to ESX servers.

It’s important to capture performance metrics for the CPU, Memory, Disk and
NIC.

Load profiling is the process of correlating the inventory to the performance


data.

It’s important to capture peak workloads so that we don’t underestimate the


number of ESX servers needed to handle the existing workload that servers
are performing.

Module 4

Welcome to Infrastructure Virtualization Consolidation Estimation. This


module will discuss creating consolidation estimates.

Upon completion of this module, you should be able to

Understand how to estimate the number of servers needed for consolidation

Understand Server Grouping

Rule of Thumb Estimating

And understand high-level architecture considerations.


This training offers you several methods for completing the module and
navigating through it.

If you are comfortable with the course material and are ready for the
assessment, close this window and you will see instructions for taking the
quiz in the MyLearn learning management system. To demonstrate
proficiency, you must complete the quiz with a score of 80 percent or better.

If you are reviewing the presentation, you have several choices, depending
on your learning style.

You can simply let the presentation run, and it will play an audio track as the
presentation unfolds. During the presentation you can use the buttons here
to pause, go back, or move forward.

Or, you can skip around the module. If the presentation navigation is visible
in either Outline or Thumb view, you can click on any slide in the presentation
to jump to that slide.

Another option is to read the material rather than listen to the audio track.
Click the Notes tab to view the transcript of the slide being viewed.

Finally, you can use the search tab to locate specific information within the
module.

Consolidation estimation works by hypothetically 'stacking‘ physical


workloads onto target ESX servers. Workload is stacked on to a server
based on the performance metrics that have been collected. The workload is
stacked until the target server reaches it’s capacity or predefined thresholds
are reached.

When performing consolidation estimation you choose a target ESX server


platform. This defines the hardware that the existing servers will be
consolidated onto. This is referred to as defining the 'new hardware‘ or
‘target platform’. You can specify whether you want the 'new hardware' to
consist of only new servers or to reuse the customer's existing servers where
possible.

In the example, a target server is selected from a vendor that the customer
wishes to use. The server will have four processor cores that have a three
gigahertz speed. The memory size will be twenty-four gigabytes and the
server will have four network ports that are a rated at a gigabit each. For
storage, the customer has selected a storage system that will be able to
perform 2000 I/Os per second and have a transfer rate of 100 megabytes per
second. This only an example configuration, the configuration that your
customer selects may be different from the one show here.

When performing consolidation estimation, it’s a good idea to define


thresholds that specify how much of the available resources of the target ESX
server that you wish to use out of the total available. These thresholds
specify how much CPU utilization, RAM utilization, Disk utilization, and
network utilization that the consolidated workload will be allowed to use on
the new ESX server.

The thresholds in the slide are just an example of how you might define
thresholds. This should not be considered best practice, only a starting
point. In this example, the CPU threshold will be set to 50%. This means that
only 50% of the total processing capacity is available to be used by the
consolidated workload. The other 50% is spare capacity that is available for
spikes in demand or for additional vSphere features like HA.

Some of the other restrictions are memory set to only 80% of the capacity
available on the target, The NIC traffic is restricted to 100 Megabytes per
second and disk I/Os are restricted to 1000 I/Os per second and transfers to
fifty megabytes per second.

A consolidation estimation takes one physical systems workload and


determines whether that workload, such as the CPU load, RAM load, NIC load,
and Disk load) can fit on the new target ESX server without crossing any of
the defined thresholds. If the workload fits, then that physical system is
placed on the target ESX server. Then the process repeats adding another
physical systems workload until a threshold is reached.

If a server's workload cannot be accommodated by the new server based on


the thresholds, then another new target server is added and workload is
added to it. This process continues iteratively until all of the existing servers
have been consolidated onto one or more new target servers.
When you virtualize servers, you must also determine which workloads to
consolidate onto a particular target ESX system. Some customers are
tempted to stack multiple virtualized servers running the same application
onto a single ESX Server system. This approach limits the consolidation
opportunity, because like applications compete for the same resources.

One approach would be to determine what resources each application


requires, then match applications that demand different resource allocations
to maximize your virtualization opportunity.

Another approach to workload placement is simply to determine the number


of ESX servers you need and allow VMware DRS to balance out the workload.
VMware DRS will automatically place a virtual machine on the optimal server
at power on of the virtual machine. VMware DRS will also monitor the
performance of the cluster and rebalance the workload as necessary.

If workload were the only consideration, then consolidation estimation would


be a fairly simple process. However, when considering consolidation it’s
important to consider how the customer is organized and what type grouping
the customer will allow.

Take the simple example on the slide, the customer has three groups
Marketing, Sales, and HR. The customer has no problems combining servers
from Marketing and Sales on the same servers, but the customer does not
want to combine servers from the HR group with any other servers.

Customers may not want virtual machines grouped together on ESX servers
for a number of reasons. This could be because of the department that
owns the server doesn’t want virtual machines from other departments
grouped together with their virtual machines.

The customer may not want test and development environments mixed with
production environments.

The customer may not want certain application servers to mix with other
application servers due to the function they perform.

The customer may have machines in multiple locations that need to remain
separate.

There are any number of other grouping situations that exist from customer
to customer. Learning these grouping rules early on the planning process is
important when performing a consolidation estimation.

In most cases determining grouping rules cannot be done using a tool. This
must be done by interviewing the customer and determining what grouping
requirements the customer has.

The groups should be determined before making consolidation


recommendations. If you make a consolidation recommendation with out
considering groups, chances are you will have to redo all the work have done.

Group always lowers consolidations ratios. When you don’t have consider
grouping you can simply fit the workloads on the server until you reach you
the defined thresholds. With grouping you have to keep in mind the grouping
requirement as well as the thresholds.

When access to modeling software is not available it helps to know a few


rules of thumb that can be used to approximate how many single virtual CPU
virtual machines can run on a single ESX/ESXi host.

Expect roughly that 3 to 4 virtual machines per core can run on an ESX/ESXi
host provided that:

Sufficient memory exists on the host to accommodate the memory


requirements of the virtual machines with no over commitment.

Sufficient Network bandwidth exists without creating any bottlenecks.

And the estimate assumes that all virtual machines are single virtual CPU
virtual machines.

When considering hardware sizing requirements for a instantiation of VMware


ESX server, you must focus on the four core resources that have been
mentioned in the earlier CPU, Memory, Disk and, Network.

The slides that follow offer some high-level architecture considerations when
sizing these core resources.

CPU Capacity is one of the core VMware benefits of consolidation. Many CPUs
are underutilized and allow for easy consolidation. However ESX can impose
it’s own overhead as well. ESX overhead varies based on three factors,
application type, load, and operating system.

The applications that load the CPU due to processing intensity cause the least
overhead, while those loading CPU due to disk I/O intensity cause more, and
those loading CPU due to network intensity cause the most overhead.

The higher the CPU Utilization, the higher the overhead on the ESX Server.

Different guest operating systems impose different overhead on the ESX


Server.

VMware ESX Server has an unique ability – the ability to effectively


oversubscribe RAM. The result is that virtual machines can actually use more
memory than is available on the ESX server host.

Keep the following considerations in mind:

Are the machines being consolidated running the same operating system, or
the same applications? When they are a significant amount of memory is
saved through the use of transparent page sharing.

Is the RAM on the machines being consolidated fully utilized? If not the
virtual machine may not need as much memory as it was given in the
physical world.

When measuring RAM to be consolidated, focus on machine similarity and


actual current RAM utilization.

VMware ESX Server has an unique ability – the ability to effectively


oversubscribe RAM. The result is that virtual machines can actually use more
memory than is available on the ESX server host.

Keep the following considerations in mind:

Are the machines being consolidated running the same operating system, or
the same applications? When they are a significant amount of memory is
saved through the use of transparent page sharing.

Is the RAM on the machines being consolidated fully utilized? If not the
virtual machine may not need as much memory as it was given in the
physical world.
When measuring RAM to be consolidated, focus on machine similarity and
actual current RAM utilization.

The key to storage capacity planning is having enough storage to contain the
aggregated machines’ data – but one should also consider future needs for
increased storage, virtual machine snapshots, and virtual machine swapfiles.

Consider the following when planning for consolidation on the NICs.

Are the machines to be consolidated already NIC-saturated? These are not


likely to be good candidates for consolidation unless the saturated NICs are of
extremely low capacity.

What is the average, sustained NIC load, as compared to peak load? How
often do peaks occur, and what is their timing relative to other machines?

Conversely, are the NICs on the machines being consolidated underutilized?

When measuring NIC capacity to be consolidated, focus on load timing as


well as absolute value.

In this guided tour we will demonstrate how to perform consolidation


estimation using the VMware Capacity Planner Dashboard. There are many
different tools available performing consolidation estimation, please consult
the documentation for your tool to learn how to perform these steps.

This concludes the Infrastructure Assessment Consolidation Estimation


module. Let’s review some of the key areas covered:

The process of consolidation estimation involves hypothetically 'stacking'


workloads onto target servers

It’s important to understand how the customer groups their server


environment before producing a consolidation estimate.

Understanding the way ESX uses the CPU, NIC, Disk, and NIC will benefit you
in placing workloads on the ESX Server.
Module 5

Welcome to Infrastructure Assessment TCO/ROI. This module will discuss


concepts related to Total Cost of Ownership and Return on Investment.

Upon completion of this module, you should be able to

Understand the Total Cost of Ownership

Understand Return on Investment

Understand Net Present Value

Learn the importance of conducting a stakeholder meeting

This training offers you several methods for completing the module and
navigating through it.

If you are comfortable with the course material and are ready for the
assessment, close this window and you will see instructions for taking the
quiz in the MyLearn learning management system. To demonstrate
proficiency, you must complete the quiz with a score of 80 percent or better.

If you are reviewing the presentation, you have several choices, depending
on your learning style.

You can simply let the presentation run, and it will play an audio track as the
presentation unfolds. Animation 4) During the presentation you can use the
buttons here to pause, go back, or move forward.

Or, you can skip around the module. (Animation 6)If the presentation
navigation is visible in either Outline or Thumb view, you can click on any
slide in the presentation to jump to that slide.

Another option is to read the material rather than listen to the audio track.
Click the Notes tab to view the transcript of the slide being viewed.
Finally, you can use the search tab to locate specific information within the
module.

Total cost of ownership or TCO is a financial estimate designed to help


organizations measure the complete lifecycle cost of a project or process.
Sometimes it can be better referred to as Total Cost of Operation to focus on
the total solution rather than the product costs.

Completeness and inter-relationship of the costs is the key to understand the


TCO. It helps an organization determine how replacing one component with
an alternative will impact costs and quantity.

TCO is normally used by IT Managers. In most every case, the lower the TCO
the better.

Costs are categorized into direct and indirect costs.

Direct costs are financial outlays specific to acquiring and implementing


server hardware.

Some examples of direct costs associated with server acquisition are:

Cost of the Server Hardware

Hardware Support Contract

Third Party Software and Support

Networking Costs such as new switches and cabling.

and SAN costs such as adding additional storage or fibre channel switches.

Indirect costs are hidden charges accounted for at an aggregate data center
level for costs associated with administering and running servers and not
directly billed per server.

Some examples of indirect costs associated with server acquisition are:

Data Center costs such as cooling, power, and floor space.

Server Administration
Server Provisioning

And procurement costs.

In order to calculate the savings in Total Cost of Ownership with VMware, you
must understand the cost of how much the organization would spend with out
VMware then calculate the costs with VMware. These costs should be broken
down between direct costs and indirect costs.

For example say an organization has plans to replace all of its servers in the
next three years because of rising support costs.

If the organization has 100 servers, and the cost of the servers is 4000 dollars
per year in a three year amortized hardware purchase including annual
support and maintenance contract costs, then the total cost purchasing all
those servers is 1.2 million over the next three years.

If each server has an amortized cost of than $1100 per year for storage and
networking cost, then over the next three years $330,000 will be spent on
networking and storage costs.

Therefore in hardware costs alone 1.5 million will be spent in hardware costs
to replace the existing servers.

Next you need to calculate to indirect costs. This simple example will only
look at the costs associated with administration, power, and cooling.

Assume for this example that on-going administration costs the organization
$2,000 per server per year. In a three year period the cost to the
organization would be $600,000.

Data center costs such as power and cooling are going differ from
organization to organization, in this example assume that the power and
cooling cost per server is $400 in annual costs per year. So over three years
the cost of power and cooling plus administration would be $720,000.

So In this simple example including direct costs and indirect costs the total
cost of ownership without VMware to simply replace the customers 100
servers with 100 new servers would be 2.2 Million dollars

Next lets offer the customer an alternative, replacing the 100 servers with
virtual machines so we don’t have to do a 1 for 1 server replacement.

To do this calculate the costs of doing business using VMware products.


After you do your consolidation estimation, you should know the number of
servers that you need. In this example we will assume that we need 10 ESX
server to virtualize the original 100 servers.

The server cost remains the same over three years but since there are fewer
servers to purchase the total server cost is now close to $400,000. The new
cost of VMware Software must now be calculated for 10 ESX servers plus the
license for one vCenter would around $189,000. This would be at a cost
$5,750 per node.

The Network and SAN cost stay the same but again since there are fewer
costs the cost drops to about $36,000

The direct costs with VMware in this example are about $622,000

As far as indirect costs go, with VMware the costs are the same but again
there are fewer servers to deal with so the total indirect costs with VMware
are about $100,000.

Now lets compare the costs of doing business without VMware and with
VMware.

Over the next three years the costs of doing business without VMware is $2,2
million where the cost of doing business with VMware is about $700,000.
This would equal a savings of $1.5 Million.

Keep in mind there are many other costs to consider when calculating Total
Cost of Ownership. This example simply showed the basics and it’s cost
estimates should not be relied on for engagements.

The payback period is the amount of time required for the benefits to pay
back the cost of the project. Remember earlier that we calculated the cost of
doing business with VMware to be $700,000 and the savings to be $1,5
million over a three year period. The yearly savings then is about $500,000.
This means if we divide the Investment by the Yearly Savings the payback
period for this example is 18 months.

ROI or Return on Investment is a competitive comparison across dissimilar


projects. The higher the ROI the better This is mostly used by Finance to
determine money gained or lost on an investment relative to the amount of
money invested.

Let look at ROI from this simple example.

Which would you rather have I return to you in 1 year, one hundred dollars or
one thousand dollars?

However before I return that money to you in a year, you must invest
something today. So now, which would you rather I return to you in one year:
One hundred dollars if you give me fifty dollars today or one thousand dollars
if you give me six hundred today.

Financially speaking, to answer this, you have just calculated an ROI on these
two competing offers, For the first the ROI is one hundred percent, while for
the second the ROI is sixty-seven percent. From an ROI perspective, the first
one looks better.

Keep in mind that ROI is only one financial measure companies use to
compare initiatives and determine winners in the financial competition for
budgeting and funding.

Return on Investment can be expressed as equal to the returned value minus


the initial value divided by the initial value.

In order to produce an accurate TCO/ROI analysis, it important to gather the


various stakeholders in the organization together and discuss virtualization.
This meeting might be the first time that the various stakeholders in the
customer's company have gotten together to discuss virtualization, so the
conversations could be ground level discussions or they could be very
advanced.

During the meeting you need to ask questions to elicit all the TCO and ROI
inputs that you will need to perform the Virtualization Assessment. Be aware
that it is entirely possible that the customer will not be able to provide you
answers to all of the TCO/ROI questions that you ask. Be sure to leave the
organization with the list of questions were unanswered and follow up with
them at a later time.

When performing a TCO / ROI analysis, there are many factors that you
should consider. The list above, while extensive, is not necessarily
comprehensive. Nor are all the items in the list above mandatory. If you are
designing your company's own assessment practice, you will decide for
yourself which items are relevant for the needs of your customers and your
company.

This concludes the Infrastructure Assessment TCO/ROI module. Let’s review


some of the key areas covered:

Total cost of ownership is a financial estimate designed to help organizations


measure direct and indirect costs.

ROI or return on investment is the ratio of money gained or lost on an


investment relative to the amount of money invested..

Gaining an understanding of how the organization operates and what costs


they have will help you when produce TCO/ROI analysis for the organization.

You might also like