You are on page 1of 132

CA Wily Introscope

Sizing and Performance Guide


Version 8.0

Date: 08-2008

Copyright 2008, CA. All rights reserved. Wily Technology, the Wily Technology Logo, Introscope, and All Systems Green are registered trademarks of CA. Blame, Blame Game, ChangeDetector, Get Wily, Introscope BRT Adapter, Introscope ChangeDetector, Introscope Environment Performance Agent, Introscope ErrorDetector, Introscope LeakHunter, Introscope PowerPack, Introscope SNMP Adapter, Introscope SQL Agent, Introscope Transaction Tracer, SmartStor, Web Services Manager, Whole Application, Wily Customer Experience Manager, Wily Manager for CA SiteMinder, and Wily Portal Manager are trademarks of CA. Java is a trademark of Sun Microsystems in the U.S. and other countries. All other names are the property of their respective holders. For help with Introscope or any other product from CA Wily Technology, contact Wily Technical Support at 1-888-GET-WILY ext. 1 or support@wilytech.com. If you are the registered support contact for your company, you can access the support Web site directly at http://support.wilytech.com. We value your feedback Please take this short online survey to help us improve the information we provide you. Link to the survey at: http://tinyurl.com/6j6ugb

6000 Shoreline Court, Suite 200 South San Francisco, CA 94080

US Toll Free 888 GET WILY ext. 1 US +1 630 505 6966 Fax +1 650 534 9340 Europe +44 (0)870 351 6752 Asia-Pacific +81 3 6868 2300 Japan Toll Free 0120 974 580 Latin America +55 11 5503 6167 www.wilytech.com

CONTENTS

Table of Contents

Chapter 1

Introscope Sizing and Performance Introduction . New and changed features in Introscope 8.0 Agent load balancing . Agent metric aging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

9 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13 13 13 13 13 14 14

Changed Heap Capacity (%) metric . Changed Metric Count metric . Changed way of determining events. Changed Number of Inserts metric . Dynamic instrumentation . . . . . . . .

Changed Overall Capacity (%) metric .

Enterprise Manager dead metric removal . How to detect metric explosions . Metric clamping . MOM hot failover . . . . . . . . . . . . . .

MOM sizing limits examples.

New metric for Collector Metrics Received Per Interval . New metric for Historical Metric Count . New metric for Number of Historical Metrics . New tab for CPU Overview . New tab for Metric Count . . . . . . . . . . . . . . . . . . . . . . . .

New metric for Transaction Traces Dropped Per Interval. New tab for Enterprise Manager Overview Ping time threshold properties . Scalability . . . . . . . .

Running multiple Collectors on one machine .

SmartStor metadata stored in uncompressed format .

SQL statements, statement normalizers, and metric explosions .

Contents

iii

CA Wily Introscope

Support for RAID 5 data storage .

. .

. .

. . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

14 14

Transaction Trace component clamp Chapter 2

EM Requirements and Recommendations Enterprise Manager overview . . . . . . . . . . Enterprise Manager databases .

. 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 20 20 21 22 26 27 27 28 29 30 31 32 33 34 35 38 40 40 43 43 43 44 44 45 46 47 47 47 48 49 49 50

Factors that affect the Introscope environment . Factors that affect EM maximum capacity . About Introscope system size . Enterprise Manager health . . . . . . . . . . . Differences between EMs and J2EE servers .

About the Enterprise Manager Overview tab . About EM health and supportability metrics . Harvest Duration metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of Collector Metrics

Collector Metrics Received Per Interval metric . Converting Spool to Data metric . Overall Capacity (%) metric Heap Capacity (%) metric . . . . .

Troubleshooting Enterprise Manager health . Additional supportability metrics . SmartStor overview . . . . . .

About SmartStor spooling and reperiodization . Report generation and performance . Concurrent historical queries and performance . About SmartStor and flat file archiving . MOM overview . . . . . . . . . . . . . . . . . . . . . . Collector overview .

Collector metric capacity and CPU usage . About the CPU Overview tab Enterprise Manager basic requirements .

Enterprise Manager file system requirements EM OS disk file cache memory requirements . Enterprise Manager heap sizing . SmartStor requirements . . . . . . . . . . . . . . . .

Each EM requires SmartStor on a dedicated disk or I/O subsystem . SmartStor Duration metric limit .

iv

Contents

Sizing and Performance Guide

MOM and Collector EM requirements .

. . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51 51 52 53 53 55 55 57 57 57 58 58 59 59 60 60 61 62 63 68 69 69 70 70 71 71 71 72 73 74 74

Local network requirement for MOM and Collectors Introscope 8.0 EM settings and capacity . SmartStor settings and capacity . . . . . . . . . . . . . . .

When to run reports, custom scripts, and large queries .

Estimating Enterprise Manager databases disk space needs Setting the SmartStor dedicated controller property . Planning for SmartStor storage using SAN Planning for SmartStor storage using SAS controllers Enterprise Manager thread pool and available CPUs . Collector and MOM settings and capacity . MOM hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOM disk subsystem sizing requirements . MOM to Collectors connection limits .

MOM to Workstation connection limits .

Metric load limit on MOM-Collector systems . MOM hot failover . . . . . . . . . .

Configuring a cluster to support 1,000,000 MOM metrics Agent load balancing on MOM-Collector systems Avoid Management Module hot deployments . Collector applications limits . Collector metrics limits Collector events limits Collector agent limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collector hardware requirements . Collector with metrics alerts limits Collector to MOM clock drift limit . Reasons Collectors combine slices

Increasing Collector capacity with more and faster CPUs Standalone EM hardware requirements example Running multiple Collectors on one machine . Chapter 3 .

Metrics Requirements and Recommendations Metrics background . . . . . . . . . . . . . . . . . About metrics groupings and metric matching . 8.0 metrics setup, settings, and capacity . Matched metrics limits . . . . . .

. 77 . . . . 78 78 79 79

Contents

CA Wily Introscope

Inactive and active metric groupings and EM performance . SmartStor metrics limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

80 80 80 80 81 81 82 82 83 84 84 85 85 91 92 94 96 96 98

Performance and metrics groupings using the wildcard (*) symbol . Virtual agent metrics match limits

About alerted metrics and slow Workstation startup . Detecting metrics leaks . Metrics leak causes . Finding a metrics leak. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

About aggregated metrics and Management Module hot deployments 81

Metrics for diagnosing a metrics leak Detecting metric explosions Metric explosion causes .

Finding a metric explosion .

Investigator metrics and tab for diagnosing metric explosions. How Introscope prevents metric explosions . SQL statements and metric explosions . SQL statement normalizers . Metric clamping . . . . . . . . . . . . . . . . . . . . .

Enterprise Manager dead metric removal . SmartStor metadata files are uncompressed Chapter 4

Workstation and WebView Requirements and Recommendations 99 Workstation and WebView background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . 100 . 100 . 100 . 101 . 101 . 102 . 103 . 103 . 103 105 . 106 . 106 . 107 8.0 Workstation and WebView requirements

OS RAM requirements for Workstations running in parallel . WebView and Enterprise Manager hosting requirement . Workstation to standalone EM connection capacity. Workstation to MOM connection capacity . WebView server capacity . . . . . . . . . . . WebView server guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.0 Workstation and WebView setup, settings, and capacity .

Top N graph metrics limit per Workstation Chapter 5

Agent Requirements and Recommendations . Agent background . . . . . . . . . . . . . . . . . . . . . . . . About virtual agents .

Agent sizing setup, settings, and capacity

vi

Contents

Sizing and Performance Guide

Agent metrics reporting limit . About the Metric Count tab . .

. .

. .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. 107 . 107 . 108 . 109 . 109 . 109 . 110 . 110 . 110 . 112 113 119 . 119 . 122 125

Transaction Trace component clamp Configuring agent heuristics subsets Virtual agent metrics match limits Agents limits per Collector . Agent heap sizing . . . . . . . . . . . . . . .

Agent maximum load when disabling Boundary Blame .

Virtual agent reported applications capacity .

High agent CPU overhead from deep nested front-end transactions . 111 Dynamic instrumentation . Appendix A Appendix B

Introscope 8.0 Sizing and Performance FAQs .

Sample Introscope 8.0 Collector and MOM Sizing Limits by OS Sample Introscope 8.0 Collector sizing limits table Sample Introscope 8.0 MOM sizing limits table . . . . . . . . . . . . . . . . . . . . . .

Index .

Contents

vii

CA Wily Introscope

viii

Contents

CHAPTER

Introscope Sizing and Performance Introduction

This document contains background, instructions, best practices, and tips for optimizing the sizing and performance of your Introscope 8.0 deployment and environment. Use it in conjunction with the following Introscope 8.0 documentation: Introscope Configuration and Administration Guide Introscope Installation and Upgrade Guide Introscope Java Agent Guide Introscope .NET Agent Guide Introscope Overview Guide Introscope WebView Guide Introscope Workstation User Guide For additional information about this product, you can take the CA Wily Technology Education Services class, Introscope: Enterprise Manager (EM) Capacity Management. For more information, go to http://www.wilytech.com/ services/education.html. In addition, CA Wily Technology Professional Services and Technical Support have service offerings to address specific needs in your application management environment.

Where to get the latest version of this book


You can find the most current version of this book on the CA Wily Community site at https://community.wilytech.com/. Check back periodically to see if the book has been updated. NOTE: The Wily Community Site is for use by registered members of the Wily User Community. If you need a user account, you can request one at the site.

Introscope Sizing and Performance Introduction

CA Wily Introscope

New and changed features in Introscope 8.0


The following sections detail new or changed features in Introscope 8.0 that affect sizing and performance.

Agent load balancing


Introscope 8.0 (8.0 only) agents in a clustered environment can connect to the MOM and get load-balanced to a Collector. Pre-8.0 agents must connect directly to

a Collector. Also, the MOM keeps the metric load balanced between Collectors by
ejecting participating 8.0 agents from over-burdened Collectors. A participating agent is one that connected to the MOM. The ejected agents reconnect to the MOM, and are reallocated to under-burdened Collectors. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOM-Collector systems on page 63.

Agent metric aging


By default, agent metric aging periodically removes dead metrics from the agent memory cache. This helps prevent metric explosions. See About agent metric aging on page 91.

Changed Heap Capacity (%) metric


The Heap Capacity (%) metric is created when the Enterprise Manager periodically asks the JVM how much maximum heap there is and how much it is currently using (based on the GC Heap: In Use Post GC (mb) metric). Formerly this metric was calculated based on a ratio of current heap total and how much heap is in use. See Overall Capacity (%) metric on page 33 and Heap Capacity (%) metric on page 34.

Changed Metric Count metric


The Metric Count metric Investigator node, which was previously under the Agent Stats node, is now here:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Metric Count

See Metric Count metric on page 85.

10

Introscope Sizing and Performance Introduction

Sizing and Performance Guide

Changed way of determining events


The way that the Enterprise Manager handles Transaction Trace incoming events has changed, and uses new and changed metrics. See Events and Transaction Traces on page 36.

Changed Number of Inserts metric


The former Data Store|Transactions:Number of Inserts metric was renamed to Data Store|Transactions:Number of Inserts Per Interval. This metric value now shows the number of Transaction Traces placed into the Transaction Trace insert queue during an interval. Previously this metric showed the number of Transaction Traces that were reported to the Enterprise Manager. See Events and Transaction Traces on page 36 and See Collector events limits on page 70.

Changed Overall Capacity (%) metric


The Overall Capacity (%) metric is calculated using an additional value from the CPU Capacity (%) metric value. See Overall Capacity (%) metric on page 33.

Dynamic instrumentation
Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. See Dynamic instrumentation on page 112.

Enterprise Manager dead metric removal


Starting with Introscope 8.0, when a metric has not produced data for more than eight minutes (default), it is removed from the Investigator tree. See Enterprise Manager dead metric removal on page 96.

How to detect metric explosions


Introscope 8.0 includes a number of new metrics and capabilities to help you detect metric explosions. For more information, see Detecting metric explosions on page 84.

Metric clamping
Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. See Metric clamping on page 96.

New and changed features in Introscope 8.0

11

CA Wily Introscope

MOM hot failover


If the MOM gets disconnected or goes down due to, for example a hardware or network failure, you can configure a second MOM to take over using hot failover. See MOM hot failover on page 62.

MOM sizing limits examples


CA Wily now provides MOM hardware and cluster requirements examples. See Sample Introscope 8.0 MOM sizing limits table on page 122.

New metric for Collector Metrics Received Per Interval


The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. The Number of Collector Metrics metric is the total sum of Collector metric data points that the MOM has received each 15 second time period, including data queries. See Collector Metrics Received Per Interval metric on page 31.

New metric for Historical Metric Count


The Historical Metric Count metric shows the total number of metrics from an agent that are either live or recently active. The Enterprise Manager uses this metric to decide whether to start clamping more metrics from the agent. For more information, see Historical Metric Count metric on page 88.

New metric for Number of Historical Metrics


A new metric, Number of Historical Metrics, tracks the number of metrics for which Introscope has historical data in SmartStor. For more information, see Number of Historical Metrics metric on page 89.

New metric for Transaction Traces Dropped Per Interval


A new metric, Performance.Transactions.Num.Dropped.Per.Interval, shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped. See Events and Transaction Traces on page 36.

New tab for CPU Overview


By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location. See About the CPU Overview tab on page 46.

12

Introscope Sizing and Performance Introduction

Sizing and Performance Guide

New tab for Enterprise Manager Overview


By viewing the EM Overview tab you can assess a number of EM health and performance-related statistics and components in one centralized location. See About the Enterprise Manager Overview tab on page 27 and Enterprise Manager Overview tab on page 90.

New tab for Metric Count


By viewing the Metric Count tab you can assess the number and distribution of agent and resource metrics in one centralized location. See About the Metric Count tab on page 107.

Ping time threshold properties


For optimal Workstation response times, Collector ping times should average no higher than 500 ms. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 2. You can adjust this threshold for your environment. In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. A disconnected Collector causes the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 3. You can adjust this threshold for your environment. See Local network requirement for MOM and Collectors on page 51.

Running multiple Collectors on one machine


By following CA Wilys guidelines, you can set up multiple Collectors on a single machines. See Running multiple Collectors on one machine on page 74.

Scalability
Introscope 8.0 includes a number of scalability improvements, which are documented across this guide: Each Collector Enterprise Manager can handle up to 500 K metrics (varies according to hardware) about twice the Introscope 7.x Enterprise Manager metric limit. Collectors can take advantage of additional CPUs to increase these limits: number of applications per Collector number agents per Collector

New and changed features in Introscope 8.0

13

CA Wily Introscope

number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). Each MOM can connect to a five million metric cluster (10 collectors, 500 K metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale. The MOM now requires more powerful hardware than Collectors. See MOM hardware requirements on page 59. Support for 50 concurrent Workstation connections Important The limits may differ substantially depending on the specific platform and hardware used in your environment.

SmartStor metadata stored in uncompressed format


To increase SmartStors speed in reading stored metadata files, starting with Introscope 8.0, all new metadata files are written in an uncompressed format. See SmartStor metadata files are uncompressed on page 98.

SQL statements, statement normalizers, and metric explosions


Metric explosions can be caused by a number of factors, including poorly written and long SQL statements. Introscope includes four SQL statement normalizers to address long SQL statements. The regular expression SQL statement normalizer is new for Introscope 8.0. CA Wily recommends that you use this normalizer before the other normalizers provided with Introscope, as the regular expression SQL statement normalizer allows you to configure regular expressions and normalize any characters or sequence of characters in the SQL statement. See SQL statements and metric explosions on page 92.

Support for RAID 5 data storage


CA Wily now supports Redundant Array of Inexpensive Disks (RAID) 5 for data storage. See Setting the SmartStor dedicated controller property on page 55.

Transaction Trace component clamp


In the case of an infinitely expanding transactionfor example when a servlet executes hundreds of object interactions and backend SQL callsIntroscope clamps the Transaction Trace, resulting in a truncated trace. This helps prevent the JVM from running out of memory. The clamped Transaction Traces are marked as truncated in the Workstation Transaction Trace Viewer. See Transaction Trace component clamp on page 108.

14

Introscope Sizing and Performance Introduction

CHAPTER

EM Requirements and Recommendations

This chapter provides background and specifics to help you understand how to size and tune your Enterprise Manager for good performance. In this chapter youll find the following topics: Enterprise Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 20 21 22 27 28 40 40 43 43 43 44 44 47 47 47 49 49 51 51 52 53 53 55 55 58

Factors that affect the Introscope environment Factors that affect EM maximum capacity . Differences between EMs and J2EE servers . Enterprise Manager health . SmartStor overview . . . . . . . . . . . . . . . . . . . About EM health and supportability metrics

About SmartStor spooling and reperiodization. Report generation and performance . Concurrent historical queries and performance About SmartStor and flat file archiving . MOM overview . . . . . . . . . . . . . . . . Collector overview

Enterprise Manager basic requirements .

Enterprise Manager file system requirements . EM OS disk file cache memory requirements . SmartStor requirements. . . . . . . . . . .

Each EM requires SmartStor on a dedicated disk or I/O subsystem . MOM and Collector EM requirements .

Local network requirement for MOM and Collectors . Introscope 8.0 EM settings and capacity SmartStor settings and capacity . . . . . . . . . . . . . . .

When to run reports, custom scripts, and large queries .

Estimating Enterprise Manager databases disk space needs Setting the SmartStor dedicated controller property . Collector and MOM settings and capacity

EM Requirements and Recommendations

15

CA Wily Introscope

MOM disk subsystem sizing requirements . MOM to Collectors connection limits . MOM to Workstation connection limits . . . .

. . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

58 59 60 60 61 62 63 68 69 69 70 70 71 71 71 72 73 74 74

Metric load limit on MOM-Collector systems MOM hot failover . . . . . . . . .

Configuring a cluster to support 1,000,000 MOM metrics Agent load balancing on MOM-Collector systems . Avoid Management Module hot deployments . Collector applications limits Collector metrics limits . Collector events limits Collector agent limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Collector hardware requirements . Collector with metrics alerts limits Collector to MOM clock drift limit . Reasons Collectors combine slices

Increasing Collector capacity with more and faster CPUs Standalone EM hardware requirements example . Running multiple Collectors on one machine . .

16

EM Requirements and Recommendations

Sizing and Performance Guide

Enterprise Manager overview


The Enterprise Manager (EM) is an integral component of the Introscope system. An Enterprise Manager is a server that collects, performs calculations on, and stores metrics reported by multiple agents. In a simple Introscope environment such as the one shown in the figure below, one single standalone Enterprise Manager collects, persists, and processes all the agent metrics, then supplies the resultant data for viewing in the Introscope Workstation or WebView browser instances.

Enterprise Manager overview

17

CA Wily Introscope

In a more complex environment, as shown in the figure below, Enterprise Managers in the role of Collectors can be clustered so that their collected metrics data is compiled in a single Manager of Managers (MOM) Enterprise Manager. The MOM provides a unified view of all the metrics to the connected Workstation and WebView instances.

Note In cases where the data is specific to a single Enterprise Manager or where clustering makes no difference to the topic, this guide uses the generic term Enterprise Manager. However in some cases, Collectors and MOM Enterprise Managers perform different functions that require different sizing capacity guidelines or result in different performance behaviors. In these cases, the term Collector or MOM is used as appropriate. While the Collector and MOM perform very different functions within a cluster, the system requirements are quite similar with the exception of data persistence, as the MOM persists relatively little data in its role.

18

EM Requirements and Recommendations

Sizing and Performance Guide

In an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Multiple physical agents can be configured into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents. To an Introscope Enterprise Manager, an application is an agent-specific association of metrics that is derived from the Java application .war files deployed on the managed J2EE application server. In an Introscope Enterprise Manager Investigator metric tree, applications, which are agent-specific, are found under the Frontends node, as shown in the following figure. Note You can have multiple applications running within a single JVM, but you can assign only one Introscope agent per JVM to collect the performance data.

Enterprise Manager overview

19

CA Wily Introscope

Enterprise Manager databases


The Enterprise Manager writes to three separate databases: SmartStor, Transaction Event database (traces.db) and the metrics baselining (heuristics) database (baselines.db). Introscope features such as Transaction Tracing, Transaction Trace sampling, and metrics baselining (heuristics) incur additional load on the disk subsystem. For this reason, the Transaction Event database (traces.db) and the metrics baseline (heuristics) database (baselines.db) can be located on the same disk with each other. However, SmartStor MUST be located on a separate dedicated disk or I/O subsystem. In the default Enterprise Manager installation process, the SmartStor data directory defaults to the target Enterprise Manager installation directory. However, for optimal performance, move the SmartStor data directory to a separate physical disk from the Enterprise Manager installation directory. For heavy-duty, production Enterprise Managers, disk I/O is the primary bottleneck for Enterprise Manager capacity, so CA Wily strongly recommends the use of multiple drives. For more information, see SmartStor requirements on page 49.

Factors that affect the Introscope environment


The first questions to answer when considering your Introscope environment are: How many Java application server processes do I want to monitor (number of agents)? and How many metrics per server on average (metrics per agent) will be generated? The answers to those questions depend on the complexity of the server and the agent instrumentation settings. For more information, see the Introscope Configuration and Administration Guide. The capacity of the Enterprise Manager is dependent on the hardware it is running on as well as other complicating factors. For example, one factor is the JVM being used for the Enterprise Manager on the platform under consideration. The Enterprise Manager performs much better when its underlying JVM uses concurrent garbage collection (traditional garbage collection can halt the system when it is busy), and JVMs that support concurrent garbage collection are preferred. If the CA Wily sizing recommendations are exceeded, the system becomes more likely to suddenly experience sluggish behavior if too many operations all occur simultaneously. You can use the Overall Capacity metric for alerting purposes. For more information, see Overall Capacity (%) metric on page 33. For example, the metrics limit is the number of metrics that can be written safely to the disk I/O system.

20

EM Requirements and Recommendations

Sizing and Performance Guide

Important On typical server configurations, the metrics limit is usually the primary limitation on the capacity of the Enterprise Manager. This is a critical factor when sizing an Enterprise Manager. CPU performance, network bandwidth, and availability of RAM are also influential, but disk I/O seek time is typically the primary bottleneck. In Introscope 8.0, exceeding the limits found in the Sample Introscope 8.0 Collector sizing limits table on page 119 will potentially bring the system to a state where you begin to see performance problems. These problems depend on what is impacted. Overloaded disk I/O typically causes combined time slices and sluggish Workstation refresh times. Lack of RAM causes memory exceptions during spool file conversion, as too many metrics are tracked. Network bandwidth problems cause slow cluster response time, and more rarely, may cause agents to be dropped. Lagging CPU causes performance problems including calculators not updating and alerts to be missed. Another example, as seen in Sample Introscope 8.0 Collector sizing limits table on page 119, the recommended limit for monitored applications (maximum number of applications) for a Windows-based Enterprise Manager is about 170% of that found on a Solaris machine. In the case of applications, the limit is strongly dependent on the performance characteristics of the CPUs available to the Enterprise Manager, since applications create alerts that must be calculated every time slice.

Factors that affect EM maximum capacity


The maximum capacity of an Enterprise Manager can be reduced by the factors listed in the table below. Factor Reducing Enterprise Manager Maximum Capacity
SmartStor NOT on a separate disk drive or I/O subsystem If metric groupings are used, exceeds the maximum number of metrics placed in metric groupings. Boundary Blame is disabled and maximum loads are not redistributed across all Enterprise Managers.

For More Information See


Each EM requires SmartStor on a dedicated disk or I/O subsystem on page 49 Matched metrics limits on page 79

Sample Introscope 8.0 Collector sizing limits table on page 119

The Enterprise Manager runs at greater Collector metric capacity and CPU than 40-50% average CPU utilization range usage on page 45 The sum of all metrics behind every TOP N graph viewed by every Workstation instance exceeds 100,000 Top N graph metrics limit per Workstation on page 103

Enterprise Manager overview

21

CA Wily Introscope

Factor Reducing Enterprise Manager Maximum Capacity


More than 4 concurrent historical queries are issued against SmartStor. SmartStor is used in conjunction with flat file archiving Improper sizing is used for Enterprise Managers, Workstations, metrics, and agents

For More Information See


Concurrent historical queries and performance on page 43 About SmartStor and flat file archiving on page 43 All chapters in this book.

Differences between EMs and J2EE servers


Users who maintain enterprise application servers are accustomed to purchasing hardware that scales well with their applications, and have general understandings of target utilization levels and capacity. Although the Enterprise Manager itself is a Java server, the Enterprise Manager neither behaves nor performs like a typical J2EE server. Therefore it should not be modeled as such when purchasing hardware or performing an Enterprise Manager capacity forecast. J2EE servers and web applications receive requests for work at irregular intervals, with varying load throughout the day. Therefore the J2EE server only performs as much work as is requested of it in a given interval. Under standard usage, when incoming user requests come into a J2EE server, the requests are serviced by a pool of worker threads, which perform necessary business logic in servlets and pools of EJBs. The servlets and EJBs in turn make requests to external databases or systems. In well-designed J2EE applications, each of these worker threads is: largely independent from one another free to obtain the necessary resources and information needed to satisfy the request not forced through a common checkpoint for synchronization (although J2EE applications often aren't designed well).

22

EM Requirements and Recommendations

Sizing and Performance Guide

Therefore, in most situations, application servers scale well in throughput by adding additional CPUs, because each CPU can run additional worker threads to satisfy more requests. Occasionally one request might be slowed down, but whether it takes 100 milliseconds (ms) or 5 seconds doesn't cause the rest of the system to come to a halt. Only in the event of an external bottleneck, such as a database, can all threads come to a halt waiting for data. Eventually the request threads all become busy, and the application server slows to a crawl, maintaining most throughput while rejecting additional requests for work. When the bottleneck is relieved, the system begins to service requests again, and returns to normal. In contrast, the Enterprise Manager behaves very differently because of its architecture and the nature of the work it performs. Introscope monitors production systems in real time, and provides information, warnings, and alerts in real time. In order to accomplish this, the Enterprise Manager performs as a real time system as well. The Enterprise Manager receives a continual flow of data from agents every 7.5 seconds. Once every 15 seconds, the Enterprise Manager must do all of the following: examine all of the metric data that it has received for the interval for consistency perform calculations perform actions, such as fire alerts or send messages store the data to disk respond to Workstation requests for live data handle incoming events (Transaction Traces, errors, and so on) and persist them.

Enterprise Manager overview

23

CA Wily Introscope

For the most part, the Enterprise Manager can only use two threads to perform calculations and actions on the large set of agent-generated data, and only a single thread to perform the data storage. If the Enterprise Manager is unable to complete these operations within the 15 second interval, it may fall behind and not catch up with all the processing that needs to be completed because another set of data arrives. The Enterprise Manager then continually combines data or suffers from sluggish performance as it attempts to process and write more data than it can handle. There are internal buffers to allow for bursts of activity so that the Enterprise Manager can catch up, but if the Enterprise Manager has too many metrics being reported, these buffers fill up quickly. The Enterprise Manager is very different from a J2EE server in this regard, because the standard J2EE server does not examine data requests on a regularly scheduled basis to decide what to do with them. The Enterprise Manager's scenario is more similar to the classic factory production conveyor belt analogy, in which a continual set of finished products (data) arrives for two workers to examine. Then the two workers must transfer the product packages (metric data) to a single worker who drives the packaged data in a truck down a single-lane road to a warehouse, where several more workers off load the packages from the truck into storage (SmartStor database). Because of the nature of the tasks that the Enterprise Manager performs, there are currently limitations in the number of CPUs that the Enterprise Manager can use effectively. A minimum of 2 CPUs are required for optimum performance. However, the use of 4 CPUs increases performance by allowing more of the following: number of applications per Collector number agents per Collector number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). More than 4 CPUs do not enhance performance. However, CA Wily recommends faster CPUs because each of the threads can then examine the data much faster. For the maximum limits on 4 CPU Enterprise Managers for matched metrics, see Matched metrics limits on page 79.

24

EM Requirements and Recommendations

Sizing and Performance Guide

Another difference between J2EE servers and Enterprise Managers is in how they perform data processing. J2EE servers largely perform batch processing, while Enterprise Managers largely perform real-time processing. J2EE applications are batch processors. Work queues up and is handled as quickly as possible. As the machine slows down, the batch processes take longer and longer. In contrast, the Enterprise Manager, which has some batch processing functions (for example, responding to historical data query requests), handles most data flow in realtime. This means that the Enterprise Manager can take whatever time it needs to process incoming data, as long as it finishes within the 15-second harvest duration period. Once the Enterprise Manager takes longer than that time frame, it starts to combine data. Sizing a real-time system can be difficult because you need to size for the maximum load, not the average load on the machine. If you only size for the average load, then during maximum load times you'll lose data. More ways that Enterprise Managers perform additional work and have limitations that affect performance atypical of standard J2EE systems include: Introscope Workstations provide different load characteristics than typical Web clients. Workstations allow users to view live data in real time. Depending on the feature or data requested, a Workstation can be a continual tax on the Enterprise Manager even if no user is watching the console, as the Enterprise Manager continues to serve data. In contrast, if a user stops interacting with a browser-based Web application, the data/refresh requests typically stop. Workstations can perform historical queries for data, which cause the Enterprise Manager to retrieve data from storage. This can interfere with the Enterprise Manager's ability to effectively process and store incoming agent data due to disk contention. J2EE systems don't typically serve requests directly from databases or have disk contention issues. The Enterprise Manager periodically reorders and reperiodizes stored data. Incoming metric data is written sequentially to a spool file, which is reorganized and indexed once every hour. This reorganization process is a resource expensive (CPU and disk I/O intensive) operation that can interfere with the Enterprise Manager's ability to process and store incoming data. J2EE servers don't typically perform periodic intense housekeeping operations such as reperiodization. Agents can experience metric leaks over time, without the user knowing, which causes more data to be processed by the Enterprise Manager. Metric leaks occurs when the number of registered metrics being reported by agents is continually increasing. This means that a properly configured system can drift over time into a problem state. An Enterprise Manager, for all configurations, should run AT MOST within 40% to 50% CPU utilization range in a steady state. This provides the additional headroom necessary for periodic operations, such as SmartStor spooling, reperiodizing, and user Workstation requests (alert requests) that may

Enterprise Manager overview

25

CA Wily Introscope

saturate the CPU. Typically J2EE systems can be run much closer to saturation because there are no hidden operations that can consume CPU above and beyond steady state. In the event the system is saturated, the J2EE server refuses incoming requests to alleviate the pressure. No other applications/processes should be running on an Enterprise Manager in order to avoid contention for system resources available to Enterprise Manager. Enterprise Managers (both Collectors and MOM) queue up incoming data query requests and aggregate the data as it is read in from SmartStor.

About Introscope system size


Introscope system size is determined by workload and business logic. Introscope workload is comprised of: total applications monitored total metrics monitored total agents monitored number of Enterprise Managers. Introscope business logic handles the data collected in the monitoring operations and determines what will be done with the data. Introscope business logic operations include determining or handling the following: total number of metrics groupings maximum number of metrics in a metrics groupings number of metrics persisted per minute calculators alerts management modules containing a lot of dashboards, calculators, alerts, and so on large numbers of reports Top N graphs.

26

EM Requirements and Recommendations

Sizing and Performance Guide

Enterprise Manager health


You can monitor and assess Enterprise Manager health in two ways by viewing the: Enterprise Manager Overview tab (see About the Enterprise Manager Overview tab, below) Enterprise Manager health and supportability metrics (see About EM health and supportability metrics on page 28) The Enterprise Manager generates and collects metrics about itself that are useful in assessing its health and determining how well it is performing under its workload. These are sometimes referred to as supportability metrics because these metrics help support the healthy functioning of the Enterprise Manager.

About the Enterprise Manager Overview tab


By viewing the Enterprise Manager Overview tab you can assess a number of Enterprise Manager health and performance-related statistics and components in one centralized location. To view the Enterprise Manager Overview tab 1 Select the Enterprise Manager node under the Custom Metric Agent. 2 Click the Overview tab in the right pane. Study these graphs as shown in the figure below. EM Capacity (%) EM CPU Utilization Heap Utilization Harvest, SmartStor, and GC Durations Number of Metrics EM Databases (MB) Number of Agents

Enterprise Manager health

27

CA Wily Introscope

Number of Workstations

About EM health and supportability metrics


Enterprise Manager metrics appear in the Investigator tree, under:

Custom Metric Host (Virtual) Custom Metric Process (Virtual) Custom Metric Agent (Virtual)(SuperDomain) Enterprise Manager
In a clustered environment, the MOM's metrics also appear under the tree path shown above. However, in a clustered environment, Collector supportability metrics show up in the same Custom Metric Host (Virtual) and Custom Metric Process (Virtual) path location, but the last name includes (CollectorHostName@PortNumber).

28

EM Requirements and Recommendations

Sizing and Performance Guide

The Investigator tree with the MOM and one Collector looks like this:

Custom Metric Host (Virtual) Custom Metric Process (Virtual) Custom Metric Agent (Virtual)(SuperDomain) Enterprise Manager Custom Metric Agent (Virtual)(Collector1@5001)(SuperDomain) Enterprise Manager
For more information, see the Introscope Configuration and Administration Guide. When you deploy Enterprise Managers into your Introscope environment, you'll need to look at the Enterprise Manager health and supportability metrics to find out what's really happening in your monitoring solution. Harvest duration, Collector Metrics Received Per Interval, SmartStor spool file conversion, and Overall Capacity (%) are several of the more significant indicators of problems in an Enterprise Manager. For more information, see Harvest Duration metric on page 29 Collector Metrics Received Per Interval metric on page 31 Converting Spool to Data metric on page 32 Overall Capacity (%) metric on page 33 Additional supportability metrics on page 38.

Harvest Duration metric


The Harvest Duration metric shows the time in milliseconds (during a 15-second time slice) spent harvesting data. It is generally a good indicator in determining whether or not the Enterprise Manager is keeping up with the current workload. You can find this metric at the following location in the Investigator tree, as shown in the figure below.
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Tasks | Harvest Duration (ms)

Enterprise Manager health

29

CA Wily Introscope

Heres the Harvest Duration metric location.

The Harvest Duration metric value should be less than 3000 ms [3 seconds] and should not exceed 7,500 ms [7.5 seconds]. The harvest operation usually causes the CPU activity to spike for the full harvest duration and the CPU is often almost idle for the rest of the 15 seconds. If the harvest duration is too long, investigate reducing the metric load on the overloaded Enterprise Manager by having agents report to separate Enterprise Managers or consider moving the Enterprise Manager to a platform with faster CPUs.

Number of Collector Metrics


The Number of Collector Metrics metric shows the total number of metrics currently being tracked in the cluster. You can find the Number of Collector Metrics metric here in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Number of Collector Metrics.

30

EM Requirements and Recommendations

Sizing and Performance Guide

Collector Metrics Received Per Interval metric


The Collector Metrics Received Per Interval metric is an extremely simple way of gauging how much load metric data queries are placing on the cluster. This metric is the total sum of Collector metric data points that the MOM has received each 15-second time period, including data queries. You can find the Collector Metrics Received Per Interval metric here in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | MOM | Collector Metrics Received Per Interval

Tip

Consult this metric regularly.

A large Collector Metrics Received Per Interval metric value, coupled with degradation of the cluster, indicates that the MOM has been asked to read too much metric data from the Collectors. This overloading is the result of some combination of the following: too many Workstations connected too many queries (especially historical queries) being run user alerts and calculators set up to evaluate too many metrics Although all resource loading issues combine to affect overall cluster performance, a large Collector Metrics Received Per Interval metric value, which reflects too many metric reads, is a different than a metric explosion (see Detecting metric explosions on page 84), which is the result of too many metric writes by the agents. This means, in particular, that reducing metric load on your Collectors may not solve issues on the MOM related to a high Collector Metrics Received Per Interval metric value. If your Collector Metrics Received Per Interval value seems too high, check the number of Workstations attached, and that most are in Live mode. If this fails to solve the issue, you should check to make sure you do not have alerts set up to evaluate too many metrics in the system. You can do this by searching and sorting by the value all metrics named:

Enterprise Manager | Internal | Alerts: Number of Evaluated Metrics

Enterprise Manager health

31

CA Wily Introscope

If Collector Metrics Received Per Interval value continues to remain high after carrying out the suggestions above, you can also set the introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance. Important Clamping the Collector metrics prevents cluster degradation, but queries and alerts that are clamped do not fully evaluate all metrics they match.

Converting Spool to Data metric


The Converting Spool to Data metric tracks whether or not the spool to data conversion task is running. You can find this metric at the following location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks | Converting Spool to Data

When this task is running, the metric has a value of 1. When this task is not running, it has a value of 0. If this metric stays at a value of 1 for more than 10 minutes per hour, this indicates that reorganizing the SmartStor spool file is taking too long. This problem is often progressive. As the spooling time gets longer hour after hour, the Enterprise Manager usually becomes noticeably less responsive overall because the Enterprise Manager is putting more and more effort into reorganizing the spool file. For better performance, add more physical memory (RAM) to the machine. Adding more RAM can help increase the size of OS disk file cache and should reduce the amount of time the conversion task takes. The amount of RAM that will help varies between operating systems, however a good general rule is to dedicate 1 GB RAM for the OS disk cache. In general at full load, you should configure a Collector to use 1.5 GB heap memory. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

32

EM Requirements and Recommendations

Sizing and Performance Guide

Additionally, a server host typically requires approximately 500 MB for the operating system (this varies based on hardware and OS). When SmartStor starts the re-spooling operation, the operating system starts reading the spool file into the file cache memory (which is part of the OS, not the Enterprise Manager Java virtual machine). If reading 200,000 metrics into memory, for example, the spool file will usually be over 1.5 GB. For optimum performance the file cache should be large enough to accommodate the entire spool file. So the host machine should have between 3 and 4 GB of physical RAM. Windows machines that are 32 bit use a fixed file cache limited to approximately 1 GB, whereas UNIX systems generally have a configurable file cache limit. This must be physical memory not virtual memory (swap space). Enterprise Manager performance degrades dramatically if the host machine starts paging to and from virtual memory. For more information about the converting spool to data task, see About SmartStor spooling and reperiodization on page 40.

Overall Capacity (%) metric


The Enterprise Manager Overall Capacity (%) metric estimates the percentage of the Enterprise Managers capacity that is consumed. You can find it at this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager: Overall Capacity (%)

The Overall Capacity (%) metric is computed in part from the following metrics, which you can find at this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Health

CPU Capacity (%) (added into the computation in Release 8.0). See Additional supportability metrics on page 38 Harvest Capacity (%) See Additional supportability metrics on page 38. Heap Capacity (%) See Heap Capacity (%) metric, below. Incoming Data Capacity (%) See Additional supportability metrics on page 38. SmartStor Capacity (%). See Additional supportability metrics on page 38.

Enterprise Manager health

33

CA Wily Introscope

The Overall Capacity (%) metric is more valuable over a long period of time rather than for a specific 15-second time slice. Since the Overall Capacity metric is based on real-time metrics, you may see the Overall Capacity value spike quite a bit higher than 100% because, for example, the hardware's I/O subsystem could be briefly overloaded. However, the Enterprise Manager tends to recover from these spike situations automatically if they are not long-lasting. In general, a spike (for example, to 200%) isn't cause for concern if it's only for a brief moment, but over a long period of time, the Overall Capacity should ideally average about 75%. However, generally if the Overall Capacity value is 50%, then you should be able to double the load (+/- 15%) to get see a 100% capacity value. Note SmartStor hourly and nightly conversion times are not factored into the Overall Capacity metric, however hourly and nightly operations do affect how much metric load the Enterprise Manager is capable of handling. During time periods that the Overall Capacity (%) metric spikes to high values (for example 600%), at least one of the other metrics listed above should also show a spike. Investigating and understanding the source of the secondary spike might help pinpoint the root cause of the resource issue. For example, the problem might be found by looking at the Heap Capacity (%) metric, which feeds into Overall Capacity (%) metric. See Heap Capacity (%) metric, below.

Heap Capacity (%) metric


The Heap Capacity (%) metric is determined by what percentage of heap the JVM is currently using (based on the GC Heap: In Use Post GC (mb) metric). Note A 25% buffer remains when the Heap Capacity (%) metric reports 100% and when the actual heap would be at 100%. For example, if the total heap is 1000 MB and the current heap usage is 750 MB, then this metric value is 100%. This buffer is included because Java needs heap space for normal operations. Depending on how youve set and launched the JVM with heap options, the JVM may start with a very small heap but grow it over time. The Heap Capacity (%) metric is based on the current JVM heap size, not what the heap size could become. CA Wily recommends that you set the Introscope heap settings so that heap min equals heap max.

34

EM Requirements and Recommendations

Sizing and Performance Guide

Troubleshooting Enterprise Manager health


Every 15 second the Enterprise Manager gathers and records health metrics about itself. There are two ways you can view these metrics to troubleshoot Enterprise Manager health performance: examine the Enterprise Manager health and supportability metrics in the Investigator tree. For more information see About EM health and supportability metrics on page 28. examine the perflog.txt file. Related Knowledge Base article(s): Perflog Values in Introscope 7.1 The Investigator tree Enterprise Manager health and supportability metrics are easy to view and interpret, so this is first place you should look to understand your Enterprise Managers current health. Perflog.txt is often valuable to CA Wily Support. Several examples of how you can use the perflog.txt file are provided in the topics below.

Harvest Duration
You can find the Harvest.HarvestDuration metric value in perflog.txt, as shown in the figure below. Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.

Enterprise Manager health

35

CA Wily Introscope

SmartStor Duration
You can find the Smartstor.Duration metric value in perflog.txt as shown in the figure below. Note This figure shows perflog.txt output in verbose mode. By default, perflog.txt is generated in a compacted mode.

Events and Transaction Traces


The Enterprise Manager attempts to insert all incoming events into a Transaction Trace insert queue. The number of events in the queue at any time is shown in the Performance.Transactions.TT.Queue.Size metric. If the Transaction Trace insert queue is not full, an incoming event is counted by the performance.transaction.num.inserts.per.interval metric. If the Transaction Trace insert queue is full when a new event comes in, the event is dropped. For Introscope 8.0, you can view a new metric, Performance.Transactions.Num.Dropped.Per.Interval that shows the number of Transaction Traces that the Enterprise Manager could not handle during the interval and were dropped. You can find these metric values in perflog.txt, as shown in the figure below.

36

EM Requirements and Recommendations

Sizing and Performance Guide

If you want to know how many events the Enterprise Manager received from agents for an interval, add the performance.transaction.num.inserts.per.interval metric plus the Performance.Transactions.Num.Dropped.Per.Interval metric. Although one would expect the values for the

performance.transaction.num.inserts.per.interval metric and Performance.Transactions.TT.Queue.Size metric for an interval to be


identical, that is generally not the case due to these factors: metric counts are based on frequent samples of the system samples of these two metrics are not taken at the same time the system is very active (numeric counts vary quickly and greatly) If, for example, at one sample time the number of inserted events is 500, this implies that the Transaction Trace insert queue should have a positive value and you would expect to see a value of 500 as well for the Performance.Transactions.TT.Queue.Size metric. However, by the time the Transaction Trace insert queue is sampled, it can be empty and record a sample number of zero.

Enterprise Manager health

37

CA Wily Introscope

Additional supportability metrics


There are a number of supportability metrics to help you monitor the help of your system and Enterprise Manager. See the table below for brief descriptions. See the Introscope Configuration and Administration Guide for more information. Supportability metric name
CPU Capacity (%)

Investigator tree location


Enterprise Manager|Health

Description
Same as EM CPU Used (%) (see below). Duplicated to easily relate to Overall Capacity (%) metric, which now takes into account this metric. The number of currently connected agents. The Enterprise Manager's perflog.txt file records and reports the number of actual agents connected in the Agent.NumberOfAgents metric value.

Number of Agents

Enterprise Manager | Connections

EM CPU Used (%)

Enterprise Manager|CPU

The percent of the total available CPU was used by running Enterprise Managers during the time period specified.

Note: This number does not reflect


other processes running on the server or overall server CPU in use, but rather how much CPU the particular Enterprise Manager used. This metric is acquired from the JVM using an API introduced in the JDK 1.5. Therefore, it is supported only on some platforms. Harvest Capacity (%) Enterprise Manager|Health Percent of time needed for the data harvest in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the data harvest takes 15000 ms, then this metric value is 100. The capacity of the Enterprise Manager to handle incoming data, based on an internal metric that indicates the number of incoming metrics yet to be processed. This internal metric is divided by twice the total number of metrics. For example, if 150,000 metrics are in the to-beprocessed queue and the Enterprise Manager has a total of 300,000 metrics, the incoming data capacity will be 25%.

Incoming Data Capacity (%)

Enterprise Manager|Health

38

EM Requirements and Recommendations

Sizing and Performance Guide

Supportability metric name


Number of Metrics

Investigator tree location


Enterprise Manager|Connections

Description
The metric load on an Enterprise Manager. When an agent disconnects, this number drops. Percent of time needed for the SmartStor write process in a 15000 ms (15 second) time slice, where 100% is the full 15 seconds. For example, if the SmartStor write duration is 15000 ms, then this metric value is 100. The duration of SmartStor Capacity (%) metric time (see above) spent writing metadata. If this metric value doesnt change proportionately as the SmartStor Capacity (%) metric value increases or decreases, there may be an issue with the file system.

SmartStor Capacity (%)

Enterprise Manager|Health

Write Duration (ms)

Data Store|SmartStor|MetaData

Enterprise Manager health

39

CA Wily Introscope

SmartStor overview
Introscope 7.1 included significant optimizations in disk read/write synchronization that take advantage of a dedicated SmartStor disk. All performance improvements and sizing increases starting with Introscope 7.1 depend on those optimizations. SmartStor writes to disk data supplied from agents sent to the Enterprise Manager/Collector first, and performs all other operation after that. For example, if 10 users are running large historical queries (over 1000 metrics/query) at the same time, an Enterprise Manager performs more slowly. The users experience sluggish Workstation response time is because SmartStor is simultaneously writing new agent metric data, running extensive user queries, doing reports, and converting files to the faster query file format. The Workstation queries are slow (or metric data is aggregated) due to the disk being overloaded.

About SmartStor spooling and reperiodization


SmartStor writes live incoming data to disk in a spool format that is fast to write, but slow to query. Every hour at the top of the hour SmartStor takes the spool file from the previous hour and reformats the file into a SmartStor data file. The SmartStor data file, which is faster and easier to search than the spool file, optimizes historic query responses. This Introscope process, which is referred to as spool to data conversion (or conversion), typically takes 10 minutes. However, conversion times on different hardware perform differently due to memory, CPU power, and disk read/write speeds. A conversion time longer than 10 minutes is a potential warning sign of an overloaded Enterprise Manager. Most importantly, the conversion time should not be getting longer every hour. This is a sure sign that the system is becoming overloaded and often indicates a metric creep, in which the number of registered metrics being reported by agents is continually increasing. The most common cause of excessively long SmartStor spool to data conversion times is a file cache size that is too small to perform the required operations. This situation can be addressed by adding more physical memory. The conversion process is usually the first process to show problems if SmartStor is not using a dedicated I/O subsystem. SmartStor reperiodization is the process by which archived data files are compressed to reduce the total size of the SmartStor directory. Reperiodization is performed in two stages after midnight by default. For information about how to configure this multi-tier reperiodization, see the Introscope Configuration and Administration Guide.

40

EM Requirements and Recommendations

Sizing and Performance Guide

Reperiodization is both I/O and CPU intensive, as the data archive files are read, the data is compacted by aggregating multiple time slices, and then the resulting data is written back to SmartStor. This means that the period after midnight is the busiest time for an Enterprise Manager. The entire reperiodization process should not take more than two hours. During this time, no other Enterprise Manager operation such as report generation (see Report generation and performance on page 43) or OS-level operation should be scheduled. Note If the Enterprise Manager is stopped in the middle of reperiodization, it will, upon restart, delete the partially written files and restart reperiodization after 45 minutes. This restart may not occur during the regularly scheduled reperiodization time. The 45 minute delay allows the system to register all its agents and metrics before launching the restart of this compute-intensive reperiodization task. SmartStor spooling and reperiodization can be verified in the Enterprise Manager log in verbose mode, which records that the spooling process starts at the top of the hour. Under standard conditions, within 10 minutes, a second recorded message reports that the spooling process has completed. In addition there are three SmartStor management metrics, which you can find at this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Data Store | SmartStor | Tasks.

As shown in the figure below, the three tasks that are monitored are: Spool to Data Conversion Data Appending Reperiodization

SmartStor overview

41

CA Wily Introscope

These tasks have metric values that oscillate from 0 to 1 when the respective task is running. You can see when those tasks are running and how long they are taking by selecting a task in the tree, then picking an appropriate time from the Time Range drop down list in the Viewer pane. Top of the hour problems are generally related to slow SmartStor spooling. Early morning (after 6 A.M.) problems are usually due to reperiodization not being completed quickly enough. This usually implies that the Enterprise Manager is excessively loaded. For more information, see EM OS disk file cache memory requirements on page 47.

42

EM Requirements and Recommendations

Sizing and Performance Guide

Report generation and performance


Generating Introscope reports is very expensive in terms of CPU and disk access. The cost is primarily based on two factors: the number of graphs (total amount of data) the report time period (historical range) Reports that are either larger than 50 graphs or longer than 24 hours should not be scheduled during the hours when SmartStor is reperiodizing (usually midnight to 3:00 A.M.) because of high CPU activity and the large amount of disk activity.

Concurrent historical queries and performance


The best way to avoid disk performance problems from historical queries is to have most Introscope Workstation users view data in Live mode. Use Historical mode only for in-depth analysis, like troubleshooting and reports. On systems under heavy metric load, make sure that users are not all attempting to perform historical queries (which attempts to access the SmartStor historical archive) at the same time. CA Wily recommends a maximum of four concurrent historical queries, although this limit may differ depending on the performance of your hardware. You should also be aware that this limit decreases during spool-to-data file conversion at the top of each hour, and at midnight during reperiodization. You can also set the

introscope.enterprisemanager.query.datapointlimit property in the EnterpriseManager.properties file to specify a maximum number of metric


data points the Enterprise Manager will return from any single query. This read clamp ensures that user queries that accidentally match too much metric data do not negatively impact system performance.

About SmartStor and flat file archiving


The flat file archiving is an alternate format that can be used for metric data storage instead of SmartStor. Unlike SmartStor, flat file format writes the data in readable ascii format, which is considerably more expansive than SmartStors format. If you use flat file archiving, when archiving, you have the option of configuring flat file data to be gzipped. This reduces the amount of disk space needed considerably, but is CPU intensive to write and extremely CPU intensive to read. CA Wily has three recommendations about SmartStor and flat file archiving.

SmartStor overview

43

CA Wily Introscope

First, avoid using SmartStor and flat file archiving at the same time. Flat file archiving duplicates some of the functionality of SmartStor. In addition, flat file archivings compression feature (if enabled) requires noticeable CPU resources that can adversely affect the Enterprise Managers performance when the compression feature periodically runs. In the event that flat file archiving must be used, configure the smallest possible number of metrics to be logged. Second, do not use flat file archiving in production. Readable metric values are most useful in a QA debug environment. Third, SmartStor should not be located on the same disk as a flat file archive. SmartStor should be on its own dedicated disk. For more information, see SmartStor settings and capacity on page 55.

MOM overview
MOMs are CPU intensive, in contrast to Collectors, which are I/O and CPU intensive. For more information about MOM requirements, see MOM and Collector EM requirements on page 51 and Collector and MOM settings and capacity on page 58.

Collector overview
Collectors are I/O intensive, and perform most of Introscope's difficult and intensive calculation processing work. Cluster performance is dominated by the Collectors. Given the synchronous communication model between MOM and Collectors, the responsiveness of a MOM (in terms to data refresh to the Workstation) is related to responsiveness of the Collectors. Any performance problems causing response problems in the Collector will be magnified by the MOM. For more information see, Collector to MOM clock drift limit on page 71. If upgrading a Collector from 6.x to 8.0, as long as there is a dedicated disk for SmartStor and Boundary Blame is turned on, there should be enough resources left over on the same host to handle the new functionality including metric baselining (heuristics) and creating virtual agents. If you need to migrate a 6.x Enterprise Manager to become an 8.0 Collector, see Related Knowledge Base article(s): Migrating a 6.x Enterprise Manager to an 8.0 Collector (KB 1630)

44

EM Requirements and Recommendations

Sizing and Performance Guide

Collector metric capacity and CPU usage


If a Collector is at maximum capacity, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119, you may look at the CPU and the system doesn't appear busy. See You may wonder why Introscope requirements don't allow adding more metrics or agents to the system. The reason is that CPU monitoring tools show a snapshot. The behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next agent data processing. This happens every 7.5 seconds, which is how the 45% average CPU utilization recommendation is derived. The initial 3-4 seconds is the harvest time, recorded as the Harvest Duration metric and it must be less than 4 seconds. For more information about the Harvest Duration metric, see Enterprise Manager health on page 27. The time between harvests allows the Collector to service Workstations, perform Transaction Traces, and handle SmartStor spooling and reperiodization. Unless you're looking at a high resolution CPU/Memory/I/O trace of the Collector between 12:00 midnight and 3:00 A.M., you can't get a true picture of a Collector's resource usage. At midnight the usage pattern of everything the Collector does changes dramatically because it's about to start reperiodization. At that point, the Collector gets very busy and typically CPU utilization jumps to 80% to 90%. Also, if your CPU monitoring tool is sampling or averaging CPU snapshots over an interval longer than one second, you may not see the intense activity spikes that can cause the Collector to back up and run into problems. There are certain operations that can easily saturate the Collector's CPU, such as Transaction Tracing, large numbers of connected Workstations, large numbers of events, large historical queries, and large reports. The Collector must have additional headroom in order to handle those peaks of activity, or else it will fall behind in its processing tasks, resulting in undesirable system behavior. While a Collector's CPU usage may not look busy at one point in time, it will look busy if you turn on a large Transaction Trace or if you connect 10 more Workstations, or run a big historical query. That's why CA Wily recommends so much additional CPU headroom. On average, you can't have any more than 40% steady state usage because there are too many other operations that can immediately cause the Collector to use 100% CPU. At that point you'll start to see Workstation sluggishness and combined time slices.

Collector overview

45

CA Wily Introscope

About the CPU Overview tab


By viewing the CPU Overview tab you can assess agent CPU health and performance-related statistics in one centralized location. To view the CPU Overview tab 1 Select the CPU node under the agent. 2 Click the Overview tab in the right pane. Study the CPU Utilization graph as shown in the figure below.

46

EM Requirements and Recommendations

Sizing and Performance Guide

Enterprise Manager basic requirements


There are several basic requirements for every Enterprise Manager. Typically an Enterprise Manager needs 2 to 4 CPUs depending on the hardware platform. More CPUs will not improve performance. An Enterprise Manager with fewer CPUs than recommended results in the system performing poorly. All Enterprise Managers need a minimum of 3 GB OS RAM to effectively run at anything close to full load. EVERY Collector Enterprise Manager must have a dedicated disk I/O subsystem for SmartStor with no other processes competing for it. After those basic requirements, system performance is determined by the speed of the CPUs, the speed of the I/O subsystems, and the file cache performance. WARNING The recommendations for maximum metrics/Enterprise Manager, agents/Enterprise Manager, physical memory, and so on, should be strictly followed. If you are seeing less CPU utilization than the recommended maximum threshold (at full metrics load), it is NOT a reason to add additional load (above CA Wily recommendations) to the Collector. In general, metrics load is highly I/O bound rather than CPU intensive, so even with CPU cycles available, the Enterprise Manager can get I/O bound on metric data and the whole system can start slowing down.

Enterprise Manager file system requirements


Make sure that the file system used for Enterprise Manager files baselines.db, and traces.db is a local disk and not a network file system (NFS). Otherwise, serious performance degradation can result.

EM OS disk file cache memory requirements


How much OS memory does each Enterprise Manager need? At full load, it's typically 1.5 GB of JVM heap space allocated to the Enterprise Manager process in JVM properties, but on top of that there must be OS memory - physical RAM - for at least another 1 GB free over and above the requirements for the OS. The CA Wily recommendation is a minimum of 3 GB for a system running an Enterprise Manager; preferably 4 GB. Note If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

Enterprise Manager basic requirements

47

CA Wily Introscope

If your hardware allows it, CA Wily recommends running the OS in 64-bit mode to take advantage of the large file cache. The file cache is important for the Enterprise Manager when doing SmartStor maintenance like spooling and reperiodization. This cache resides in physical RAM, and is dynamically adjusted by the OS during runtime based on available physical RAM. Therefore, our recommendation is for 4 GB RAM. As general guidance, each Enterprise Manager should have about 1.5 GB of OS file cache available in its memory. Top of the hour problems are usually related to SmartStor spooling which are best addressed by additional physical memory, especially disk file cache. The biggest single influencing factor for SmartStor spooling is the file cache size. Typically, 32-bit Windows allows a file cache just under 1 GB, and typically SmartStor spooling files for a full load are closer to 2 GB. That difference in size causes performance pressure. In providing a larger OS file cache, you are providing a large enough Enterprise Manager file cache to allow the OS to read the entire spool file into memory, then process the profile and dump it straight back out into the SmartStor archive as a data file.

Enterprise Manager heap sizing


The appropriate Enterprise Manager heap settings depend on your Enterprise Manager OS, hardware, and the metric load. The Enterprise Manager GC parallel flag youll need to set also depends on the Enterprise Manager OS version. In the heap settings examples below, note that when the total number of metrics that the Enterprise Manager monitors changes, the heap settings also change. Enterprise Manager Hardware (OS Version)
2x2.8Ghz Xeon HT (Win 2K Adv Server)

RAM Total Example Enterprise Manager GC Flag Settings (GB) Metrics Monitored
2 90,000 lax.nl.java.option.additional=-server -Xms512m Xmx512m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m lax.nl.java.option.additional=-server -Xms800m Xmx800m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m

2x2.8Ghz Xeon HT (Win 2K Adv Server)

210,000

48

EM Requirements and Recommendations

Sizing and Performance Guide

Enterprise Manager Hardware (OS Version)


2x2.8Ghz Xeon HT (Win 2K Adv Server)

RAM Total Example Enterprise Manager GC Flag Settings (GB) Metrics Monitored
3 400,000 lax.nl.java.option.additional=-server -Xms1400m Xmx1400m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m lax.nl.java.option.additional=-server -Xms1500m Xmx1500m -showversion -XX:+UseConcMarkSweepGC XX:+UseParNewGC -XX:+DisableExplicitGC XX:NewSize=128m -XX:MaxNewSize=128m XX:PermSize=64m

2x2.8Ghz Xeon HT (Win 2K Adv Server)

500,000

If you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate Enterprise Manager JVM heap settings.

SmartStor requirements
Each EM requires SmartStor on a dedicated disk or I/O subsystem
In Introscope 7, significant performance improvements were made in SmartStor that freed up CPU resources for other features such as virtual agents, calculators, Transaction Tracing and sampling, and applications with associated heuristic calculations (baselining). What matters to SmartStor is concurrent I/O throughput and how many disk spindles are servicing those requests. Having SmartStor on a second dedicated disk is required to take advantage of these enhancements. Point the SmartStor location to a separate dedicated disk or disk-array than the Transaction Event database (traces.db) and metrics baseline (heuristics) database (baselines.db). Verify that the SmartStor file persistence is actually going to that different disk. Ensuring that the SmartStor data directory is on its own disk is the top solution to many Introscope performance issues. When SmartStor is not on its own dedicated disk, the first indication that there is a problem is when there are SmartStor spooling problems. For more information, see About SmartStor spooling and reperiodization on page 40. Note For information about a spreadsheet to help you determine your SmartStor disk requirements, see the Introscope Configuration and Administration Guide.

SmartStor requirements

49

CA Wily Introscope

SmartStor Duration metric limit


The Enterprise Manager collects health metrics about itself. Starting with Introscope 7, the Smartstor Duration metric value is recorded, which tracks how long it takes SmartStor to write data during every 15-second metric harvest cycle. As shown in the figure below, you can view the SmartStor Duration metric in this location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Tasks | Smartstor Duration (ms).

Heres the SmartStor Duration metric location.

Under standard Enterprise Manager conditions, the average Smartstor Duration value should be less than 3500 ms (3.5 sec). The Smartstor Duration value MUST be less than 15,000 ms (15 sec). If this metric value is greater than 15 seconds this indicates a critically overloaded EM. For more information, see Enterprise Manager health on page 27 and the Introscope Configuration and Administration Guide.

50

EM Requirements and Recommendations

Sizing and Performance Guide

MOM and Collector EM requirements


Some sizing requirements and performance issues apply to any Enterprise Manager, be it a MOM or a Collector. However, some sizing requirements and performance issues apply specifically to the MOM or to the Collector because MOMs are CPU intensive and Collectors are I/O and CPU intensive. The following topics describe general Enterprise Manager requirements as well as EM- and Collector-specific requirements.

Local network requirement for MOM and Collectors


Whenever possible, a MOM and its Collectors should be in the same data center; preferably in the same subnet. Even when crossing through a firewall or passing through any kind of router, the optimal response time is difficult to maintain. If the MOM and Collector are across a router or, worse yet, a packet-sniffing firewall protection router, response time can slow dramatically. To keep the cluster operational, the MOM drops any Collector that has any of these conditions: appears unresponsive through the network for more than 60 seconds (see information about the ping time threshold below) indicates its system clock has skewed more than three seconds from the MOMs clock (see Collector to MOM clock drift limit on page 71). For optimal Workstation responsiveness, the ping metric, which is reported by the MOM for each Collector each time slice, should be less than 500 ms. Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response. To view the ping metric, use the Search tab to view metrics named ping in the supportability metric section of the Investigator tree. You will find a ping metric reported for each Collector. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. Ping times over the 10 second threshold cause the Enterprise Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 2. You can adjust this threshold for your environment by changing the

introscope.enterprisemanager.clustering.manager.slowcollectorthr eshold property in the IntroscopeEnterpriseManager.properties file. For


more information, see the Introscope Configuration and Administration Guide.

MOM and Collector EM requirements

51

CA Wily Introscope

In Introscope 8.0, there is an additional ping time threshold of 60 seconds. If the ping time exceeds this value, the MOM automatically disconnects from the Collector associated with the slow ping time. This prevents the entire cluster from hanging, which is a side effect of when one Collector in a cluster is greatly underperforming. A disconnected Collector causes the Enterprise

Manager|MOM|Collectors|<host@port>:Connected metric to display a value of 3. You can adjust this threshold for your environment by changing the introscope.enterprisemanager.clustering.manager.slowcollectordis connectthreshold property in the IntroscopeEnterpriseManager.properties file. For more information, see
the Introscope Configuration and Administration Guide. Tip You can set an alert on the Enterprise

Manager|MOM|Collectors|<host@port>:Connected metric value. For more information on creating and configuring alerts, see the Introscope Configuration and Administration Guide.
When a Collector disconnects from the MOM, the metric flow from that Collector to the MOM stops. This means you will see a data gap in the Workstation metric reporting. However, the Collector is still gathering and persisting the agent metrics. When the Collector reconnects to the MOM, you can run a historical query to see the metrics reported during the disconnected period.

When to run reports, custom scripts, and large queries


In order to avoid sluggish Workstation response times and general poor performance, it's important that reports, custom scripts, and large queries not be run when the Collector is scheduled to do heavy processing. Introscope schedules heavy CPU and disk I/O intensive tasks at the top of each hour (the spooling process) and during reperiodization, which is typically scheduled to occur daily between midnight and 3 A.M. During these times, do not run or schedule other heavy processing.

52

EM Requirements and Recommendations

Sizing and Performance Guide

Introscope 8.0 EM settings and capacity


The topics below describe EM-related settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment.

Estimating Enterprise Manager databases disk space needs


In planning for your Introscope environment, you may wonder How much disk space do I need for all my Introscope databases? To answer this question, youll need to calculate your disk space needs for the three databases in which Introscope stores data: SmartStor, traces.db, and baselines.db. SmartStor, which should be on its own dedicated disk, is used to store metric data coming from agents. For more information, see SmartStor overview on page 40. The SmartStorSizing8.0.x.y.xls spreadsheet, which is located in the <Introscope_Home>/examples directory, can help you determine your SmartStor disk space requirements. For information about using the spreadsheet, see the Introscope Configuration and Administration Guide. traces.db contains all Transaction Traces and events data, such as error snapshots. This database spans multiple files. One file is created per day and this data is kept for the number of days specified in the IntroscopeEnterpriseManager.properties file. In the example file snippet below, the daily file is stored for 14 days.

introscope.enterprisemanager.transactionevents.storage.max.data. age=14
baselines.db stores all of the Introscope metrics baselining (heuristic) data in a single file. The traces.db and baselines.db databases collect and maintain data at different rates. Therefore, to determine the database disk space needs for your Enterprise Manager you will have to perform disk space calculations for traces.db and baselines.db separately, then sum the two calculations.

Introscope 8.0 EM settings and capacity

53

CA Wily Introscope

traces.db disk space calculation example


Estimating the disk space needed for your Introscope traces.db file starts by answering the questions How many events do I want to keep? How many days do I want to keep these events?. Once youve answered these questions, determining the disk space needed involves the mathematical calculations shown in the example below. By substituting the Total days to store data and Events/ day values with the values youve determined for your system, you can estimate your Enterprise Managers traces.db disk space requirements. Introscope agent average bytes/event = 4096 Note The approximate default average size of an event when stored on disk is 4 K. Total days to store data = 36 Events/day = 1000 events/minute x 60 min/hr x 24 hrs/day = 1,440,000 Note This is the maximum load number for events/day based on 1,000 events/ min, which is an example maximum load for a Windows machine. See the Sample Introscope 8.0 Collector sizing limits table on page 119 for other maximum load examples. Bytes required/day = 4096 (bytes/event) x 1,440,000 events/day = 5,898,240,000 GB required/day = 5,898,240,000 bytes required/day/(1024 x 1024 x 1024) =5.49 GB Total disk space required = 36 (total days to store data) x 5.49 GB required/day = 198 GB

baselines.db disk space calculation example


The baselines.db file should rarely exceed 2 GB. Estimating the disk space needed for your Introscope baselines.db file starts by answering the questions How many nodes are on my Enterprise Manager? and How many agents are reporting data to my Enterprise Manager? Once youve answered those questions, determining the disk space needed involves the mathematical calculations shown in the example below. By substituting the number of nodes on your Enterprise Manager, which depend on application count and Blamed backend count, and the number of agents, you can estimate your Enterprise Managers baselines.db disk space requirements.

54

EM Requirements and Recommendations

Sizing and Performance Guide

This baseslines.db calculation example makes the following assumptions: Note The number below are only examples; they are NOT provided as recommendations for any or all Introscope environment(s). Nodes/each Overview dashboard = 100 (which is very big) Heuristics/node = 3 Objects generated by each heuristic (in steady state) = 2 (objects/hr/ heuristics) x 24 hrs/day x 7 days/week = 336 objects/week 100 nodes x 3 heuristics/node x 336 objects/week = ~100,000 baseline objects/agent/week NOTE: Baselines roll over at weekly boundaries. Every baseline is stored for 30 minute increments across a week. Once you roll into the next week, the baseline data is loaded from the last week and then is updated with this weeks data. # agents reporting data to the Enterprise Manager = 200 Baseline objects/agent/week = 100,000 (from the calculation above) Bytes/baseline object = 100 MB/agent = 100,00 baseline objects/agent/week x 100 bytes/object = 10 MB/ agent/week The baselines.db file size = 10 MB/agent x 200 agents = 2 GB.

SmartStor settings and capacity


Setting the SmartStor dedicated controller property
Starting in Introscope 7.1, there is a dedicated controller property that tells the Collector that there is a dedicated SmartStor disk. Note Only the Collector requires a dedicated SmartStor database and dedicated controller. MOM machines also have a SmartStor instance, but due to the vastly smaller metrics load, are able to house the SmartStor instance on the same disk as other MOM components. In IntroscopeEnterpriseManager.properties (the Enterprise Manager properties file), the property can be seen as:

introscope.enterprisemanager.smartstor.dedicatedcontroller=true
Providing a separate disk for each SmartStor AND setting the dedicated controller property to true affects the total number of metrics an Enterprise Manager can handle because these allow for better sharing of disk resources. This allows for a number of performance enhancements including:

SmartStor settings and capacity

55

CA Wily Introscope

larger virtual agents can be created. agents can report a larger number of applications. more calculators can be used. more Management Module logic is possible. Workstation responsiveness is faster. The dedicated controller property is set to false by default. You MUST provide a dedicated disk for SmartStor in order to set this property to true; it cannot be set to true if there is only a single disk for each Collector. The reason is that with a single disk for Collector operations AND SmartStor, context switching would be performed on the disk level (rather than the software level). This could cause severe Collector and possibly OS performance problems. When the dedicated controller property is set to false, the Collector assumes that there is one disk for all Enterprise Manager operations, and therefore uses one disk-writing lock. This means that only one area at a time is written. For example, the Collector will write only to SmartStor or only to the heuristics database that supports the Investigator Overview dashboard. Performance disadvantages to having the dedicated controller property set to false are: Only one I/O task can be running at a time. SmartStor writes are in shorter segments. The disk's seek pointer is invalidated after each context switch. If there is a second disk for SmartStor, but the property is set to false, there is no performance gain by having a second disk for SmartStor. Collector sizing recommendations are reduced by 50%. When the dedicated controller property is set to true, the Collector uses two locks: one lock is dedicated to SmartStor, and the second lock is for everything else. Performance advantages to setting the dedicated controller property to true include: SmartStor I/O tasks can run concurrently with other I/O tasks, which improves the Enterprise Managers overall metric-handling ability. SmartStor can write in larger segments. The seek pointer remembers its last write placement. The lock dedicated to SmartStor reduces interruption from the metrics (heuristics) database (baselines.db), which stores metrics baseline data. For instructions about how to set the SmartStor dedicated controller property to true, see the Introscope Configuration and Administration Guide.

56

EM Requirements and Recommendations

Sizing and Performance Guide

If Redundant Array of Independent Drives/Disks (RAID) configuration is desired, CA Wily recommends RAID 0 or RAID 5. Each SmartStor database MUST reside on its own dedicated RAID setup. All the restrictions above apply to all the varied storage choices available (local disks, external storage solutions such as SAN, and so on). The SmartStor requirement for a separate disk/controller DOES NOT mean that a separate host adapter (such as fiber channel adapter, SCSI adapter, and so on) is required. It only means that a separate dedicated, physical disk or RAID setup is used for each SmartStor database. To determine if a machine being considered for use as SmartStor is a single dedicated disk or drive, you may need to determine if the machine has multiple controllers (same as multiple hard drives). It's important to understand that multiple partitions on the same drive share a controller, which is not an appropriate environment for the SmartStor instance. You can use commands like du (for disk usage) on Unix/Linux or Windows Device Manager to determine whether two drives are logically different or physically different. It's critical that the drives are physically different.

Planning for SmartStor storage using SAN


If you plan to use SAN for SmartStor storage, then each logical unit number (LUN) requires a dedicated physical disk. If you have configured two or more LUNs to represent partitions or subsets of the same physical disk, this does not meet the requirements needed for SmartStors dedicated disk.

Planning for SmartStor storage using SAS controllers


If you plan to use a serial-attached SCSI (SAS) controller for SmartStor storage, what you are considering is using a host bus adapter (HBA) with multiple channels that operate simultaneously. Each full duplex channel is known as a SAS port; each port transfers data at 3 Gb (gigabits) per second. One SAS controller can be used for the Enterprise Managers that store both SmartStor as well as the traces.db and baseline.db data. Whats important is to have a dedicated disk for SmartStor; in this case meaning that SmartStor has its own dedicated SAS port.

Enterprise Manager thread pool and available CPUs


The Enterprise Manager has a pool of threads that do the work of harvesting metrics every 15 seconds. The size of the Enterprise Manager thread pools are based on hosting one Enterprise Manager per machine. However, if there is more than one Enterprise Manager running on a single host machine, this results in an excessive number of threads.

SmartStor settings and capacity

57

CA Wily Introscope

For example, if you are running five Enterprise Managers on an 8 CPU quad core machine, each Enterprise Manager bases the size of its thread pools on the 32 available CPUs. This configuration can reduce throughput due to context switching as the threads from all five Enterprise Managers contend for the 32 available CPUs. In Introscope 8.0, the Enterprise Manager properties file (IntroscopeEnterpriseManager.properties) includes the new available processors (CPUs) property to tell the Enterprise Manager how many processors (CPUs) it can expect to have available:

introscope.enterprisemanager.availableprocessors=
See the Introscope Configuration and Administration Guide for more information about setting this property. To continue the example, in the case where there are five Enterprise Managers on a host machine with 32 CPUs, you would allocate six processors for each Enterprise Manager. Youd then set the available processors property to six as shown:

introscope.enterprisemanager.availableprocessors=6

Collector and MOM settings and capacity


The topics below describe the Collector and MOM settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment.

MOM disk subsystem sizing requirements


The MOM requires more powerful hardware (CPUs) than Collector with the exception of the disk subsystem (the MOM usually performs little I/O). For some examples, see the Sample Introscope 8.0 MOM sizing limits table on page 122. The MOM does have a SmartStor instance, which persists metrics generated by virtual agents, as well as alert states and calculator results from the calculated metrics and supportability metrics from its peer Collectors. However, the MOM doesn't need a second dedicated disk for SmartStor, because the lesser metric load being reported to the MOM doesn't require a second disk.

58

EM Requirements and Recommendations

Sizing and Performance Guide

MOM hardware requirements


The maximum load for a MOM-Collector system is 5,000,000 (5 million) metrics. Although a MOM can connect to Collectors representing a total of 5,000,000 metrics, in each 15-second time slice, the MOM itself can only receive 1,000,000 (one million) metrics total. All or none of these 1,000,000 MOM metrics can be associated with calculators and alerts (metrics associated with calculators and alerts are called subscribed metrics). For more information about subscribed metrics, see About metrics groupings and metric matching on page 78.) In order to handle this load, the MOM requires more powerful hardware (CPUs) than Collector with the exception of the disk subsystem (the MOM usually performs little I/O). For some examples, see the Sample Introscope 8.0 MOM sizing limits table on page 122. For more information about subscribed metrics, see About metrics groupings and metric matching on page 78. If you are running a MOM near maximum capacity (for example, a 5 million metric cluster or 1 million subscribed MOM metrics), the MOM must run on a 64-bit JVM with a 12 GB heap size. The machine must have physical RAM of at least 14 GB. For more information, see Configuring a cluster to support 1,000,000 MOM metrics on page 61.

MOM to Collectors connection limits


CA Wily recommends using the smallest number of Collectors needed to accommodate the number of agents providing metrics. The maximum load depends on the hardware used, as shown in the Sample Introscope 8.0 MOM sizing limits table on page 122. The more Collectors to which a MOM is connected, the more complicated the system becomes and the greater likelihood for instability or failure. Performance issues that arise as Collectors are connected to a MOM include the following: MOM to Collector clock sync issues may be more difficult to manage. For more information, see Collector to MOM clock drift limit on page 71. Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers. the system may take longer to start up. increased likelihood that a misbehaving Collector affects the entire cluster. Note In a clustered environment, a single Collector that is performing poorly can make it appear as if the entire cluster is performing poorly. For these reasons, a single MOM should be connected to a maximum of 10 Collectors.

Collector and MOM settings and capacity

59

CA Wily Introscope

It is important to ensure that every Collector is running smoothly because any individual nonresponsive Collector causes the entire system to lock up until the Collector either responds, drops its connection, or the MOM times it out (see Local network requirement for MOM and Collectors on page 51). This is because SmartStor data is held on the Collectors, not on the MOM. So to retrieve query or alert information, the MOM must wait for every Collector to respond with its portion of the result before sending the combined query or alert data response back to the Workstation. The Workstations, in turn, are delayed waiting for the MOM's compiled data to display. The responsiveness of a cluster is therefore the response of its slowest connected Collector. In contrast, a single standalone Enterprise Manager has no outside dependencies.

MOM to Workstation connection limits


For information about this topic, see Workstation to MOM connection capacity on page 102.

Metric load limit on MOM-Collector systems


The maximum load for a MOM-Collector system is 5,000,000 metrics. Although a MOM can connect to Collectors representing a total of 5,000,000 metrics, in each 15-second time slice, the MOM itself can only receive 1,000,000 metrics total (hardware dependent). All or none of these 1,000,000 MOM metrics can be associated with calculators and alerts (Metrics associated with calculators and alerts are called subscribed metrics. For more information about subscribed metrics, see About metrics groupings and metric matching on page 78.) If that number of calculators and alerts is exceeded, the Collector cluster start up time can become slow. In planning for your systems metric load, CA Wily recommends that a MOM not be connected to more than 10 Collectors. For more information, see MOM to Collectors connection limits on page 59. In addition, each MOM can handle a maximum metrics load no greater than that recommended in the Sample Introscope 8.0 MOM sizing limits table on page 122. For example, lets say you are planning an environment comprised of a MOM and nine Windows-based Collectors. According to the Sample Introscope 8.0 Collector sizing limits table on page 119, each Collector can handle a maximum of 500,000 metrics. However, if you choose to max out each Collector at 500,000 metrics each, you can connect ten Collectors to the MOM to total 5,000,000 metrics in the MOM-Collector system. If however, your environment will typically generate 1,000,000 metrics, you could set up one Collector to handle 200,000 metrics and the remaining eight Collectors to handle 100,000 each (totalling 800,000 metrics), for a MOMCollector system total of 1,000,000 metrics.

60

EM Requirements and Recommendations

Sizing and Performance Guide

Or you could set up four Collectors to handle 200,000 metrics each (totalling 800,000 metrics) and the remaining five Collectors to handle 40,000 metrics each (totaling 200,000 metrics), for a MOM-Collector system total of 1,000,000 metrics. In order for Introscope to support 1,000,000 metrics on the MOM, you need to configure the MOM and meet specific JVM requirements on each clustered Collector. See Configuring a cluster to support 1,000,000 MOM metrics on page 61.

Configuring a cluster to support 1,000,000 MOM metrics


In order for Introscope to support 1,000,000 metrics on the MOM, you need to configure the IntroscopeEnterpriseManager.properties files on each Collector in the cluster. In addition you need to meet specific JVM requirements on the MOM and all the clustered Collectors. To configure the cluster to support 1,000,000 MOM metrics 1 Run the MOM on a 64-bit JVM with 12 GB heap size. The machine must have physical RAM of at least 14 GB. 2 On each Collector in the cluster, configure the

IntroscopeEnterpriseManager.properties file.
a On each Collector machine in the cluster, go to the <Introscope_Home>/ config directory and open the IntroscopeEnterpriseManager.properties file. b Add this transport property value as shown:

transport.outgoingMessageQueueSize=4000
c Save and close the IntroscopeEnterpriseManager.properties file. 3 Run each Collector in the cluster on a 32-bit JVM with 1.5 GB heap size. The required Collector configuration as well as MOM and Collector JVM sizing requirements are complete. For MOM sizing examples, see Sample Introscope 8.0 MOM sizing limits table on page 122. For Collector sizing examples, see Sample Introscope 8.0 Collector sizing limits table on page 119.

Collector and MOM settings and capacity

61

CA Wily Introscope

MOM hot failover


If the MOM gets disconnected or goes down due to, for example a hardware or network failure, you can configure a second MOM to take over using hot failover. In configuring MOM hot failover you have the choice of setting up two MOMs to both act in a primary role (peers), or you can configure one MOM each to act in primary and secondary (backup) roles in case of failure. In this network topology, the two MOMs share a single Introscope installation, and the MOMs and the Introscope installation are on three different machines. The Introscope installation can reside on a Storage Area Network (SAN) file system or can be shared using a Network Attached Storage (NAS) protocol such as Network File System (NFS) or Server Message Block (SMB). For information on configuring MOM hot failover, see the Introscope Configuration and Administration Guide. Note Hot failover is intended for primarily for the MOM because the MOM is a single point of failure in the Introscope clustering architecture. The agent load balancing feature provides fault tolerance for the Collectors in a cluster. In the case of a Collector failure, agents reconnect to the MOM and are redirected to other Collectors. See Agent load balancing on MOMCollector systems on page 63.

Alternative method of configuring agent to MOM hot failover


When trying to connect to an Enterprise Manager, the agent tries all the IP addresses for a given host name. Therefore, if you have defined a host in DNS with the IP addresses of the primary and secondary MOMs, then you dont need to configure failover in the agent. You can instead specify the host name and the agent connects to whichever MOM is running.

Configuring Workstation log-in for hot failover


When trying to connect to an Enterprise Manager, the Workstation tries all the IP addresses for a given host name. Therefore, if you have defined a host in DNS with the IP addresses of the primary and secondary MOMs, then you can specify the host name and the Workstation connects to whichever MOM is running. For information about administering Workstation, see the Introscope Workstation User Guide.

62

EM Requirements and Recommendations

Sizing and Performance Guide

Agent load balancing on MOM-Collector systems


In Introscope 8.0, the MOM uses agent load balancing to balance the metric load between Collectors in a clustered environment. The MOM equalizes the metric count among the Collectors by ejecting participating 8.0 agents from overburdened Collectors. The ejected agents reconnect to the MOM, then are reallocated to under-burdened Collectors. Note Agent load balancing is not the same as a metric clamp. Agent load balancing is carried out by the MOM, which disconnects and connects agents to specific Collectors based on the current metric load. A metric clamping is a limit, or clamp, on the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. For more information on metric clamps, see Metric clamping on page 96. Setting up agent load balancing requires these high-level steps, which include three key configurations: the metric weighting factor, the metric load threshold, and how often the MOM rebalances agents: Step 1 Determining how MOMs assign agents to Collectors on page 64 Step 2 Setting up the agent load balance metric weight load on page 64 Step 3 Setting the agent load balance metric threshold on page 65 Step 4 Setting the agent load balance interval property on page 66 To configure agent load balancing, see the Introscope Configuration and Administration Guide.

Collector and MOM settings and capacity

63

CA Wily Introscope

Determining how MOMs assign agents to Collectors


Several factors determine how the MOM assigns an agent to a Collector. Connection type An agent is only assigned to a Collector that supports the same connection type that the agent uses to connect to the MOM. For example, if the agent connects to the MOM using HTTP, then the Collector must have enabled HTTP connections. Configuration done in the loadbalancing.xml file You fill out the loadbalancing.xml file to restrict agents to a specific set of Collectors, or exclude agent from a specific set of Collectors. For more information, see the Introscope Configuration and Administration Guide. Agent connection history with a specific Collector To prevent an explosion of SmartStor data as 8.0 agents are transferred from one Collector to another in the cluster, if an 8.0 agent has connected to a Collector previously, the MOM favors that Collector for future connections unless there is an alternative Collector that is underloaded or a favored Collector is overloaded. Note Pre-8.0 agents do not connect to the MOM; instead, they must connect directly to a Collector.

Setting up the agent load balance metric weight load


When agent load balancing is configured, the MOM allots 8.0 agents to Collectors based on weight-adjusted load. You can adjust the weighting factor of individual Collectors in a cluster to improve performance. The number of metrics that a Collector can handle, which determines a Collectors relative power in a cluster, is determined by a number of factors including CPU power and memory, number of applications, and network speed. Therefore, when setting up agent load balancing you can use the weighting factor to ensure that the MOM assigns fewer metrics to less powerful Collectors. You can help avoid cluster performance problems by setting up the metric weight load so that the more powerful Collectors handle the bigger metric loads. You set the weight load

introscope.enterprisemanager.clustering.login.em1.weight property in the IntroscopeEnterpriseManager.properties file as described in the


Introscope Configuration and Administration Guide. Note In the

introscope.enterprisemanager.clustering.login.em1.weight
property name, em1 is an arbitrary identifier. Each Collector has a unique identifier. Provide an appropriate identifier for your environment.

64

EM Requirements and Recommendations

Sizing and Performance Guide

The value of the

introscope.enterprisemanager.clustering.login.em1.weight property
is a positive number that controls the relative load of the Collector. If the factors affecting how the MOM assigns agents to a Collector (see Determining how MOMs assign agents to Collectors on page 64) are not providing a different agent connection decision, then the weight load on a specific Collector divided by the total weight load of the cluster is the percentage of the metric load assigned to that Collector. The MOM then uses weight-adjusted metric counts when assigning agents to Collectors and when rebalancing the agent metric load. For example, a MOM connects to three Collectors that all have zero metrics currently being reported. If Collector A has a weight of 150, Collector B has a weight of 100 and Collector C has a weight of 50, then the MOM assigns metrics to Collectors A, B, and C approximately in the ratio of 3:2:1.

Setting the agent load balance metric threshold


You must set the

introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file, to configure the clusters


tolerance, or threshold, for imbalance. This is another agent load balancing property that affects performance. The default introscope.enterprisemanager.loadbalancing.threshold property setting is 20,000 metrics, which means that a Collector would have to be 20 K metrics out of balance before the MOM rebalances the agents. A Collector is out of balance if it is either under or over the weight-adjusted cluster average by more than the threshold. See the Agent load balancing usage scenarios on page 66 for some examples. To properly configure the

introscope.enterprisemanager.loadbalancing.threshold property,
choose a number of metrics that prevents the MOM from constantly reallocating agents. When the MOM disconnects an agent from one Collector and assigns it to another, overhead is added to the cluster. When load rebalancing is needed, the added overhead is fine. However, unnecessary rebalancing adds unnecessary overhead to the cluster, which can diminish system performance. When a cluster is a little unbalanced, there is not a negative effect on performance because there is a certain amount of flux. An appropriate introscope.enterprisemanager.loadbalancing.threshold property level is a value at which the MOM brings agents into balance by making fewer, but larger, adjustments, which is better for system performance than more, but smaller, adjustments.

Collector and MOM settings and capacity

65

CA Wily Introscope

Setting the agent load balance interval property


You must set the introscope.enterprisemanager.loadbalancing.interval property in the IntroscopeEnterpriseManager.properties file to tell the MOM how often to check the cluster for possible rebalancing. The default is 600 seconds (ten minutes) and the minimum is 120 seconds (two minutes). Consider the actual needs of your system when setting the interval value. Every time the MOM checks for rebalancing, it must take an action. Then the system needs time to adjust to the changes that occurred due to the rebalancing. If the MOM is set to rebalance again too soon, the system wont have adapted to the previous changes. For this reason and because the MOM must handle a basic workload that requires time to carry out, CA Wily does not recommend you use the minimum load balance interval of 120 seconds.

Agent load balancing usage scenarios


Here are some examples to help you understand how agent load balancing operates in a clustered environment.

Agent load balancing when a Collector is added


A MOM connects to Collectors A and B. There are 36,000 metrics being reported to Collector A and 30,000 metrics reported to Collector B. You set the metric threshold to 10,000 metrics using the introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file. Collector C, which has 24,000 metrics being reported to it, is added to the cluster. The MOM does not rebalance the metric load since there is an average of 30,000 metrics per Collector (36,000 + 30,000 + 24,000)=90,000 metrics/3 Collectors) and none of the Collectors differs from 30,000 by more than the 10,000 metric threshold.

Agent load balancing when a Collector fails


A MOM connects to Collectors A, B, and C. There are 36,000 metrics being reported to Collector A, 30,000 metrics to Collector B, and 24,000 metrics to Collector C. Collector A fails and the agents that reported to Collector A re-connect to the MOM. The MOM redirects approximately 15,000 metrics to Collector B and 21,000 to Collector C. Now Collectors B and C both have 45,000 metrics being reported to them.

66

EM Requirements and Recommendations

Sizing and Performance Guide

Agent load balancing when a Collector restarts


When Collector A recovers after failure, the cluster is unbalanced. There should be an average of 30,000 metrics per Collector being reported to each, yet Collectors B and C have 45,000 metrics being reported to both of them, which is 15,000 metrics above average. This exceeds the threshold of 10,000 metrics, so the MOM ejects agents with a total of 15,000 metrics each from Collectors B and C and redirects all 30,000 metrics to Collector A. This results in 30,000 metrics being reported each to Collectors A, B, and C.

Agent load balancing with weights


A MOM connects to Collectors A, B, and C. The threshold is set to 10,000. There are 24,000 metrics reporting to Collector A, 30,000 to Collector B, and 36,000 to Collector C, making a total of 90,000 metrics. The cluster has an average of 30,000 metrics per Collector. You have set the weight of Collector A to 150, Collector B to 100, and Collector C to 50. The average weight is 100 (the sum of the weights divided by the number of Collectors or 150 + 100 + 50/3). Each Collector should have a metric load proportional to its relative weight. Since Collector A has a weight of 150, it should therefore have 45,000 metrics (its weight is 50% above average so its metric load should be 50% above the 30,000 metric average). Collector B has an average weight and therefore should have the average metric load, or 30,000 metrics. Collector C has a weight 50% of the average and therefore should have 50% of the average metric load, or 15,000 metrics. Based on these relative weights and metric averages, the cluster is unbalanced. Collector A is underloaded because it is under the weight-adjusted average by more than the threshold (24,000 - 45,000= -21,000). Collector B is perfectly balanced because at 30,000 metrics, its metric load is equal to its weightadjusted average. Collector C is overloaded because it is over the weightadjusted average by more than the 10,000 metric threshold (36,000 - 15,000= 21,000). The MOM rebalances the cluster by ejecting agents with 21,000 metrics from Collector C and redirecting them to Collector A.

Collector and MOM settings and capacity

67

CA Wily Introscope

Agent load balancing when a Collector is added to the cluster


You determine that your cluster of three Collectors A, B, and C is overloaded. You dynamically add Collector D by adding a set of connection properties to the MOMs IntroscopeEnterpriseManager.properties file. In addition, you have set the metric threshold for all the Collectors in this cluster to 10,000 metrics using the introscope.enterprisemanager.loadbalancing.threshold property in the IntroscopeEnterpriseManager.properties file. The average metric load is 22,500 metrics reporting each to Collectors A, B, and C, while Collector D has zero metrics being reported to it. The MOM rebalances the cluster because the difference between the average load of 22,500 metrics and zero metrics is greater than the threshold of 10,000 metrics. The MOM rebalances 7,000 metrics each from Collectors A, B, and C and redirects all 21,000 metrics to be reported to Collector D. This results in 23,000 metrics being reported to Collector A, 23,000 metrics to Collector B, 23,000 metrics to Collector C, and 21,000 metrics to Collector D.

Agent load balancing when agents prefers a specific Collector


If you have an agent that prefers a particular Collector, connect the agent directly to that favored Collector. This prevents the MOM from ejecting the agent, because the MOM only rebalances agents that were redirected to the Collector by the MOM.

Avoid Management Module hot deployments


WARNING Do not perform Management Module hot deployments on production Collectors or MOMs. A Management Module hot deployment locks the system and also prevents the metric data from being reported. Hot deployment of virtual agents and Management Modules is very CPU intensive and can lock up Collectors for a couple of minutes during which metric harvest does not happen. This can happen if you change the virtual agent definitions or redeploy Management Modules in the MOM or Collector; the consequence can be that the cluster stops responding to Workstation users for extended periods. CA Wily strongly recommends not doing Management Module hot deployments on production Collectors and MOMs. You may perform a hot deployment during development and when developing Management Modules. However, if you are working with a large fully loaded Enterprise Manager or a large cluster, avoid performing a Management Module hot deployment, as it is likely that the system will freeze. For more information about virtual agents, see the Introscope Java Agent Guide.

68

EM Requirements and Recommendations

Sizing and Performance Guide

Collector applications limits


In Introscope 8.0 under full metric load, an individual Collector can accommodate the total number of applications from all agents (depending on hardware) as shown in Sample Introscope 8.0 Collector sizing limits table on page 119. An overloaded Enterprise Manager starts to combine metrics, so once you approach the maximum number of applications limit, add a new Collector and break up the metric load. The calculation of application heuristics are very CPU intensive on the Collector. CA Wily recommends x86 architectures for two reasons: the higher clock speed and the ability to execute individual threads faster (faster response times). In contrast, RISC architectures are better at executing threads in parallel (greater throughput). Introscope architecture changed greatly in version 7. The new architectural paradigm dictates that the Collector limit for applications monitored with the Overview dashboard is now stated as a total number rather than an average per agent. For information about finding applications in your Introscope environment, see Enterprise Manager overview on page 17. For information about the Overview dashboard, see the Introscope Workstation User Guide.

Collector metrics limits


For metrics limit examples by hardware type, see the Sample Introscope 8.0 Collector sizing limits table on page 119. One indication that an Enterprise Manager is overloaded is that it starts to combine metric time slices. When this happens, a message appears in the Enterprise Manager log at the Warning level saying that SmartStor isn't keeping up with live data. In addition, there is a another message in the Enterprise Manager log in verbose mode stating the down-sampled period for any combined time slices. In Introscope, a downsampled period is a time period that is disproportionately large for the associated SmartStor data storage tiering level. For example, in Data Tier 1 (relatively current data), each data point for reported metric data represents a 15 second period. If SmartStor gets slow, and the Enterprise Manager cant keep up, instead of saving two points of 15-second data, SmartStor stores only one point every 30 seconds, halving the amount of data it needs to write to disk. At Data Tier 2 (older but not the oldest data), all the 15second period data is reperiodized to cover 60 seconds. Typically for Data Tier 2 data, reperiodization means that four metric data points, representing 60 seconds, are combined into a single 60-second data point. The same process is done once more to combine Data Tier 2 data into the oldest set of data, which is Data Tier 3 data.

Collector and MOM settings and capacity

69

CA Wily Introscope

So when your Collector approaches the metrics limit and you see the warning messages described above, add a new Collector to your system. Note If you are running a standalone Enterprise Manager that is approaching the metrics limit, you may need to implement a clustered environment.

Collector events limits


The Collector treats ChangeDetector events, Transaction Traces, errors, stall events, and so on as event objects, and persists them in object databases attached to the Enterprise Manager. For example, Transaction Traces and errors are usually stored in the traces.db file. The maximum events limit represents the total number of events a Collector can receive and persist from all agents. There is one limit for steady-state event persistence and another for burst capacity. Steady state means 24/7 ongoing event activity. Burst capacity means that the Collector can sustain a heavy events load for no more than a couple of hours, but not 24 hours. For burst limit examples, see the Sample Introscope 8.0 Collector sizing limits table on page 119. If you want to know how many events are actually being received in your system, you can count the number of Transaction Traces per time slice. That number is seen in the Investigator tree Enterprise Manager health and supportability metrics as Data Store | Transactions:Number of Inserts Per Interval. The only Introscope events that are potentially high volume are Transactions Traces and ChangeDetector events. The other types of events are not as common, and the number of errors and stall events should be fairly small. This is because there are throttles at the agent side to prevent large numbers of errors being sent to the Enterprise Manager. As an example (see Sample Introscope 8.0 Collector sizing limits table on page 119), the steady state limit for all events on an AMD Opteron-processor based hardware is about 1,000 events per minute. The burst limit is five times that. So for the Opteron, in this example, the burst limit is 5,000 events per minute.

Collector agent limits


Introscope 7.x changed the agent connection architecture from multiple threads per connection to a pooled thread non-blocking architecture. The 6.x architecture imposed very strict limits on the number of agents that could be connected to an Enterprise Manager. Introscope 8.x agents use the same connection pool mechanism as 7.x, but 6.x agents do not. The new higher limit, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119, only applies if ALL agents are 7.x or 8.x.

70

EM Requirements and Recommendations

Sizing and Performance Guide

In Introscope 8.0, the Enterprise Manager can take advantage of additional CPUs to increase the maximum agents limit. The Enterprise Manager must be using 4 CPUs or Cores to take advantage of the increased Collector agents capacity (see the Sample Introscope 8.0 Collector sizing limits table on page 119). This limits are dependent on the specific hardware in use. The number of currently connected agents is available as the Number of Agents Enterprise Manager health and supportability metric, which you can find in the Investigator tree at this location:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Connections | Number of Agents.

An overloaded Enterprise Manager starts to combine metrics, so once you approach the agent limit, add a new Collector. An inappropriately configured agent can create thousands of metrics in quick succession and overload the Enterprise Manager. To prevent this, the Enterprise Manager uses a metric clamp. For information about metric clamping, see Metric clamping on page 96.

Collector hardware requirements


The hardware required to run a Collector at maximum load is primarily dependent on the machine's CPU speed and dedicated disk I/O subsystem. Additional CPUs beyond 2 to 4 CPUs (hardware platform dependent) will not increase the capacity of a given Collector on specific hardware. Faster CPUs will. Check the Sample Introscope 8.0 Collector sizing limits table on page 119 for examples of appropriate hardware platform, OS, and CPU.

Collector with metrics alerts limits


For information about this topic, see About alerted metrics and slow Workstation startup on page 81.

Collector to MOM clock drift limit


MOM and Collector clocks need to be synchronized to within AT MOST three seconds. If the clocks drift by more than that amount, the MOM releases the connection with the Collector. Then the MOM attempts to reconnect periodically (every minute) and checks to see if the clocks are in sync. If not, the connection fails. In addition, any clock drift between the Collectors and the MOM, even within the required 3-second limit, will cause disproportionate delays in Workstation responses.

Collector and MOM settings and capacity

71

CA Wily Introscope

Important You must run time server software to synchronize at regular intervals the clocks of all the machines in the cluster. Time server software synchronizes the time on a machine with either internal time servers or internet time servers. Workstation sluggishness or unresponsiveness is rarely caused by a problem in the Workstation or MOM. It is usually caused by a single unresponsive Collector, which propagates to the MOM and then the Workstation, and is magnified when Collectors are clustered. One way to determine which Collector is slowing the system down is to look at the round-trip response time from MOM to each Collector. Each Collector has a ping metric that represents the MOM to Collector round-trip response time, and for optimal Workstation response time, should be less than 500 ms on average. This is equivalent to the GetEvent metric in Introscope 7.0. The ping metric shows how quickly the Collectors are responding to messages from the MOM. Note The Introscope ping metric monitors only the lower boundary of the round-trip response time from the MOM to each Collector. This ping time is not the same as the network ping time, which is the sending of an ICMP echo request and getting an echo response. The ping metric is a good way to diagnose which Collector is responding slowly. Ping times of 10 seconds or longer indicate a slow Collector that may be overloaded. If the ping time is above the 10 second threshold for extended periods of time, investigate the overall health of the Collector that is reporting the slower ping time. Check for obvious signs that this Collector is overloaded, such as the Collector is combining time slices or receiving very large numbers of events. For more information about Collector health, the ping time threshold, and the ping metric, see Local network requirement for MOM and Collectors on page 51.

Reasons Collectors combine slices


If a Collector combines time slices throughout the day and appears to respond slowly despite being at or below the maximum recommended capacity limits, one of these four reasons is likely to be causing the problem: Other processes are running on the Collector.

72

EM Requirements and Recommendations

Sizing and Performance Guide

The sizing guideline provided for any hardware configuration assumes that no other processes are running on the host. If for example, the Sample Introscope 8.0 Collector sizing limits table on page 119 states that a Collector running on a 2 CPU Xeon can handle 500,000 metrics, that assumes there is no other server or database process running on the machine. This is true for any background process, but is especially important for processes that might impact the disk I/O performance or have a large memory footprint. The Collector doesn't like contention for its disk or the memory resources. This is a significant factor in many performance problems. I/O contention with SmartStor and other processes including the Enterprise Manager itself; SmartStor is not located on a separate disk or I/O subsystem The virtual agent is poorly configured. Very large virtual agents or poorly configured virtual agents with a lot of metrics will start to use up the CPU resources. The two biggest CPU drains are metrics baseline (heuristics) and virtual agents because of the large amount of calculation involved in both. Large Transaction Traces are running continuously. The process of accepting and persisting events like Transaction Traces involves deserialization and indexing, which are very CPU intensive. A very large number of Transaction Traces uses a lot of Collector CPU resources.

Increasing Collector capacity with more and faster CPUs


Adding two additional CPUs beyond the minimum 2 CPUs required increases the maximum agents per Collector limit, doubles the maximum number of applications, and increases number of metrics that can be placed in metric groupings. However, the maximum metrics limit remains the same. However, faster CPUs may help improve the Enterprise Manager performance, for example with faster Workstation query response times. See the limits shown in the Sample Introscope 8.0 Collector sizing limits table on page 119. Note In this guide, 2 CPUs is interchangeable with Dual Core and 4 CPUs is interchangeable with Quad Core.

Collector and MOM settings and capacity

73

CA Wily Introscope

Standalone EM hardware requirements example


Here is a standalone Enterprise Manager hardware requirements example. This should help you understand the various components and requirements you'll need to consider if you are deploying a standalone Enterprise Manager in your Introscope environment. Important The Enterprise Manager described below is only an example; it is NOT provided as the only recommended Enterprise Manager for any or all Introscope environment(s). EM Component Example Requirement

Number of EM instances/Server 1 Server Type and Model Operating System CPU Physical RAM Disk I/O Subsystem Windows Server 2003 Windows (running in 64-bit mode for optimum file cache size) Two to four Intel Xeon CPUs @ 2.8 GHz 4 GB The OS resides on a separate physical disk. RAID 0 or RAID 5 configuration. Drive Speed: 10k RPM or greater

Running multiple Collectors on one machine


All Collectors need a minimum of two CPUs to perform their key operations. Adding an additional 2 CPUs for a total of 4 CPUs helps increase these limits: number of applications per Collector number agents per Collector number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). You can run multiple Collectors on one machine as long as you follow these requirements: Run the OS in 64-bit mode to take advantage of a large file cache. The file cache is important for the Collectors when doing SmartStor maintenance, for example spooling and reperiodization. File cache resides in the physical RAM, and is dynamically adjusted by the OS during runtime based on the available physical RAM. CA Wily recommends having 3 to 4 GB RAM per Collector.

74

EM Requirements and Recommendations

Sizing and Performance Guide

There should not be any disk contention for SmartStor, meaning you use a separate physical disk for each SmartStor instance. If there is contention for SmartStor write operations, the whole system can start to fall behind, which can result in poor performance such as combined time slices and dropped agent connections. The Baseline.db and traces.db files from up to four Collectors can reside on a separate single disk. In other words, up to four Collectors can share the same physical disk to store all of their baseline.db and traces.db files.

Collector and MOM settings and capacity

75

CA Wily Introscope

76

EM Requirements and Recommendations

CHAPTER

Metrics Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related metrics requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Metrics background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 78 79 79 80 80 80 80 81 81 81 82 82 83 84 84 85 91 92 94 96 96 98

About metrics groupings and metric matching. 8.0 metrics setup, settings, and capacity Matched metrics limits . . . . . .

Inactive and active metric groupings and EM performance . SmartStor metrics limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance and metrics groupings using the wildcard (*) symbol Virtual agent metrics match limits

About alerted metrics and slow Workstation startup . Detecting metrics leaks . Metrics leak causes . . . Finding a metrics leak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

About aggregated metrics and Management Module hot deployments

Metrics for diagnosing a metrics leak Detecting metric explosions Metric explosion causes . . Finding a metric explosion .

How Introscope prevents metric explosions SQL statements and metric explosions . SQL statement normalizers Metric clamping . . . . . . . . . . . . Enterprise Manager dead metric removal

SmartStor metadata files are uncompressed .

Metrics Requirements and Recommendations

77

CA Wily Introscope

Metrics background
Every 15 seconds, the metrics harvest cycle takes place on the Enterprise Manager. During this process, the two sets of metrics data reported by agents are aggregated by the Enterprise Manager. This time slice data is processed to perform calculations, check alerts, update heuristics, update Workstation views and are persisted to disk by SmartStor. Typically at load levels close to the limits recommended in the Sample Introscope 8.0 Collector sizing limits table on page 119, the harvest duration time takes no more than about 3 to 4 seconds. The metrics limit that an individual Collector can handle is influenced by the CPU speed. As discussed in EM basic requirements on page 20, CA Wily recommends two to four dedicated CPUs per Collector (depending on hardware platform). Additional dedicated, physical CPUs won't increase the number of metrics and agents a Collector can handle. However, faster CPUs may help increase the Collector's maximum capacity. Introscope business logic is handled by the following: total number of metrics groupings maximum number of metrics in a metrics groupings number of metrics persisted per minute. Understanding metric groupings and metric matching, then following the guidelines discussed in Matched metrics limits on page 79 can be helpful in avoiding performance problems.

About metrics groupings and metric matching


Introscope metrics, which measure application performance that Introscope tracks and records, are identified by strings. Every metric in Introscope has a string identifier name that includes its host, process, agent, and resource name such as JSP|_Shopping_Cart_JSP:Average Response Time (ms). The structure of your Investigator tree reflects the resource name. Introscope's business rules are built around the concept of metrics groupings. A metrics grouping is a logical grouping of metric resource names or identifiers that youve defined using a regular expression. Whenever you create a regular expression, you are creating a metrics grouping. If a regular expression returns one or more results, those results are the metrics grouping. Metrics groupings are stored in Management Modules. A metric can match no groups, one group, or many groups. Metrics groupings are used by Introscope as described in the following process:

78

Metrics Requirements and Recommendations

Sizing and Performance Guide

1 Introscope business logic monitors including include Alerts, Dashboards, and the Workstation Investigator tree want data from the Enterprise Manager. 2 Introscope business logic monitors request metric data using a metric group. For example an Enterprise Manager gets the Workstation request, Give me the data for the Servlets metric group. 3 When the data query is submitted, the Enterprise Manager scans all metrics to see which match the metric group Servlets. Those metrics are then subscribed to. 4 Every 15-second harvest cycle, the metrics that are subscribed to have their 15-second time slice data routed to the subscribing Introscope business logic monitor. The total number of metrics that the Collector must assess during each time slice can easily become so big that it cant process all the business logic youve defined for all your metrics in the 15 second harvest cycle period. This situation can lead to performance problems. Therefore, CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metrics limit if you are running a 2 CPU Collector. And no more than 30% of the metrics limit if are running a 4 CPU Collector. For example metrics limits, see Sample Introscope 8.0 Collector sizing limits table on page 119.

8.0 metrics setup, settings, and capacity


The topics below describe metrics-related settings and capacity limits required to set up, maintain, or configure your Introscope 8.0 environment.

Matched metrics limits


The limiting factors for metrics are: SmartStor I/O activity the number of metric groups defined It doesnt matter how many alerts that youve set. What matters is how many metrics are matched. CA Wily recommends that the total number of metrics to be placed in metrics groupings is no more than 15% of the metric limit if the Enterprise Manager is using its minimum requirement of 2 CPUs. If an Enterprise Manager is using 4 CPUs, this limit increases to 30%. For example limits, see Sample Introscope 8.0 Collector sizing limits table on page 119. Note If you are using standalone Enterprise Managers, you define metrics groupings on the Enterprise Manager. However, if you are using clustered Collectors, set up metrics groupings on the MOM.

8.0 metrics setup, settings, and capacity

79

CA Wily Introscope

Inactive and active metric groupings and EM performance


Enterprise Manager performance can be dependent on whether the Enterprise Manager is handling inactive or active metrics groupings. An inactive metrics grouping is metric grouping data that is not being requested by a Workstation. In this case, the Enterprise Manager looks at the calculations it needs to perform and alerts it needs to carry out (such as send out an e-mail message), and doesnt need to do extra work to send metric grouping information to a Workstation. An active metrics grouping is metric grouping data that is being requested by a user logged in at a Workstation. For example, an Introscope Administrator wants to look at a graph that is representing a metric grouping. In this case, the Enterprise Manager has to provide all the data for the graph to the Workstation in addition to performing the calculations and handling the alerts.

Performance and metrics groupings using the wildcard (*) symbol


Do not create metrics groupings or regular expressions that use only the wildcard or asterisk (.*) symbol and no other specifiers. Running the search term (.*) creates a metric grouping that matches all metrics in the system. Subsequent business operations on that grouping, such as adding alerts, then puts unnecessary overhead on the Enterprise Manager.

SmartStor metrics limits


Starting with Introscope 7.1, the metric capacity for a SmartStor increased by 25% over that provided in Introscope 7.0. The new limits are imposed by I/O throughput rather than CPU (as was the case in Introscope 7.0). The recommended SmartStor metrics limit is the same for both the MOM and Collector as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119.

Virtual agent metrics match limits


See Virtual agent metrics match limits on page 109 and examples in the Sample Introscope 8.0 Collector sizing limits table on page 119.

80

Metrics Requirements and Recommendations

Sizing and Performance Guide

About alerted metrics and slow Workstation startup


If you launch your MOM and are logged in, but don't see any metrics in the Workstation Investigator tree for a long time, it's because the MOM is taking a long time to begin sending data to the Workstation. Large numbers (100,000 or more) of metrics alerts (a metric group to which you've attached an alert) in individual Collectors cause a great deal of network and CPU overhead in the MOM as the Collectors register these alerts in the MOM. During startup, the MOM takes the CPU for 15 to 20 minutes during which you can see the Investigator tree, but none of the metrics are being reported back to you. Since the slow processing is due to the MOM handling query exchange, this situation has very little impact on the Collectors and data collection progresses. Limiting the number of Collectors and alerts should make the startup process less time consuming. If the startup time is unacceptable, reduce the number of alerted metrics, or consider a machine with faster individual CPUs.

About aggregated metrics and Management Module hot deployments


WARNING Do not perform Management Module hot deployments on production Collectors or MOMs. The cost of Management Module hot deployments can be 48 seconds per 1500 aggregated metrics (on virtual agents), and during the Management Module hot deployment, the Collector CPU utilization is almost 100% for the entire duration. The effect on a MOM is even more pronounced. For more information about Management Module hot deployments, see Avoid Management Module hot deployments on page 68.

Detecting metrics leaks


If the number of agents connected to your Enterprise Manager is within the guidelines recommended in the Sample Introscope 8.0 Collector sizing limits table on page 119, yet the Enterprise Manager behaves like it is handling a much larger load than it is actually managing, the cause may be a metrics leak. Symptoms may be, for example, that historical queries for even relatively small numbers of metrics take far too long to return, causing dashboards to render in slow motion and Investigator browsing to be extremely slow or impossible.

Detecting metrics leaks

81

CA Wily Introscope

A metrics leak happens when a metric produces data for a very short period of time, and then never produces data again. This happens when part of the metric name includes something transient, like a session key or a SQL parameter. Note A metric explosion happens when an agent is inadvertently set up to report more metrics than the system can handle. In this case, Introscope is bombarded with such a large number of metrics that performance gets very slow or the system cannot function at all. For more information, see Detecting metric explosions on page 84.

Metrics leak causes


The Enterprise Managers SmartStor keeps both a skeleton of metric names (metrics metadata) and the actual meat or content of the historical data (which is recorded every 15 seconds) behind the metric names. Each time a new metric is introduced to Introscope, the metadata is updated in SmartStor so that subsequent historical queries can get to the data about that metric later. If an agent is misconfigured such that it builds a new metric for each transaction against the application (for example, a broken SQL statement normalization), or if new metrics are created for a large body of metrics that are supposed to be repeated with each restart (for example a unique ID in the JMX or WebSphere PMI metrics collected by the agent), then the metric metadata continues to grow continuously. If the growth is fast enough, the Enterprise Manager might shut down the agent at its metric limit (the default limit is 50,000 metrics). If the metrics leak growth is slow, however, the problem may be invisible initially. When the skeleton grows large enough, however, routine operations with the SmartStor such as querying historical data for a metric become extremely slow. Counting metrics in the agent will likely highlight the problem, but if the metric growth occurs slowly enough between agent restarts, then the problem may not be visible except from the count of metrics in the metadata skeleton.

Finding a metrics leak


The most obvious symptom of a metrics leak is when messages about saving metadata appearing continuously and with extremely high latencies in the IntroscopeEnterpriseManager.log. The typical amount of time SmartStor needs to save metadata ranges from about one tenth of a second to one half second (100 to 500 ms). The saving of metadata should generally only happen when new metrics are being added to an agent (for example during startup or when a new portion of the application has been exercised for the first time). If an agent has been in production at length under load, SmartStor is not expected to continue to carry out the metadata save operation.

82

Metrics Requirements and Recommendations

Sizing and Performance Guide

The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 86209 ms (86 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metrics leak problem.

[INFO] [Manager.SmartStor] Saved metadata - took 86209

Metrics for diagnosing a metrics leak


Introscope provides a number of metrics to help you perform a high level diagnosis of a metrics leak problem. These metrics are described in the table below. Metric
Enterprise Manager | Data Store | SmartStor | MetaData:Metrics with Data Enterprise Manager | Data Store | SmartStor | MetaData:Agents with Data Enterprise Manager | Data Store | SmartStor | MetaData:Agents without Data Enterprise Manager | Data Store | SmartStor | MetaData:Partial Metrics with Data Enterprise Manager | Data Store | SmartStor | MetaData: Partial Metrics without Data

Description
Replaces the previous metadata metric and renames it to better convey the notion that this is the number of metrics known to SmartStor. The number of agents that the metadata knows about that have data in SmartStor. The number of agents that the metadata knows about that have no data in SmartStor. The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have data in SmartStor. The number of partial metrics (metrics under an agent node in the Investigator) that the metadata knows about that have no data in SmartStor.

You will not solve your metrics leak until you identify the cause of the leaking metrics and plug it. Contact CA Wily Support if you are unsure about how proceed with fixing your metrics leak.

Detecting metrics leaks

83

CA Wily Introscope

Detecting metric explosions


A metric explosion happens when an agent is inadvertently set up to report more metrics than the system can handle. In this case, Introscope is bombarded with such a large number of metrics that performance gets very slow or the system cannot function at all. Contact CA Wily Technical Support if you think your Introscope system performance issues may be due to a metric explosion.

Metric explosion causes


Metric explosions can be caused by a number of factors including: broken SQL Agent normalization. See SQL statements and metric explosions on page 92. a large number of unique SQL statements. See How poorly written SQL statements create metric explosions on page 92. sockets being opened on random ports. See Finding a metric explosion on page 85. JMX serverid Metric explosion due to JMX serverid occurs when there are JMX filter strings given to WebLogic that produce metric names that include a serverid= <int>, where the integer is a unique number for each WebLogic run. This can result in thousands of new metrics with each server restart. In this situation, for example, after several weeks the SmartStor metadata can be in excess of 500 K dead metrics, although the actual metric count should have been no more than 25K. See Metrics leak causes on page 82 and Finding a metric explosion on page 85. The JDBC URL is formatted into the SQL metric names database name formatting. For more information, see Knowledgebase Article 1240. If a URL grouping that uses the introscope.agent.urlgroup properties is not used, then every unique URL generates a different node of metrics. See Knowledgebase Article 1112.

84

Metrics Requirements and Recommendations

Sizing and Performance Guide

Finding a metric explosion


The most obvious indication of a metric explosion is poor Enterprise Manager performance with these conditions and symptoms: Low number of agents and a reasonable metric count, but the Enterprise Manager performance is sluggish and similar to an overloaded application Small historical queries are extremely slow, taking many seconds or even minutes High CPU utilization (often above 50%) Disk usage is not necessarily higher than usual a very large number of agent metrics being generated, for example more than 7,000 metrics per JVM extremely long (longer than 30 seconds) SmartStor metadata save times A warning in the Enterprise Manager log that the agent metrics limit has been reached, and that no more metrics will be accepted. If you have these symptoms, chances are that you have a metric explosion situation. The SmartStor metadata save time is recorded and stored in the Enterprise Manager log, as shown in the log snippet below. In this case, it took 31701 ms (31 seconds) to save this piece of metadata. This long saved metadata time is a strong indication of a metric explosion problem.

7/13/06 09:31:08 AM PDT [INFO] [Manager.SmartStor] Saved metadata - took 31701


When it takes 30 seconds or longer to save metadata, you are probably storing a massive amount of metric names (250K or more). If you also see that you are saving the metadata often (every few minutes or less), you are leaking metrics. When too many metrics are leaking very quickly, it causes a metric explosion.

Investigator metrics and tab for diagnosing metric explosions


You can view several metrics and use the Enterprise Manager Overview tab to determine current and historical metrics counts. When you see excessively high metric count numbers, this indicates you have a metric explosion situation.

Metric Count metric


The Metric Count metric tracks the number of metrics that the Enterprise Manager currently thinks are live, meaning actively reporting data from a specific agent. If this value is exceptionally high, it means an agent is reporting too many metrics.

Detecting metric explosions

85

CA Wily Introscope

You can find this metric under the Custom Metric Agent (Virtual) node in the Investigator tree; it will look similar to this:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)| Agent | Metric Count

Note Before Introscope 8.0, the Metric Count metric node was located under the Agent Stats node.

86

Metrics Requirements and Recommendations

Sizing and Performance Guide

You can also configure the EM Capacity Dashboard to access the current metric count and the metric count from the top five agents.

Note You must configure the EM Capacity dashboard before use, as it does not automatically contain links to underlying data. For information about creating and editing custom links, see the Introscope Configuration and Administration Guide.

Example EM Capacity dashboard configuration


To view the current metric count and Top 5 agent metric counts from the EM Capacity dashboard: 1 Choose Workstation > New Console. 2 When the Console window opens, select EM Capacity from the Dashboard: list. 3 In the Metrics graph, click on a metric bar. An Investigator window opens displaying the Number of Metrics metric. 4 Return to the EM Capacity dashboard. 5 In the Agents graph, click on a metric bar. An Investigator window opens displaying the metric count for one of the top 10 agents.

Detecting metric explosions

87

CA Wily Introscope

Historical Metric Count metric


The Historical Metric Count metric shows the total number of metrics from an agent that are either live or recently active. The Enterprise Manager uses this metric to decide whether to start clamping more metrics from the agent. If the Historical Metric Count is high while the Metric Count metric is in range, it means that the agent has too many metrics that it is intermittently reporting data on or is constantly renaming its metrics. You can find this metric under the Custom Metric Agent (Virtual) node in the Investigator tree; it will look similar to this:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)| Agent | Historical Metric Count.

88

Metrics Requirements and Recommendations

Sizing and Performance Guide

Number of Historical Metrics metric


The Number of Historical Metrics metric returns the total number of metrics the Enterprise Manager is tracking across all agents. The true limit of the Enterprise Managers performance is defined by this number. While there is no specific limit to the number of agents, or specific number of metrics per agent, if the combination becomes too high in total, the Enterprise Manager starts to performing poorly. You can find this metric at the following location in the Investigator tree:
Custom Metric Host (Virtual)| Custom Metric Process (Virtual)| Custom Metric Agent (Virtual)(*SuperDomain*)| Enterprise Manager | Connections | Number of Historical Metrics.

Detecting metric explosions

89

CA Wily Introscope

Enterprise Manager Overview tab


In the Enterprise Manager Overview tab Number of Metrics graph, shown in the figure below, you can view the number of live metrics as well as historical metrics reporting to the Enterprise Manager. This can help you determine if an Enterprise Manager is experiencing metric overload. For more information, see About the Enterprise Manager Overview tab on page 27.
Here are the Live metrics. Here are the Historical metrics.

When you hover the cursor over the data points, you can get more metric information, as shown in this Historical metrics example.

90

Metrics Requirements and Recommendations

Sizing and Performance Guide

How Introscope prevents metric explosions


Introscope includes several capabilities to prevent metric explosions: Agent metric aging, in which unused metrics are regularly removed. This results in little or no build up of unused metrics on the agent and Enterprise Manager. See About agent metric aging on page 91. SQL statement normalizers. See SQL statements and metric explosions on page 92. Unused metrics are regularly removed from the Enterprise Manager. See Enterprise Manager dead metric removal on page 96. SmartStor metadata file is uncompressed. See Metric clamping on page 96. Metric clamping. See Metric clamping on page 96.

About agent metric aging


Starting in Introscope 8.0, by default agent metric aging periodically removes dead metrics from the agent memory cache. A dead metric is a metric that has no new data reported in a given amount of time. Agent metric aging runs on a heartbeat in the agent. During each heartbeat, a certain set of metrics are checked. The heartbeat is the time interval when metrics are checked for removal, in seconds. A metric is a candidate for removal if the metric has not received new data after certain period of time. For more information and instructions on configuring agent metric aging properties including the number of metrics checked each heartbeat and the metric removal time period, see the Java Agent Guide or the .NET Agent Guide. When dead metrics are not removed, performance issues can occur due to an excessive number of metrics being sent to the Enterprise Manager. This can result in both greater CPU utilization and a slower response time as the Enterprise Manager works harder to perform its tasks. Agent metric aging can increase Workstation and Enterprise Manager response time if many dead metrics are added to the agent memory cache. To see the current metric count for an agent, see the Metric Count node under the Custom Metric Agent node. Agent metric aging can also reduce performance in two ways. First, if agent metric aging happens too frequently, that is metrics are removed and then are turned on again, you may see an increase in CPU utilization and increased response times. For example, if every hour the same metrics are removed from the cache and then added back again, there is increased performance overhead due to the metrics and accumulators being cached repeatedly.

Detecting metric explosions

91

CA Wily Introscope

In this case, you can update your agent metric aging properties so that they use less system overhead. Update the introscope.agent.metricAging.numberTimeslices property and increase its value. In addition, avoid reporting metrics that need to be removed and then turned on again. For example, you could stop reporting a SQL statement metric that gets invoked every two hours when the associated dead metric ages out every hour. Second, if Introscope checks too many metrics during each heartbeat, this can reduce performance. In this case, you might not see agent metrics being aged and removed, however, during each heartbeat metric review, Introscope checks metrics for possible metric removal. This adds to performance overhead. In this case, update the introscope.agent.metricAging.dataChunk property to a lower number so that Introscope checks fewer metrics for metric removal during each heartbeat metric review. You can also decrease the heartbeat frequency by reducing the value of the introscope.agent.metricAging.heartbeatInterval property, so that Introscope checks for metric removal less often. For information about configuring agent metric aging properties, see the Java Agent Guide or the .NET Agent Guide.

SQL statements and metric explosions


Metric explosions can be caused by a number of factors, including a large number of unique SQL statements.

How poorly written SQL statements create metric explosions


If your SQL Agent is showing a large and increasing number of unique SQL metrics even though your application uses a small set of SQL statements, the problem could be in how the SQL statement was written. In general, the number of SQL Agent metrics should approximate the number of unique SQL statements. An common reason this becomes a problem is because of how comments are used in SQL statements. For example, in this statement,

"/* John Doe, user ID=?, txn=? */ select * from table..."


the SQL Agent creates the following metric:

"/* John Doe, user ID=?, txn=? */ select * from table..."

92

Metrics Requirements and Recommendations

Sizing and Performance Guide

Note that the comment is part of the metric name. While the comment is useful for the database administrator to see who is executing what query, the SQL Agent does not parse the comment in the SQL statement. Therefore, for each unique user ID, SQL Agent creates a unique metric, potentially causing a metric explosion. The database that executes the SQL statements does not see these metrics as unique because it ignores the comments. This problem can be avoided is by putting the SQL comment in single quotes, as shown:

"/*' John Doe, user ID=?, txn=? '*/ select * from table..."
The SQL Agent then creates the following metric where the comment no longer causes a unique metric name:

"/* ? */ select * from table..."


Some applications may generate an extremely large number of unique SQL statements. If technologies like EJB 3.0 or Hibernate are in use, the likelihood of long unique SQL statements increases. For more information about Hibernate, see http://www.hibernate.org/.

Example 1
In looking in Investigator at this path under an agent node

Backends|{backendName}|SQL|{sqlType}|sql
you notice that temporary tables are being accessed like this:

SELECT * FROM TMP_123981398210381920912 WHERE ROW_ID = ?


All the additional digits on the TMP_ table name are unique and steadily growing causing a metric explosion.

Example 2
You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:

#1 INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES (?, ?, ?, ?, ?, ?, ?, "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)
In studying the code, you notice that "CHANGE CITY FROM CARROLTON, TO CAROLTON, _ " recurs as a dizzying array of cities.

Detecting metric explosions

93

CA Wily Introscope

Example 3
You have been alerted to a potential metric explosion and your investigation brings you to a review of this SQL statement:

CHANGE COUNTRY FROM US TO CA _ CHANGE EMAIL ADDRESS FROM TO BRIGGIN @ COM _ "
In studying the code, you notice CHANGE COUNTRY results in an endless list of countries. In addition, the placement of the quotes for countries results in people's e-mail addresses getting inserted into SQL statements. Heres the source of metric explosion as well as other negative consequences.

SQL statement normalizers


To address many unique SQL statements, the SQL Agent includes these statement normalizers: Normalizer Type
Default SQL statement Custom SQL statement Regular expression SQL statement normalizer

Description
Normalizes text within single quotation marks ('xyz') The SQL Agent allows users to add extensions for performing custom normalization A SQL Agent extension that normalizes SQL statements based on configurable regular expressions (regex).

Note: CA Wily recommends that you use this


normalizer first, as it allows you to configure regular expressions and normalize any characters or sequence of characters in the SQL statement. Command-line SQL statement normalizer If the regular expression SQL normalizer is not in use, and code includes SQL statements that enclose values in the where clause with double quotes (" "), allows a command-line command to normalize SQL statements

For more information about working with Introscope SQL statement normalization capabilities, see the Java Agent Guide or the .NET Agent Guide. The two examples below can help you understand how to implement the regular expression SQL statement normalizer.

94

Metrics Requirements and Recommendations

Sizing and Performance Guide

Example 1
Heres a SQL query before regular expression SQL statement normalization:

INSERT INTO COMMENTS (COMMENT_ID, CARD_ID, CMMT_TYPE_ID, CMMT_STATUS_ID, CMMT_CATEGORY_ID, LOCATION_ID, CMMT_LIST_ID, COMMENTS_DSC, USER_ID, LAST_UPDATE_TS) VALUES(?, ?, ?, ?, ?, ?, ?, CHANGE CITY FROM CARROLTON, TO CAROLTON, _ ", ?, CURRENT)
Heres the desired normalized SQL statement:

INSERT INTO COMMENTS (COMMENT_ID, ...) VALUES (?, ?, ?, ?, ?, ?, ?, CHANGE CITY FROM ( )
Heres the configuration needed to the IntroscopeAgent.profile file to result in the normalized SQL statement shown above: introscope.agent.sqlagent.normalizer.extension=RegexSqlNormalizer
introscope.agent.sqlagent.normalizer.regex.matchFallThrough=true introscope.agent.sqlagent.normalizer.regex.keys=key1,key2 introscope.agent.sqlagent.normalizer.regex.key1.pattern=(INSERT INTO COMMENTS \\(COMMENT_ID,)(.*)(VALUES.*)''(CHANGE CITY FROM \\().*(\\)) introscope.agent.sqlagent.normalizer.regex.key1.replaceAll=false introscope.agent.sqlagent.normalizer.regex.key1.replaceFormat=$1 ...) $3 $4 $5 introscope.agent.sqlagent.normalizer.regex.key1.caseSensitive=false introscope.agent.sqlagent.normalizer.regex.key2.pattern='[a-zA-Z1-9]+' introscope.agent.sqlagent.normalizer.regex.key2.replaceAll=true introscope.agent.sqlagent.normalizer.regex.key2.replaceFormat=? introscope.agent.sqlagent.normalizer.regex.key2.caseSensitive=false

Example 2
Heres a SQL query before regular expression SQL statement normalization:

SELECT * FROM TMP_123981398210381920912 WHERE ROW_ID =


Heres the desired normalized SQL statement:

SELECT * FROM TMP_ WHERE ROW_ID =


Heres the configuration needed to the IntroscopeAgent.profile file to result in the normalized SQL statement shown above:
introscope.agent.sqlagent.normalizer.extension=RegexSqlNormalizer

Detecting metric explosions

95

CA Wily Introscope

introscope.agent.sqlagent.normalizer.regex.matchFallThrough=true introscope.agent.sqlagent.normalizer.regex.keys=key1 introscope.agent.sqlagent.normalizer.regex.key1.pattern=(TMP_)[1-9]* introscope.agent.sqlagent.normalizer.regex.key1.replaceAll=false introscope.agent.sqlagent.normalizer.regex.key1.replaceFormat=$1 introscope.agent.sqlagent.normalizer.regex.key1.caseSensitive=false

Enterprise Manager dead metric removal


Starting with Introscope 8.0, when a metric has not produced data for more than eight minutes (default), it is removed from the Investigator tree. The Enterprise Manager metrics removal reduces the number of potential metric explosions, and also results in improved performance. You may notice the following performance improvements due to dead metric removal: Reduced Enterprise Manager RAM consumption Increased responsiveness of the Enterprise Manager while tracking live metrics

Metric clamping
Several properties that limit, or clamp, the number of metrics on the agent and the Enterprise Manager help to prevent spikes in the number of reported metrics (metric explosions) on the Enterprise Manager. No new metrics are displayed in the Workstation after a clamp has occurred. Metric clamping is enabled through four new properties: Note The default values for each property are used by Enterprise Manager if the line for the property is commented out in the EnterpriseManager.properties file. Property Name Description
Limits the number of live and historical metrics an agent will report. The default is 50,000. Limits the number of live metrics reporting from agents per Enterprise Manager. The default is 500,000.

introscope.enterprisemanager.agent.metrics.limit

introscope.enterprisemanager.metrics.live.limit

96

Metrics Requirements and Recommendations

Sizing and Performance Guide

Property Name

Description
per agent (both live and historical) per Enterprise Manager. The default is 1,200,000.

introscope.enterprisemanager.metrics.historical.limit Limits limits the total metrics

introscope.enterprisemanager.query.datapointlimit

Limits the maximum metric data points each Collector or standalone Enterprise Manager returns from any one query. The clamp is per query, not all concurrent queries. Queries to MOMs are only indirectly clamped by the data limit on each Collector. Default=0 (No limit).

Note: The value you choose


for this property depends on a number of different factors such as heap size, number of total metrics on the Enterprise Manager, and whether the Enterprise Manager is a MOM or a Collector. Determine the limit based on the load and hardware in your Introscope environment.

For more information about these properties, see the Introscope Configuration and Administration Guide. When the Enterprise Manager starts up, the values of these properties are logged. When an Enterprise Manager hits a clamp value based on the total number of metrics that it can process in total or when an agent hits the agent clamp, a log message appears in the Enterprise Manager log. If clamping is no longer necessary due to a change in the limits, then another log message is logged in the Enterprise Manager log. All supported agents obey these clamps, though the custom metric agent and agent clusters (virtual agents) are not subject to the clamps.

Detecting metric explosions

97

CA Wily Introscope

Metric clamp supportability metrics


The Metric Count metric, which is an Enterprise Manager supportability metric seen in the Investigator tree under the Enterprise Manager node for each agent still reports live metrics only (as opposed to both live and historical metrics). For more information, see Metric Count metric on page 85. In addition, under the Enterprise Manager node for each agent, there is a new Is Clamped supportability metric that has the value of 0 when an agent is not clamped and 1 when an agent is clamped.

Metric clamp scenario


Heres a scenario describing what would happen if you set both the Enterprise Manager and agent metric clamps. Lets say in the IntroscopeEnterpriseManager.properties file you set the following values for these properties:

introscope.enterprisemanager.agent.metrics.limit=10000 introscope.enterprisemanager.metrics.live.limit=800
Then you start the Enterprise Manager and two agents. Youd see that the Enterprise Manager gets clamped when 800 metrics have been reported, even though the agent clamp number of 10,000 metrics has not yet been reached. This means there are now no new metrics from the agent getting reported. In addition, the agent logs state that the Enterprise Manager clamp has been reached and no more metrics will be reported to the Enterprise Manager. If you increase the Enterprise Manager clamp value, youd see that new metrics from the agent start to be reported.

SmartStor metadata files are uncompressed


Before Introscope 8.0, SmartStor stored metadata using built-in Java compression. Starting in Introscope 8.0, to increase SmartStors speed in reading stored metadata files, all new metadata files are written in an uncompressed format. However, Introscope 8.0 still retains the capability of reading compressed data generated by previous versions. The uncompressed metadata files are bigger than the compressed files, however the uncompressed files are still extremely small when compared to metric data .data files. The amount of disk space used by the metadata files is about eight to ten times greater than the compressed format. However, not using compression speeds up SmartStor access time by five times.

98

Metrics Requirements and Recommendations

CHAPTER

Workstation and WebView Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related Workstation and WebView requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Workstation and WebView background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 100 100 100 101 101 102 103 103 103

8.0 Workstation and WebView requirements .

OS RAM requirements for Workstations running in parallel . WebView and Enterprise Manager hosting requirement . Workstation to standalone EM connection capacity Workstation to MOM connection capacity WebView server capacity . . . . . . . . . WebView server guidelines . . . . . . . . . . . . . . . . . . . . . . 8.0 Workstation and WebView setup, settings, and capacity

Top N graph metrics limit per Workstation .

Workstation and WebView Requirements and Recommendations

99

CA Wily Introscope

Workstation and WebView background


You control Introscope and access performance metrics through the Introscope Workstation. You can set alerts for individual metrics or logical metric groups, view performance metrics, and customize views to represent your unique environment. Introscope WebView presents Introscope's customizable dashboards and Investigator tree views to authorized users in a browser interface so that critical information can be viewed without the aid of the Workstation. To a MOM, WebView just looks like another Workstation client; from the MOM's point of view it's just another busy Workstation. The Workstation in Introscope 8.0 uses less memory on average than in 7.x releases. It operates within the heap footprint specified in the Workstation.lax file. For information about the Workstation.lax file, see the Introscope Configuration and Administration Guide. This applies to the Command Line Workstation (CLW), as well. For CLW queries that do not return large amounts of data, the 128 MB heap size specified in the Introscope Configuration and Administration Guide is adequate. However, larger queries require setting the heap size to 256 MB or greater. Workstation sluggishness or unresponsiveness is rarely caused by a problem in the Workstation or MOM. It is usually caused by a single unresponsive Collector, which propagates to the MOM and then the Workstation. For more information, see MOM to Collectors connection limits on page 59.

8.0 Workstation and WebView requirements


The topics below describe Workstation and WebView-related basic requirements.

OS RAM requirements for Workstations running in parallel


To run multiple Workstations on the same machine, the OS must have physical RAM for each Workstation running in parallel, above the memory required for the OS itself. For example, in order to run three Workstations on a single Windows machine, the machine must have 512 MB + (3 x 256 MB) = 1.2 GB of physical memory (minimum).

WebView and Enterprise Manager hosting requirement


WebView should not be running on the same host as the Enterprise Manager to avoid contention for CPU resources.

100

Workstation and WebView Requirements and Recommendations

Sizing and Performance Guide

8.0 Workstation and WebView setup, settings, and capacity


The topics below describe Workstation and WebView-related settings and capacity limits required to set up, maintain, and configure your Introscope 8.0 environment. WebView generates a performance load equal to about that of three Workstations. You will need to take this into account when calculating the total number of Workstations and WebView instances you deploy in your Introscope environment.

Workstation to standalone EM connection capacity


A standalone Enterprise Manager should not have more than number of connected Workstations shown in the Sample Introscope 8.0 Collector sizing limits table on page 119. Each standalone EM can have multiple channels to provide data to and from Workstations and WebView instances, as shown in the figure below.

8.0 Workstation and WebView setup, settings, and capacity

101

CA Wily Introscope

Workstation to MOM connection capacity


A MOM should not have more than the number of Workstation connections shown in Sample Introscope 8.0 MOM sizing limits table on page 122. In a clustered situation, each Collector has one channel of data flow to the MOM. Collectors, when connected to a MOM, should not have any direct Workstation or WebView connections. Instead, all Workstation and WebView connections should be made to the MOM, as shown in the figure below.

Important Although in a MOM environment, data collection is spread across a number of Collectors, there is a case where Workstation performance problems can occur in a clustered environment. This happens if all the Workstation connections involve active users, and all their queries are based on data coming from a single Collector. In that case, the users may experience sluggish performance due to the Collectors own internal limitations on simultaneous historical queries.

102

Workstation and WebView Requirements and Recommendations

Sizing and Performance Guide

WebView server capacity


A single WebView server instance can not serve more than 10 to 15 concurrent users or 25 passive users. A passive user is someone who issues a query, then walks away from the browser window or doesnt close the window when finished. In this case, the Enterprise Manager must keep sending data to the browser, which refreshes the web page every 15 seconds whether or not someone is actually needing or using the data. Exceeding the number stated above generally results in slower response times for all browser clients. Use additional WebView instances that are co-located if your user requirement is larger.

WebView server guidelines


WebView servers do not require a dedicated I/O subsystem. The CPU resource load is primarily dependent on the activity of each WebView server instance. The activity includes the concurrent user count and the number of dashboards that can be viewed through the WebView server instance. The more dashboards that there are, the more potential metrics that have to be requested from Enterprise Manager and processed to create graphs, and so on. Dashboards with large metric counts (over 1000) slow down WebView processing.

Example WebView server configuration


A 4-way 3 GHz AMD Opteron or Intel Xeon with 8 GB RAM server running Linux can be used to host multiple WebView server instances on the same machine. This hardware should be able to house about three WebView servers.

Top N graph metrics limit per Workstation


Top N is a way to qualify a graph on an Introscope dashboard so that only the Top N (where you pick the N) metrics show up. It's a way to further filter data based on the actual content of the data. For example, you can set up a metric group that matches all servlets. Say there are 100,000 servlets in your system. On a dashboard, you have a graph display to show the top five slowest servlets. The Enterprise Manager has to subscribe to and process the data for all 100,000 servlets in order to determine the top five slowest. That's why processing Top N graphs is resource expensive for the Enterprise Manager.

8.0 Workstation and WebView setup, settings, and capacity

103

CA Wily Introscope

At all times, the sum of all metrics (metrics and metrics groupings) for every TOP N graph viewed by every Workstation instance (all Workstations total) should not exceed 100,000 metrics. Try to use Top N sparingly because whenever a Top N request is made, all the data is provided in real time, which puts a large resource demand on your Introscope system. And when used, have as few viewers as possible actively view Top N graphs. If in a single moment in time Introscope system users are actively viewing dashboards and graphs representing more than 100,000 metrics, performance problems can occur. For example, dashboards can have very slow refresh times.This can occur when a number of users log in at the same time to view a dashboard containing a Top N graph. For example, imagine that there are ten dashboards defined in a system, and two of the ten of dashboards include 10 graphs on them that are Top N graphs. The other eight dashboard have 10 standard (not Top N) graphs. And lets say that each of the ten Top N graphs has a metric grouping that matches 1,000 metrics. This means a total of 10,000 metrics is requested when the dashboard containing the Top N graphs is requested to be displayed. Now imagine that 10 Introscope users at different machines decide to log in and all at the same time look at one of the dashboards containing the Top N graphs. This requires the system to request and handle 10,000 metrics x 10 user instances as output to Workstations = 100,000 metrics requested at once. In this situation, its highly likely the users would experience slow Workstation performance as they click on the dashboard elements.

104

Workstation and WebView Requirements and Recommendations

CHAPTER

Agent Requirements and Recommendations

This chapter provides background and specifics to help you understand sizing and performance-related agent requirements, settings, and limits for your Introscope system. In this chapter youll find the following topics: Agent background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 107 107 108 109 109 109 110 110 110 111 112

Agent sizing setup, settings, and capacity . Agent metrics reporting limit Transaction Trace component clamp . Configuring agent heuristics subsets. Virtual agent metrics match limits Agents limits per Collector . Agent heap sizing. . . . . . . . . . . . . . .

Agent maximum load when disabling Boundary Blame .

Virtual agent reported applications capacity

High agent CPU overhead from deep nested front-end transactions . Dynamic instrumentation

Agent Requirements and Recommendations

105

CA Wily Introscope

Agent background
In an Introscope deployment, the agent collects application and environmental metrics and relays them to the Enterprise Manager. Agent features that affect overhead are Boundary Blame, Transaction Trace sampling, and URL normalization. The agent allows Introscope to collect minute details about how your applications are performing. What types of data the agent collects depends on which ProbeBuilder Directives (PBDs) files you choose to implement. Several standard PBDs are included when you install the Java or .NET agent, as well as specific PBDs for your application server. The instrumenting process is performed using CA Wilys ProbeBuilding technology, in which tracers, defined in ProbeBuilder Directives (.pbd) files, identify the metrics an agent will gather from applications and the JMS virtual machines at run-time. ProbeBuilder Directive (.pbd) files tell the ProbeBuilder how to add Probes, such as timers and counters, to .NET or Java components that Introscope-enable the application. ProbeBuilder Directive files govern what metrics agents report to the Introscope Enterprise Manager. Custom directives can also be created to track classes and methods unique to specific applications.

About virtual agents


You can configure multiple physical agents into a single virtual agent, which enables an aggregated, logical view of the metrics reported by multiple agents. A virtual agent is useful if you manage clustered applications with Introscope. A virtual agent comprised of the agents that monitor different instances of the same clustered application appears in Introscope as a single agent. This allows metrics from multiple instances of a clustered application to be presented at a logical, application level, as opposed to separately for each application instance. For more information about virtual agents, see the Introscope Java Agent Guide. You can check the total number of metrics matched by virtual agents by navigating to the following point in the Investigator tree:
SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Agents|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)

Each of the virtual agents has a metric count. Sum all of these counts to determine the total number of metrics matched.

106

Agent Requirements and Recommendations

Sizing and Performance Guide

Virtual agents are a significant drain on the CPU. For example, a 1500-metric virtual agent can result in a 10% increase in CPU usage. If the recommended number of metrics matched by the virtual agents is exceeded, there is significant impact on the CPU. There is some trade-off between the total number of applications (baselined heuristics) and virtual agents, since they are both dependent exclusively on CPU resources. In general, if the total number of monitored applications is significantly less than the limit, the metric match limit for virtual agents can be increased. However, metric match limit for virtual agents should never exceed 150% of the limit set in the guidelines. A virtual agent deployed on a MOM only creates load on the Collectors, which do the aggregation and pass the result back to the MOM. Note Be aware that the Collector does most of the work in performing the calculations needed for virtual agents; the MOM is not performing the calculations.

Agent sizing setup, settings, and capacity


The topics below describe agent-related settings and capacity limits required to set up, maintain, or configure your Introscope 8.0 environment.

Agent metrics reporting limit


An agent should not report more than 15,000 metrics. To view a pie chart and table showing the metric counts for an agent, see About the Metric Count tab, below. An inappropriately configured agent can create thousands of metrics in quick succession and overload the Enterprise Manager. To prevent this, the Enterprise Manager uses a metric clamp. For information about metric clamping, see Metric clamping on page 96.

About the Metric Count tab


By viewing the Metric Count tab you can assess the number and distribution of agent metrics in one centralized location. To view the Metric Count tab 1 Select any node under the agent node. 2 Click the Metric Count tab in the right pane. Study the pie chart and table of Resources metric count data as shown in the figure below. Mouse over an area of the pie chart to display a tool tip with metric count and percentage.

Agent sizing setup, settings, and capacity

107

CA Wily Introscope

Transaction Trace component clamp


In the case of an infinitely expanding transactionfor example when a servlet executes hundreds of object interactions and backend SQL callsIntroscope clamps the Transaction Trace, resulting in a truncated trace. This helps prevent the JVM from running out of memory. By default, the Transaction Trace component clamp is set to limit the Transaction Trace at 5,000 components. When this limit is reached, warnings appear in the log, and the trace stops. In addition, clamped Transaction Traces are marked as truncated in the Workstation Transaction Trace Viewer. See the Introscope Workstation User Guide. You can change the Transaction Trace component clamp value in the introscope.agent.transactiontrace.componentCountClamp property, which is found in the IntroscopeAgent.profile file. See the Introscope Configuration and Administration Guide or the Java Agent Guide. WARNING If the Transaction Trace component clamp size is increased, the memory required for Transaction Traces may increase. Therefore, the maximum heap size for the JVM may need to be adjusted accordingly, or else the managed application may run out of memory. See Agent heap sizing on page 110 and the Introscope Configuration and Administration Guide.

108

Agent Requirements and Recommendations

Sizing and Performance Guide

Agent maximum load when disabling Boundary Blame


If you disable Boundary Blame on your 7.x or 8.0 agents, agents will generate more metrics than before Boundary Blame was disabled. For example, a system that generates 200,000 total metrics from Boundary Blame-enabled agents may generate 300,000 metrics after disabling Boundary Blame. While the resulting metrics may not incur the same processing cost per metric that Boundary Blame metrics do (Boundary Blame metrics incur additional baseline calculations overhead in the Enterprise Manager, while non-Boundary Blame-enabled metrics do not), be careful to ensure that the increased metric load is redistributed so that the Enterprise Manager maximum capacity doesnt exceed the metrics limit. Important If in your Introscope system a 6.x agent is connected to a 7.x or 8.0 Enterprise Manager, it means that agent Boundary Blame is not enabled because 6.x agents werent capable of that feature. In this case, the maximum number of agents connected to the 7.x/8.0 Enterprise Manager, regardless of agent version, must adhere to the 6.x Enterprise Manager maximum number of agent limits. For more information about Boundary Blame, see the Introscope Workstation User Guide.

Configuring agent heuristics subsets


You can alter the property

introscope.enterprisemanager.heuristics.agentspecifier=.*
which is a regular expression that matches the agents for which heuristics are enabled. The default .* matches all agent names. Limiting this property to a subset of agents you are interested in can improve performance, largely without limiting the ability to analyze the Enterprise Manager. For more information, see the Introscope Configuration and Administration Guide.

Virtual agent metrics match limits


Given the very high impact of even a small virtual agent cluster, CA Wily recommends that for high-load Collectors, the total number of matched metrics in virtual agents be less than whats shown in Sample Introscope 8.0 Collector sizing limits table on page 119 for a fully-loaded system. If your system isn't fully loaded (metrics or applications), you can have a higher number of matched metrics.

Agent sizing setup, settings, and capacity

109

CA Wily Introscope

Virtual agent reported applications capacity


If the number of reported applications is significantly lower than the capacity limit for your platform, there should be enough CPU resources to increase this number. However, CA Wily does not recommend increasing this number by more than 50%.

Agents limits per Collector


The recommended number of agents per Collector is hardware dependent, as shown in the Sample Introscope 8.0 Collector sizing limits table on page 119. If Introscope 6.0 agents are connected to an 8.0 Collector, the maximum number of agents should be kept at the Introscope 6.0 Enterprise Manager limits. For more information, see Agent maximum load when disabling Boundary Blame on page 109. In Introscope 8.0, the MOM more effectively balances the metric load between Collectors in a clustered environment. To configure agent load balancing, see the Introscope Configuration and Administration Guide. To understand how agent load balancing affects Introscope performance, see Agent load balancing on MOMCollector systems on page 63.

Agent heap sizing


You can view the agent GC heap usage in the CG Heap overview. See the Introscope Workstation User Guide. The agent uses Java heap memory to store collected data. If your applications heap is highly utilized, you may need to increase the heap allocation when you install the agent. See the Introscope Configuration and Administration Guide. The 8.0 agent, on average, uses slightly more memory than the 7.x agent because of performance improvements to CPU and response times overhead. You may see an increase of up to 100 MB over your 7.x average runtime heap usage. If monitored applications are characterized by very deep or long lasting transactions, the agents Transaction Trace sampling may require more heap memory than previous Introscope versions. See Transaction Trace component clamp on page 108. If you are operating a high-performance Introscope environment, contact CA Wily Professional services for the appropriate agent JVM heap settings.

110

Agent Requirements and Recommendations

Sizing and Performance Guide

High agent CPU overhead from deep nested front-end transactions


Servlets are configured by Introscope to be seen as front-ends. A typical transaction starts with a servlet, which may call an EJB, which calls a back-end. Its possible for servlets to call other servlets in a nested way, which Introscope sees as nested front-ends. In most cases, this does not add to agent CPU overhead. However, deep transactions having nested front-end levels (for example 40 levels deep) may result in high CPU overhead. If this occurs, your Transaction Trace Tree View may look like this:

Notice the recurring servlet calls. In this case, a servlet keeps calling itself, resulting in a 2125 ms transaction time for this deep nested transaction

Notice in this case a servlet continuously calling itself (recurring call). This is just one example. This can also happen when a servlet continuously calls other servlets. In either case, you may see an increase in agent CPU overhead. If the overhead is unacceptable, contact CA Wily Technical Support.

Agent sizing setup, settings, and capacity

111

CA Wily Introscope

Dynamic instrumentation
Introscope uses dynamic instrumentation (also called dynamic ProbeBuilding) to implement new and changed PBDs without restarting managed applications or the Introscope agent. This is useful for making corrections to PBDs, or to temporarily change data collection levels during triage or diagnosis without interrupting application service. For more information about dynamic instrumentation, see the Java Agent Guide or the .NET Agent Guide. Dynamic instrumentation affects CPU utilization, memory, and disk utilization. This is because dynamic instrumentation includes redefining the monitored classes, which is a resource intensive process. To avoid performance problems after you enable dynamic instrumentation, CA Wily highly recommends that you: use configuration to minimize the classes that are being redefined (see the Java Agent Guide or the .NET Agent Guide) incrementally change PBDs (dont change a large number at one time) do not change a large number of PBDs that affect many classes.

112

Agent Requirements and Recommendations

APPENDIX

Introscope 8.0 Sizing and Performance FAQs

Frequently asked questions about Introscope sizing and performance are listed in the table below. Typical answers or solutions to each question are provided, with the most common being provided first, the second most common listed second, and so on. Question Most Common Answers/Solutions
General Performance Questions Can I handle the same number of metrics that I used to in 7.x versions of Introscope? What about 6.x versions? If you are upgrading from 7.x to 8.0, then number of metrics that Introscope 8.0 can handle is double then 7.2 limits. So, for example, if a given 7.x system used to handle 250 K metrics, that limit is now 500 K without requiring any changes to the hardware. For more information, see 8.0 metrics setup, settings, and capacity on page 79 and Virtual agent metrics match limits on page 80. My Collector is at maximum recommended capacity. I'm looking at the CPU, and the system doesn't appear busy. Why can't I add more metrics or agents to this Collector? CPU monitoring tools show a snapshot. The behavior of the Collector is 100% CPU usage for 3-4 seconds (at full load), and then idle until the next metric harvest from the agents. This happens every 7.5 seconds, which is how CA Wily arrives at the 45% average CPU utilization recommendation. For more information, see Collector metric capacity and CPU usage on page 45.

Introscope 8.0 Sizing and Performance FAQs

113

CA Wily Introscope

Question
What were the Introscope 8.0 sizing and performance improvements?

Most Common Answers/Solutions


Significant scalability and performance improvements were made including the following:

1 In Introscope 8.0, SmartStor improvements


resulted in the Collector metric limits doubling from the Introscope 7.x limits (based on the same hardware). For examples, see the Sample Introscope 8.0 Collector sizing limits table on page 119.

2 Significant improvements to the MOM allow


each MOM to connect to a five million metric cluster (10 collectors, 500 K metrics per collector), which is a five-fold increase in clustered Enterprise Manager scale For examples, see Sample Introscope 8.0 MOM sizing limits table on page 122.

3 Adding an additional 2 CPUs to a Collector to


make a total of 4 CPUs helps increase these limits: * number of applications per Collector * number agents per Collector * number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). For more information, see Increasing Collector capacity with more and faster CPUs on page 73.

4 Support for 50 concurrent Workstation


connections. For examples, see Sample Introscope 8.0 Collector sizing limits table on page 119.

Note: The limits may differ substantially


depending on the specific platform and hardware used in your environment.

114

Introscope 8.0 Sizing and Performance FAQs

Sizing and Performance Guide

Question

Most Common Answers/Solutions


Component-related Questions

My Collector is combining time slices throughout the day and appears to respond slowly, but I'm at or below the maximum capacity limits. What could be wrong?

1 Other processes are running on the machine. 2 I/O contention with SmartStor and other
processes. SmartStor is not located on a separate disk or I/O subsystem.

3 Poorly configured virtual agent. 4 Large Transaction Traces are running


continuously. For more information, see Reasons Collectors combine slices on page 72. What hardware is required to run the Collector at maximum load?

1 This is primarily dependent on the CPU speed


and dedicated disk I/O subsystem. For more information, see Collector hardware requirements on page 71.

2 See examples of the appropriate hardware


platform, OS, and CPU. For more information, see Sample Introscope 8.0 Collector sizing limits table on page 119. Does the Workstation use more memory than previous releases? No, the Workstation uses less memory on average. It operates within the heap footprint specified in the .lax file. For more information, see Workstation and WebView background on page 100. Can I run multiple Workstations on the same machine? Yes. However, you must be certain that the OS has dedicated physical RAM for each Workstation running in parallel, above the memory required for the OS itself. For more information, see OS RAM requirements for Workstations running in parallel on page 100. Can I run multiple Enterprise Managers on the same machine? Yes. However, you must be certain to follow the CA Wily requirements when setting this up. For more information, see Running multiple Collectors on one machine on page 74.

115

CA Wily Introscope

Question
I launched my MOM and logged in, but I'm not seeing any metrics in the Investigator tree for a long time. Why does the MOM take a long time to begin sending data?

Most Common Answers/Solutions


Large numbers of metric alerts in individual Collectors will cause a great deal of overhead in the MOM as the Collectors register these alerts in the MOM at startup. If the startup time is unacceptable, you will have to reduce the number of alerted metrics, or get a machine with faster individual CPUs. For more information, see About alerted metrics and slow Workstation startup on page 81.

Can I connect more agents to a 8.x Collector than a 6.x or 7.x Collector?

Yes but only if there are 4 CPUs/Cores per EM available. See Sample Introscope 8.0 Collector sizing limits table on page 119 for some examples. For more information, see Agents limits per Collector on page 110.

Why can't my MOM connect to more than 10 collectors?

The more Collectors that the MOM connects to, the more complicated the system becomes and the greater likelihood for instability or failure. For example, clock sync issues may be more difficult to manage, the system can take longer to start, and there's a higher likelihood that a misbehaving Collector can affect the entire cluster. For more information, see MOM to Collectors connection limits on page 59.

Why is it so important to ensure that every Collector is running smoothly?

Any individual Collector can cause the entire system to appear slow and lock up, due to the synchronous mechanism the MOM uses to poll information from Collectors. For more information, see Collector to MOM clock drift limit on page 71.

The requirements state that I can have X metrics in the virtual agents Can I exceed that number?

Yes, however, this impacts CPU significantly (not I/O or memory), so you must decrease the Collector's capacity.

Note: The Collector does the processing work


needed for virtual agent operations, not the MOM. For more information, see Agent background on page 106.

116

Introscope 8.0 Sizing and Performance FAQs

Sizing and Performance Guide

Question
Will additional dedicated physical CPUs increase the number of metrics and agents that my Collector can handle?

Most Common Answers/Solutions


Adding an additional 2 CPUs to a Collector to make a total of 4 CPUs helps increase these limits: * number of applications per Collector * number agents per Collector * number of metrics that can be placed in metric groupings (if using a standalone Enterprise Manager). In addition, faster CPUs may help increase the Collector's maximum capacity and improve performance. For more information, see Collector hardware requirements on page 71 and the examples in Sample Introscope 8.0 Collector sizing limits table on page 119.

My system has 16 SPARC CPUs. Why can't a single Collector on this platform handle any more load than a 4 CPU Xeon machine?

Although the Collector is heavily multithreaded, there are certain operations that require synchronization and cannot effectively leverage more than 4 CPUs. The Collector, therefore, does not scale well with additional CPUs beyond 4, depending on the hardware platform. Individual processor speed is the most important success factor for a Collector. For more information, see Increasing Collector capacity with more and faster CPUs on page 73.

What are the main performance considerations for the MOM?

The MOM requires more powerful CPUs and better network connections than Collectors, but does not require fast disk access (the MOM performs little disk I/O). For more information, see MOM disk subsystem sizing requirements on page 58.

I changed the virtual agent definitions in my MOM/Collector and everything came to a halt. What happened?

Note: In a clustered
environment, deploy Management Modules and virtual agents only on the MOM, not on a Collector.

Hot deployment of virtual agents and Management Modules is very CPU intensive and can lock up the MOM for a couple of minutes during which metrics harvesting doesn't happen. CA Wily strongly recommends not performing Management Module hot deployments on production Collectors and MOMs. For more information, see Avoid Management Module hot deployments on page 68.

117

CA Wily Introscope

Question
Do Collectors and MOM have to be on the same subnet?

Most Common Answers/Solutions


For best Workstation responsiveness, when a MOM requests data from a Collector, the round-trip response must be less than 500 ms. Whenever possible, a MOM and its Collectors should be in the same data center; preferably in the same subnet. For more information, see Local network requirement for MOM and Collectors on page 51.

What is the limit for ChangeDetector events, Transaction traces, errors, stall events, and so on? How do I determine that limit? Is that for each?

The Collector effectively treats all of these as event objects. As of Introscope 7.1, the Maximum Number of Events limit represents the total number of events a Collector can receive and persist from all agents. There is one limit for steady state event persistence and another for burst capacity. Steady state means 24/7. Burst capacity means that the Collector can sustain this load for no more than a couple of hours. For more information, see Collector events limits on page 70.

118

Introscope 8.0 Sizing and Performance FAQs

APPENDIX

Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Sample Introscope 8.0 Collector sizing limits table


The following table shows example Introscope environment configurations and sizing requirements by operating system. This should help you understand the various components and requirements you'll need to consider if you are deploying a Collector, either as standalone machine or in a clustered Introscope environment. Important The machine configurations shown below are only examples; they are NOT provided as the only recommended Collector/s for any or all Introscope environment(s). Note * The Maximum Number of Metrics numbers in the table includes the metrics reported by agents as well as the metrics created by calculators and virtual agents, which should be a reasonably small number. You can find the total number reported as Enterprise Manager | Connection: Number of Metrics in the Enterprise Manager supportability metrics. See also Additional supportability metrics on page 38.

Sizing and Performance Guide


S

Operating System

Hardware

Physical RAM

JVM Heap

Max # Agents /EM

Max # Metrics*

Max # Applications /EM

Max # Events/ Max # minute Virtual Agent Matched Metrics Steady State Burst 3500 3000

Max # Workstation Connections per Standalone EM

Max # EMs/ machi ne

Max metrics in metric groupings/ Standalone EM

Solaris

2 CPU UltraSPARC III, Clock speed ~ 1.2 GHz

4 GB

1.5 GB

200

400,000

900

700

20

15% of Max # metrics (60,000)

Solaris

4 CPU UltraSPARC III, Clock speed ~ 1.2 GHz

4 GB

1.5 GB

250

400,000

1800

700

3500

3000

40

30% of Max # metrics (120,000)

Solaris

4 CPU UltraSPARC III, Clock speed ~ 1.2 GHz

8 GB

1.5 GB

200

400,000

900

700

3500

3000

20

15% of Max # metrics (60,000)

Red Hat Linux

2 CPU Xeon or Opteron, Clock Speed ~ 3 GHz

4 GB

1.5 GB

300

500,000

1500

1000

5000

5000

50

15% of Max # metrics (75,000)

Sample Introscope 8.0 Collector sizing limits table

120

CA Wily Introscope

Operating System

Hardware

Physical RAM

JVM Heap

Max # Agents /EM

Max # Metrics*

Max # Applications /EM

Max # Events/ Max # minute Virtual Agent Matched Metrics Steady State Burst 4000 3500

Max # Workstation Connections per Standalone EM

Max # EMs/ machi ne

Max metrics in metric groupings/ Standalone EM

AIX 5.3

4 CPU Power 5, Clock speed ~ 2.2 GHz

4 GB

1.5 GB

250

400,000

1500

850

50

15% of Max # metrics (60,000)

Windows 2000/2003

2 CPU Xeon or Opteron, Clock Speed ~ 3 GHz

4 GB

1.5 GB

300

500,000

1500

1000

5000

5000

50

15% of Max # metrics (75,000)

Windows 2000/2003

4 CPU Xeon or Opteron, Clock Speed ~ 3 GHz

4 GB

1.5 GB

400

500,000

3000

1000

5000

5000

50

30% of Max # metrics (150,000)

Windows 2000/2003

4 CPU Xeon or Opteron, Clock Speed ~ 3 GHz

8 GB

1.5 GB

300

500,000

1500

1000

5000

5000

50

2 15% of Max # metrics (75,000)

121

Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Sizing and Performance Guide

Sample Introscope 8.0 MOM sizing limits table


The following table shows example Introscope environment configurations and sizing requirements by operating system. This should help you understand the various components and requirements you'll need to consider if you are deploying a MOM in a clustered Introscope environment. Important The machine configurations shown below are only examples; they are NOT provided as the only recommended Collector/s for any or all Introscope environment(s).
Operating Hardware System Physical RAM JVM Heap Maximum # Metrics with Associated Calculators and Alerts per MOM-Collector Cluster Maximum # Workstation Connections per MOM-Collector Cluster

Solaris

2 CPU UltraSPARC III, Clock speed ~ 1.2 GHz

14 GB

12 GB

250,000

20

Red Hat Linux

4 CPU Xeon or Opteron, Clock Speed ~ 3 GHz 4 CPU Power 5, Clock speed ~ 2.2 GHz

14 GB

12 GB

1,000,000

50

AIX 5.3

14 GB

12 GB

500,000

50

Windows 2 CPU 2000/2003 Xeon or Opteron, Clock Speed ~ 3 GHz Windows 4 CPU 2000/2003 Xeon or Opteron, Clock Speed ~ 3 GHz

14 GB

12 GB

500,000

25

14 GB

12 GB

1,000,000

50

Sample Introscope 8.0 MOM sizing limits table

122

Sample Introscope 8.0 MOM sizing limits table

123

CA Wily Introscope

124

Sample Introscope 8.0 Collector and MOM Sizing Limits by OS

Sizing and Performance Guide

Index
Symbols
*, See wildcard symbol

business logic, Introscope handled by 78 monitors 79

C
calculators, and slow cluster start-up time 60 causes of slow start-up time, in cluster 60 clamp metrics See metrics clamp Transaction Trace component 108 clock drift, performance problems due to 71 cluster agent load balancing examples 66 applications and virtual agents 106 cause of slow start-up time 60 configuring to support 1 million metrics on MOM 61 tolerance for imbalance 65 determining when to implement if using a standalone EM 70 environment, explained 18 fault tolerance for Collectors 62 hanging prevented by MOM disconnecting under performing Collector 52 how MOM balances the metric load 63 improving performance by adjusting Collector weighting factors 64 likely cause of Workstation sluggishness in 72 location of MOM and Collector metrics 28 metric for total number of metrics currently tracked in 30 overhead 65 performance problems due to hot Management Module deployment 68 performance, based on Collectors 44 poor performance due to a single Collector 59 setting up metrics groupings in 79 slow response time due to network bandwidth problems 21 time synchronization 59, 72 when MOM drops a Collector 51 Workstation and WebView connections 102 Workstation performance problems in 102 CLW, See Command Line Workstation

A
active metrics groupings, defined 80 users and WebView servers limit 103 agents Collector connection history and future connections 64 connection architecture 70 heartbeat, defined 91 how MOM assigns to Collectors 64 increased CPU overhead from deep transactions involving multiple front-ends 111 load balancing and cluster fault tolerance for Collectors 62 configuring frequency on MOM 66 defined 63 differentiated from metric clamping 63 example scenarios 66 metric counts after weight adjusting 65 setting metric weight load 64 setting threshold for imbalance 65 memory cache, removing dead metrics 91 metric aging defined 91 performance problems related to 91 properties to configure 92 Agents with Data metric 83 Agents without Data metric 83 alerts, and slow cluster start-up time 60 applications, clustered and usefulness of virtual agents 106 asterisk, See wildcard symbol

B
baselines.db about 53 calculating disk space needed burst limit defined 70 events 70 54

Index

125

CA Wily Introscope

Collector cluster performance 44 CPU requirements 74 speed and disk I/O system 71 steady state usage 45 usage for high resource operations 45 viewing high usage, 45 CPU usage, described 45 diagnosing slow response to MOM using ping metric 72 effect of faster CPUs 71 file cache requirements 74 good performance in individual 60 hardware requirements 71 how agents are assigned to by MOM 64 increasing capacity with faster CPUs 73 JVM requirements in 500 K metric MOM cluster 61 limits, increasing with more CPUs 73 location of supportability metrics 28 migrating from 6.x to 8.0 44 persisting event objects 70 ping time from MOM 51 reperiodization 45 run only Introscope process 72 running multiple on one machine 74 sign of overloaded 51 sizing limits examples 119 SmartStor minimum requirement 47 synchronizing clocks with MOM 71 under performing and cluster performance 59 unresponsive 72 upgrading 44 using loadbalancing.xml to restrict agents to specific 64 Workstation/WebView connections to in clustered environment 102 Command Line Workstation, heap size needed 100 concurrent queries recommended number of historical 43 configuring agent failover when host defined in DNS 62 agent metric aging 92 how often MOM rebalances cluster agent load 66

MOM-Collector cluster 61 RAID setting 57 Workstation log-in when host defined in DNS 62 connection history, agent to Collector and future connections 64 type MOM uses to assign agents to Collectors 64 converting spool to data metric, defined 32 CPU Overview tab 46 CPUs Collector requirements 74 Enterprise Manager basic requirement 47 guidelines 25 faster and Collector capacity 71 using to increase Collector capacity 73 high usage Collector 45 increasing and Collector limits 73 large MOM overhead and alerted metrics 81 resource contention with WebView and EM 100 speed, Collector 71 usage Collector 45 Collector, described 45 Collector, steady state 45 during heuristics calculations 69 reports 43 scheduling of heavy processing 52 virtual agents using large resources 73 WebView server load 103 custom scripts, scheduling 52

126

Index

Sizing and Performance Guide

D
dashboard, using EM Capacity to determine metric explosions 87 dashboards, cause of slow WebView processing 103 data, about historical queries and performance problems 25 dead metrics See metrics, dead dedicated controller property for SmartStor 55 deployments, hot, Management Module cost 81 disk drive, determining number of controllers 57 file cache size, SmartStor 48 OS file cache 32 space estimating for baselines.db 54 estimating for traces.db 54 DNS agent config for MOM failure 62 Workstation log-in config for MOM failure 62 dynamic instrumentation defined 112 performance problems related to 112 ProbeBuilding See dynamic instrumentation

Overview tab 27 processing of time slice data 78 RAM minimum requirement 47 running multiple Collectors on one machine 74 standalone connections to Workstations 101 hardware requirements example 74 supportability metrics 51 symptoms of metric explosion 85 using EM Capacity dashboard 87 when to grow from standalone to cluster 70 events determined number received 37 high volume 70 limit burst 70 maximum, defined 70 steady state 70 objects, in Collector databases 70 explosion, metrics See metrics explosion

F
faillover, planning for MOM 62 failure, planning for using MOM, See failover file cache, requirements for Collector 74 system, general requirements 47 flat file archiving, using with SmartStor 44 front-ends, multiple and transaction problems 111

E
EM Capacity dashboard, using to determine metric explosions 87 em.db, See baselines.db Enterprise Manager 48 capacity and metrics limits 21 configuring for heap memory 32 CPU basic requirement 47 resources and running WebView 100 utilization guidelines 25 determining capacity 20 finding problems using specific metrics 29 heap settings 48 metrics grouping limits 79 location 28 migrating from 6.x to 8.0 44 OS disk file cache requirements 47 overloaded and combining metrics 69 time slices 69

G
GetEvent metric, See ping metric graph, Top N, defined 103 groupings, metrics, defined 78

H
hardware requirements Collector 71 MOM 58, 59, 60 harvest cycle, metrics 78 Harvest Duration metric 29, 35, 45

Index

127

CA Wily Introscope

heap capacity (%) metric 34 Command Line Workstation size needs 100 settings, Enterprise Manager 48 size Enterprise Manager 32 Workstation 100 heartbeat, agent, defined 91 heuristics CPU usage for calculations 69 database, See baselines.db Historical mode using for viewing data in Workstation 43 historical queries and EM agent data storage 25 and MOM overloading 31 poor performance caused by 40 recommended number of concurrent 43 running 43 host defined in DNS and agent failover 62 and Workstation log-in 62 hot deployments, of Management Modules, performance problems 81

J
JVM Collector requirements in cluster with 500 K metric MOM 61 heap settings, Enterprise Manager 48

L
leaks, metrics, symptoms 81 limits Collector, examples 119 MOM, examples 122 limits, metrics, definition 20 Live mode viewing Workstation data in 43 load balancing for agent, defined 63 reducing metrics 30 loadbalancing.xml, using to restrict agents to specific Collectors 64

M
Management Module cost of hot deployments 81 hot deployment and cluster problems 68 and virtual agents 68 problems with hot deployment 68 maximum events limit, defined 70 memory, Workstation requirement 100 metadata file, about uncompressed 98 SmartStor, using to find metric explosion 85 metric aging, agent defined 91 clamp, differentiated from agent load balancing 63 count metric, defined 98 Metric Count metric 85 Metric Count tab 107 metrics Agents with Data 83 Agents without Data 83 alerts, large numbers and performance problems 81 baselining database, See baselines.db checked during agent heartbeat 91

I
I/O contention, reason for SmartStor problems 73 disk system for Collector 71 throughput, SmartStor 49 inactive metrics groupings, defined 80 instrumentation, dynamic and performance problems 112 defined 112 Introscope agent connection architecture 70 business logic 78 defined 26 monitors 79 improving slow startup time 81 metric explosion prevention 91 no other processes may run on Collector 72 workload, defined 26 Is Clamped metric, about 98

128

Index

Sizing and Performance Guide

clamp about related supportability metrics 98 defined 96 properties to enable 96 scenario 98 cluster load balancing 63 combined as symptom of overloaded EM 69 converting spool to data 32 counts, weight-adjusted for agent load balancing 65 dead about 91 defined performance problems related to 91 removal 96 Enterprise Manager supportability 28 explosion and SmartStor metadata save time 85 causes 84 configuring 87 defined 82, 84 due to poorly-written SQL statements 92 how Introscope prevents 91 symptoms of 85 explosion, defined 84 groupings active, defined 80 defined 78 Enterprise Manager limits 79 inactive, defined 80 performance problems when using wildcard symbol 80 relationship to regular expression 78 groupings, setting up in a cluster 79 harvest cycle 78 Harvest Duration 29, 35, 45 Heap Capacity (%) 34 Is Clamped 98 leaks defined 82 diagnosing using SmartStor metadata save time 83, 85 symptoms 81, 82 limits and Enterprise Manager capacity 21 definition 20 MOM-Collector system 60 related to Top N graphs 104

load reducing 30 metadata about 82 problems with continuous growth 82 symptoms of metrics leaks 82 Metric Count 85, 98 Metrics with Data 83 Number of Agents 71 Number of Inserts Per Interval 70 Overall Capacity (%) 33 Partial Metrics with Data 83 Partial Metrics without Data 83 ping 51, 72 SmartStor capacity 80 management 41 SmartStor Duration 36, 50 subscribed defined 79 limits 59 MOM limits 60 supportability Is Clamped 98 Metric Count 98 using to find Enterprise Manager problems 29 weight load setting for agent load balancing 64 Metrics with Data metric 83 migrating, 6.x Enterprise Managers to 8.0 Collectors 44 MOM alerted metrics and large CPU overhead 81 configuring cluster to handle 1 million MOM metrics 61 disconnected due to ping time threshold 52 failure planning 62 hardware requirements 58, 59, 60 hardware requirements and subscribed metrics limit 59 hot failover 62 how assigns agents to Collectors 64 limits on subscribed metrics 60 location of supportability metrics 28 ping time to Collector 51 reasons for overload 31 secondary backup for hot failover 62

Index

129

CA Wily Introscope

sizing limits examples 122 SmartStor instance, about 58 synchronizing clock with Collectors 71 to Collector connection limit 59 system metrics limit 60 WebView appears as Workstation client Workstation connections allowed 102

100

N
network bandwidth problem, and slow cluster response times 21 Number of Agents metric 71 Number of Inserts Per Interval metric 70

O
OS disk file cache requirements, EM 47 memory requirements for Workstation RAM and disk file cache 32 Overall Capacity (%) metric defined 33 spiking 34 100

related to agent metric aging 91 related to large numbers of metrics alerts 81 with Management Module hot deployment 68 related to MOM-Collector connections 58 sluggish in Workstation, typical cause 100 WebView slow response times, cause of 103 Workstation problems in cluster 102 ping metric 51 about 72 diagnosing a slow-responding Collector 72 time Introscope 72 network 72 threshold for Collector overload 51 threshold that disconnects MOM 52 production Collector and MOMs, Management Module hot deployments in 68

Q
queries historical See historical queries scheduling large 52

P
Partial Metrics with Data metric 83 Partial Metrics without Data metric 83 passive users defined 103 WebView servers limit 103 performance cluster, and single under performing Collector 59 dedicated controller property 55, 56 improving cluster by adjusting Collector weighting factor 64 in cluster causing MOM to drop Collectors 51 individual Collector responsiveness 60 load, WebView 101 poor due to large historical queries 40 problems due to MOM to Collector clock drift 71 due to recurring servlet calls 111 from large continuous Transaction Traces 73 in cluster due to Management Module hot deployment 68 metrics metadata continuous growth 82

R
RAID configuration recommended 57 setting 57 RAID 0 57 RAID 5 57 RAM adding to improve spooling time 32 increase OS disk file cache 32 EM minimum requirement 47 regular expression, relationship to groupings 78 reperiodization 52 about 41 Collector 45 SmartStor, defined 40

metrics

130

Index

Sizing and Performance Guide

reports CPU usage 43 scheduling large or long when to schedule 52

43

S
SAN using for SmartStor storage 57 SAS controllers using for SmartStor storage 57 scheduling custom scripts 52 large queries 52 reports 52 secondary backup MOM for hot failover 62 servlets performance problems from recurring calls 111 recurring calls and high agent CPU overhead 111 seen as Introscope frontends 111 sizing Collector limits examples 119 MOM limits examples 122 SmartStor about 40 Collector minimum requirement 47 dedicated controller property about 55 and performance 55, 56 default installation directory 20 determining if drives are physically different 57 flat file archiving recommendations 44 I/O throughput 49 management metrics, about 41 metadata files, about uncompressed 98 save time and metric explosion 85 metrics about metadata 82 capacity 80 metadata save time and metrics leaks 85 metadata save time related to metrics leaks 83 MOM instance, about 58

problems indications of 49 with I/O contention 73 recommended RAID configuration 57 reperiodization defined 40 verifying 41 requirements 48 setting RAID configuration 57 up 49 spooling 52 about 40 disk file cache size requirements 48 verifying 41 storage SAN guidelines 57 SAS controllers guidelines 57 SmartStor Duration metric 36, 50 metric value 36 spool to data conversion task 40 spooling SmartStor 40 time, lengthening 32 SQL Agent Introscope statement normalizers 94 showing many unique SQL metrics 92 statements causing metric explosions 92 normalizers 94 standalone Enterprise Manager hardware requirements example 74 Workstation connections allowed to 101 startup time, improving slow Introscope 81 steady-state, events limit 70 subscribed metrics See metrics, subscribed supportability metrics Is Clamped 98 Metric Count 98 related to metric clamp 98 synchronizing, clock on clustered machines 59, 72 system performance, determining general 47

Index

131

CA Wily Introscope

T
tabs CPU Overview 46 Enterprise Manager Overview 27 Metric Count 107 threshold, ping time for Collector overload 51 that disconnects MOM 52 time server software, use to synchronize machine clocks in cluster 59, 72 time slices combined, symptom of overloaded EM 69 data processing in Enterprise Manager 78 tool tip 107 Top N graph defined 103 metrics limit 104 traces.db about 53 calculating disk space needed 54 Transaction Event database, See traces.db Transaction Trace component clamp 108 dropped events metric 36 events 36 insert queue 36 performance problems related to 73 queue size 36 transactions, deep, involving multiple frontends 111

W
WebView cause of slow client response times 103 connections in clusters 102 dashboards and slow processing 103 how MOM sees as Workstation 100 performance load 101 running on EM and CPU resource contention 100 servers CPU resource load 103 user limits 103 wildcard symbol, performance issues in metrics groupings 80 Workstation connections allowed to MOM 102 allowed to standalone Enterprise Manager 101 in clusters 102 heap footprint 100 memory requirement 100 OS memory requirements 100 performance problems in cluster 102 sluggishness cause in a cluster 72 typical cause 72, 100 viewing data in Live mode 43

U
upgrading, Collector 44 using time server software 59, 72

V
virtual agents and Management Module hot deployments 68 useful for clustered applications 106 using large CPU resources 73

132

Index

You might also like