Best Practices For Monitoring Cisco Ip

Best Practices for Monitoring Cisco Systems IP Telephony Networks
Contents
Why Monitor IPT Components? 1 About AppManager ..................... 1 CallManager Server Health ........ 2 CallManager Services Health...... 3 CallManager Database................. 3 CallManager Functionality ......... 3 IP Gateway Health ....................... 3 QoS Monitoring ............................ 3 Layer 2 and 3 Switches ................ 3 Reporting....................................... 3 Conclusion ..................................... 3 Appendix A: Supported Environments................................ 3 Appendix B: Summary Guidelines3
This paper highlights suggested best practices to ensure a successful Call Manager deployment. Other white papers that address the management of the other Cisco IPT components are available. Each IP telephony deployment is different, but generally a Cisco Systems AVVID IP Telephony deployment includes a CallManager cluster, voice gateways, a Unity voice mail server, routers, L2/L3 switches, IP phones, and other applications.
White Paper
January 2005
Why Monitor IPT Components?

Ciscos AVVID (Architecture for Voice, Video, and Integrated Data) exemplifies high-reliability IP telephony (IPT), but its reliability is dependent on the proper configuration and operation of dozens of associated components. In the following sections, well discuss a few of the IPT components you should plan to monitor day-in and day-out. In many cases, good management and monitoring practices can alert you to potential risks before they actually create problems for users. Monitoring Cisco IPT with NetIQ AppManager will enhance performance, cost-effectiveness, and reliability, and simplify the management of your IPT network. A comprehensive management solution is vitally important to the success and reliability of your IP telephony implementation.
About AppManager
The AppManager suite from NetIQ is the best, most comprehensive, and most reliable system fault and performance management solution on the market. AppManager was designed to manage the Windows NT/2000 systems that support Cisco IP telephony. It can perform hundreds of simple and sophisticated monitoring and management tasks related to Windows 2000 services, DNS/DHCP and WINS, SQL server, and even hardware, such as CPUs and fans. Modules have been developed for AppManager to specifically manage the Cisco AVVID system. AppManager works to ensure the availability and performance of VoIP systems and networks through the use of Knowledge Scripts, which are network management rules designed to handle one or more tasks. Depending on the task, Knowledge Scripts can collect performance data (for example, about how many calls have been attempted today), monitor systems for simple or complex events (for example, call quality is poor or a service is down), and respond with one or more actions (such as raising an alert when theres a problem, or restarting a service automatically).
CallManager Server Health

AppManager for Cisco CallManager checks CPU and memory utilization for CallManager processes at each server you choose to monitor and raises an alert when a process exceeds its utilization threshold, indicating reduced performance or increased risk of a failure. It tracks average CPU and memory usage over time for the CallManagers in a cluster, gives you access to a list of the processes that are consuming the most CPU resources, and can display the information it discovers in charts and graphs. Careful, thorough management of your Cisco CallManager servers will let you know about a potential problem so that you can respond proactively, before the problem affects your users. AppManager offers numerous Knowledge Scripts devoted to monitoring the Cisco CallManager application, resources, and critical services. With other scripts, you can monitor for spikes in CPU and memory usage. Note: Knowledge Script names in bold indicate that the script is recommended. Knowledge Scripts not in bold are suggested. There are Knowledge Scripts not mentioned in this document that may be useful for your specific requirements so be sure to review the Knowledge Script Guide for more complete listings.
Heres a list of the most important things to monitor right from the start:
CPU usage. Run the CiscoCallMgr_CCM_SystemUsage script to monitor and set thresholds for the CallManager CPU usage and total CPU usage. Also run CiscoCallMgr_CCM_CpuHigh, which lets you set thresholds for maximum CPU usage for all other CallManager processes. Run these scripts every five minutes. Then run CiscoCallMgr_Report_SystemUsage or create a chart to compile the data that youve collected. The Maximum and Average data streams can provide invaluable trending information.
Also, look for spikes in CPU usage. To isolate which processor application is causing the spikes in CPU, run NT_TopCpuProcs. Its possible that an application other than CallManager is causing the problem. Spikes in excess of 80 percent may indicate that your system cant handle any new functions or that the CallManager might start dropping calls. Consider adding another server or moving phones to balance the loads carried by all your servers. If this is a rogue process, stop the identified process.
Physical memory. Run the CiscoCallMgr_CCM_SystemUsage script to monitor and set thresholds for CallManager memory usage and total memory usage. Also run CiscoCallMgr_CCM_MemoryHigh, which allows you set thresholds for maximum memory usage for all other CallManager processes. Run these scripts every five minutes. Then run CiscoCallMgr_Report_SystemUsage or create a chart to compile the data that youve collected. The Minimum, Maximum, and Average data streams can provide invaluable trending information.
Also, look for spikes in memory usage. To isolate which process or application is causing the spike in memory usage or memory leak, run NT_TopMemProcs. Its possible that an application other than CallManager is causing the problem. Spikes in usage of 75 to 80 percent could indicate heavy usage, or a more serious issue such as a virus, or denial-of-service attack. Spikes in excess of 80 percent may indicate that your system cant handle any new functions or that the CallManager might start dropping calls. Consider adding another server or moving phones to balance the loads carried by all your servers. If this is a rouge process or memory leak, stop the identified process.
Hard disk. We recommend that you monitor your hard disks every 12 hours. Among the benefits is ensuring the status of the different disks belonging to a RAID array. (Although the array may be in proper working condition, one of the physical drives may not be.)
Several AppManager scripts automate the task of monitoring hard disk status:
CIM_DiskArrayFail. Monitors each physical drive in the Array set. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation. CiscoCallMgr_Sys_PhysicalDiskBusy. Monitors physical disk operation time and queue length. A disk is considered busy if its disk operation time is high or the queue length is long. CiscoCallMgr_Sys_PhysicalDiskIO. Monitors physical disk reads, writes and transfers per second. For disk array subsystems, you need to enable Performance Monitor disk counters before you can run Sys_PhysicalDiskIO. If you have not already enabled Performance Monitor for disk activities, run the program %systemroot%\system32\diskperf.exe with the -y switch, then reboot your system. On Windows 2000 servers, only the physical disk counter is enabled by default. CIM_IDAFail. Monitors IDA controllers for the operational status of IDA drives. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation. CIM_SCSIFail. Monitors the operational status of discovered SCSI drives. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, drive failures, and drive degradation.
White Paper
CIM_SCSITimeout. Monitors hard and soft resets and command timeouts for the SCSI controller. This script raises events for SNMP or Compaq Insight Manager (CIM) failures. Disk space usage. You should closely monitor the usage of disk space of your CallManager servers, especially if logs are activated. Youll be able to avoid many problems altogether if you take a proactive stance toward managing log file sizes. Run the NT_LogicalDiskSpace script every 12 hours. Usage above 75 percent or free space of less than one GB is a signal to delete temporary files and archive and delete log files. Virtual memory. Run NT_MemUtil every two minutes to monitor the usage of virtual memory, as well as physical memory and paging files. A spike in usage in excess of 75 percent could indicate heavy usage, or a more serious issue such as a virus, or denial-of-service attack. Run the AvgValueByHr Report script to summarize the data youve collected on an hourly basis. Look at the Minimum and Average memory data streams to help you set event thresholds and establish growth needs. Look at the Maximum memory data stream to help detect memory leaks. Fans. Its a good bet to periodically check the status of your CallManager servers fan. A once-a-day check is sufficient. Run the following script: CIM_FanSummary. Monitors the status of system and CPU fans. This script raises events for SNMP or Compaq Insight Manager (CIM) failures, fan failures, and fan degradation. Power supply. The status of your CallManagers power supply is perhaps one of the more vital conditions that you can monitor. Run either or both of the following scripts every two hours: CIM_UPSBatteryLow. Monitors the UPS (uninterrupted power supply) battery life. This script raises events for SNMP or HP Compaq Insight Manager (CIM) failures, AC power on, and low battery. CIM_UPSLineStatus. Monitors the UPS AC power line. This script raises events for SNMP or Compaq Insight Manager (CIM) failures and AC power line failure. Temperature. You should monitor the condition of the servers temperature sensors as well as your systems overall thermal environment. Once every hour, run: CIM_ThermalStatus. Monitors the systems thermal environment and the status of the temperature sensors. If the overall condition of the systems thermal environment is abnormal or the temperature sensors are operating out of normal range an event is generated with a degraded or critical condition event. Memory leaks. A memory leak occurs when a process requests memory for temporary usage, but does not release the memory when the process no longer needs it. This memory accumulation by a process can then starve other processes that need memory, leaving your system unstable or degraded.
Run CiscoCallMgr_CCM_SystemUsage to monitor physical memory usage for the CallManager process and total physical memory usage. Run this script for a week or two, and then create a chart or run CiscoCallMgr_Report_SystemUsage to compile the data that youve collected. Graph the memory values to identify possible memory leak conditions at the system level. You can identify a potential memory leak condition by noticing that the maximum free memory values continuously diminish over time or memory values for a particular process continually increase over time (assuming other parameters, such as the number of registered devices, remain somewhat constant). To pinpoint the faulty process, run NT_TopMemProcs. Then use the AppManager Chart Console to graph the daily minimum memory usage for that process over time. Double-click on a datapoint to see the details on memory use by top processes.
Network Interface Cards. It is also important to monitor the bandwidth on the Network Interface Cards. If the NIC on a particular CallManager is over-utilized, problems could occur with call setup and other communications.
Run the NT_NetworkBusy script every 15 minutes to monitor the traffic on all CallManager Network Interface Cards. An event will be raised if bandwidth utilization exceeds the threshold.
CallManager Services Health

The CallManager serverand thus, the rest of your IPT networkis only as reliable as the applications and services on which it depends. Youll need to monitor the following essential components.
Cisco CallManager service. The Cisco CallManager service runs on the Cisco IP Telephony Applications Server to provide software-only call processing as well as signaling and call control functionality. Youll want to monitor the status of the CallManager service every five minutes.
Run CiscoCallMgr_CCM_HealthCheck and set the parameters to alert you when the service has been restarted or if a restart attempt fails.
In addition, several other scripts monitor vital health-related functions:
CiscoCallMgr_CCM_RoleStatus. Determines whether a CallManager status is Primary or Backup. This script raises an event for status transitions. A Backup is defined as any CallManager with no registered phones (hardware or software). CiscoCallMgr_CCM_Heartbeat. Monitors the CallManager heartbeat. Each CallManager installed in your system should be sending out a signal to all registered devicesletting them know its activeevery 30 seconds. This script raises an event if the heartbeat stops or falls below the specified threshold. A low heartbeat indicates that the CallManager service was stopped and then restarted.
Cisco TFTP service. Cisco Trivial File Transfer Protocol (TFTP) builds and serves files consistent with the trivial file transfer protocol, a simplified version of FTP. TFTP servers distribute information to IP phones about the locations of CallManagers and the existence of patches they need to install, if any. The TFTP service on Windows 2000 provides configuration files and other information to Cisco devices as they register. Youll want to be notified if the TFTP service goes down, if TFTP errors occur, or if an exceptionally large number of TFTP requests pass over the network. If a TFTP server isnt working properly, youll probably see problems with phone and gateway registration. The TFTP server may serve up corrupt configuration files, or may fail to respond to requests. You can monitor the TFTP service with the same script you use to monitor the CallManager service. Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails. In addition, several other AppManager scripts monitor vital TFTP-related functions:
CiscoCallMgr_TftpRequests. Monitors the total number of TFTP requests handled during an interval. This number includes the local requests that were successfully handled by the server, Not Found requests, and requests that have been aborted or rejected by the TFTP server. CiscoCallMgr_TftpErrors. Monitors TFTP-related errors that occur during an interval. CiscoCallMgr_TftpHeartbeat. Monitors the Cisco TFTP heartbeat.
4 White Paper
CiscoCallMgr_TftpChangeNotify. Monitors the number of TFTP change notifications handled during an interval. CiscoCallMgr_TftpSegmentPctLost. Monitors the percentage of TFTP segments lost during an interval. CiscoCallMgr_TftpSegmentsSent. Monitors the number of TFTP segments sent during an interval.
Cisco Messaging Interface service. The Cisco Messaging Interface service provides the communication between the voice-mail system and Cisco CallManager. Use the CiscoCallMgr_CCM_HealthCheck script to monitor the status of the Cisco Messaging Interface service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails. Cisco IP Voice Media Streaming APP service. The Cisco IP Voice Media Streaming Application service provides voice media streaming functionality for the Cisco CallManager for use with MTP, conferencing, and music on hold (MOH). Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails. Cisco CTI Manager service. The CTI Manager contains the CTI components that interface with applications. With the CTI Manager service, applications have access to resources and functionality of all Cisco CallManagers in the cluster and have improved failover capability. CiscoCallMgr_CCM_HealthCheck also monitors the status of the Cisco CTI Manager service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails (assuming that the CTI Manager service is not in use). NumOfActiveCMLink is a counter that shows the total number of active CallManager links in the cluster. If this value drops to 0, then there is definitely something wrong with the CTI Manager service. If this number is non-zero, but is less than the total number of active CallManagers, you may have a problem with the CallManager servers in the cluster. Run the following scripts to further monitor the CTI Manager service:
CiscoCallMgr_CTI_Manager. Monitors the number of CTI Manager connections, open devices, open lines, and active CallManager links. CiscoCallMgr_RegCtiPorts. Monitors the number of currently registered CTI ports.
Cisco Telephony Call Dispatcher service. The Telephony Call Dispatcher service provides centralized services for Cisco Web Attendant clients and pilot points. You can monitor the Telephony Call Dispatcher service with the same script you use to monitor other Cisco services. Run CiscoCallMgr_CCM_HealthCheck every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails (assuming that Web Attendant is not in use). Cisco RIS Data Collector service. The Real-time Information Server (RIS) maintains real-time Cisco CallManager information and provides an interface through which the Cisco RIS Data Collector service and the SNMP Agent retrieve that information. Use the CiscoCallMgr_CCM_HealthCheck script to monitor the status of the RIS Data Collector service. Run the script every five minutes and set the parameters to alert you when the service has been restarted or if a restart attempt fails.
Cisco Database Layer Monitor service. The Cisco Database Layer Monitor service monitors aspects of the database layer as well as call detail records (CDRs). The database layer comprises a set of dynamic link libraries (DLLs) that provide a common access point for applications that need to access the database to add, retrieve, and change data. The Cisco Database Layer Monitor service performs functions such as determining whether the primary server is available during failover. Monitor this service every five minutes with the CiscoCallMgr_CCM_HealthCheck script, setting the parameters to alert you when the service has been restarted or if a restart attempt fails. System backups. Make sure youre always kept informed if system backups dont take place as scheduledpreferably every night. Monitor your backup servers and drives and make sure they arent in any danger of crashing. Cisco provides backup utilities that can be monitored. Run CiscoCallMgr_CiscoBackupStatus regularly and after scheduled backups to monitor the Cisco IP Telephony Applications Backup Utility program. Internet Information Services. Internet Information Services (IIS) support Cisco CallManager configuration through active server pages (ASP), give the Cisco CallManager server access to Administration web pages, and helps secure Cisco CallManager administration functions. If IIS processes and applications consume excessive CPU resources, and IIS servers, Web services, and processes may go down.
Configure CiscoCallMgr_IIS_HealthCheck to notify you if any of the above events occur. Because IIS logs error information, you should also monitor its logs. Use CiscoCallManager_CCM_EventLog to look for failed or errored ASP and HTTP requests and other communication failures.
In addition to the above-mentioned scripts, daily running of the following scripts will provide additional IIS monitoring capability:
CiscoCallMgr_IIS_CpuHigh. Monitors CPU usage for IIS application processes. CiscoCallMgr_IIS_KillTopCPUProcs. Monitors CPU usage for IIS processes and kill processes using excessive CPU resources. CiscoCallMgr_IIS_MemoryHigh. Monitors working set memory usage and memory pool usage for IIS application processes. CiscoCallMgr_IIS_RestartServer. Restarts an IIS server. CiscoCallMgr_IIS_ServiceUptime. Monitors Web sites and Web services uptime.
DC Directory Server service. The DC Directory Server provides phone number lookup and other directory services for Cisco IP phones. Run CiscoCallMgr_CCM_HealthCheck every five minutes to monitor the status of the DC Directory Server service and to automatically restart this service if it goes down. Domain Name Service. Youre probably already monitoring this important service on your network. DNS is just as critical for CallManager as it is for the rest of your network, enabling each VoIP phone to locate its CallManager server.
Run the NT_DNSConnectivity script to make sure CallManager servers never lose connectivity to DNS servers. Use the CiscoCallMgr_Sys_EventLog script to scan the Windows event logs for DNS errors.
Security. Another member of your organization may be in charge of network security. But security failures become your problem if they take down the phone system. Its a good idea to keep informed about any secure areas, such as the CallManager server, that have been compromised or threatened.
White Paper
NT_FailedLogins. Monitors for failed logon attempts to the server since the last interval (possibly due to break-in attempts).
CCM Systems Counters: NetIQ plans to provide the ability to monitor some of the key systems counters provided in recent releases of CallManager. Performance counters like CallsRejectedDueToThrottling, CodeRed (Yellow) EntryExit, and others are planned for a future release.
CallManager Database
The CallManager SQL database keeps records of your administrative configuration data, call route tables, and information about all calls made. Without the database, CallManager cant access any of its administrative configuration data or its routing plan. Database status, accessibility, and available space are the most critical metrics to track, but you should also keep tabs on CPU and memory utilization, at minimum. Set the parameters in the following Knowledge Scripts to raise events if a critical SQL service, such as MSSQLServer, goes down, or if the Windows 2000 Application Event Log includes a message that a SQL scheduled job has failed. To gather the most information, run the following scripts every five minutes:
CiscoCallMgr_SQL_Accessibility. Monitors SQL Server and database accessibility. CiscoCallMgr_SQL_RepTransactions. Monitors the number of transactions marked for replication but not yet replicated. CiscoCallMgr_ServerDown. Monitors the status of the SQL Server service. Automatically restarts the service when down if the Auto-Start option is set to yes.
Other useful metrics to monitor include:
CiscoCallMgr_SQL_BlockedProcesses. Monitors the SQL processes that have been blocked. CiscoCallMgr_SQL_CPUUtil. Monitors the percentage of CPU resources used by SQL Server processes. CiscoCallMgr_SQL_DataGrowthRate. Monitors the data growth and shrink rates for all SQL Server databases. CiscoCallMgr_SQL_DataSpace. Monitors the data space available and data space being used for all SQL Server databases. CiscoCallMgr_SQL_DBGrowthRate. Monitors database growth and shrink rates. CiscoCallMgr_SQL_DbOption. Monitors databases options. CiscoCallMgr_SQL_DBSpace. Monitors the database space available and space being used for all SQL Server databases. CiscoCallMgr_SQL_Errorlog. Monitors the SQL Server error log. CiscoCallMgr_SQL_LogGrowthRate. Monitors log growth and shrink rate for all SQL Server databases. CiscoCallMgr_SQL_LogSpace. Monitors the log space available and log space being used for all SQL Server databases. CiscoCallMgr_SQL_MemUtil. Monitors the amount of working set memory used by SQL Server processes.
CiscoCallMgr_SQL_NearFileMaxSize. Monitors the size of all SQL Server database files. CiscoCallMgr_SQL_NearMaxConnect. Monitors SQL Server opened connection usage. CiscoCallMgr_SQL_NearMaxLocks. Monitors SQL Server lock utilization. CiscoCallMgr_SQL_NetError. Monitors SQL Server network errors. CiscoCallMgr_SQL_RepTranSec. Monitors the number of transactions replicated per second.
CallManager Functionality
Once youve taken care of the absolutely essential monitoring tasks outlined in the sections above, youre ready to extend coverage once moreto the functionality of CallManager itself and some of the extras that supplement or ship with CallManager.
Registered Devices. Whenever a device (e.g., phone, gateway, gatekeeper) has a problem registering with its CallManager, you should take a closer look. In the Windows 2000 Application Event Log, an error listed as DeviceTransientConnection indicates that a device made a connection to the CallManager server on TCP port 2000, but that the connection was terminated before registration was accomplished. This could mean theres a problem with the device, with the network connection, or with the server or database. The device itself may be illegal and could indicate a security breach. For obvious reasons, youll want to know anytime theres a problem with device registration, or if the number of currently registered devices exceeds the number of devices that you know are authorized. Registered phones. Youll want to be kept informed any time the number of registered phones decreases rapidly or falls below your threshold. Run CiscoCallMgr_RegHardwarePhones every 15 minutes. In addition, monitor the number of currently registered station devices other than Cisco hardware phones, such as Cisco IP SoftPhones, Cisco uOne ports, and Cisco Unity voice ports, with CiscoCallMgr_RegOtherDevices. MGCP gateway registration. At minimum, run CiscoCallMgr_MGCP_GatewayCheck every five minutes to monitor for new and missing MGCP Gateways. An additional AppManager script provides more data to aid your monitoring efforts:
CiscoCallMgr_CCM_DeviceStatus. Monitors the status of gateways within a cluster. Possible statuses include registered, unregistered, rejected, and unknown.
Gatekeeper registration. You should periodically (every five minutes should do) verify that the CallManager is registered with the gatekeeper. Run CiscoCallMgr_CCM_DeviceStatus to monitor the status of Gatekeepers within a cluster. Possible statuses include registered, unregistered, rejected, and unknown. Calls in progress. When a phone goes off hook, it is a call in progress until it goes back on hook. If all calls that are in progress are connected, the number of calls in progress and the number of active calls will be the same. For capacity-planning purposes, you should establish an upper-limit threshold for the number of calls that can be in progress. Run CiscoCallMgr_CallsInProgress every five minutes over a period of time. Then run AvgValueByHr report script to graph the data streams that will help you decide what constitutes the calls-in-progress threshold. Once youve established your baseline, you can configure the CallsInProgress script to alert you when the number of in-progress calls exceeds Cisco sizing guidelines.
White Paper
Active calls. Active calls are those that have a voice path connected. For capacity-planning purposes, you should establish an upper-limit threshold for the number of active calls. Run CiscoCallMgr_CallsActive every five minutes over a period of time. Then run CiscoCallMgr_Report_CallsByHour report script to graph the data streams that will help you decide what constitutes the active-call threshold. Once youve established your baseline, you can configure the CallsActive script to alert you when the number of active calls exceeds your systems capacity To gather additional information about active calls, run NetworkDevice_ISDNDChannelUtil to monitor total gateway call activity.
Attempted calls. You should monitor attempted calls over time and use the collected data to compute the Busiest Hour Call Attempt (BHCA) value. Run the CiscoCallMgr_CallActivity script every 15 minutes to gather the data and then run CiscoCallMgr_Report_CallsByHour to graph the data. Completed calls. A completed call is an active call that completed without an abnormal termination code. You should monitor completed calls over time and use the collected data to compute the Busiest Hour Call Attempt (BHCA) value. Run the CiscoCallMgr_CallActivity script every 15 minutes to gather the data and then run CiscoCallMgr_Report_CallsByHour to graph the data. Active PRI channels. Collection of this data over time can help you understand call patterns and busy hour peak calls. You can use baseline data to detect real-time underutilization of circuits, which is an indication of possible system performance degradation (including hard-to-detect PSTN call routing or circuit-down conditions). Data trending helps you plan for circuit growth and provisioning. Several AppManager Knowledge Scripts can provide the information that you need:
CiscoCallMgr_MGCP_PRI_Channels. Monitors MGCP PRI devices for the number of currently active and out-of-service channels. PRIs can be grouped into logical Trunk Groups for thresholding across multiple PRIs. The PRIs are generally grouped by any combination of carrier, local, long distance, international, etc. CiscoCallMgr_MGCP_PRI. Monitors calls completed and outbound busy attempts for MGCP PRI devices and also the status of the PRI D-Channel. CiscoCallMgr_MGCP_T1CAS_Channels. Monitors MGCP T1 devices for the number of currently active and out-of-service channels. T1s can be grouped into logical Trunk Groups for thresholding across multiple T1s. The T1s are generally grouped by any combination of carrier, local, long distance, international, etc. CiscoCallMgr_MGCP_T1CAS. Monitors calls completed and outbound busy attempts for MGCP T1 devices CiscoCallMgr_H323_CallsAttempted. Monitors the number of calls attempted by an H.323 device during an interval. CiscoCallMgr_H323_CallsInProgress. Monitors the number of calls in progress by an H.323 device. NetworkDevice_ISDNBChannelUtil. Monitors Total Gateway PRI channels in use and E1 Interface channels in use. NetworkDevice_InterfaceHealth. Monitors the parent resource for the interfaces on a network device. CiscoCallMgr_CCM_PRIChannels. For CallManager 3.1 and above, monitors the number of active PRI voice channels and PRI spans in service.
9
In-service PRI spans. The total number of in-service PRI spans should remain constant, although the number of circuits may vary whenever a new circuit is provisioned or an existing circuit is disconnected. Run CiscoCallMgr_CCM_PRIChannels every five minutes and set it to alert you when the number of in-service spans falls below an acceptable level. In addition, run NetworkDevice_InterfaceHealth to gather further information about the parent resources for the interfaces that you are monitoring.
Port status (FXO, FXS, and Analog). Make sure that your monitoring efforts include watching call activity through your FXO, FXS and Analog ports, as well as knowing when the ports become inactive. The total number of in-service ports should remain fairly constant. Run the following AppManager scripts to monitor active and in-service ports, completed calls, and outbound busy attempts:
CiscoCallMgr_MGCP_FXO. Monitors completed calls and outbound busy attempts for MGCP FXO devices. CiscoCallMgr_MGCP_FXS. Monitors completed calls and outbound busy attempts for MGCP FXS devices. CiscoCallMgr_AnalogPortsActive. Monitors the number of currently active analog ports. CiscoCallMgr_AnalogPortsOutOfService. Monitors the number of analog ports out of service. CiscoCallMgr_CCM_FXOPorts. For CallManager 3.1 and above, monitors the number of active and in-service FXO ports. CiscoCallMgr_CCM_FXSPorts. For CallManager 3.1 and above, monitors the number of active and in-service FXS ports.
Active Conference Bridge calls. Software to help users set up conference calls, Conference Bridge, ships with the CallManager software and allows for two different types of conference dialin proceduresMeet-Me and Ad-Hoc. Conference Bridge works with either multicast or unicast conference devices, but in each case, you must configure in advance the maximum number of audio streams that will have to be supported for a call. You should monitor Conference Bridge conferences and streams in real time to identify underand over-utilization and to ensure that users are able to set up and complete conference calls when desired and that conference devices are configured to meet demands for audio streams. Five AppManager scripts can provide all of the data that you need:
CiscoCallMgr_ConfBridgeActiveConf. Monitors the number of active conferences for a Conference Bridge. CiscoCallMgr_ConfBridgeActiveStreams. Monitors the number of active streams for a Conference Bridge. CiscoCallMgr_ConfBridgeAvailStreams. Monitors the number of available streams for a Conference Bridge. CiscoCallMgr_ConfBridgeConferences. Monitors the number of conferences completed during an interval. CiscoCallMgr_ConfBridgeStreams. Monitors the number of streams on conferences completed during an interval.
10
White Paper
Available Conference resources. Run CiscoCallMgr_ConfBridgeAvailStreams to alert you if the number of available Conference Bridge streams falls below the minimum acceptable level. If the number of available streams frequently falls below the acceptable level, consider adding more Conference Bridge resources. Active transcoding resources. Transcoding resources allow IP phones using different codecs to communicate transparently. With calls coming into your network from the PSTN and from other VoIP networks, you may see some problems with codec incompatibility. Among the resources CallManager allocates is a transcoding resource that allows IP phones using different codecs to communicate transparently. Transcoding is particularly useful if bandwidth is tight and restrictions are being placed on certain network segments to limit codec usage to the lower-bandwidth codecs. For example, a call placed using a low-bandwidth codec may be transferred to a voicemail system that requires a G.711 (high-bandwidth codec) data stream. In such a case, the lack of a transcoder can mean a dropped or failed call. Run the following scripts every three minutes to monitor active resources and to be notified should the number of available resources fall below an acceptable level:
CiscoCallMgr_TranscoderResources. For CallManager 3.1 and above, monitors active and available transcoder resources on all transcoder devices registered to a CallManager. CiscoCallMgr_Transcoder_Device. Monitors an individual transcoder device for active resources and available resources. This script also monitors whether the transcoder device ran out of resources at any time during the specified interval. CiscoCallMgr_TranscoderUnavailable. For CallManager 3.1 and above, monitors the number of times during the interval that a CallManager attempted to allocate a transcoder resource when none was available.
Media Termination Points (MTPs). Available on some Cisco switches, the MTP application supports call hold and transfer for H.323 endpoints and PSTN phones, which wouldnt otherwise be able to hold or transfer calls on a VoIP network. MTPs work by acting as proxies, keeping the call on hold alive on the non-supportive endpoints while communicating information about the calls location to the party at the other end of the call. Without MTPs, many incoming calls placed on hold or transferred by a telephone user are dropped. Because you obviously cant predict how many incoming calls will need MTP resources at any point, its a good idea to keep records of how many active streams each MTP has to support at certain times of the day, how often MTPs are requested, and how often these requests go unfulfilled due to call volumes.
CiscoCallMgr_MTP_Device. Monitors an individual MTP device for active and available resources.
Music on Hold (MOH) Servers. A plug-in installed during CallManager installation, the MOH server allows users to hear music while theyre waiting on hold. MOH wont work unless you also configure the CallManager server to use the MOH streams generated by the MOH server. The MOH Audio Translator application can transform a given .mp3 audio file into MOH audio source files formatted for each of the four supported codec types. Based on a source ID that identifies the type of codec making the MOH request, source files are then sent in streaming (UDP) format to the proper port. The MOH server has several Windows 2000 performance counters to monitor, and youll also want to know if any MOH requests end in failed connections, indicating a configuration mismatch between the server and the CallManager. The IP Voice Media Streaming application that enables MTPs and unicast conference bridges also enables the MOH server, so make sure you receive an alert if it goes down for any reason.
11
Cisco CallMgr_MOHDevice. Monitors the number of currently active and available resources of Music On Hold devices. CiscoCallMgr_MOHServer_LostConnections. Monitors the number of times during the specified interval that a Music On Hold server lost connections with CallManager.
Available bandwidth. Voice traffic requires specific bandwidth based on codec. G.711 requires about 64 Kbps or so for each direction of a bi-directional call. G.723 and G.729 require significantly less bandwidth due to compression but congestion can severely impact call quality. Each time you add a new application to the mix on your network, you risk the oversubscription of certain links. Congestion will almost certainly affect overall call performance, particularly if data loss or excess latency occurs. Voice is susceptible to catastrophic degradation under conditions of network oversubscription. Ensure that you have adequate bandwidth, and ensure that you know when bandwidth availability is low, by running the following scripts every five minutes:
CiscoCallMgr_LocationBandwidth. Monitors the current available bandwidth for a Cisco CallManager location. NetworkDevice_SingleWANLink_Util. Monitors a single WAN (serial, T1, or T3) link on a network device. NetworkDevice_WANLink_Util. Monitors WAN (serial, T1, or T3) links on a network device.
IP phone functionality. You should monitor IP phones for their registration status, the validity of their dial tones, jitter, latency and lost packet count. By frequently checking CallManager Call Detail Records (CDRs) and Call Management Records (CMRs), youll gain access to valuable information about call metrics and call quality. CallManager writes CMRs only for Cisco IP phones and for gateways that use the MGCP (Media Gateway Control Protocol) to interface with CallManager. CallManager doesnt keep these records by default; do the following to start collecting these useful data records: a. In Cisco CallManager Administration, select Service > Service Parameters > CallManager. b. To enable the generation of CDRs, set CDREnabled to T. c. To enable the generation of CMRs, set CallDiagnosticsEnabled to T. The following AppManager scripts provide the IP phone monitoring capability you need:
CiscoCallMgr_RegHardwarePhones. Monitors the number of registered hardware phones. CiscoCallMgr_CCM_PhoneCheck. Monitors for new and missing phones and events with directory number or description of phones. CiscoCallMgr_CCM_LossOfHardwarePhones. Monitors for loss of hardware phones and events based upon configured threshold. CiscoCallMgr_CallQuality. Monitors calls recorded in the CallManager database on the Publisher for jitter, latency and lost data. This script checks CMRs periodically for lost packets, jitter, and latency, all of which can degrade the quality of voice transmission and lead to user complaints. Latency is the most important statistic to track. The CMR estimates latency for a call based on differences in the Network Time Protocol (NTP) timestamps in the RTP headers added to each packet by the sender and the receiver. Latency for a VoIP call in a single direction should be below 140-150 ms, or call quality noticeably deteriorates.
AppManager will generate an event with the full CDR record that includes source number, destination number, duration of call, failure cause code, and the latency, loss, and jitter metric values averaged for that call.
CiscoCallMgr_CallFailures. Monitors calls recorded in the CallManager database on the Publisher for calls that ended with an abnormal termination code.
12 White Paper
AppManager will generate an event with the full CDR record that includes source number, destination number, duration of call, failure cause code, and the latency, loss, and jitter metric values averaged for that call.
CiscoCallMgr_CCM_DeviceStatus. Monitors the status of key devices within a cluster. Possible statuses include registered, unregistered, rejected, and unknown.
Cisco CallManager CDR Reporting and Analysis. The AppManager for Call Data Analysis module enables customers to collect and report on call data records (CDRs) produced by VoIP systems such as Cisco CallManager. These records usually contain information such as call origination, call destination, call duration, and call termination status. Most VoIP systems also provide information about the quality of the calls they process, including metrics such as jitter and latency, as well as the number of packets that were sent, received, and lost. With Call Data Analysis, customers can create and schedule detailed reports, using AppManager Knowledge Scripts that analyze the traffic represented by the CDR data. Sample reports include Call Volume Report, Call Success Rate Report, Call Completion Rate Report, Call Failure Cause Report, and Call Quality Report.
IP Gateway Health
We suggest that you constantly monitor VoIP gateways for availability, CPU statistics, memory usage, and link utilization. Run the following AppManager scripts to gather all of the necessary data:
NetworkDevice_Chassis_Usage. Monitors the physical chassis of a network device. NetworkDevice_Interface_Health. Monitors the interfaces on a network device. NetworkDevice_LANLink_Util. Monitors the LAN links on a network device. NetworkDevice_WANLink_Util. Monitors the WAN (serial, T1, or T3) links on a network device.
QoS Monitoring
In order for VoIP users to receive an acceptable level of voice quality, VoIP traffic must be given priority over other kinds of network traffic, such as data. The main goal of Quality of Service (QoS) is to ensure that VoIP traffic receives the preferential treatment it deserves, thereby reducing or eliminating the delay of voice packets that travel across a network. You should monitor the following metrics that affect VoIP call quality:
Delay. The end-to-end delay, or latency, as measured between endpoints is a key factor in determining VoIP call quality. Jitter. Jitter is a call quality factor known to adversely affect call quality. Jitter is also called delay variation, and it indicates the variance of the arrival rate of datagrams sent during a simulated VoIP call. Jitter buffer loss. Jitter buffer loss is the amount of data that is lost when jitter exceeds that which the jitter buffer can hold. Jitter buffer loss affects call clarity, which affects the overall call quality. Packet loss. When a datagram is lost during a VoIP transmission, you can lose an entire syllable or word in a conversation. Obviously, data loss can severely impair call quality.
13
MOS. By comparing your real network metrics with the subjective MOS (Mean Opinion Score), you can understand which network factor is clearly affecting voice quality. The MOS is an overall score representing the quality of a call. The MOS is a number between 1 and 5. A MOS of 5 is excellent; a MOS of 1 is unacceptably bad. R-value. Defined by ITU (International Telecommunication Union) recommendation G.107, the Emodel is a complex calculation, the output of which is a single score called an R-value that is derived from delays and equipment impairment factors. An R-value can be mapped to an estimated MOS. R-values range from 100 (excellent) to 0 (poor). As shown below, an estimated MOS can be directly calculated from an R-value:
Several AppManager scripts simulate a VoIP call between Performance Endpoints. After simulating a call, the scripts can gather data about some or all of the QoS metrics as they relate to your network:
VoIPQuality_CallPerf_G711a. Simulates a VoIP call between endpoints using the G.711a codec, which is the ITU standard for H.323-compliant codecs. Uses the A-law for compression, a popular standard in Europe. VoIPQuality_CallPerf_G711u. Simulates a VoIP call between endpoints using the G.711u codec, which is the ITU standard for H.323-compliant codecs. Uses the U-law for compression, the most frequently used method in North America. VoIPQuality_CallPerf_G723.1-ACELP. Simulates a VoIP call between endpoints using the G.723.1-ACELP codec, which uses the conjugate structure algebraic code excited linear predictive compression (ACELP) algorithm. VoIPQuality_CallPerf_G723.1-MPMLQ. Simulates a VoIP call between endpoints using the G.723.1-MPMLQ codec, which uses the multipulse maximum likelihood quantization (MPMLQ) compression algorithm. VoIPQuality_CallPerf_G726. Simulates a VoIP call between endpoints using the G.726 codec, which is a waveform codec that uses Adaptive Differential Pulse Code Modulation (ADPCM). ADPCM is a variation of pulse code modulation (PCM), which only sends the difference between two adjacent samples, producing a lower bit rate VoIPQuality_CallPerf_G729. Simulates a VoIP call between endpoints using the G.729 codec, which is a high-performing codec that offers compression with high quality.
14
White Paper
VoIPQuality_CallPerf_G729A. Simulates a VoIP call between endpoints using the G.729A codec, which is a reduced-complexity version of the G.729 codec. Developed for simultaneous voice and data applications for which the G.729 codec was too complex. Speech quality is virtually indistinguishable between G.729 and G.729A.
Many other Knowledge Scripts simulate a VoIP call between Cisco SAA-enabled routers. The VoIPQuality_CiscoSAA scripts simulate calls using the same codecs as the VoIPQuality_CallPerf scripts. And finally, one more Knowledge Script, CiscoCallMgr_CallQuality, monitors calls recorded in the CallManager database on the Publisher for jitter, latency and lost data.
Layer 2 and 3 Switches

We highly recommend that you continually monitor Layer 2 and Layer 3 switches for switch failures, card failures (such as reboots, crashes), memory utilization, CPU utilization, power supply status, temperature status, fan status, QoS parameters, and IP phone port status. Three AppManager scripts provide the monitoring capability you need:
NetworkDevice_Chassis_Usage. Monitors the physical chassis of a network device. NetworkDevice_Interface_Health. Monitors the interfaces on a network device. NetworkDevice_LANLink_Util. Monitors the LAN links on a network device.
Reporting
AppManager collects data about the performance of IP telephony and stores it in the AppManager repository, a SQL server database. You can access this data in real-time or historically for all of your reporting needs.
Real-time
The AppManager Chart Console lets you generate and view charts of data streams generated by Knowledge Script jobs. As the jobs run, the data streams in the charts are continually updated with new information. The Chart Console provides key data that you can use instantly to manage and troubleshoot your Cisco IPT environment. You can use the AppManager GUI- or Web-based Chart Console to view collected data in real-time at regular intervals as low as one minute. Viewing of data can be organized and segmented by data stream and access to charts can be restricted by AppManager user login. All data displayed in charts can be easily viewed using AppManager Report scripts and selecting the desired data stream. In addition, AppManager ships the Chart2HTML Knowledge Script, which allows you to easily convert charts to Reports.
15
Historical
We recommend that you collect trending information whenever and wherever possible. Trending information should contain at least maximum and average values, which can then be used to define above average and peak thresholds for the different parameters. The threshold should be defined if possible using the average and maximum values observed during the busy hour of the day in order to avoid unnecessary alerts.
Compiling the Collected Data

AppManager reports are generated using Report Knowledge Scripts. AppManager ships with dozens of Report scripts to generate HTML reports based on any type of collected data. Access to reports can be restricted using MS IIS web site directory security. The following is a list of frequently used generic report Knowledge Scripts:
AggValueHistory. Generates a report from data in the archive and aggregate tables. AvgMaxMinValue. Displays the average, maximum, and minimum values of the data stream(s) collected by a Knowledge Script within a specified time frame. AvgValueByDay. Details the average daily value of data streams collected by Knowledge Script jobs AvgValueByHr. Displays the average values by hour of the data stream(s) collected by a Knowledge Script within a time range AvgValueByMin. Displays the average values by minute of the data stream(s) collected by a Knowledge Script within a time range
Analyzing Call Activity

This section introduces a formal process that you can adapt to your organization in order to provide baseline and trend information, and act upon the information collected this way. It is important to note that unexpected additions to the number of phones, sidecars (7914s), gateways, applications (IP Softphones, Web attendants, IP Manager Assistants, etc), or any other changes might affect system utilization. Therefore, it is important to document plans for a successful analysis of the collected data and planning. Here is an example of the steps entailed in a postanalysis of collected data: 1. Analyze the Call Detail Records (CDRs) and/or relevant performance counters to determine attempted calls (CA) and completed calls (CC). If using CDRs, group the data into time slots (for example, 15 minutes-worth of data at a time). Compute the HCA (hourly CA) and the HCC (hourly CC) for every time slot of data. For example, to get the hourly data, multiply the numbers found for 15 minutes by 4.) 2. Using the data above, you can determine:
The busiest hour during the day (all days); The busiest hour during the week (all weeks); The top three busiest days of the year.
Busiest can be defined in terms of HCA, HCC, or even talk-minutes if you are looking at the problem from a cost perspective. If you are using 15-minute time slots for data analysis, finding the weekly busiest hour of call attempts (BHCA) means finding the four consecutive time slots that have the highest total value of HCA. The result could be, for example: on Tuesday from 9:45am to 10:45am, with a BHCA value of 1,252. 16 White Paper
3. Once youve figured out your busiest hours (daily, weekly) and your top three busiest day (yearly), use the [data] collected during:
the busiest hour during the day (averaged), the busiest hour during the week (averaged), and the busiest hour of the busiest days (considered as a peak value)
Note [data] could be any of the parameters highlighted in this document (total virtual memory used; virtual bytes used by CCM.exe; etc.); so for each [data] element you want to baseline, youll obtain three values. You can use data youve collected to plan system upgrades or to analyze whether your system has a good chance of sustaining periods of high usage. For example, if you are planning to add more phones on your system within six months, then monitor the call activity using the busiest hour of the busiest day for one month and divide by the current number for phones to determine the peak call activity per phone.
Conclusion
Cisco CallManager is an excellent choice for your IP telephony implementation. But as with any sophisticated system, there may be a few hurdles along the way to your goal of a VoIP network with five-nines of reliability. Network hardware and links go down; software applications, services, and processes consume limited CPU and memory resources; intruders interfere with administrative files and records. Cisco has worked hard to ensure the Cisco CallManager system will be as reliable as the telephone networks we all take for granted. Keeping the network running perfectly, all the time, requires proactive management and a good understanding of the various system components including the operating systems, databases, and servers that support Cisco CallManager. An intelligent deployment and ongoing monitoring practice are required to keep Cisco CallManager and its associated software and hardware operational, efficient, and reliable a task that can be quite time-consuming. NetIQ AppManager for VoIP software and Knowledge Scripts provide the strategic and tactical tools in support of the necessary monitoring tasks required for company-specific SLAs.
17
Appendix A: Supported Environments

AppManager modules support the following platforms:
Module: Cisco CallManager Cisco Intelligent Contact Manager (ICM): Cisco ICS Cisco IP Interactive Voice Response (IP IVR) Cisco IP/TV Cisco Personal Assistant Cisco Unity Cisco Unity Bridge Compaq Insight Manager Supported platforms: Cisco CallManager 3.0(x), 3.1(x), 3.2(x), 3.3(x), 3.4(x), 4.0(x) Cisco ICM 4.6 or later Cisco Integrated Communication System 7750 Cisco IP IVR 2.2 or later Cisco IP/TV 3.2 or later Cisco Personal Assistant 1.2 or later Cisco Unity 3.0(2), 3.1(x) and 4.0(x) Cisco Unity Bridge 2.1 or later Cisco Media Convergence Server (MCS) 7800 series Compaq Insight Manager agent 3.2 or later Dell OpenManage H.323 Call Setup Lotus Domino Unified Messaging Microsoft Exchange Unified Messaging Network Devices Dell PowerEdge servers running Dell OpenManage version 3.1 or later Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a Domino Server 4.5, 4.6 or 5.0 Exchange Server 5.0, 5.5 and Exchange 2000 Server Cisco Systems switches, routers and gateways, including VG200/248 Nortel BayStack switches, models 460 and above Nortel Networks routers, BayRS v14 and above Nortel Access Stack Node (ASN) Series Nortel Backbone Concentrator Node (BCN) Series Nortel Backbone Link Node (BLN) Series Nortel Backbone Node (BN) Series Nortel Passport Advanced Remote Node (ARN) Series Nortel Passport Series, including 8600 series Extreme Networks switches using ExtremeWare v6.1.8 and above Alcatel OmniSwitch/Router 6000 and 7000 Series
18
White Paper
Module: SIP Call Setup Video Quality
Supported platforms: Microsoft Windows 2000 SP2 Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a Microsoft Windows Media Player version 7.x or later RealOne Player or RealPlayer G2 or later
VoIP Quality (Call Performance)
Microsoft Windows 2000 SP2, Windows NT 4.0 SP6a Linux for x86 Sun Solaris (x86 and SPARC)
Windows
Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a
19
Appendix B: Summary Guidelines

CallManager Server Health CPU CallManager CPU Usage Total CPU CPU for CallManager Process CPU for all other CallManager processes Isolate Process Spikes in CPU Memory Memory by the CallManager Process Total Memory Memory for all CallManager Processes Memory for all other processes Isolate Process Spikes in Memory (Memory Leaks) Physical Memory Virtual Memory Paging Space Paging High Disk Disk Usage Disk Array Status Fans Fan Status Power Supply Battery Status AC Power Status Temperature Server Temperature Status Network Interface Cards NIC Card Bandwidth Utilization Server Up/Down CCM AppManager Application Monitors AppManager 6.0 Services netiQms netiQmc netiQccm CCM SQL (CallManager Database) SQL Accessibility SQL Server Status SQL Transaction Replication Required AM KS CiscoCallMgr_SQL_Accessibility CiscoCallMgr_SQL_ServerDown CiscoCallMgr_SQL_RepTransaction Required AM KS Thresholds War/Crit Polling Interval Data Collection
CiscoCallMgr_CCM_SystemUsage CiscoCallMgr_CCM_SystemUsage CiscoCallMgr_CCM_CpuHigh CiscoCallMgr_CCM_CpuHigh NT_TopCpuProcs CiscoCallMgr_CCM_SystemUsage CiscoCallMgr_CCM_SystemUsage CiscoCallMgr_CCM_MemHigh CiscoCallMgr_CCM_MemHigh NT_TopMemProcs NT_MemUtil NT_MemUtil NT_MemUtil NT_PagingHigh NT_LogicalDiskSpace CIM_DiskArrayFail CIM_FanSummary CIM_UPSBatteryLow CIM_UPSLineStatus CIM_ThermalStatus NT_NetworkBusy
90% 20%
Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes
Y Y
Y Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes n/a n/a 90% 70% Y Y
Y Every 5 Minutes Every 5 Minutes Every 10 Minutes Every 12 Hours Every 12 Hours Y Y
n/a / 80% Down Down
Every 15 Minutes Down Required AM KS Thresholds War/Crit Polling Interval Data Collection
Down Down Down Thresholds War/Crit n/a Down Trans to be Repl. Max # 10 Every 1 Hour Polling Interval Data Collection
20
White Paper
CCM Application Services (CallManager Services Health) CallManager Services Cisco Call Service (Cisco Call) Cisco DB Layer Monitor Service (Aupair) Cisco TFTP Service (Cisco Tftp) Cisco IP Voice Media Streaming App Service Cisco Message Interface service Cisco Telephony Call Dispatcher Service (Cisco Telephony Call Dispatcher) Cisco DC Directory Server Service Cisco SNMP Data Collector service Cisco Extension Mobility Logout (CiscoUserLogoutSvc) Cisco MOH Audio Translator service Cisco RIS Data Collector service (Cisco RIS Data Collector) Cisco CDR Insert Service (InsertCDR) CallManager Heartbeat CallManager Keep-Alive CallManager TFTP Status TFTP Requests TFTP Errors Not Found errors Request Aborted Overflow errors CallManager Role Status Primary/Secondary status of all CallManagers CCM_Publisher Primary Sub Secondary Sub CallManager Backup Status burBackup service (burBack) Successful backup CA Arcserve BrightSTOR Agent IIS Server Service Health IIS Service Status Domain Name Service DNS Connectivity Security Failed Logon Attempts
Required AM KS
Thresholds War/Crit
Polling Interval
Data Collection
CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck
Down Down Down n/a n/a Down Down Down n/a n/a Down Down
Every 1 minute Every 1 minute Every 1 minute
Y Y Y
CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck
Every 1 minute Every 1 minute Every 1 minute Every 1 minute
Y Y Y Y
CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_Heartbeat CiscoCallMgr_TFTPRequests CiscoCallMgr_TFTPRequests CiscoCallMgr_TFTPErrors CiscoCallMgr_TFTPErrors CiscoCallMgr_TFTPErrors CiscoCallMGR_CCM_RoleStatus
Every 1 minute Every 1 minute
Y Y
30% 30%
Every 30 Minutes Every 30 Minutes
Change in status Every 5 minutes Secondary--> Pri Pri --> Secondary Secondary--> Pri Down fails Down Once a day Once a day
CiscoCallMgr_CiscoBackupStatus CiscoCallMgr_CiscoBackupStatus
CiscoCallMgr_IIS_HealthCheck NT_DNSConnectivity NT_FailedLogons
Down Can be added
Every 1 minute Every 1 Hour Every 1 Hour Y
21
CallManager Functionality Call Information CallManager Calls Active CallManager Calls In Progress Calls Attempted/Calls Completed (Busy Hour Reporting) Call Quality Packet Loss Jitter Latency CallFailures IP Phone Functionality Loss of HW Phones New or Missing Phones Registered Hardware Phones Status of Critical Phones MGCP Gateway Registration New or Missing Gateways Gateway Registration Status MGCP Gateway Registration Gatekeeper Registration Status MGCP Call Activity FXO calls completed and outbound busy attempts FXO calls active and in-service FXO ports FXS calls completed and outbound busy attempts FXO calls active and in-service FXO ports PRI calls active and out-of-service channels PRI calls completed outbound busy attempts and DChannel Status T1 calls active and out-of-service channels T1 calls completed and outbound busy attempts H323 Call Activity H323 Calls Attempted H323 Calls In Progress Analog Port Activity Analog Ports Active Analog Ports out of Service Music On Hold Resources Status of active and available MOH resources MOH server connection status with CallManager Transcoder Resources Active and available transcoder resources Media Termination Points Active and available MTP resources
Required AM KS
Thresholds War/Crit n/a
Polling Interval
Data Collection Y Y Y Y Y Y Y
CiscoCallMgr_CallsActive CiscoCallMgr_Call_in_Progress CiscoCallMgr_CallActivity CiscoCallMgr_CallQuality CiscoCallMgr_CallQuality CiscoCallMgr_CallQuality CiscoCallMgr_CallFailures CiscoCallMgr_LossOfHardwarePhones CiscoCallMgr_Phone Check CiscoCallMgr_RegHardwarePhones CiscoCallMgr_DeviceStatus CiscoCallMgr_MGCP_GatewayCheck CiscoCallMgr_DeviceStatus CiscoCallMgr_DeviceStatus 3% 45ms 150 ms
Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 30 Minutes
10% 2000 Phones 3 Phones
Y Y Y
CiscoCallMgr_MGCP_FXO CiscoCallMgr_CCM_FXOPorts CiscoCallMgr_MGCP_FXS CiscoCallMgr_CCM_FXsPorts CiscoCallMgr_MGCP_PRI_ Channels CiscoCallMgr_MGCP_PRI CiscoCallMgr_MGCP_T1 CAS_ Channels CiscoCallMgr_MGCP_T1 CAS CiscoCallMgr_H323Calls Attempted CiscoCallMgr_H323CallsIn Progress CiscoCallMgr_AnalogPortsActive CiscoCallMgr_AnalogPortsOutOfService CiscoCallMgr_MOHDevice CiscoCallMgr_MOHServer_Lost Connections CiscoCallMgr_TranscoderResources CiscoCallMgr_MTP_Device
Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes
Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
22
White Paper
Cisco CallManager Operational Reports Registered Hardware Phones CallManager Service Availability SW & HW Inventory System Usage CPU Memory and Memory Leaks Virtual Memory (Memory Leaks) CallManager Call Information Calls In Progress Calls Active Busy Hour Calls Attempted Calls Completed across CallManagers Busy Hour Calls Attempted Calls Completed per CallManager CallQuality MGCP Call Information Total Active Channels per PRI/T1 Total Active Calls per MGCP Gateway CiscoCallMgr_Report_CallActivity CiscoCallMgr_Report_CallQuality ReportAM _AvgValByDay CiscoCallMgr_Report_ServicesAvailability ReportAM_Inventory
CiscoCallMgr_Report_SystemUsage CiscoCallMgr_Report_SystemUsage ReportAM _AvgValueByHr
ReportAM _AvgValueByHr CiscoCallMgr_Report_CallsByHour CiscoCallMgr_Report_CallsByHour
CiscoCallMgr_Report_MGCPChannelUsage CiscoCallMgr_Report_GatewayUsage
23

Best Practices For Monitoring Cisco Ip

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best Practices For Monitoring Cisco Ip

Uploaded by

Copyright:

Available Formats

Best Practices for Monitoring Cisco Systems IP Telephony Networks

Why Monitor IPT Components?

CallManager Server Health

CallManager Services Health

Other useful metrics to monitor include:

Layer 2 and 3 Switches

Compiling the Collected Data

Analyzing Call Activity

Appendix A: Supported Environments

Module: SIP Call Setup Video Quality

VoIP Quality (Call Performance)

Microsoft Windows 2000 SP2, or Windows NT 4.0 SP6a

Appendix B: Summary Guidelines

Every 5 Minutes Every 5 Minutes Every 5 Minutes Every 5 Minutes

n/a / 80% Down Down

CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck

Every 1 minute Every 1 minute Every 1 minute

CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck

Every 1 minute Every 1 minute Every 1 minute Every 1 minute

CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_HealthCheck CiscoCallMgr_CCM_Heartbeat CiscoCallMgr_TFTPRequests CiscoCallMgr_TFTPRequests CiscoCallMgr_TFTPErrors CiscoCallMgr_TFTPErrors CiscoCallMgr_TFTPErrors CiscoCallMGR_CCM_RoleStatus

Every 1 minute Every 1 minute

Every 30 Minutes Every 30 Minutes

CiscoCallMgr_IIS_HealthCheck NT_DNSConnectivity NT_FailedLogons

Down Can be added

Every 1 minute Every 1 Hour Every 1 Hour Y

Thresholds War/Crit n/a

10% 2000 Phones 3 Phones

CiscoCallMgr_Report_SystemUsage CiscoCallMgr_Report_SystemUsage ReportAM _AvgValueByHr

ReportAM _AvgValueByHr CiscoCallMgr_Report_CallsByHour CiscoCallMgr_Report_CallsByHour

You might also like