Professional Documents
Culture Documents
Module Development
Reference Manual
Supporting
PATROL version 3.x
October 2010
www.bmc.com
Contacting BMC Software
You can access the BMC Software website at http://www.bmc.com. From this website, you can obtain information
about the company, its products, corporate offices, special events, and career opportunities.
United States and Canada
Address BMC SOFTWARE INC Telephone 713 918 8800 or Fax 713 918 8000
2101 CITYWEST BLVD 800 841 2031
HOUSTON TX 77042-2827
USA
Outside United States and Canada
Telephone (01) 713 918 8800 Fax (01) 713 918 8000
Customer support
You can obtain technical support by using the BMC Software Customer Support website or by contacting Customer
Support by telephone or e-mail. To expedite your inquiry, see “Before contacting BMC.”
Support website
You can obtain technical support from BMC 24 hours a day, 7 days a week at http://www.bmc.com/support. From this
website, you can
■ read overviews about support services and programs that BMC offers
■ find the most current information about BMC products
■ search a database for issues similar to yours and possible solutions
■ order or download product documentation
■ download products and maintenance
■ report an issue or ask a question
■ subscribe to receive proactive e-mail alerts when new product notices are released
■ find worldwide BMC support center locations and contact information, including e-mail addresses, fax numbers, and
telephone numbers
3
License key and password information
If you have questions about your license key or password, contact BMC as follows:
■ (USA or Canada) Contact the Order Services Password Team at 800 841 2031, or send an e-mail message to
ContractsPasswordAdministration@bmc.com.
■ (Europe, the Middle East, and Africa) Fax your questions to EMEA Contracts Administration at +31 20 354 8702, or send
an e-mail message to password@bmc.com.
■ (Asia-Pacific) Contact your BMC sales representative or your local BMC office.
Contents 5
RUNQ Schedule Force Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Initial RUNQ Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Normal RUNQ Scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Scheduling a PSL process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Executing a PSL Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Scheduling an OS Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Executing OS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
One-Time Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Execute vs. System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Connection-Oriented Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Sharing a channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Global Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Nonpermanent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Permanent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Local Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Named Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Unnamed Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Where Variables are Stored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Browsing the Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Namespace and Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Context of a PSL Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Executing Inside and Outside the VM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 4 KM Design 61
What should a KM Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Better productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Get all info about application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Definition of a good KM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
General Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Non-intrusive Application Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Finding Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Extending Existing KMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Portable KM Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Portable agent functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Portability Issues at the Interface with the OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Portable Areas of PSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Non-Portable Areas of PSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Portability examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Carriage Return Characters (NT, VMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
process() function (all platforms). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Launching Child Processes (all platforms). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Child Process Error Handling (all platforms) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Preventing Command Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Child Process popen() Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Launching Daemon Processes (all platforms) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Options to Avoid in KM Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
KM Tracing and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Chapter 7 Instances 99
Managing instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Create function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Destroy function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Techniques for creating instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Classic create loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Classic destruction loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Optimizing the create process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Create return code checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
File check. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Process check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Create icon for class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Nested instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Main map instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Dummy application class instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Contents 7
Determining the parent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Limiting the number of instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Instance creation pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Limit number of nesting levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Characters you should not use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Creating invisible instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Instance filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Adding instance filtering by using filterlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Filtering suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Allow a configurable limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Instance pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Transient instance/history problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Contents 9
PSL binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Limitations of PSL libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Libraries must be present at compile time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Console does not distribute libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Prediscovery will fail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Glossary 227
Contents 11
12 Advanced PATROL Knowledge Module Development Reference Manual
Figures
Java Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
PATROL Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
PSL Code Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Scheduler Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
RUNQ Scheduler Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Schedule Optimal: Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Slide 3.10: Schedule Optimal: Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Slide 3.11: Schedule Optimal: Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Slide 3.12: Schedule Optimal: Example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Scheduling a PSL process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Executing a PSL Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Scheduling an OS process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Executing a system Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Executing an execute Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Execute the popen() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Executing the fopen() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Sharing a process channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Attributes in the PATROL namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Context of a PSL process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Execute Inside or Outside the VM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Profiler Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Level 1 - Remove Useless Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Level 1 - Reduce Jump Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Level 1 - Remove Redundant Quad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Level 1 - Pack Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Level 2 - String joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Level 2 - Fold and propagate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Level 2 - Remove Multiple and unused Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Level 2 - Reorder Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Level 3 - Remove Unreachable Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Level 3 - Reduce Block Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Prediscovery and discovery process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Discovery Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Checking for file updates in discovery tabsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Unique features of discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Discovery pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Instance creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Instance destruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Nested Instances without Icon for Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Example for create Icon for class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Figures 13
Nested Instances creation (icon for class) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Different Parameter Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Text Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Graph Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
State Boolean Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Stoplight (pre v3.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Stoplight (post v3.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Alarm Range settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Range Overlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
How the Agent loads libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Creating PSL libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Nested PSL libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Modify Binary Config File Phase I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Modify Binary Config File Phase II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Modify Binary Config File Phase III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Tables 15
16 Advanced PATROL Knowledge Module Development Reference Manual
Chapter
1
1 Introduction to KM Development
This chapter covers basic information on developing a PATROL Knowledge Module
(KM). This chapter contains the following sections:
When you install and run an agent out-of-the-box, it will wait for you to tell it or
configure it to do something. To provide the PATROL Agent with intelligence, you
will write knowledge modules (KMs). Then, when the PATROL administrator loads
the KM, the agent will start executing the code written in the KM.
This book provides developers with information on what they can do with an agent,
and the agent tasks performed by an operator or administrator. Although
information on how to install, start, or configure the agent, how to load KMs, or how
the PATROL user interface works, some information on these area is presented to
help you get started.
All the agent tuning parameters are stored in the PATROL configuration database,
pconfig, which developers have access to. However, it is important that your KM
does not modify variables that influence the agents operation. For example, a KM
should never automatically modify the /AgentSetup/preloadedKMs value to enforce
persistency or modify the agent's tuning variable. Changing these values should be
handled by the PATROL administrator. These configuration settings influence more
than just the operation of your KM and can change the agent's impact on the system.
If you write a KM to monitor a specific application (for example, a Web server), are
you aware that some users might actually run 100 instances of this application? In
addition, these users may not want all the detailed information that you make
available for this single Web server. Maybe they want the ability to toggle between
overview monitoring and detailed monitoring modes.
It is important to think beyond your own environment since a customer will consider
their own system to be an average system. Also, consider what you offer concerning
optimization. A KM that can potentially be installed on a large amount of systems
should be tuned as much as possible. For example, if you can reduce 1 percent of CPU
from overall CPU consumption of your KM, this will make a considerable difference
to someone running the KM on 1000 systems.
metrics is compiled, the developer will verify if these metrics make sense. The proper
way to do this is by asking someone close to the application which parameters one
could be interested in. This becomes the basic functionality for the KM. Often the
developer will present the list of parameters and ask which ones are really not
needed. This approach is the same as walking with a kid in a candy store and then
asking the child for the items the kid wouldn't like to have. You will probably end up
buying a lot more than if you would have asked the question what candy you should
bring when you go to the candy store.
It even gets worse if the developer hasn't properly researched the application and just
starts writing code from the user or API documentation. The amount of data
presented in the KM will be huge, but the value or information probably minimal. It
is the task of the developer to be as creative as possible to turn this data into
information without over using code cycles, CPU cycles or memory.
Will your KM spawn a response functions to warn someone in a certain case (let's say
application is switched to maintenance mode)? How would you feel if you see behind
a terminal and suddenly all your 1000 computers go into alarm because your
application was in that mode? While you are going through all the potential failure
situations, also check for the all-OK case. Do you have any instances that no one will
ever look at or need reporting data for while everything is going well?
For example, you are writing a KM to monitor hardware and one of the values
indicates the “internal temperature”. When you write your KM, you set the value of
the parameter to 70 C. The next thing you should do is look in the manual for
“operating temperatures” and set the maximum and possibly minimum values. If
you cannot find this in the manual, then you should call support for that application
to find out what the nominal values are. Remember, if you do not do it, the customer
will eventually have to do it and they will probably not value the KM very high.
After you define these alarm ranges, you should check if there are recovery actions
you can take. Most people will require a toggle on any recovery actions you provide
(default=off, for demos you want all bells and whistles enabled). An example
recovery action would be to turn the speed of the hardware momentarily down, so it
can cool a bit. You should never forget that you have to be able to recovery from a
recovery action and return to normal operation after the temperature has dropped.
A customer is usually not interested in if you can do it; but is more interested in how
well you can do it.
2
2 PSL Language and Design
This chapter provides an overview of the PSL language by going over design
decisions. You are expected to know most of the PSL functions that are available. For
information not covered in this chapter, see the PSL Reference Guide. This chapter
contains the following sections:
Why PSL? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Escape Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Special /clearText Character Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Standalone interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Endless loop detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Storing PSL in KM or Separate File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
An Approach to KM Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Phase 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Phase 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Phase 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Why PSL?
PSL was created with the following items as top priorities:
■ Easy to learn.
■ Need for access to shared data. From within one script, one should be able to
■ The interpreter should support “multiple” scripts running at the same time,
■ A badly written script should not bring down all the other scripts.
Why did BMC Software develop PSL rather than a more common language such as
per, C, or java? Because none of these languages address the previously mentioned
“priorities”.
Datatypes
To make it easier to learn the language, it was decided to stay away from explicit data
types and the string datatype was chosen as the datatype. This has some implications,
because some functions or operators require or work with integer or oat values. For
these functions, the input strings will be converted to the needed datatype for the
operation and after the operation completes, stored again as a string.
The function or operator will decide how the input should be cast. For example,
bitwise instructions (such as shift left), need two integer values and the bitwise
operator will take care of the conversion. An operation like +, -, x, / will work with
double precision (like almost all the other arithmetic operations).
Comparisons
Since all this happens under the hood, you as a developer should normally not be
aware that the internal datatype is actually a string and the PSL language will usually
work the way you expect it to work. However the moment you should be aware how
everything is stored is when you want to compare to variables.
PSL will try to decide which type of comparison is necessary when you type:
if (a == b) f...g
if (! a) f a = "EMPTY"; g
If you really want to test for a non-empty string then this code might not do what you
expect it to do. If a="00000000"; then this is considered a valid number and therefore
the test if (!0) f a = "EMPTY"; g will succeed although the string is not empty. To force
non-empty string comparison, one should do the following:
if (a == "") f a = "EMPTY"; g
Which is a true empty string comparison. If you want to compare a certain null-pre
xed number as a string you can do so by insuring the number is evaluated as a string
like this:
Escape Characters
Inside a string you can specify some escape characters by preceding a regular
character with a 'n'. These escaped characters will be recognized at compile/parse
time of your PSL script, that means, even before the string is stored internally.
The control characters can be very useful if you want to serialize data stored in a PSL
variable (for example before storing them in a configuration variable). For example:
When you read the variables, you could easily undo the serialization as a work
around of the comma delimiter which is hardcoded in pconfig lists.
Sequences to change the color, appearance, or any other VT100 terminal sequences
will not be recognized.When you do a:
print(get("/clearText"));
you will print the ANSI terminal clearscreen escape sequence and this is recognized
by the text output window.
%PSL printf("%c[H%c[2J",27,27);
Variables
Before a variable is used in PSL, the interpreter will make sure the variable is
initialized before it is used. All variables are automatically initialized to the empty
string "". Although the PSL manual sometimes mentions that a certain function will
return NULL, this is actually an impossible return value in PATROL. A variable can
never contain NULL. The best PSL can do is an empty string.
Some people wonder if there is a limitation to the datasize one can store in a variable.
This size is limited to the memory the interpreter can get from the OS. Memory
allocation for variables is dynamic and will automatically be released upon
termination of the process.
Since the storage of variables is a string, you will not be able to store 0 (null-byte
characters). The interpreter will see a null-byte as a string terminator. This is
something you have to be aware of when you are reading binary data from a file that
can contain null-bytes.
For every PSL process, 3 variables will be automatically de ned. These are exit status,
errno and PslDebug. We will explain these in later chapters.
Functions
When calling functions, you will pass a copy of the value to the function (also known
as call by value). There is only one exception to this, the PslExecute(); function will
actually modify the arguments of the function. A function that returns data will
always return a copy of the data.
For each function one can define a set of local variables. The number of local variables
is limited to 20 per function.
If a main() function is de ned then this main function is considered the starting point
of execution. That means that all loose code will be ignored. If your loose code
contains initializations then they will not be executed. This usually results in a lot of
confusion and misunderstanding.
A good coding style is to always have a main() function for every decent size PSL
script. Don't forget to add your initialization routine to the main function in that case.
Standalone interpreter
Some people don't realize that the PSL language can be used outside of the PATROL
agent as well.Whenever you install an agent, you will see a standalone psl compiler
and interpreter. This interpreter has a couple of limitations you should be aware of:
■ pslInstructionMax
■ pslInstructionPeriod
NOTE
These variables and the endless loop detection not applicable for PATROL Agent versions
3.8.00 and later.
The pslInstructionPeriod is a global timer for the agent. That means every
pslInstructionPeriod seconds, all instruction counters for all PSL processes that
haven't reached pslInstructionMax yet, will be reset to zero. However, if your process
has reached pslInstructionMax already, it will not be reset, and the instruction
counter will just continue to increase for that PSL process.
When a process will be detuned, the agent will calculate the length of the delay in
function of the total number of instructions, compared to the setting of Instruction
Max.
Once a process gets on the “blacklist”, it will not be removed again, so you should
make sure you don't end up on that list. In case you have seen that one of your PSL
process executes more than allowed on a typical agent, you have to find out if the
logic of your process can't be changed, so it won't execute so much PSL functions.
Before your program starts executing, the agent will not know if it contains an
endless loop, but the agent will determine this at runtime, because the instruction
count was to high over the period of time. An instruction is any QUAD instruction.
For example (ntharg(), cat(), ...) but also calls to user de ned functions. Actually it is
counting the number of virtual machine instructions, “executed” by the VM
instruction set processor. (More about VM's later)
To find out how many instructions are executed for a certain psl function, you can use
the stand-alone psl compiler/interpreter:
The difference is in the KM file. Normally the KM will either contain the PSL script
itself or a pointer to where the PSL script can be found. If the PSL is part of the KM,
the COMMAND TEXT attribute for a parameter, menu command, infobox, or (pre)
discovery in a KM file would look something like this:
BASE_COMMAND = {
{COMPUTER_TYPE = "ALL_COMPUTERS", COMMAND_TYPE = "PSL", COMMAND_TEXT
= 852329761
"phone = get(\"/patrolHome\");\ hname = get(\"/hostname\");\
portno = get(\"/udpPort\");\
mach = get(\"/appType\");\
[snip]
set(\"value\",int(siz));"}
BASE_COMMAND = {
{COMPUTER_TYPE = "ALL_COMPUTERS", COMMAND_TYPE = "PSL", COMMAND_TEXT
= LOAD
"usr_proc_collector.psl"}
},
When you want to write a patch for a certain KM, and the PSL is not inside the KM
file, you can just replace the existing psl file with a newer one. Any other PSL files
that haven't changed would not be affected. Also, the file sizes for each of the files
will be very small and in some cases that's a good thing when doing small
upgrades/changes. But the customer would need all files to be there before a KM
would be functional, more files can mean more administration on his side... or more
work if he wants to repackage your KM.
Another benefit of having it in a separate psl file, is if you are using CVS or similar
source control mechanism. The granularity for checking in files will be smaller and
it's easier to track changes on specific files. Besides that, when PSL is stored in an
external file, you can just open it with a text editor and copy/paste or even read the
code without having to deal with escaped quotes, newlines and such.
If all PSL is part of a KM, and you want to bring out a newer version of your KM, you
only have to ship one file, no matter how much you changed. However, by shipping a
new KM file, you might overwrite changes made to any other script inside the KM. I
mean if the KM contains all PSL and if someone wrote additional parameters in the
KM, or changed some code in one of the parameters, you will automatically
overwrite that if you replace the old KM with a new KM file. The benefit of storing
everything inside the KM is that you don't have to determine dependencies and find
out which files have to be shipped as well. Only the KM file will do.
An Approach to KM Development
Although there are many other ways to approach KM development, the section will
■ Phase 1
■ Discovery
■ Menu Commands
■ Phase 2
■ InfoBox
■ Recovery Actions
■ Porting
■ Phase 3
■ Command Types
■ Channels
■ Libraries
Phase 1
The first step will produce a self sufficient KM and cover the primary aspects of KM
development. To create a minimal, though functional KM you should develop the
following:
■ Discovery - Develop the best way to find the components of the application and
create an icon to represent it.
■ Parameters - Measure specific values about the application, such as its performance,
capacity, and load metrics.
■ Alarm Ranges - Define numeric ranges that specify when a parameter will go into
alarm because the parameter value is in a dangerous condition.
■ Menu Commands - Develop a minimal set of administrative actions that the user
can perform from the console in an ad-hoc manner.
Phase 2
The second step, phase 2, will add polish to the KM and provide a commercial or
production-level KM.
■ InfoBox commands - Define what you should display in the popup report
available on the console.
■ Platform support - Make sure you check for simple requirements, such as whether
the KM is running on the correct platform agent. Take the proper action in case a
KM is not supported on the platform.
■ Recovery actions - Try to develop actions to correct problems that have been
found. Verify if you want to have these recovery actions enabled out of the box.
Usually, the KM developer will provide true diagnostic reports and offer options
to run corrective actions automatically or operator driven.
Phase 3
The third and optional phase covers the harder to understand yet more powerful
features a KM may have.
■ Command Types - Try to avoid the usage of system() commands. Use command
types instead.
■ Libraries - Maybe you would like to eliminate duplicate code or group common
user PSL functions into one file to be shared by all PSL scripts.
■ Online help - Should be provided for all KMs that will be resold
3
3 PATROL Virtual Machine
The PATROL Agent is a real virtual machine. Although the PATROL Agent is only a
single process on the host operating system, the agent has its own scheduler to
schedule and execute multiple processes. Each process in the agent has private
memory, and has access to the VM's shared memory. This chapter discusses the
following topics:
Unlike some virtual machines (like Java), the PATROL VM has full access to the host
operating system, with the possibility to communicate to the host OS under different
credentials. Instead of requiring the developer to write and compile executables for
each of the different OS's the PATROL VM technology allows you to write a single
script that will be compiled and interpreted in a platform independent fashion. This
approach makes it easy to run your scripts on any platform on which the VM runs.
One of the most well know virtual machines is the Java virtual machine1. As shown in
Figure 1, the Java source is compiled into byte code.
In the case of a Java applet, this byte code (which contains the list Java VM
instructions) is then transported over the Web and executed by the Java VM (which is
running in your web browser).
In the case of a Java application, your Java run-time environment will execute this
byte code.
1.Actually, the PATROL VM existed long before the Java VM was created.
Advantages of a VM
An enormous advantage of this design is that you only need to know one OS if you
want to write virtual machine programs. If you understand how the VM works, you
know how the agent will behave on any host OS. A KM is just a set of programs that
the PATROL agent needs to run. (This is a simplified description since a KM contains
more than just commands to execute. However, the most important part is the code).
Most virtual machines don’t allow the VM to execute host commands to make sure
the machine acts fully self-contained. Since PATROL was designed to gather
information of the OS where it is running on, communicating with the host OS is one
of its primary tasks. Actually the PATROL VM is optimized for executing host
commands and offers features like being able to spawn processes with any
credentials or opening communication channels towards OS processes so the OS
process can continue to run and has better responsiveness and less performance
impact on the OS.
If you know PSL, you can write programs that will run in every OS where the VM
runs on without being a master of the OS you execute on. Your skill should be limited
to knowing the VM itself. Of course, when executing OS commands, you will make
your code system dependent.
The PATROL Virtual Machine is also known as the PATROL Agent. The agent acts
like it is multi-tasking and can run various PSL processes at the same time. Also, the
agent has a very lightweight mechanism of swapping PSL processes in and out.
The code that the PSL compiler generates is the instruction code for the PATROL VM
(quad code). The PATROL VM will behave as a Virtual Instruction Set Processor and
quad code is the language it understands. After compilation of the PSL source code,
you only have VM instructions left. These instructions are a “smart” translation of
your source code into a language that the VM can understand.
Before the VM will execute the code, it should be optimized first (by the optimizer).
The optimizer works on compiled code and tries to turn it into even better code. The
optimizer takes a couple of very complex steps where a smart, but lightweight
interpretation takes place. Because the optimizer knows the exact capabilities of the
VM execution engine, it can reformat/realign your code without changing the logic
while increasing the effectiveness of your code. For more information about
optimization, see Chapter 5, “Optimizing PSL.”
After optimization, the compiled PSL code can be saved into a PSL library which is
then ready to be used by a PATROL agent without need of further compilation or
optimization.
When the agent receives PSL source code that has to be executed more than once2, the
agent will store the compiled PSL in its PSL cache. If you even want to skip that first
compilation step, you can pre-compile a parameter script to a .bin file, or you can
save the mostly used code in a library. To compile your PSL scripts into bin files, you
will have to use the standalone PSL interpreter.
To view these quad instructions, type psl -q to dump the quad code table.
The execution engine (also called instruction set processor) is responsible for
executing the quad code instructions. Some functions that do not require host OS
access can be handled completely within the interpreter. For example, a command
like a += 5; is executed inside the VM.
To understand how the PATROL virtual machine works, the following sections
explain the process for scheduling of tasks internally.
Main Loop
Figure 4 displays the scheduler components and there place within the main loop, the
central wait state for the agent.
Everything the agent does gets initiated from the main loop. Within this loop, the
agent will perform the tasks listed in Table 1 on page 37.
Main Run-Queue
The central data structure for the PATROL agent's scheduling of jobs is the main run-
queue. This is an internal queue data structure that has all the executable commands,
including discovery and pre-discovery, parameters, process cache cycle, menu
commands etc. Every PATROL process that has to be scheduled by the Run-queue
scheduler has an rtcell that contains the runtime information of the process. The main
runqueue has jobs that still have to be started, not jobs that are currently running.
The run-queue is ordered by the time that a command should run. This time is altered
only by actions such as Update Parameter otherwise the command will run at the
time scheduled.
Commands that should only execute once will not be rescheduled. Discovery and
process cache has de ned reschedule periods (40 sec & 300 sec respectively). These
intervals can be changed my modifying the agent's configuration.
A command that has to be executed immediately (i.e. menu command) gets placed at
the front of the queue (see Figure 10 on page 44).
The purpose of these attributes is explained in the following sections. However, the
constraints and defaults for these values include the following:
■ Schedule from end and from previous are mutually exclusive and at least one
should be specified.
■ If both of them are specified, from previous is used, if none of them are specified,
from end is used.
■ Force delta and optimal are optional and mutually exclusive. If both are specified,
force delta will be used.
For each of those possible execution times, the agent will calculate the “time between
executions” and the “standard deviation” for all of the already scheduled processes.
This will return two values per possible execution time. The agent will then pick the
execution time with the greatest time between executions as the new best-scheduled
time for this parameter.
The standard deviation is used whenever we have 2 schedule times with the same
time between executions. The new execution time with then be the one with the
smallest standard deviation. A small standard deviation is an indication of even
(optimal) spreading. In case we would have chosen the execution time with the
greatest standard deviation, then this would not indicate optimal spreading but be an
indication of a lot of executions happening close to that time and one or a couple of
executions would happen in a relatively long time from the new execution time.
for all possible t’s and any other scheduled process, where t is incremented by
runqDeltaIncrement until (t + runqMaxDelta) is reached
4. If two averages are the same, take the one with the smallest standard deviation (for
a better spread)
The delta increment is 2 seconds and the Max Delta is 8 seconds (this is just an
example, max Delta should normally be more than 10 seconds). If the process we
want to schedule has an interval of 4 seconds, it would be best to schedule it at t=2 (or
t=6)
NOTE
It is unlikely that a parameter will be executed on schedule. Also, the agent will try to prevent
multiple parameters from executing at the same time. Developers cannot change this behavior
and therefore should not expect that two parameters with the same schedule will execute
together.
This process is very simple to run. It searches until the interval between previous
process and next process is at least runqDelta.
This proposed schedule time depends on the setting of the policy. If you schedule
from end, you will add interval to the current time after the parameter finishes to
execute. (Interval will be forced)
If you schedule from previous, you will take the previous exectime and add interval
to it.
After that the scheduling can be optimized, depending on the setting of the policy.
This is an optional step
When a PSL process is scheduled from the main run-queue and executed, it is placed
on the PSL run-queue. The agent runs multiple PSL processes through its interpreter,
one at a time.
The run-queue scheduler is responsible for moving the processes from the RUNQ
(the waiting list) on the rtlist that is the list of running processes. Each executing PSL
process is allotted a time slice with a xed set of low-level compiled instructions
(quads).
The PSL interpreter runs the PSL process until its timeslice expires, or it is blocked
e.g., system(), execute(), popen() and some forms of read() and write(). Built-in
variable /pslTimeslice can be set to control the number of quad instructions per
timeslice.
When a PSL process is blocked it will not appear on either the main run-queue or the
PSL run-queue. PSL processes that are blocked on functions such as locking or shared
channels, will not stop the agent from executing other PSL processes.
In PATROL you can see these processes by executing %DUMP RUNQ and %DUMP
RTLIST from the system output window.
If the process that is about to be scheduled is a PSL process, it will be passed on the
PSL process scheduler. This includes compiling the PSL process if wasn't compiled
before.
The PSL process scheduler is responsible for swapping in and swapping out PSL
processes.
Whenever a PSL processes has to be executed, it will be put on the PSL runq (at the
end). The PSL process scheduler will always take the first process of this runq and
then execute until:
If the process terminates or completes, the process is passed back to the main run-
queue3
If a time slice is fully used, then the process is put back to the end of the PSL run-
queue.
If the process has to wait for IO, it will be passed to the IO handler. The process sit in
the “IO wait bucket”.Whenever IO completes, the PSL process will be moved back to
the end of the PSL run-queue.
Whenever the process executes a sleep(), a timer will be created. If the timer
completes, the process will be moved at the end of the PSL run-queue. The lock()
function is similar. In this case the agent will not be woken up because a timer
expired, but because someone unlocked the resource the lock() is waiting for.
Scheduling an OS Process
Figure 12 shows the flow for executing a PSL process.
If the process that is scheduled is an OS process, it will be executed on the OS and the
process information will be passed to the IO handler routine.
Whenever data is returned the IO handler will take the appropriate action to return
the data to the caller.
Executing OS Commands
PATROL deals with OS commands because they are used frequently in your KMs to
get data from your application. One way to get data out of your application is
through a subprocess execution, and it's important to understand how this works.
One-Time Commands
One way to retrieve data from your application is to start a process that returns the
data each time it is launched. This way is useful for commands that do not work in a
conversational way.
■ execute(CMDTYPE,"cmd");
■ spawns CMDTYPE with arguments “cmd”
■ system("cmd")
■ spawns the OS command interpreter with execution of “cmd”
execute("OS",...) = system();
The one you should use will depend on if a shell is required to execute and if pre- and
post-command processing is desired.
system() command
The system() command will unconditionally spawn an OS command interpreter and
then pass the arguments to that interpreter.
system("myprogram.sh");
This command spawns the OS command interpreter and passes the argument
“myprogram.sh” to the interpreter.When you trace this execution on the OS, you will
see that this command will cause the following to execute on a Unix system:
/ bin/sh myprogram.sh
For this example, the system() function is the correct one to use since “myprog.sh”
will contain shell commands and therefore needs a shell interpreter to run.
However, if you had typed system("ls"); then the shell interpreter is useless and will
just be an extra process that could have been avoided, because the “ls” command
does not need a command interpreter to run (like any executable).
Figure 13 shows the (useless) shell process that is created to execute the ls command.
execute() command
To avoid the extra creation of a shell process, you could use the execute() command.
Figure 14 on page 49 depicts how the execute command will directly spawn the
desired process without implicitly calling the OS interpreter.
As a result, execute() will use less OS resources. For this reason, execute() is the
preferred method for spawning processes in PSL.
For execute() to work, you will have to define a command type. For more information
about command types, see Chapter 13, “Command Types.”In addition, there are
extra features that will make execute() more attractive. Limitations of execute()
include:
■ Batch files - To execute batch files, you must initiate a call to an OS command
interpreter. To launch batch files, if you have no need for extra features that
execute offers, use the system() command.
Connection-Oriented Functions
Another way to get data from an application (and using sub-process execution) is to
use the connection-oriented method (also known as a channel), which allows you to
communicate to the executable.
These are functions where a continuous connection between the VM and an external
process or file is maintained. The following are descriptions of the PSL functions that
allow you to establish channels.
popen(CMDTYPE,"cmd") function
When popen() is called, the PSL interpreter will create a communication channel
between your PSL process and the external process. This communication channel is
returned to you. This channel is private for a PSL process. Figure 15 shows what
happens when you execute the popen() function.
When the channel is established, you can use functions to talk to the executable
through the channel or listen to your executable with write() or read(). When you
want to finish the communication you can use the close() function to stop the channel.
The same function can also be used to do proper cleanup and stop the spawned child
process.
This possibility can greatly improve the performance of the KM you are writing,
especially when the start-up of the child process can require a lot of CPU.
fopen("filename") function
Just like subprocess execution, you can use a similar function called fopen() to access
files. This function will return a channel that is private to the PSL process that called
this function. Figure 16 on page 51 shows what happens when you execute the
fpopen() function.
With this function, you can use the following functions to read, write, reposition, tell
current position, and close the channel to the file:
■ read()
■ write()
■ fseek()
■ ftell()
■ close()
Sharing a channel
By sharing a channel to a process or a file, you allow other PSL processes to use it. The
channel will now be named and known in the namespace.
If you want to access the same executable or file from multiple PSL processes, it is a
good habit to let them all share the channel instead of launching multiple OS
commands and creating multiple channels.
After creation of a channel, the channel is open for everyone to use. To prevent
multiple PSL processes accessing it at the same time, you might consider writing the
channel access functions in a library that will do all necessary locking to force
command serialization over the channel.
Concurrency
Since only one PSL process can be in the running at any given time, there are no low-
level concurrency problems. Any PSL process can be interrupted after execution of
any statement (this can also happen during execution of blocking functions that
require IO).
Two or more PSL processes that logically share data structures (for example, a
variable in the namespace) can be interrupted when halfway through a transaction,
leaving the shared data structure in an undesired state.
If this is the case, you should identify these transactions and use lock() to make sure
that only a single PSL process can access the shared resource at any given time.
This doesn't only apply to namespace variables, but also to channels. The following
are some examples for solving concurrency through locking:
■ Chan=(p/f)open(....,....);
local for the PSL process
■ Share(chan,"share_name");
channel named in namespace and made accessible by all other PSL processes
■ (un)lock("share_name",...);
prevents simultaneous usage of channel
Global Variables
Global variables are accessible from within any PSL process and have the following
variables associated with them:
Global variables are typically addressed with their PATH definition in the PATROL
agent's object store (also known as heap). Below is a list of global variable examples:
■ /ipAddress
■ /MYAPP/myvar
■ /AgentSetup/defaultAccount
■ /ORACLE/instances
The following sections describe two types of global variables: nonpermanent and
permanent.
Nonpermanent Variables
Nonpermanent variables are lost after an agent restarts and use fewer resources than
permanent variables. They will remain resident in the agent's memory as long as the
agent keeps running. A KM developer can freely assign any name for a global
variable using the get() and set() function. The unset() function will remove the
variable from the agent's memory. These variables are read-write.
set("/<MYAPP>/<myinst>/parameterdebug","TRUE"),
And every parameter can check whether debug was turned on by using
get("../parameterdebug");
Permanent Variables
Permanent variables include configuration and system variables and require extra
resources when they are executed.
Configuration Variables
Configuration variables are stored inside the configuration database of the agent.
They remain resident during the life of the agent even after an agent restart because
the agent reloads its configuration database when it is started.
You normally access these variables by using the pconfig() function call. This call
always goes to the configuration database and costs operations more than
nonpermanent variables do. Even a read operation will go to the configuration
database so that the configuration database can be updated from outside the
PATROL agent (using the OS pconfig command). These variables are read-write.
NOTE
You might have noticed that a normal get() operation will work on these kinds of variables
also. It will have the same cost as using the pconfig() call.
System Variables
System variables are any variables that are executed on the host OS on demand, for
example: /ipAddress.
Local Variables
Local variables are private for each PSL process and are initialized at the start of the
process. The types of local variables are
Named Variables
In a named variable, the PATROL agent creates a variable author and assigns it the
value John Doe. The agent then executes the function nthargf(author,1) and assigns
the result to a second variable named firstname. The result is that two named local
variables are created. Below is an example:
author="John Doe";
firstname=nthargf(author,1);
Unnamed Variables
In an unnamed variable, the PATROL agent automatically assigns a name because
the programmer does not provide a name. Below is an example:
In this example, because a name was not provided for the subfunction cat(error file),
PATROL assigns it a name. The former example expands to the following:
unnamed var=cat(myfile);
firstline =nthline(unnamed var,1);
This means that the output of every command is stored inside a variable (named or
unnamed).
■ Name your local variables; otherwise, PATROL will do it for you, and you will not
be able to use them in your script later.
Each of these objects have a set of attributes. Some of them are shown on the slide
below. Besides the builtin attributes, the namespace also contains configuration
variables and user de ned global variables. The local variables are local to a PSL
process and not accessible in the namespace. Figure 18 shows the attributes in the
PATROL namespace.
If you do not give the output of the statement a name, PATROL will create an
unnamed variable for it. The contents of the file will be stored in memory.
PSL does not offer some sort of internal and magical IPC4. This is a concept usually
known by “shell developers” since IPC is a common way to pipe data from one
process into the other. Command in PSL are not evaluated as separate processes (like
“shell” works) and therefore there is no need to pass data in such fashion.
4.interprocess communication; this is the same as communication over pipes (stdout of process 1 = stdin of pro-
cess 2)
4
4 KM Design
This chapter discusses the following topics:
What should a KM Do
The following sections discuss items that help make a productive KM.
Chapter 4 KM Design 61
Better productivity
Better productivity
The goal of a KM is to present information instead of just data, in a manner that the
customer can understand and use and in a way that they are comfortable with to
interpret the information.
Talk to the DBA, to the end users, and developers and always talk to the support
organization to get management and administrative features that are the most
important. Support organizations are always bombarded with customer requests and
“How do I”-question.
Ask them what are the main problems are, what most people call support for?
If support says 30% of the calls are because of configuration issues, you might want to
create a parameter that checks the configuration of the application (you might even
check the configuration of the KM, because the KM will probably require some
support as well).
Sometimes a simple check can significantly decease the support calls. It might be even
more basic/simple to do then what you are thinking about.
Definition of a good KM
So how do you define a good KM versus a bad one? Developers might think they
wrote the best KM possible, because it lists every metric one can possibly capture
from the application. This is definitely not the definition of a good KM.
You might even be on a bad track if your KM just displays a ton of information or
even worse, a ton of raw data. If the intelligence is limited to displaying every
possible value you can retrieve then you should consider the KM to be quite dumb,
because the added value of the KM will be limited.
Some KM's leave the intelligence up to the end-user and requires so much user
interaction in the form of response() functions that it would compromise the KM's
usability and scalability.
Performance
The measurement of the performance of an application is valuable both as an
indicator of availability problems and as a tool for tuning the application, and thereby
improving the operational productivity. Measuring the real performance of the
application with the KM can take the guesswork out of tuning the overall
performance of the application.
Correction
When the KM detects a problem, can it fix it? Your KM can fix it via what PATROL
calls a “recovery action.” It can also prompt the operator before performing the
action. Alternatively, you might just want notification via pager, e-mail, or sending a
trap or ticket to your help desk system.
Administration
Availability and performance metrics can be categorized as “monitoring” metrics.
The other side of that coin is “management” or “administration” of the application.
This refers to the user taking interactive action on the application.
Chapter 4 KM Design 63
General Guidelines
Should your KM design be of the administrative type, you may want to include some
of these traditional administrative functions:
Capacity planning
Capacity planning is also called “the gentle art of convincing your boss you need
more.” Although not as critical for short-term operational stability, another important
issue for long-term productivity is the measurement of the overall performance of the
system. This involves metrics such as performance time and resource usage to
determine when increased usage trends will require new hardware or other big-ticket
items.
General Guidelines
The following are some general guidelines for KM development:
■ Intrusiveness
■ Enhanced the applications functionality
■ Try to be non-intrusive
■ SNMP
There are numerous sources of data from which to collect management information
in a non-intrusive fashion. Listed below are some of the most commonly used
interfaces:
■ Scripts - If you already have scripts written in shell, Perl, SQL, C, or any other
language, you can reuse them.
■ Processes - is a system process up, zombied, or down? How much CPU, memory
has it used?
■ Files - The presence or absence of a file can have significance. The size of a database
file can be a parameter. Archiving or removing a file can be a useful administration
command.
■ Log Files - plain-text error or history files can yield alert situations or provide
historical perspective.
■ SNMP MIB(s) - Get the data from the management information base (MIB) for the
application. There are PSL functions specifically for accessing values via SNMP.
■ Ports - Some applications that initiate network connections will be visible in netstat
output.
Chapter 4 KM Design 65
Extending Existing KMs
■ API's - If the application offers a published API, such as C or C++, you can access
the application by this method. Typically, you would use the API to build an
executable that is launched by the KM as a sub-process.
Data available from other KMs can take a few forms. First, the discovery phase might
benefit from the already discovered icons for the database. Rather than discover
Oracle servers yourself, you can use PSL within the agent to get the ORACLE KM
instance list through a PSL get on the internal instances variable such as:
The values of measured parameters from other KMs are easily available within PSL in
the agent via the PSL get() function. These values could be included in arithmetic to
get a more application-specific performance metric or some other measurement.
If you use existing KMs, beware that the KM may change with new releases or that
the data returned may change.
Also you have to degrade gracefully. If the data isn't there you need to run without it
or fail discovery and tell the user that you are dependant on the other KM.
Portable KM Design
An important advantage of a KM as a form of application monitoring and
management is the ease with which the KM can be ported to a new hardware or
software platform.
This advantage is achieved through the portable infrastructure layer created by the
agent, KM, and PSL features of PATROL. However, PATROL cannot fully insulate
the KM developer from the idiosyncrasies of the various platforms. Although all base
PATROL functionality is portable, the boundary between PATROL and the outside
world is the point at which PATROL loses control; and as a KM developer, you will
need to know what to deal with. Two main areas can affect a KM:
■ Operating system: Differences between the external world of Windows NT, Unix,
Linux, VMS, . . .
Chapter 4 KM Design 67
Non-Portable Areas of PSL
■ Control Flow - Basic statements such as if, else, while, for each, and switch
■ String functions - includes nthline and other string functions, list/set functions,
and sorting
■ Agent symbol table - variables controlled via PSL get, set, and other PSL functions
■ File access functions: file, cat, fopen, fseek, ftell, read, readln, write, and close
■ Child process launching: system, execute, popen, read, readln, write, close, and the
“exit status” special variable
These items are not considered non-portable, because the functions will behave
exactly the same on all platforms, but sometimes the arguments you have to supply
to these functions to make them work could be significantly different.
Portability examples
The following sections provide examples of portability.
Windows NT, and VMS all use a pair of new line ("\n") and carriage returns ("\r")
characters to separate lines in a text file. This issue poses problems in PSL in a number
of areas related to text data:
■ PSL read() or readln() on a popen() channel returning text data output by a child
process
The effect is that the string variables containing text data from these sources will still
contain the "\r" characters. This is problematic because the PSL string operation
functions do not treat the "\r" characters as special and will only consider the "\n" as
the separator.
For example, if you use the PSL function ntharg() to return the first or second line, the
returned line will still contain a "\r" character. This character is also difficult to see
when debugging the script with debugging output statements.
There are no PSL functions that handle "\r" characters as separators for lines. Some of
the places that this poses a problem when processing these strings are as follows:
■ Line and word processing: ntharg(), nthline(), nthargf(), nthlinef(), PSL foreach()
statement
The solution for this problem is to trim the carriage return character immediately
after the input was read.
trim(xx, "\r");
Chapter 4 KM Design 69
Carriage Return Characters (NT, VMS)
The solution to the "\r" problem is a pragmatic, if not particularly elegant, solution.
The PSL trim() function can be used to remove these characters before processing the
string data. You have to remember to remove these characters every place where the
KM gathers external text data. However, it is only one line in each such place. An
example involving PSL cat looks similar to the following:
One possible method of handling this problem is to wrap all calls to PSL cat().
Wrapping requires calling another wrapper function instead of PSL cat(). each time,
as shown in the example below:
# reinstate errno
errno = save errno;
return(text);
}
Not all places have this problem with the "\r" character. One place where the carriage
return character is automatically handled is the PSL file reading operations. The PSL
read() function on a channel opened via PSL fopen() (with a non-binary mode) will
not return the carriage return characters in the output. PSL read will automatically
remove these characters from the returned result.
The main abstraction layer for gathering statistical performance data about processes
is the agent's process cache. The only reason it is called a cache is that it records a
snapshot of the process information within the agent's memory for rapid access. This
cache is a regularly updated record of operating system processes and a number of
statistics about them. The process cache is available via the PSL process() function.
For example, the process cache contents can be viewed via:
The process cache is built automatically via the agent. The PSL process() function is
very efficient because it only operates within the agent's internal data structures and
does not perform external process analysis. Behind the scenes, the agent uses a
platform-specific method to gather the data.
Gathering platform-specific data may be direct operating system statistic access (for
example, Windows NT) or a system command (for example, ps on some Unix
platforms). The process cache is updated according to the process cache cycle, which
is by default 300 seconds (5 minutes) but can be configured on a per-agent basis by
the PATROL administrator. The process cache can also be updated immediately via
the built-in %REFRESHPROC CACHE agent command, which can be launched as an
ad-hoc command on the system output window or via PSL, using the PSL system()
function with this command.
The process cache has a number of limitations. It may be out of date, possibly by
almost 300 seconds. The process cache cannot be updated for one particular process
only. The whole process cache must be updated as a group. This requirement poses
problems for a KM that needs to get regular up-to-date information about a small
group of processes if this data is needed more frequently than the process cache
update cycle. Portability of the PSL process function to all agent platforms is also
somewhat limited to non-Unix platforms. All Unix agents should function
identically, but some of the attribute fields are not always available on agents running
on other platforms.
Chapter 4 KM Design 71
Launching Child Processes (all platforms)
The solution for the delay in refreshing the process cache can be solved by using the
PSL proc exists() function. This function always immediately queries the process
tables and has up-to-date information. However, its only value is in the presence or
absence of a particular process, and the function returns only a Boolean value. PSL
proc exists() cannot gather any other more detailed statistics about a process. It also
requires the PID of the process and cannot be used to search for processes based on
the name.
The first problem with child process launching is whether the command exists on a
particular platform. For example, the Windows NT dir command fails on Unix, and
the Unix ls command is usually absent on Windows NT.
The first level of portability is to ensure that the correct command is executed on the
correct platform.
There is no easy way to achieve this execution other than to explicitly check the
platform, which is possible via the appType variable. Ported KM code might often
look like the following:
On Unix, the text returned from a PSL system call that fails is typically a shell-specific
error message. This message differs depending on whether sh, csh or ksh is the
default shell for the PATROL user.
Similarly, the messages on Windows NT are from the OS command interpreter rather
than PATROL-specific error message text. It is possible to check for the error message
patterns, but this is not the best solution.
The PSL exit status variable can be used to determine the exit code of the child
process. This exitcode may come from the shell or command interpreter, but it is
generally accurate because they usually pass along the child process code or have a
valid failure code if the command is not found. The exit status might have a different
meaning on each OS. It is likely that a process on UNIX that exits with a 0 return code
has completed successfully, however on VMS, this might indicate a problem.
Chapter 4 KM Design 73
Launching Daemon Processes (all platforms)
This scenario sometimes works and sometimes not, depending on the Unix platform
and on the daemon itself. Some of the failing platforms occur because the shell is
waiting for the file descriptors of the child process. Hence, the following Unix csh
syntax sometimes improves the portability by hiding the file descriptors from the
shell:
This descriptor is not shell-specific, and the solution to the shell portability issues
leads to either a separate shell script file that is launched separately or explicit coding
of the shell in the command, such as the following:
When all else fails, the PSL popen() function comes to the rescue. The PSL popen()
function has the property that it launches the child process and immediately returns
control to the agent. Therefore, you can simply call:
chan=popen("OS", "/etc/my_daemon");
This command will return control immediately to the PSL process and leaves the
child process running. However, behind the scenes, all output from the daemon is
stored in the agent's memory. Therefore, a better sequence is to detach the child
process from the agent by closing the channel, as shown below:
Note that the PSL close() function is called without a second argument. You should
not use the flags in the second argument to kill the child process because that is not
the goal.
The PATROL development console has many options to choose from, and sometimes
it can be difficult to know what to do with them all.
Some of the options you should try to avoid using in your KM include the following:
■ State Change Actions: These are console-centric actions and therefore are limited
to OS commands. They are not very powerful and generally are left alone.
■ Setup Commands: Ignore setup commands because as these commands run once
at the start of the agent, they are not useful in a KM. You can get the same “once
only” execution through prediscovery by doing the once-only action just before
converting prediscovery to discovery.
■ Self-Polling Parameters: Avoid the button in the scheduling dialog box for
parameters.
■ Interactive Tasks: All the tasks have a “Task Is Interactive” button. Generally, it
should be left alone.
■ in_transition: Old PSL function that serves no purpose anymore.When first called,
a transition timer is started for <timeout> seconds and the value 1 is returned.
While this timer is running, subsequent calls also return 1. After the timer expires,
calls to in transition() will still continue to return 1 until the next full discovery
cycle, after which it will finally return 0 and reset the timer. The timer is also reset
by calls to change state().
■ change state: A function with a lot of side-effects. If you call change state() on an
instance with parameters then this will automatically suspend the execution of all
these parameters. It's better to let one of the parameters under the instance go into
alarm, since this also allows you to add recovery actions if needed.
■ Simple Discovery: Maybe this is useful for entry level KM's, but this option
definitely lacks the options if you really want to be in control. You achieve the
same functionality as simple discovery by writing some simple PSL scripts.
Because of the lack of control and functionality, we will not spend any time on
Simple Discovery in this book.
Chapter 4 KM Design 75
KM Tracing and Logging
■ Menu Command%%f. . . gmacro: This macro will ask for user input and substitute
the result in the menu command text prior to sending it to the agent. Since this is a
literal replacement, improper input by the operator can result in compiler errors.
■ Statically linked libraries: It is possible to statically link libraries. This feature has
not been used at all and is there for historical reasons.
If you write the output to file, don't write it to the agent's error log. The agent error
log is indeed “The agent error log” and not “km error log”.
5
5 Optimizing PSL
When you think of optimizing PSL, you should have some understanding about what
the PSL optimizer will do for you. Once you understand that, you can focus on the
areas that are your responsibility as a developer. This chapter discusses the following
topics:
Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Standalone compiler/interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Agent Command Line option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
From PSL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
PSL Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Levels of Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Architecture Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Frequency of execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Amount of scheduled processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Amount of spawned processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
PSL Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
= , grep() or index() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
multi-string grepping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
PSL readln() limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Evaluation Short-Circuiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Optimize Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
PSL Functions That do Not Influence Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Measuring Performance
Before you ship your code or install it on production machines, you should at least go
through a performance analysis cycle of your KM. That means measure how much
CPU your KM is consuming and if you can solve the performance bottlenecks. After a
while you develop a feeling for what the more expensive calls are under different
circumstances, but before you are there, you will have to go through a lot of KM
analysis and optimize a lot of PSL code yourself.
The PSL Profiler saves the profile data for PSL processes to the profile as they
terminate. The agent’s data includes the following:
Standalone compiler/interpreter
It is possible to test your standalone psl scripts by using the standalone psl
compiler/interpreter. After you run your psl with profiling mode enabled, you can
run ppv (PATROL profile viewer) on the result to examine the profiling result.
From PSL
If you just want to profile a piece of your PSL code, you can enable PSL profiling on a
very granular level as follows:
■ ProfDefaultOptions(options)
■ omitting options returns the current defaults
■ ProfOptions(pid, options)
■ existing processes can be queried or changed
For more information on these items, see the PSL Reference Manual.
PSL Optimizer
The PSL Optimizer includes the following:
■ mutli-pass
Levels of Optimization
The PSL Optimizer supports the following levels:
■ 0 - no optimization
■ level 1 (peephole) optimization
■ level 2 (local) optimization
■ level 3 (global) optimization
Level 1 Optimization
The following figures provide example code that illustrate Level 1 optimization.
Level 2 Optimization
The following figures provide example code that illustrate Level 2 optimization.
Level 3 Optimization
The following figures provide example code that illustrate Level 3 optimization.
■ Libraries:
No cost
Architecture Optimization
The following sections describe architecture optimizations.
Frequency of execution
When you program in PATROL, you will create many PSL processes, which are
scheduled by the PATROL Agents scheduler. Every process will require resources.
Therefore, every effort should be made to limit the number of processes and their
execution schedule. This applies to all commands that will be automatically
scheduled by the agent (prediscovery, discovery, collector and standard parameters)
PSL Optimization
The following sections describe PSL optimizations.
= , grep() or index()
These are three different functions to do string matching. The optimizer will not
change one command with another, each function has it's own reason of existence
with its own backend code. I'll try to spend some time on each of the different
functions and hope this helps you understand why it is working this way.
text = pattern
Returns 1 if regular expression pattern matches the text, otherwise zero. The text can
be multi lined. This is an operator. An operator takes two arguments are returns a
third argument in a single QUAD instruction. The operation is a regular expression
match and it will unconditionally behave like a very fast version of grep()... no
options, no checking, no return values, besides 1 or zero. Whenever the optimizer
can, it will “pre-execute” this operator at compile time. That means if the two
arguments are static (hardcoded), using = will have no runtime overhead. The
execution of = is fast (much faster than grep and I guess a little bit faster than index),
but
■ Will return 0 or 1
■ Will always use regexp (be careful when your pattern contains regexp characters)
grep(pattern,text,options);
Returns the lines of text that match pattern. This is a build-in function and will
therefore require a quad more to execute than an operator. Has options to tweak the
execution, and will try to determine if pattern contains regexp or not. It works line by
line and if pattern does not contain regexp it will use a sort of index() function
(without compiling and executing the regular expression) to optimize for speed. This
is the slowest of them all on a line mode, but a lot faster then writing your own using
= and for each line on a block of text. The optimizer will not “pre-execute” the
function, even if the arguments are hardcoded (or static).
index(text,string);
Does not work with regular expressions. It will return the position of string within
text.Works on multilines and very useful to see if a certain word appears in a block of
text. This is a PSL function with less overhead than grep(). The optimizer will “pre-
execute” in case both arguments are hardcoded (or static)
multi-string grepping
The PATROL grep() function supports multiple string matching.
grep("string1\\|string2",text);
longfile = get("/patrolHome")."/longfile.txt";
function Safe_Readln(chan)
{
local psl, x, data;
pos = ftell(chan);
x = PslDebug;
# Turn off runtime error reporting
PslDebug = 0;
data = readln(chan);
while (errno == 57)
{
# E_PSL_READLN_TRUNCATED
pos = pos + 4095;
fseek(chan, pos, 1);
data = data.readln(chan);
}
PslDebug = x; # restore runtime error reporting
return data;
}
readchan = fopen(long le, "r");
if (! chan exists(readchan))
{
print("error <".errno."> opening <".long le.">\n");
exit;
}
longbuf = Safe Readln(readchan);
print("========== longbuf =============\n");
print("long buf is <".length(longbuf)."> bytes long\n");
print(longbuf);
print("=========end of longbuf ========\n");
Evaluation Short-Circuiting
Operators && and || do not short-circuit evaluation. In other words, the second
operand of these operators is always evaluated and never short-circuited by the first
expression. The following are some examples of expression short-circuiting:
Optimize Loops
Loop optimizations provide the following:
■ Menu and InfoBox commands - Menu commands and InfoBox commands are
only executed when the user requests them (on the console). You can tune them
however their impact will be less than those that run repeatedly.
■ Recovery Actions - Recovery Actions run only when a problem occurs and there
therefore will yield less performance gain than those that run repeatedly.
6
6 Prediscovery and discovery
This chapter presents the following topics:
At that time there was only something called discovery. Discovery was the first script
that was ran right after the KM was downloaded to the agent. Because in a lot of cases
discovery would find out that the application was not installed, it was decided that it
would be beneficial to introduce a step before discovery. Prediscovery was born.
Later (PATROL 3), the agent was fully autonomous and the real need for having
prediscovery or discovery went away, but by then it had introduced a whole new
way of developing KM's so the features were never removed in later versions.
Because some customers wanted to prevent KM's from being loaded altogether,
based on some very simple rules, the allow on and deny to fields we added to the KM's
property tab. These will be translated as pragma statements in the KM file. When an
agent now loads a KM, the pregame’s will be checked first, then the agent will check
the agent configuration database for disabled KMs and if this succeeds then the KM
will eventually be parsed.
Discovery Cycle
The Discovery Cycle is the process of going through prediscovery and discovery.
This section will explain what this is all about.
■ If prediscovery exists: 1
Depending on the setting of active, the agent will start either create a prediscovery or
the discovery process. These scripts will not be compiled unless they are about to be
run. That means prediscovery can be used to check availability of a library (and load
it) before discovery is run. Discovery can then safely require the library. It is strongly
recommended never to require libraries in prediscovery.
If you safely want to check for existence of a library, use the following:
function libtest(lib)
local warn,alarm;
if ( libtest ( "mylib.lib"))
{
A lightweight process can be seen as a high priority rtcell for the RUNQ scheduler
and will therefore not be delayed in its execution.
All applications that use the default discovery interval (40 seconds) will be scheduled
from the same lightweight process and the evaluation of active will start one after the
other. The order for which KM will be checked is the same as the KM loading order.
Every KM that has its own discovery cycle will create its own lightweight discovery
processes. If you don't require a 40 second execution of your discovery scripts, you
should modify the discovery cycle for your application.
Coding style
Although prediscovery can be left empty, it is recommended that you not leave it
empty. If you want to skip prediscovery and execute discovery directly, type the
following command in your prediscovery script:
set("active",2);
If you don't need prediscovery nor discovery at all, type exit in the command.
Historically, the most common check to make in prediscovery is to check that the KM
is not running on an incorrect agent platform. This can be done by examining the
/appType variable however, it is a lot better to use the allow on and deny to fields in
the KM property window.
Some people believe that the discovery process has to be the place where application
instances get created. Although this is probably true for at least one KM you write, all
the instantiation can then happen from any process running on the agent. There is
nothing bad in having only exit statements in both prediscovery and discovery. We
will talk more about instance creation in the next chapter.
The idea behind full discovery is usually when your application was just loaded and
still needs to find out all the details and perform setup of the KM. This step is
supposed to be more expensive than partial discovery, where only an update to the
current discovered environment needs to be made.
If your discovery relies on the process() function, there is no need to execute the
discovery script unless the PATROL internal process table was refreshed. The PSL
full_discovery() command indicates that a refresh of the process cache has happened
since the last time discovery was run. This function has not other purpose than
notifying you the process cache was refreshed although some people seem to think
there is something magical about it.
If your discovery is based on a configuration file, your initial discovery cycle will
have to read this file completely. After this initial setup, it would be good to check if
the file changed since the last discovery cycle. If it hasn't there is no need to set
everything up again. You can use the file () function as a test to determine whether
the file has changed. This function will return the last modification time of the file.
You can save the timestamp in a global variable.
exit;
}
The discovery script can be used to collect data as well (just like any PSL script). In
case you have a command that will return data for all instances of your application
class, then you might want to consider putting the command in the discovery script.
However, make sure to remember to update the discovery interval in that situation.
Change it to the same scheduling you would have used if the collection logic was in a
parameter.
Discovery pitfalls
Figure 36 on page 97 conveys a few of the pitfalls that you may with discovery.
Since it is possible to have both discovery and prediscovery running at the same time,
you should be aware that this can result is some very bizarre results.
To make sure you don't encounter this problem, it is a good practice to only have a
single set("active", ...); call in either the prediscovery or discovery script. If you have
multiple statements that modify the active flag then type a exit; function after these
statements.
7
7 Instances
This chapter presents the following topics:
Chapter 7 Instances 99
Managing instances
Managing instances
When you develop a KM, you are writing an application management template. The
application class can be compared to a real object class. Just like with object oriented
programming, you will create instances of a class for each managed object.
When an application class is instantiated, each instance will show up with the menu
commands and infobox commands as they need on the application class. At the same
time all parameters that are defined on the application class will be created under the
instance as well. If the parameters were set as active, they will immediately be
scheduled for execution.
Before the first instance is created, the only code that will be executed on the agent for
you application class is the prediscovery or discovery script5. Again, this is one of the
reasons why discovery is so special.
To create instances you should call the create() function. It is also possible to create
instances using the simple discovery rules, but we will not cover that in this book
because of the limited functionality and control.
Create function
The PSL create() function has the following syntax:
create(sid,label,state,msg,parent);
Table 2 on page 101 describes the variables available in the create() function.
5.This is not really true, setup commands are also executed only once for the application.
This is the name you will have to use when you want to
access this instance in the PATROL namespace.
label name (label) of the instance you create
Only the first argument is required, but you will have to specify three arguments to
create a proper instance. The last argument is used for creating nested instances and
we will discuss that later in this chapter.
The label of the instance can be changed at run-time and the console will update the
label of the instance immediately when it is changed on the agent. This feature can be
useful if you want to allow the user to use a different naming convention for the
instances at runtime (label instances by hostname or by IP address). You should be
aware that changing the label introduces network traffic to notify the consoles about
the update and the label should therefore not be changed without a good reason.
When you are implementing the instance creation functionality of your KM you
should know that end users expect a proper response from your KM. That means, if
you allow users to add instances to the instance list through a response() function,
and they press OK, they expect to see the icon (or an error message) after a short time.
Waiting for the next discovery cycle is usually unacceptable. Some developers think
that create() can only be done from within discovery. Nothing is further from the
truth. The create() function can be called from within any PSL script that runs on the
agent. Table 3 describes the scripts used for creating instances.
The only script where it is unlikely to find a create() function are infoboxes. Even the
system output window can occasionally be used to debug certain situations.
Destroy function
The counterpart of the create() function is the destroy() function. Just like create(),
destroy() can be called from anywhere in PSL. When you call destroy() the instance
you specify will immediately be destroyed and all parameters belonging to this
instance will be destroyed as well. In case of nested instances, destroying an parent
instance will also destroy the child instances regardless of the application class they
belong to.
When the last instance of an application is destroyed, the application icon will be
removed from the console as well. Because the application icon is destroyed after the
last instance is destroyed, it is a good practice to always add new instances before you
remove the old ones. If you first remove old instances and this list happens to be the
same as the current list of instances you have then the application icon will be
removed only to be recreated after the first new instance is created.
You have to be especially careful when you have written a menu command that
allows a user to “destroy all instances of this application”. In that case you could get
the list of currently created instances by calling get(“instances”);. However, if
you just run over this list and call destroy() for each of the instances, it is very likely
that you will destroy yourself before you finish the list. In this case you have to make
sure to remove yourself from the list and destroy yourself as the last instance.
At the basis of the problem lies the way KM development is usually done: first, make
sure it works, then make sure it works well.
In this section we will discuss the common mistakes that are made and how you can
reorganize your code so your KM will work better from the start.
It is a good practice to make sure that for every create() function you write, you also
have a destroy(). Just like a C-developer would provide a free () for every malloc().
In many KMs, you will see code that looks like this:
{
# Before we create the instance we should check if it wasn't
# created before
if (! exists(instance))
{
create(instance,instance,OK);
}
}
This code works fine, but it checks every instance to see whether it still needs to be
created. Performance of the code depends on the number of instances; therefore, in
the example, the more instances you have, the more CPU is consumed. Even if all
instance have been created already, the code will check for the existence of every
instance
found=0;
With this code, destruction logic was added to the create-loop. Usually you will see
that the wanted instances list will be more or less static (give or take a few instances).
By looking a bit deeper on how this code affects performance, we can come up with
the following formula: number of checks = current instances (current instances+1).
When your KM contains 50 instances, this would mean almost 2500 if-statements for
every discovery cycle. The result of these checks is very likely to be nothing (in case
the wanted instances list didn't change.
# This same function can be used to find the instances that should be
# destroyed.
# Read this as: What we currently have but don't want
to destroy=difference(current instances,wanted instances);
create(instance,instance,OK);
}
destroy(instance);
}
This code does the same as the previous example but a lot faster. The for each loop for
the create and the destroy logic is only called when something needs to be done.
Additionally, the very fast (and readable) difference() function is used instead of the
slower for each() loops.
For each additional instance, some extra CPU will be necessary for the difference()
call, but that extra cost is compared to the extra price you would pay in the first
example. Besides that you will not need to do needless checks in case no instances
have been added or removed from the wanted instances list.
create(instance,instance,OK);
This code look new, but if the instance is filtered by specifying it in the
/AgentSetup/<APPL>. filter list configuration variable, the create statement will fail.
(More details about this variable later.)
If you didn't cater for this situation, then in the second for each statement, the set will
fail and generate a run-time error. Therefore, it is better to ensure that create was
successful, as demonstrated in the code below.
if (create(instance,instance,OK))
{
# Now we can safely set the instance specific data
set(instance."/mydata",somedata);
}
}
Another way to prevent these types of error from occurring is by rechecking the
current instance list after all creates have been done. This is specifically useful when
executed from within discovery and the discovery process also acts as a collector for
each of the instance. In this case it might be a good idea to split the collection logic
from the create logic, as shown in the following code:
create(instance,instance,OK);
}
set(instance."/Status/value",...);
}
File check
In some cases, discovery is completely dependent on the content of a file. Because of
this dependency, the file will occasionally need to be parsed to find the wanted
instances list. A good way to accomplish this is to determine whether the file has been
changed. If the file has changed, you can parse it and see whether you need to add or
destroy instances, as shown in the following sample code.
# Retrieve the timestamp from the namespace. The first time this executes
# it will be set to ""
old timestamp=get("old_timestamp");
If the file has been processed, the timestamp will be saved. Then the discovery cycle
will look for the old timestamp again and use that timestamp to check whether the
file changed.
Process check
When discovery is dependent on the existence of processes, you can use the
full_discovery() call which was already described in detail in the previous chapter.
An example is shown in the code below.
# Check if the process cache has been refreshed since the last
# time this command executed
if (! full_discovery())
{
# If the process cache was not refreshed, exit
exit;
}
If you do not select the create icon for class then the console will not show a class icon
as a parent to each instance. Selecting or clearing this toggle is only a visual effect and
doesn't change anything in the way the namespace works.
If the check box is not checked then the application icon will not be created.
Nested instances
Nested instances are just a visual feature of the console, there is no additional object
level introduced in the Agent tree. This is the most important thing to understand
nested instances. Although instances will appear on the console as if there is a parent-
child relationship, the PATROL namespace doesn't change. There are some ways to
nest instances. If you don't remember the proper syntax of the create statement take
another look at the beginning of this chapter before continuing.
When creating nested instances, you have to ensure that the instance sids for a certain
application class are unique. This is sometimes overlooked, because the instances
could appear under a different parent.
Figure 41 on page 112 shows APPL B application class with four instances. The first
and fourth create statements are trying to create the instance with the same sid 'X'.
The first create statement creates the instance object with sid 'X' and gives the instance
icon a label of 'root'. PATROL knows this object by sid ('X') not “root”. The fourth
create statement is requesting to create the instance object X a second time but with a
label of 'tmp'. Even though the labels are different we are attempting to create
multiple copies of an instance inside of an application class. This is not possible.
NOTE
It is not possible to nest instances from different agents under a specific main-map instance.
Typically application classes that serve the only purpose of being a parent have a KM
name of <XXX> CONT.
By creating such an application class you can also mimic the “Create Icon for Class”
behavior, but still have a nicer label for the icon.
To determine children of an instance you must get all the instances of the application
(using the instances attribute) and loop through the instances list comparing them to
the parentInstance variable for matches.
Limiting the number of application classes will have a limited impact on the Agents
CPU. The application class itself uses some memory and CPU
(Prediscovery/Discovery). But it really starts getting important when you start
instantiating.
What is the total number of parameters that will execute every second?
■ KM1: 15 instances
■ KM2: 225 instances
■ KM3: 3375 instances
■ KM4: 50625 instances
■ Total instances: 54,240 instances
■ Total parameters: 101,250 parameters every minute
■ Parameters per second: 1687
Based on this example, you can see that it is important to limit the number of
instances, especially when you have nested instances, because it won't be obvious to
see the total amount of instances right away on the console.
If you could rewrite the KM logic so you would be able to remove an application class
(and a level of nesting), the results would be 15 times better. If you could limit the
number of instances create per KM by half, your KM would also be 15 times better.
Those are very significant differences!
Limitations
Once a nested instance has been created under a specific parent, it is impossible to
change the parent of the instance. The only way to workaround this limitation is by
destroying and recreating the instance under a new parent.
Inheritance of a number of attributes does not follow the hierarchical path of its
parent. Usually when you get() a variable from the namespace and the variable is
unknown, the namespace will be traversed, looking for the variable. A nested
instance will not traverse its way through the parent.
Events generated from a nested child do not contain the full logical path of the
instance. If this logical path is really necessary you will have to retrieve the full path
of the instance yourself and trigger one of your own events. We will discuss events in
detail in a later section.
Avoid using'.' and '|' in your sid. The dot character is used by PEMAPI as a delimiter.
The bar character is used by the layout database in the console. Never use the '/'
character in the sid, since this is the namespace delimiter.
The arguments to the PSL create function are instance name, instance label, and initial
status. A common error in learning PSL is to try one single argument as follows:
create("x"); # wrong!
This will create an instance, but it is hidden because the default state in the absence of
a 3rd argument is "NEW" which is an invisible state. The new instance will be created
under the computer icon. You will have to drill down into a computer icon on each
agent to see the new icon.
Instance filtering
The following sections provide information on instance filtering.
{
name="inst_".int(i);
if (create(name,name,OK))
{
print("Instance : ".name." create succeeded\n");
}
else
{
print("Instance : ".name." create failed\n");
}
}
Then these three instances will not be created, but all the rest will if you set
Then only these three instances will be created, all the rest will be suppressed. To
implement filtering out an instance using a menu command, see below:
# Now get the sid. This is not the same as the "name"
inst sid = get("sid");
3. This script will add the SID of the instance to the filter list and will immediately
destroy the instance.
When filtering out an instance, ensure you have a way to undo this. After you have
filtered out all instances, you end up with nothing and there might not be a way in
your KM to add instances or edit the filterlist if non of the instances are available. To
protect from this, add a function in your KM that ensures an offline setup icon is
created before the last instance is removed.
Filtering suggestions
When you see you are creating too much instances, you should introduce some sort
of filter capability for your KM. The following are some suggestions for filtering
criteria you could use.
Condition X
For example, monitor only the top X instances.
Consider moving the information in the parameters to a report that can be requested
by executing a menu command. For our 4 KM example, this would mean to remove
the parameters in KM4 (and KM4 altogether), but instead add a report menu
command that will return the data the user would otherwise get by clicking on the
parameters. This would also change the collection of data from a “scheduled
collection” to a “collect on request”. In KM3,you can then create some summary
parameters of the parameters previously shown under KM46.
Instance pitfalls
The following sections describe a few pitfalls that you may encounter with instances.
You could think of the PATROL History database as a “Circular” file, where the latest
data is written, and the expired data is written over. The history database could
increase in size if you added other parameters/application classes or increased the
retention period of any of your parameters.
But then what about reducing the file size? The problem is that one would like to
achieve certain effects with custom KMs without blowing up the history file. This is
proving rather difficult. For example, occasionally bringing in unique and temporary
file systems would slowly bring the size of the history file up. There would be no
certain limit. Is there a way to store history on continually changing instances without
having the history file run away? This description of the problem is commonly
known as “history pollution”.
There is a method you can use to work around this, but it might not be acceptable for
all cases. Let's say you would like to monitor processes and display the combination
of process name + PID and maintain some history about them.
This will be a very good example to get maximum history pollution, since the process
name and PID is an extremely unique thing, and it's unlikely that this combination
will ever reoccur once the process has died.
It's important to know that the history uses the combination of /APPL/SID/PARAM as
an index to store the historical data.
What the end user will see is on the console is not the instanceID, but the instance
label (which is the “name” in the namespace), and that is something we can use. Just
think of an instance ID as a “slotnumber” and the instance label as the thing we want
the end user to see. Applied to our processes example, you could come up with
something like this:
/APPL/SID =
-/PROCESS/SLOT1
-/PROCESS/SLOT2 -
/PROCESS/SLOT3
/APPL/INST=
-/PROCESS/inetd-445
-/PROCESS/ksh-336
-/PROCESS/xterm-776
As long as the number of processes will be reasonable, we will not pollute the history
more than necessary. (we will only create to number of slots we need, and reclaim
them whenever one becomes available)
Of course this feature might sound very good, but there is a problem with this
approach as well. If process A was using slot 1 and it dies, it becomes available for a
new discovered process, let's say process B.
If we now ask for history of a parameter belonging to process B, we will not only get
the history from the B process, but also the old history of the A process (or even every
old process that occupied the slot in the past). That means we have to provide
addition info on the graph, so we won't confuse our end user what information he's
actually looking at. One way to do that would be to annotate every time a process
occupies slot and when it releases the slot.
Another way would be to define a certain “impossible” value for each of your
parameters and set the parameter to this impossible value whenever a switch occurs,
or whenever no-one is occupying the slot.
The history file uses blocks to store data for parameter/instance values. Over time,
blocks are freed (because of history retention) for use with new data, but the new
space allocated will only be for values of the same parameter/instance. In other
words, if you have an instance /FILESYSTEM/tempfs01 and store history on it, when
the history expires, there will be room allocated in param.hist for
/FILESYSTEM/tempfs01 (only) even if that instance does not exist anymore. There is
currently no direct method in PSL to handle this problem.
This is certainly different than how most people think it works. Usually, one assumes
that once the history for /CLASSx/INSTx/PARx was “outofdate” then the space it took
was available for any other /CLASSx/INSTx/PARx to ll.
Actually it is quite logical that this is not the case. The history file works just like a
database. It is indeed the case that whenever history is outdated, the occupied space
will become available again for other /CLASSx/INSTx/PARx values.
The problem lies somewhere else.When you set history retention period to 7 days,
that means the oldest datapoint for a certain /CLASSx/INSTx/PARx will not be
removed from the history database if the time difference between the oldest
datapoint and the newest datapoints is less than 7 days.
Maybe another example is better: Let's say we have a process that restarts every 2
days. That means every two days it will be assigned another PID. The data in the
history database will not be cleaned up until there are 7 days worth of history. That
means the index and data in the history database will never be removed.
Best case scenario, the process continues to run and will only occupy 7 days worth of
data, since the history works like a round robin database data which is older than
now-7days will be overwritten with new data.
Whenever the process dies (and the instance is removed), your history data will not
be cleaned up, because the rule is to keep 7 days worth of data, so it's sitting there and
you can't access it.
For most KM's this is OK, for example a file system is unmounted, instance is
removed, file system is remounted, instance is recreated, and the “old” data is still
available.
However, with our example, it is very unlikely that the combination PID-
PROCESSNAME will ever reoccur and your data will just be sitting there.
Since no extra data is added, there is no reason for the agent to clean it up (as
mentioned before, history will be cleaned when is the delta between timevalues of old
and new datapoints are greater then the history retention period).
Of course it doesn't matter how many times a year the process restarts, it is important
to know that the agent will store (and not necessarily free) data for each
/CLASSx/INSTx/PARx combination. The only freeing process that happens is either
console triggered (developer console can call “Clear History”), or automatically when
the time range of a parameter exceeds the history retention period.
8
8 Parameters and Recovery Actions
This chapter presents the following topics:
Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Standard Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Collector Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Consumer Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Parameter Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Text Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
No Output Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Gauge Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Graph Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
State Boolean Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Stoplight Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
ExtraFilesList Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Creating an ExtraFilesList Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Parameter History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Annotated Datapoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Alarm Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Range Overlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Recovery Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Use Recovery Actions Intelligently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Recovery Actions on Alarm1, Alarm2, and Border . . . . . . . . . . . . . . . . . . . . . . . . 136
Can Be Used to Tune or Detune . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Debugging Recovery Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Turn History Off If Not Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Set Scheduling to a Reasonable Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Value attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Parameters
Parameters allow you to gather and display metrics of an application. A typical KM
defines a number of parameter scripts that collect various metrics. As soon as an
applications instance is created, all the parameters defined in the application class
will automatically be instantiated as well. Parameters that have scripts defined will be
put in the RUNQ and will be scheduled by the scheduler.
There are three types of parameters (standard, collector, and consumer) each with
different characteristics. Table 4 lists the mandatory properties of a parameter.
Once the name has been defined, it can't be changed with the
developer console anymore. If you really need the change the
name of a parameter and don't want to duplicate it manually,
you could edit the KM file with a text editor outside of the
developer console. Once the KM has been modified in such a
way, you should restart the developer console for the
changes to take effect.
active allows you to create parameters which are inactive by default
Besides these mandatory properties, there are also properties that you will only be
able to enter, depending on the parameter type you are using. The properties can be
divided in two categories: execution and visualization. Depending on the type of
parameter you are defining, you may or may not be able to enter information for the
properties listed in Table 5 on page 125.
Standard Parameters
The standard parameter is the most complete of the three types of parameters. This
parameter offers both collection and visualization.
Non-PSL command type scripts will set the parameters value with the data returned
on standard output. Ensure that you return only one value if your parameter type is
not text.
If you are using PSL, the standard parameter can act as a collector as well. Actually
you could say that the standard parameter will then behave like a collector with
visualization build in.
If you set the output to something other than no output, you can update the
parameter directly from the PATROL Console.
NOTE
A no-output standard parameter is not the same as a collector, standard has history and alarm
ranges and can execute recovery actions.
Consumer Parameters
A consumer parameter only implements the visualization and recovery actions.
The most challenging issue with a consumer parameter is that you can't determine
which parameter, menu command, or discovery script sets the consumer. Therefore,
you should document where the setting of the parameter occurs.
Collector Parameters
A collector parameter only implements execution of scripts.
You should only use the PSL command type when defining a collector. If you use
another command type, stdout will implicitly set the value attribute of the parameter,
but since a collector doesn't have a value attribute, you will get a run-time error.
Parameter Styles
There are several parameter styles to choose from. As listed above, not all parameter
type support all styles. A definition of each parameter style and its meaning is given
below.
Text Parameters
The text parameter is the only parameter that does not store history, but “No history”
does not mean “no memory”. The value of the text parameter is stored in the agent's
memory. The more text you store in the parameter, the bigger the agent will become,
so you have to be very careful if you type commands like this:
set("value",get("value").new info);
These cumulative parameters, have a risk consuming a great deal of memory. For the
same reason, you should not use text parameters for displaying entire log files. If you
really want to show something of the log, you could decide to only show the new log
entries since the last collection circle.
You cannot define alarm ranges on a text parameter 8, but you can change the state of
the text parameter by changing the status attribute like this:
set("status",AlARM);
Because you can't define alarm ranges on a text parameter, it will be impossible to
execute recovery actions as well. If you really want to execute recovery actions, you
will have to do it in the collection cycle or by creating a parameter that allows you to
execute the recovery actions.
No Output Parameters
Although no-output parameters don't show up, you can use them to store history.
Even a consumer parameter can have a no-output type.
You can also define alarm ranges and recovery actions for no output parameters. This
can be very useful in case you want to execute some “hidden” recovery actions or if
you want to increase the functionality of parameters that can't execute recovery
actions (like a text parameter).
8. This is technically incorrect, you can define alarm ranges, but they will simply be
ignored.
Gauge Parameters
A gauge parameter can only show a single value at a time. When annotated data is
available, the Info button on the gauge will be activated. You can click the button to
display the annotation data.
Graph Parameters
Graph parameters provide the most flexibility. After opening them, you will
immediately get an overview of the status over time and the trend of the collected
data. It is also possible to drag and drop graph type parameters together so it will be
displayed as a single graph. The graph type can be changed to for example a pie or a
bar.
Stoplight Parameters
This output type behaves a bit bizarre on NT consoles earlier than 3.4 since the
stoplight on NT will only show the green or red light, and the platter will display the
actual difference in the state. A yellow platter indicates a WARNING state. If the
stoplight is red and blinking, it indicates an ALARM state.
From a Unix console, the stoplight turns green for OK, yellow for WARNING, and
red for ALARM. From PATROL v3.4 on the behavior on the UNIX and NT console is
the same.
ExtraFilesList Parameter
ExtraFilesList allows a KM developer to specify extra files which should be
committed to an agent when the KM is committed. Below are the steps required to
specify extra files.
3 Add the list of files to be committed in the command text window. Each file should
be specified on a separate line. A distinction can be made between PSL library files
and any other file that should be send by putting the keyword # EXTRA or # LIB
before the filename.
The # LIB keyword should be used when you specify PSL libraries, although the lib
keyword doesn't really make a difference (yet).
NOTE
File locations are relative to the local or global psl directory ($PATROL HOME/lib/psl). When
the console attempts to find the specified file, it always considers the local directory first. The
files will end up on the agent in the same location relative to the global psl directory,
regardless of whether the file was taken from the local directory or the global directory.
You should start the line with a # character because the # character is used as the
comment character by most shells and PSL. Thus, if someone accidentally activates
the ExtraFilesList parameter, it won't cause a problem.
ExtraFilesList will only work with developer consoles, since it uses the commit
function. You should never specify big binary files in the ExtraFilesList, because this
can impact the network performance. It is also impossible to commit different
executables to different platforms.
Parameter History
Patrol stores historical values for parameters (except text parameters) in a binary
history file local to the agent. The disk space used is approximately 8 bytes for value
and timestamp. There is also a separate index file with one index entry per parameter.
You can turn off history collection for a parameter by changing the setting from
“inherited” to “local” and setting the number of days to “0”. Since PATROL uses the
double audiotape for storing values, you might experience incorrect result when
storing values that exceed the storage size of a history value.
Annotations are stored in a separate history database and uses disk space
approximately equal to the number of text characters.
You can extract the contents of this history via the dump_hist utility or the PSL
dump_hist() function.
Annotated Datapoints
If you would like to save textual data for a certain value, you can annotate the
datapoint. The syntax for annotation is:
annotate("<param path>","<fmt>","<data>",...);
The annotation data will be saved as long as the history of the parameter.
When you annotate a datapoint, you will always annotate the last value that was
set(). Remember to set() before you annotate.
Annotating data doesn't come for free. All the data you annotate will be saved to disk.
Make sure to not just annotate every datapoint with all new log-entries that were
collected when monitoring a log file. An annotation point should be an exception
because otherwise the operator won't know where to look first.
Alarm Ranges
Alarm ranges define the ranges in which the state of the parameter should be
considered OK, WARNING, or ALARM.
Settings
There are three ALARM ranges, as listed below, each with a minimum and a
maximum attribute. Both minimum and maximum can only be de ned as integers.
When a new value arrives, it is evaluated against the ALARM range rules like this:
min border <= min alarm1 <= max alarm1 <= min alarm2
<= max alarm2 <= max border
This rule means border is exclusive, if the definition is ]0,100[ then 0 is not part of the
border range, but 100.1 is. Alarm1 and alarm2 are inclusive, so if the definition is
[10,30], then 30 would be ALARM, but 30.01 would not.
Range Overlapping
In case max alarm1 = min alarm2, alarm1 takes precedence.
In the extreme case where all values are the same, Alarm2 is completely overlapped
and therefore useless. This means that one value will belong to alarm1 and any other
value belongs to border.
Recovery Actions
Recovery actions arise from the combination of KM design and the agent. The
recovery action is an agent-executed corrective action that is launched once a problem
is detected, to attempt to correct it. Recovery actions are automatic and do not require
any user intervention.
Recovery actions can also have escalated multiple actions. A sequence of recovery
actions can be defined to attempt alternative solutions to an alarm. Recovery actions
can also be used to return to normal operation if the problem is corrected.
When a new value is set, the agent will perform a range check (this explains why you
don't see something going into alarm immediately after you modified the range, the
agent will first have to collect a new value).
No matter what happens, there will only be one recovery action running per
parameter! No matter how often you switch between ranges. If there is a recovery
action that should be executed (and no recovery action is running), the recovery
action will re off immediately. If there was a recovery action running (even on
another range), your recovery action will not execute.
Each range can have multiple recovery actions. The moment the agent changes range
(NOT the same as a state change), he will restart processing the list of recovery
actions from the top. If the range is still the same when a new value arrives, the agent
will try to re of the next recovery action in the list (as long as there is not another
recovery action running for that parameter).
Pitfalls
The following sections describe pitfalls with recovery actions.
Value attribute
You do not set an object. You will set an attribute of the object. The following
instruction is therefore invalid and can cause a lot of confusion:
set("/MYKM/myinst/param1",50);
Parameter-related observations
The following are parameter-related observations:
9
9 Menu Commands and Infoboxes
This chapter contains the following sections:
Menu Commands
Menu commands allow developers to add administrative functions to KM's. Usually
menu commands are used for administration of the KM's and the application the KM
is managing. Menu commands will only be available on application instances.
When the operator selects a menu command, the menu command text will be sent to
the agent and scheduled for immediate execution. Before the agent can execute the
script, it will compile it first.
A menu command can be executed as a task or command. We will discuss what the
difference between a task and a command is.
"#%MODES% dev"
The result is that the menu command will only be available via the developer console
and will not show up on the operator console.
Some macro variables you can also use in state change actions are available to pass
some context to your OS command (see chapter about events).
%%{. . . } macro
This macro will ask for user input and substitute the result in the menu command text
prior to sending it to the agent. Although it is useful for testing, because it is a lot
easier than defining a response() function, the use of this macro should not be used in
production code.
The data entered by the user will be literally replaced. Improper input by the operator
can result in compiler errors. It can also allow operators to execute arbitrary PSL code
on an agent.
Command
When you define the menu command, you have to select if you would like to execute
is as a task or as a command. If you select command, then the PSL script is scheduled
for execution just like a parameter. You have no control over the process once it is
launched. Any output that is generated by the menu command will be sent to the
system output window.
Tasks
If the command is defined to execute as a task, a console object (text box) will be
created when the command is started. From this console object you can choose decide
to kill, suspend, or restart the command. Output is directed to this task window on
the console.
The following explanation s one way to create an overlay KM. This feature is not
limited to just computer class KM's and can sometimes be useful if you want to add
parameters to any existing KM without having to modify the original file.
3. Add the menu commands to this new KM. These menu commands will eventually
be merged with the menu commands that were already defined in the
ALL_COMPUTERS.km.
5. Open the KM in a text editor and rename the COMPUTER CLASS NAME from
MYCOMPCLASS to ALL_COMPUTERS.
Another way is to remove everything from the ALL_COMPUTERS KM but your own
menu commands in the PATROL console (make sure you have a copy of the original
file somewhere). Save this new KM and rename the file to MYCOMPCLASS.km. The
result will be the same.
Infoboxes
When “infobox” is selected on the console, all infobox commands will be send to the
agent and scheduled for immediate execution.
The Infobox has two types of entries: built-in entries at the top that are hard-coded
into the agent, and KM-specific entries at the bottom. Only KM specific entries can be
added by the developer to extend the functionality of an infobox.
■ Application version
■ Application’s installed directory name
■ Knowledge Module version
■ Overall condition or status of the application
The value shown in the right-hand portion of the infobox is set via the PSL print()
command.
Both PSL and OS commands can be used. Usually the OSmacro command %ECHO is
used because this results in very fast execution. (No rtcell, or PSL process is created if
the agent sees the user just wants to do a %ECHO command.) An easy way to print a
namespace variables is:
%ECHO %{<variable>}
The output of a command defined in an infobox is sent to the text box. If you execute
an OS command, you have to be careful your result is in the first line, since only the
first line will be visible.
Because of this, you should consider moving all your menu command code to
libraries. The only code that will be send over the network would then typically be:
requires my_menucmdlib;
menu_show_preferences();
If it takes a while before you even see that first line printed, then this will probably be
caused by a combination of network delay and compilation time. Try putting you
menu command in a library in that case.
10
10 PSL response function
This chapter presents the following topics:
Response function
The PSL response() function allows you to create interactive dialog boxes on the
PATROL Console for information, collection, and presentation.
The function arguments of a response function defines the user interface elements.
This first line is followed by a list of result values for each of the elements that made
up the response window.
The order of the return values is the same as the order that the elements were defined
in the response() function.
The PATROL Console can display multiple response() windows as it receives them
from the Agent.
If multiple PATROL Consoles are connected to the same PATROL Agent, a response
function window produced by a parameter recovery action will appear on each of
them. In this situation, the first operator that presses OK or CANCEL will in effect
acknowledge them all. Clicking on the remaining response windows will cause any
effect on the agent, but they are not automatically removed.
NOTE
BMC Software recommends that you not generate response windows from a recovery action.
Should multiple agents have a problem, the operator could potentially receive multiple
response windows. Since there is no option on the console to close all response windows, this
will be a manual task for the operator. In case you really want to create response windows
from a recovery action, make sure to provide a toggle so users can turn this option off.
7.If B=1 (broadcast property) is selected, all consoles that have the application loaded will see the response()
function. This behavior is not very useful.
You can continue to update the response window by specifying this rid as the title of
the response function.
For more information about dynamic response windows, see the PSL Reference
Manual.
Response pitfalls
The following sections describe a few pitfalls with using response.
For example, if you only want to allow one operator at a time to open a specific
response box, you will have to use some sort of locking to make sure that the second
response function cannot be opened.
If you use locking (the real lock() function), you will see that the agent will
automatically release all locks held by the PSL process the moment it dies, so it seems
that the PSL lock function will be able to help us out on this.
if (lock("MYLOCK","x",0))
{
print("I've got the lock\n");
}
else
{
print("Lock already held by someone else\n");
exit;
}
response("LOCK TEST","","","1\nCLICK TO RELEASE THE LOCK");
You will see this will work and will allow you to have only one response function of a
specific type active.
But by doing this you have created another problem. Let's say that your customer is
using PATROL for 24h support and they have two operator sites (one in Belgium and
one in Houston), so they can do “follow the sun” support. A problem happened in
Houston during daytime and someone opened the response function.... Now what
will happen if the guy who opened the response function forgot to close it? 12 Hours
later the same problem occurs. The guy in Belgium tries to open the response function
and gets a message that someone else already opened it.
To solve the problem (and get access to the response function), the guy in Belgium
has to call the Houston office (probably no answer) or restart the agent (drastic
intervention, just to release the lock)
I mentioned that a lock would automatically be released when a PSL process is killed,
so we change the PSL script to the following:
if (lock("MYLOCK","x",0))
{
print("I've got the lock\n");
set("lockpid",getpid());
set("locktime",time());
}
else
{
lockpid=get("lockpid");
locktime=get("locktime");
print("Lock already held by pid ".lockpid."
timestamp=(".locktime.")nn");
# Probably you would like to ask the user here if he
# want to use the emergency
# unlock ... we will presume he wants to
kill ( lockpid );
print("We killed the process holding the lock, ".
"try again\n");
exit;
}
response("LOCK TEST","","","1\nCLICK TO RELEASE THE LOCK");
Indeed, it seems the process can't be killed.... Well, this is something special. The
process is in an IOWAIT state (see %PSLPS) and is therefore not considered to be
running. So how can we fix that problem?
Well, it is obvious that we won't be able to kill the response function (because of
IOWAIT state) if we write it like that.Maybe we can rewrite the response function so
we can kill the PSL process? Take a look at this code:
if (lock("MYLOCK","x",0))
{
print("I've got the lock\n");
set("lockpid",getpid());
set("locktime",time());
}
else
{
lockpid=get("lockpid");
locktime=get("locktime");
print("Lock already held by pid ".lockpid."
timestamp=(".locktime.")nn");
# Probably you would like to ask the user here if he
# wants to use the emergency
# unlock ... we will presume he wants to
kill(lockpid);
print("We killed the process holding the lock, ".
"try again\n");
exit;
}
#Open a dynamic response function
rid=response("LOCK TEST",
"","D=1","1\nCLICK TO RELEASE THE LOCK");
output="";
while (!output)
{
output=response get value(rid,1);
sleep(1);
}
# Output received, kill it
response(rid,"","K=1");
This will work, because we have created a while loop that will allow the PSL process
to come out of the IOWAIT state once every second... this is enough to have the
process killed (and release the lock in case of emergency).
There are other workarounds possible to this problem. What you can also do is
maintain a list (PIDLIST) with PID's that started this menu command (and will
potentially modify something in the response function). If someone saves the info,
you check if his PID is (still) in the PIDLIST.
If this is the case, you save the info and clear out the PIDLIST... This will prevent
anyone else from saving the info (because the PIDLIST is cleared).
If someone opens the response box after the info was save, his PID will be added to
the list again and he will be able to save his info.
I don't know if this procedure is clear, but it is not based on locking and it is secure,
because the list will only be cleared if someone saves the info (meaning that we don't
care about processes that die/disconnect).
Clicker behavior
For some developers it just seems to be impossible to get the clicker element in the
response() function to work. Most of the times because of the inconsistency between
the UNIX and the NT console, but sometimes also because they didn't clearly
understand how it is supposed to work or behave. Before I get into the known
problems, I would like to explain how the clicker is supposed to work.
You know you have to specify a minimum value and a maximum value and a default
value. Let's refer to them as min, max, and default.
■ If default doesn't lie between min and max, PATROL will take for default a value
of (min+max)/2
The clicker window is sized according to the size of the default value. The value will
be correct (you just have to scroll in the small window containing the number to see
the actual value)
A possible work around is to make sure the default value contains at least as much
characters a there are character in min and max. (a leading zero will not work for
default)
min= 99;
max=10;
def= 50;
if (def>max) { def=max; }
if (def<min) { def=min; }
lenmin=length(int(min));
lenmax=length(int(max));
lendef=length(int(def));
# Trap clicker problem...set default to something else
if ((lendef<lenmin) || (lendef<lenmax))
{
def=(length(min) > length(max)) ? min : max;
}
printf("min=%i - max=%i - def=%i\n",min,max,def);
Of course, this limits the use of this control a lot, because you will not be able to set
default really as you would like to. (but in cause you don't mind, this might be
useful). An alternative might be to use the slider or another response() control.
11
11 PSL libraries
This chapter presents the following topics:
PSL library files have a .lib suffix. The format of a .lib file is binary, and it contains
compiled and optimized quad-code. The format is not a native executable or a DLL
format. The binary format is platform independent and does not suffer from
portability issues, such as endianness or network byte order constraints. This
platform independence is achieved through the use of a byte code format specific to
PATROL.
If you use libraries to make the PSL command text in parameters smaller, there is no
real performance gain, because the PSL code for parameters is only compiled once.
However, performance will increase if you use libraries for menu commands.
Menu commands are compiled every time they are executed. This compile time
might be saved when you have complex menu commands. The menu command text
would consist of as many library calls as possible.
Libraries don't contain a version at all. Actually none of the compiled PSL formats
support versions. The agent will use the timestamp of the file to see if the file was
changed or not.
■ In PSL script - requires mylib; all exported functions/variables from mylib are
available
Requires
A PSL script loads a PSL library using the requires keyword. The requires keyword is
followed by the library file name, either directly as an identifier or in double
quotation marks as a string literal. Assuming the existence of a PSL library file named
myfuncs.lib, the following requires statements should be identical:
requires myfuncs;
OR
requires "myfuncs";
OR
requires "myfuncs.lib";
The use of a file extension outside the quotation marks is not allowed, as shown in the
following statement:
When the PSL compiler encounters a requires statement, the agent loads a PSL library
file into memory if it has not already been loaded (for example, if it was required by
another PSL script). The agent must find the library file in the /lib/psl directory offset
from the PATROL HOME environment variable. The functions and global variables
that have been exported by the PSL library can be used in the PSL script at any point
after the requires statement.
A function can be exported before the definition. A function cannot have the same
name as a built-in PSL function name such as ntharg() or grep().
Export
The main keyword in the definition of a PSL library is export. This keyword is used to
explicitly export a PSL user-defined function or a PSL global variable. A function can
be exported by the export function syntax, as follows:
Exporting variables
A global variable in a PSL library can be exported using the export syntax:
export myvar;
A global variable can be exported before its first use. A function's local variable
cannot be exported. Exported global variables have a distinct value for each PSL
process using the PSL library. Therefore, they are not exactly like variables inside a
DLL, where there would be only one value for any global variable. For example, if
two PSL scripts require a library, and it exports a global variable 'x', then the two PSL
scripts can set 'x' to different values without interfering with each other.
This is usually convenient in preventing bugs, but can be limiting in that there is no
way for PSL processes to use PSL libraries to communicate with each other. On the
other hand, the PSL get()/set() namespace functions is more than adequate for PSL
processes to communicate together, so this is not a major disadvantage.
Exported global variables cannot be automatically initialized. They have the empty
string at startup of the PSL process, just like all PSL variables, but reliance on this fact
can trigger a PSL runtime warning message if you have the “PslDebug” settings
enabled.
There is no way for a PSL library to automatically call an initialization function when
it is loaded. Instead, the only solution is an explicit initialization function in the
library that must be called by every PSL script that uses the library. Alternatively, this
can be hidden by calling the initialization routine before any other library processing
occurs, but this is less efficient since it requires checking each time whether
initialization has occurred yet.
Exporting functions
With exporting functions:
Code sharing
The main purpose of libraries is to allow code sharing. However, code sharing doesn't
mean that all the functions de ned in the library are automatically exported and
usable by the process that needs it.
Code reuse
To make a function available to a process that requires the library, it must be exported
in the library first. This means that all non-exported functions in the library are
callable only in the library. Every library function is stored only once inside the
agent's PSL cache and all PSL processes will use the same function code from the
cache.
Function variables
The data will be private to the PSL process (there is no call by reference possibility
with PATROL) and will be duplicated for the library.
This means that the library function can't change the data for any arguments.
Function hiding
Call by reference is simply impossible because every library function duplicates the
arguments before using it. However, the library can access variables that belong to
the PSL process. Therefore, the function code is executed in context of the PSL process
that called it.
This command creates the myfuncs.lib file in the current directory. This command
must be executed in the current directory containing myfuncs.psl. Additionally, the
command does not automatically find PSL files in either local or global directories.
You cannot create or edit PSL libraries from the developer console. The only method
is from the command line using the psl executable.
There is also no performance penalty when you use PSL libraries. All libraries are
internally cached by the agent and have the same performance as other PSL scripts.
The agent does gain some memory usage when you use PSL libraries. The agent
maintains only one copy of the read-only data for a PSL library, including all the
quad-code instructions, constants, and linkage data. The process of using a PSL
function in a library rather than copying it into many files, which is poor
programming style, saves the agent some memory. The agent maintains multiple
copies of PSL library dynamic information, such as the values of local, global and
exported library variables, with one copy per PSL process using the library.
Understanding these issues will help avoid problems and save time.
Developing the PSL library on the agent machine does not work well when
developing or testing on more than one agent.
A trick to changing libraries without having to restart the agent is to change the name
of the library and recompile it. Don't forget to change your requires statement in the
KM.
These steps are necessary because of a design flaw involving late linkage of the PSL
library and binary format. Function and variable names are mapped to integer offsets
too early. This mapping makes PSL library loading efficient because it does not need
to happen in the agent, but it leads to these problems of inflexibility when the
functions changed.
The agent can experience strange failures if some of these steps are missed. The agent
has difficulties handling inconsistent PSL libraries. Fortunately, it does not crash; it
will offer a number of error messages such as “fatal inconsistency detected.”
Not all failures are this obvious. Some of the “quiet” failings involve the wrong PSL
library function called or the wrong global variable accessed. When debugging a KM
or PSL error that appears to defy logic, consider the possible failings from PSL
libraries.
Did you restart the agent the last time you changed some libraries? Did you
recompile dependent PSL libraries when you changed just one of them?
Did you copy the newly created “.lib” library files down to the agent's global 'psl'
directory before restarting the agent?
In the face of strange behavior or a PSL coding bug you thought you had xed, these
PSL library issues could be causing you to chase ghosts instead of your real PSL
coding bugs.
5 Recompile the PSL library via psl -l to create a new .lib file.
6 Recompile any dependent other PSL library files with psl -l.
7 Copy or FTP one or more changed .lib files to the agent’s global psl directory.
8 Ensure that library file permissions are adequate to allow the defaultagent account
to read the library files.
%DUMP LIBRARIES
In PATROL 3.1 there is no way to detect the status of the libraries. However,
PATROL 3.2 offers the built-in agent command %DUMP LIBRARIES to list the PSL
libraries and when they were loaded.
The output from the “-R” option is either a report that no PSL libraries are required
such as: 'myfuncs.psl' requires no PSL libraries.
Alternatively, if PSL libraries are required, the output will look similar to:
The status (second argument) is either dynamic or static depending on whether the “-
s” option to “psl” was used to create a static linked library. Generally, all PSL libraries
are dynamic.
Library type
The type refers to whether the library is a normal PSL library, as discussed here, or an
XPC-based DLL-like PSL library, which is discussed elsewhere.
Unfortunately, the “-R” function is not supported on PSL library files, and fails with
an error message:
This is a limitation of the psl executable and not a problem with the library file
formats. One inelegant work around for that may well be to copy myfuncs.lib to
myfuncs.bin, and then try cmdpsl -R myfuncs.bin. Because of the similarity of “.bin”
and “.lib” file formats, this will show the library usage dependencies.
PSL binaries
It is also possible to compile the PSL scripts you saved in a separate PSL file to PSL
binaries. With the command psl -b myscript.psl, you will generate a myscript.bin file.
When distributing the KM you can decide not to distribute the source PSL files, but
the .bin files instead.
The PSL binaries also don't have a version number. Binary files will be
unconditionally committed.
■ The Developer Console KM downloading does not send PSL library files.
■ The -R option requires the libraries to exist and reports an error if it does not exist.
■ The psl function only finds libraries in the current directory and has no -I or -L
option to give an alternative directory. However, an environment variable named
PSL LOAD PATH can be used.
■ There is no way to create libraries from the Developer Console KM editing dialog
boxes.
Despite these problems, the use of PSL libraries is good coding practice. It encourages
code modularity and reuse and simplifies larger PSL coding problems. The problems
with distribution are usually solved by including the PSL library files in agent-side
installation routines. The majority of other difficulties occur during development of
the KM and do not affect the end user of the KM.
Do not put the requires statement as one of the first lines in the prediscovery without
first determining whether the application is installed on the system.
If the library is installed on the system and the agent loads it, it will be loaded in agent
memory and can't be unloaded. If your application doesn't exist on the system, you
might want to set active to zero (unload the application).
12
12 Configuration Variables
This chapter contains the following sections:
If the binary configuration file does not exist, or config.default was changed, the
PATROL Agent process reads the ASCII config.default file and creates/loads it into
the binary configuration file.
■ The PATROL CONFIG command must appear at the top of the file.
■ The change entries can appear in any order and are applied in the order of
appearance.
When defining changes or adding new variable to this file, you should be aware of
some constraints:
■ Variable path - You can use any path to define the variable in a configuration file.
The path length is limited to 1024 characters.
Variable Syntax
"<VAR1>" = { <ACTION> = "value1[,val2,...]"}
Action Syntax
"<VAR1>" = { <ACTION> = "value1[,val2,...]"}
Value Syntax
"<VAR1>" = { <ACTION> = "value1[,val2,...]"}
■ comma-separated list
■ beware of \characters (also \n! = lists)
Figure 54 on page 170 shows phase 2 in modifying the binary configuration file.
■ <APPL>.filterList
■ <APPL>.filterType
defaultAccount
The agent uses the <APPL>[.<INST>].defaultAccount user to run parameters,
recovery actions, and application discovery procedures if an account is not specified
for these commands. The default is patrol. If a hard-coded default other than
PATROL is used, the agent issues a warning unless a valid encrypted password is
provided for the account; for example, “user1/encrypted password>.”
OSdefaultAccount
The agent uses the <APPL>[.<INST>].OSdefaultAccount user to run parameters and
recovery actions for this application or application instances unless an account is
specified for these commands. The default is NULL (use PatrolAgent's
defaultAccount).
OSdefaultAccountAppliesToCmds
If the flag is set to “yes,” then menu commands that run against instances of this
application will run as user specified by either:
■ <APPL>.<INST>.OSdefaultAccount
■ <APPL>.OSdefaultAccount
Otherwise, menu commands will run as the user that the console logs into the agent
unless a explicit account is specified for the command. The default is “no” (don't run
as *.OSdefaultAccount).
■ Parameter
■ Instance
■ Application
If an agent configuration variable exists on each of the levels, the agent will use that
configuration setting. Otherwise, the agent will check if configuration was set in the
property sheet before proceeding to the next level.
13
13 Command Types
This chapter contains the following sections:
Command Types
Command types are a powerful but alas unintuitive mechanism of isolating specifics
of execution of types of commands. Each KM has a set of attributes called “Command
Types” that can be set on a per-class basis. Command Types can be specified for
application or computer classes or instances via the “Types of Commands” menu
entry.
Once defined, the command types should appear on the option menu in the Console
GUI for the “Command Type” button when creating a command. However, this
direct usage of the Command Type is not recommended, but it is better to use the
name of a command type as the first argument to PSL execute or PSL popen in a PSL
script. Sample PSL scripts might look like:
This use of command types in PSL execute or popen forms the basis of their use. In
this way, they are often used in conjunction with global channels opened via PSL
popen.
The three main fields are Command Template, Pre-Command Text, and
Post-Command Text.
The fields related to killing the process, such as “Terminate if output contains”, are
not usually used because the functionality offered is too limited. Killing the process is
usually handled explicitly rather than using these flags.
The %{password} macro allows command types to access secure information without
making it available to the PSL code in general. Command types are the only place
where the password built-in macros is available to a KM. For example, a PSL get on
the password variable in the symbol table will fail returning the empty string.
The %{command} macro for command text is the text string that comes from the
second argument to the PSL execute or popen call. In the case of directly launched
commands using this command type, this is the full command-text of a command
explicitly typed as this command type.
In this example, pre-text and post-text are irrelevant and ignored. This means that
using the %{command} macro in the Command Template has limited value. This
usage only achieves a wrapping of the command inside another command, but does
not allow the use of pretext or post-text to send input to the command.
For example, let us make a command that executes a SQL script against an interactive
database command:
For this to operate correctly, say in Oracle, we need this SQL command in the second
argument to be passed to some interactive command. In the Oracle case we could
treat “sqlplus” or “svrmgr” as commands that take SQL scripts as input. This would
lead us to use the following definition of the “MY SQL” command type:
Now, when the PSL execute statement runs from above, the agent will execute the
command “sqlplus”. It will then send the SQL script that is the second command text
argument to PSL execute, down to the standard input of sqlplus. In theory, sqlplus
will then execute the SQL command it receives, and return the output. However, if
you know Oracle, you know that this example does not yet actually work. There are a
few problems.
The first problem is the sqlplus requires a username and password submitted before
it will run a SQL command. We need to extend our command type to use the
%{username} and %{password} macros to securely submit these to the standard input
of sqlplus before sending the SQL script.
Secondly, sqlplus will also produce a lot of spurious output that we might not want
to see in the result. This is more easily solved by using the -s ag to make sqlplus go
into “silent” mode and suppress prompt output.
Thirdly, there is another hidden problem in that sqlplus does not actually exit after
running a single SQL script. This would mean that, assuming we xed both other
problems, the call to PSL execute would still hang indefinitely because it is waiting
for the sqlplus process to die.We could use PSL popen to open an interactive channel,
read the resulting output via PSL read, and then kill the process via the correct
arguments to PSL close. However, it is simpler to ensure that the sqlplus process
receives an “exit;” command after the SQL script. We could either explicitly add
“n\nexit\n\n” to our PSL execute call, but it is better that we place the exit statement
in the command type itself, rather than having to remember it for every call to PSL
execute.
So the question is how to add the username and password before the command text,
and the exit statement after the command text. Fortunately, this is exactly why a
command type has pre-text and post-text. This example above shows that the
command text is sent to the command's standard input. Actually, the full string sent
down the pipe to the sub-process's standard input is the concatenation of the pre-text,
command text, and post-text. If there is pre-text or post-text that are non-empty
strings, then some extra newlines are added. A newline character is added after the
pre-text, after the command text, and after the post-text (at the end). The above
special command type macros are expanded in the pre-text and command template
but not the posttext or command text. If there is no pre-text or post-text, the command
text is sent down the pipe, followed by a newline (to ush). In summary, the string sent
to standard input is:
Therefore we can extend our example command type to handle security login, and to
terminate the session afterwards. The new attributes to the “MY SQL” command type
should be:
Note that pre-text and post-text are not written to the file, and in fact have no effect if
%fcommandFileg is used. This can easily be considered a bug or a feature, as you
please. Note that whenever the command finishes, the temporary file will be
automatically removed by the agent.
NOTE
Command Types cannot use both %{command} and %{commandFile} and will receive a
runtime error from the agent, and any commands using that command type will not execute.
Without all of the features provided by the shell you can't do a great deal but it does
allow you to execute commands directly, rather than wasting memory/process slot
on executing a shell, and then having the shell execute your command. So there are
obvious tradeoffs.
This does have an important advantage: using a newly de ned command type
without a %{shell} macro is more efficient than choosing OS command type for a
command or PSL execute/popen call, since the builtin OS type will spawn a shell to
handle the commands. Hence these extra shell processes can be avoided by using a
command type instead.
The tokenization (parsing of the command string into a list of arguments) performed
by the agent is very, very simple. It splits the command string into a argument list
based only on whitespace. So you can't have a token '"a b c"' and expect it to be split
into one argument, 'a b c'. What you will actually get is three arguments: '"a', 'b', 'c"'.
For example, if you have a command type defined as:
This is erroneous since the “cat” command will get two arguments of “<“ and the
filename, and will try to output a file named “<“. Instead, it is necessary to use
%{shell} here to ensure that the redirection occurs correctly.
This instance should be specified in the 3rd argument of the execute function, in case
you are using the execute function with such a command type in a discovery script.
When execute is called in an infobox, menu command, parameter or recovery action,
PATROL will find the nearest ancestor instance himself. However, since discovery
ran on application level, there is no way the agent can determine the instance you
would like to use.
In case you want to run execute() after you have already created an instance, you can
specify that instance. For example, a discovery script like below will work
...
create(inst,inst,OK);
...
execute(cmdtype,cmd,inst);
...
In case you are trying to discover which instances you should create for your
application class (for example CHILD.km), you can define the command type in
another KM that can even possibly be the parent of CHILD.km (for example
PARENT.km) In this case, PARENT.km should already have at least one instance, for
example, PARENT/TOPLEVEL could be the unconditionally created parent instance.
execute("CMDTYPE","cmd","/PARENT/TOPLEVEL");
Another option (not really advised) would be to create an invisible instance of your
application (unconditionally). In your discovery you can do the following to create an
invisible instance. Note that the name “invisible” is not important, but the fact that
this create statement only has one argument is!
if (! exists("invisible"))
{
# Create invisible object
create("invisible");
}
Now every execute can use the "invisible" as the instance to provide the context.
execute(cmdtype,cmd,"invisible");
14
14 Global Channels
This chapter contains the following sections:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Global Channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Shared Channel Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Channel Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
PSL Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Perl Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Channel Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Shared channels not closed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Channel synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Introduction
The use of a PSL global channel is a design advantage for a KM where an external
process is used to access the application.
If more than one PSL process needs to get information from or to the application, this
is not efficient. A local channel can be accessed only by the PSL process that spawned
the child via the PSL popen function. In this case the more performance effective
approach is to use the PSL share function to convert the local channel to a global
channel. The global channel idea involves only creating a single child process.
Global Channels
A global channel is typically a pipe into an interactive process. This child process can
be as simple as a system command shell, a commandline tool or a self-written
application.
The use of PSL global channels is effective for performance improvement only when
the required conditions arise involving multiple PSL scripts and multiple child
processes. The channel is an internal agent data structure and has no actual physical
runtime presence on the machine.
The use of the global channel has a performance advantage in that it allows one child
process to be used by multiple PSL processes.
When using shared channels, only one subprocess will answer all requests from a
number of PSL processes. This can be very good for performance (applied to a KM
that works on Oracle, this can mean one connection per Oracle instance, instead of
one connection per parameter in the KM).
When you lock() a channel, it will be locked for a certain PSL process. When you
implement locking, you normally lock the channel before writing to it; and you
unlock it after you read the output from the channel.
If another process wants to write to the channel, it waits until it can acquire a lock on
the channel (when the process occupying the channel unlocks it). If no process is
using the channel at the point of request, it will immediately get the lock. You will
also have to implement timeouts on the lock so that a process will not be able to get
the channel for an infinite time. This will block the channel, and your KM will stop
responding.
Be careful when using shared channels in menu commands. The user is used to
getting immediate feedback when executing a menu command. If the execution goes
over the same channel that you're monitoring, a user issuing many menu commands
will influence the monitoring part of your KM (if the user gets the lock all the time).
Another possibility is that the user might not get the lock at all (or within an
acceptable time interval). The user's perception will be that the KM performs badly;
therefore, it's advisable to execute menu commands as one-time commands.
Channel Example
Although channels are quite obvious to work with, it seems that a lot of developers
find it difficult to get started with them. The following example does not only show
how to create channels, but also how you could use channels to communicate to other
processes and do something useful. These examples will pass a job where extended
regular expressions are needed to PERL. The two programs below work in tandem
and communicate over a channel.
PSL Program
Perl Program
$j=1;
while (<STDIN>)
{
if (/ˆ @#!FILTER!#@ns (. )/)
{
$filter =$1;
}
elsif (/ $ lter /)
{
$output.=$ ;
}
elsif (/ˆ @#!EOT!#@/)
{
print length($output)."\n".$output;
$output="";
}
}
Channel Pitfalls
The following are a few drawbacks that may occur with channels.
Channel synchronization
Let's first explain the problem with channel synchronization. For example: you are
using popen(), read() and write() to simulate an interactive telnet session and have a
function (below) which submits a command to the telnet session and returns the
response. Once the channel is open, you call this function several times in succession.
function mcomm(channel,command)
{
write(channel,command."\n");
output=read(channel);
return(output);
}
a=mcomm(handle,"echo apple");
b=mcomm(handle,"echo banana");
c=mcomm(handle,"echo pear");
a = apple
b = banana
c = pear
a=
b = apple
banana
c = pear
or
a=
b=
c = apple
banana
pear
What you want to do is force read() to wait but also allow for the case where the
submitted command may not return any output.
As with any component that communicates with another component, you have to
define some type of protocol.
By adding a minor convention to your code you can make the communication more
stable. The “trick” is to wait until you get a termination signal... (In this case a very
bizarre piece of text). Of course the end-of-output could also be signalled by a prompt
or something else. If the sender can tell you how many bytes it will send, you could
determine the end-of-output by counting the number of returned bytes. One of the
possible ways of getting it done is shown below:
function mcomm(channel,command)
{
local data,output;
output=""; term="@@@_THE_END_@@@";
term cmd="echo\necho ".term."\n";
write(channel,command."\n".term cmd);
while (data=read(channel))
{
output=output.data;
if ( grep("ˆ".term."$",output))
{
output=replace(output,term."\n","");
last;
}
}
return(output);
}
Depending on the product you are talking to, there might be other and better
solutions.
15
15 Events and State Change Actions
This chapter presents the following topics:
Event Basics
The following sections provide basic information on events in PATROL.
Once an event is triggered, the PATROL Agent feeds a snapshot of the event to the
PEM. The PEM then matches the information passed on by the agent against an event
catalog (by default, the Standard Event Catalog). Criteria for type of event, as well as
the extraction of details, are matched against this catalog, and an event type is
established.
In addition to the Standard Event Catalog, the PATROL Developer Console can be
used to build custom event catalogs. The Standard Catalog describes types of events,
and the PEM pushes any event from any performance area through this catalog.
Custom events are sent from custom triggers, also user-definable from the Developer
Console, and pushed through the PEM. The PEM then forwards all details as one
event file, complete with numeric id, to the Event Repository.
When an event occurs, the state change is propagated up through the object hierarchy
(in the case of alarms and warnings). The agent uses this means to notify you visually
on the console. At the same time, the agent logs the event details, which it stores as a
circular log file.
The agent sends the most recent event entries to the event cache at your console. You
can dictate how much temporary disk on your console you will devote to the cache
using the miscellaneous option on the User Preferences Property Sheet.
The rest of the events are stored in the log file at the agent. You query this file when
you run View => Event Repository Statistics from the Console Menu. You can also
query the log file at the agent using the Filter button in Filter Mode.
KM generated events
Out-of-the-box events are automatically triggered by changes in state of objects.
PATROL does not issue events from recovery actions within parameters, but rather
hard codes them into the product. So, with this model, you get coverage from the first
time you start up PATROL. IF something is going wrong, PEM sends an event. If
something comes out of ALARM (is going right), PEM sends an event. If someone
attaches to an agent or runs a command remotely on an agent, PEM sends an event.
What if you want to specify what constitutes an event in your environment? You can
trigger events with one simple PSL command called event trigger(). If a performance
area breaches a threshold; you can use the PSL command from within a Recovery
Action (which is initiated immediately upon notice of the ALARM or WARNING).
If one event occurs, and you wish to issue another based on other conditions at the
time of record, you can issue the PSL command through an existing event type.
If you wish operators the ability to manually issue events as trouble tickets to the
OEM trouble ticket system, you can use the PSL command as a PATROL Menu
Command on the PATROL Console.
Event Catalogs
The PATROL Standard Event Catalog gives you the broadest event coverage
possible. No matter what KM you purchase, or develop on your own, the Standard
Event Catalog ensures that you get a basic set of events with the default triggers.
All KMs will have their ALARMs, WARNINGs, and other default events passed
through the Standard Event Catalog. No matter what the KM, or individual
performance area within that KM, certain events will always be passed along as a
type from within the Standard Catalog. For example, an Event Class 11 from the
Standard Catalog is de ned loosely as any change from OK to ALARM, with a
corresponding Recovery Action. Any event, whether from ORACLE, the CPU area of
the PATROL KM for Unix, or a custom KM, that matches the criteria will fall under
Event Class 11. This creates a small problem in providing event specific help,
troubleshooting, or other documentation as part of the event generation.
SNMP Support: allows you can select if the event is to be sent as an SNMP trap to
another third-party console or trouble ticket system. Escalation Period: defines if the
event is not acknowledged, closed or deleted in the time specified here, the
commands you enter in this field will execute. Use this only if PATROL Consoles will
be the component from which you assign ownership.
Notification Command is executed as soon as this type of event is logged. Use this for
your automated solution, if any.
Expert Advice/Description You also have the ability to enter expert advice. This field
can be passed along to other OEM systems. Description is also customizable from this
Property Sheet and also passed along to other OEM systems.
The steps below demonstrate how to create a custom event catalog. In this example,
let's assume that you created a parameter called CPUNumProcs in the NT CPU
application class, and now you need to build an event catalog and trigger a recovery
action for the CPUNumProcs parameter
1. From the KM tab tree view, expand the [Application Class] folder.
4. Right-click Event Catalog and choose New. The Event Class property sheet
appears.
7. Select the [Description] tab. In the Description area, type “I'm working too Hard!!!
There are %s process running and my CPU utilization is %s.”
■ State change actions can be defined on computer level and application (actually
instance) level.
■ State change actions are executed by the console and will only be executed when
the computer/instance changes state.
■ If you defined a state change on instance level, then if one parameter of your
instance parameters goes into alarm (which changes the state of the instance to
alarm), the alarm state change action will be red off. If a bit later another parameter
will go into alarm, this will not cause the state change to be re-triggered, because
the instance will not change its state again!
■ If you define a state change to a computer class, only computer objects will trigger
the action. Also, computer objects don't get their state from parameters, even when
everything is set to propagate. They get their state directly from the parents of
those parameter's instances (applications). Therefore, there's no 'direct' way for a
computer class to know which parameter put it into alarm.
■ State change actions are very different from recovery actions, which are associated
with parameters and event notification actions, which are associated with event.
The agent executes both recovery actions and event notification actions.
■ State change actions do not have an “auto repeat until...” possibility, but that
doesn't mean this functionality can not be developed by you. For example: on
ALARM/WARN state change, invoke some kind of OS process such as a C
program, perl script, OS script, etc. which would continuously run and
periodically take the desired action. You would then use the OK state change to
execute another OS command which would stop the ALARM state change action.
■ If you define state change actions on a global level, and additionally on a localized
level (for a specific computer/instance) then both the global and the local state
change action will re off in parallel.
echo %{HOSTNAME}
Let's say you want to execute a PatrolCli command that will return all loaded
applications. Code to do that is:
applications=get vars("/","subnodes");
foreach appl (applications)
{
applpath="/".appl;
if ( get(applpath."/__type__") == "APPLICATION")
{
ret data=[ret data,applpath];
}
}
# Now we want to return ret data .
When you would trace the remote PSL process that is launched, you would see that
actually all lines will execute, but you will only get the result of the first line. Which
is:
applications=get vars("/","subnodes");
The reason for this is that your execpsl is actually being executed in the RemPsl
standard event’s notification action. The code in the RemPsl standard event
notification action is basically:
The first line in the RemPsl standard event notification action will cause the execution
of your script since a replacement of %{EV ARG1} will occur with your PSL script
before compilation and execution.
This means that only the result of first line will actually be passed back through the
event trigger().
A way to execute multiple lines and get the result of what you want it to be is by
assigning the result to xxx pem yourself in your script. That means the last line
should be:
a="hello";
xxx pem="a.world!";
And will result in the following script to be executed in the notification action:
xxx pem=0;
a="hello";
xxx pem=a." world!";
event trigger("0","Result","INFORMATION",
"6",xxx pem,"%{EV_ARG2}");
16
16 PATROL and SNMP
This chapter provides an overview of PATROL and SNMP. This chapter contains the
following sections:
SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Master Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Sending Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
SNMP Trap sending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Support for SNMPv2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
SNMP
The PATROL Agent has full agent-side support of SNMP, including MIB browsing
and trap forwarding and receiving. These features are available via PSL and partially
via the PATROL MIB. The agent supports SNMP v1 and v2, except the getbulk
functions.
In PATROL 3.2 and higher, the agent has only a subagent in it, and can communicate
to the separate master agent process or to other master agents as required. This
eliminates the duplicate master agents that often resulted in the use of the 161 and
1611 ports.
Master Agent
A common issue with the master agent is how to know if it is running. If the master
agent is not running, you may receive any of the following errors messages:
■ SNMP.SNMP Command
■ Line# 11: TRACE: ASSIGN settings = `SNMP support is not active.
“Patrol SNMP Sub-Agent connection failed Make sure SNMP Master Agent is
running.” message in Agent Error Log when Patrol agent starts up.
■ Check that the parameter that starts up the SNMP subagent is active.
■ Check if the process snmpmagt is running. If this is not running, the master agent
is not started.
■ If you already have an SNMP agent on the box, the master agent will not be able to
bind to port 161. This port is used whenever an SNMP manager wants to get info
from any SNMP subagent do a netstat -a and check for processes that are bound to
ports 161 or 1161 (8161).
■ To find out which port the master agent will bind to, look in the $PATROL
HOME/lib/snmpmagt.cfg file. You might want to change this number to
something else (8161 for example) to see if the master agent will start on a different
port.
COMMUNITY public
ALLOW ALL OPERATIONS USE NO ENCRYPTION
To disable the parameter (from an NT Console), just select Computer Classes => ALL
COMPUTERS => <YourOS> => Global => Parameters => SNMPstart.
Sending Traps
Another common question is how do I send traps form PATROL to a SNMP
manager.
By default the agent will send the traps. If you want to make the snmpmagt send
traps you have to specify additional configuration (trap destinations in the
snmpmagt.cfg) and enable patrol for this by setting the configuration variable
/snmp/TrapConfTable to YES.
This PSL two-liner will re a trap from the agent to a system myhost;
host="myhost";
snmp_trap_send(host."/162",
host." .1.3.6.1.4.1.1031 6 1 1",
".1.3.6.1.4.1.1031.1 string test",
".1.3.6.1.4.1.1031.2 string test 2",
".1.3.6.1.4.1.1031.3 string test 3",
".1.3.6.1.4.1.1031.4 string test 4"));
targethost = "127.0.0.1";
current time = date();
message format = "RECOVERY_TRAP %s %s %s ".
"Host= %s IP=%s PLATFORM=%s VALUE=%s ".
"STATE=%s MIN=%s MAX=%s TIME=%s";
msg = sprintf(message format, param name, inst name, app name,
hostname,
ipAddress, platform,
param value, status, alarm min, alarm max, current time);
trap origin = ipAddress . " ". bmc enterprise id . " 6 255 1000";
trap text = ".1.2.3.4.5 string " . msg;
■ Unsigned32
■ Counter64
If you hit one of these datatypes, you will probably not see the result, because the
agent can't decode the message properly.
Besides that SNMPv2 introduced a "bulk" get operation. The agent will not respond
to such requests.
A
A PSL Internal function
This appendix contains the following sections:
ret = internal(cmd);
The list of sub-commands for the PSL internal command on Windows NT is:
NT GetRegistryValue
The GetRegistryValue command for PSL internal allows access to leaf node registry
values. There are two possible arguments: the registry entry variable name, and a
sub-identifier for the line within that data. Both arguments are space-separated from
the command as the first argument to PSL internal. Here are some dummy examples
to show the argument formatting:
The first argument must be specified via its full path. The start point of the path is
HKEY LOCAL MACHINE as the root of the tree. Backslash characters are used to
specify subkey hierarchies. This backslash character must itself be escaped in PSL
code. For example, here is a simple call to get a particular registry value:
PCI_0_5=REG_SZ=\Device\Video0
This shows the return string format of three columns. The first is the name, the
second is the type, and the third is the value. In this case, the type is REG SZ, which is
a zero-byte-terminated string.
Backslashes need to appear in pairs due to the lexical tokenization of PSL, as in most
languages. To use initial backslashes would also be incorrect. The case of the
characters used for the names of registry entries is also important. The following are
incorrect examples.
The reason this fails is that there is a space in one of the names. For spaces in names,
there must be explicit quoting around the registry key name. The solution is simply:
PORT=REG_SZ=1987
This shows a port number of 1987 specified for PATROL in the registry.
The error handling of the GetRegistryValue function is similar to all the commands
for the NT internal function. If the wrong number of sub-arguments is given to the
command, a PSL runtime error will be generated to the system output window, errno
will be set, and the empty string is returned by internal.Other errors such as a
non-existent registry entry have similar error handling behavior.
NT GetRegistrySubkeys
The GetRegistrySubkeys command in the internal function returns a list of sub-keys
of a particular key in the registry. The only argument is the name of the registry value
which is placed after a space in the string for the first argument. The parent is
assumed to already be implicitly "HKEY LOCAL MACHINE" and paths relative to
this node are specified via backslash sequences. Some examples of getting a list of
subkeys are:
The return value is a newline-separated list of subkey names. The output of this
command could look like:
ControlSet001
ControlSet002
ControlSet003
DISK
Select
Setup
Clone
CurrentControlSet
ControlSet001
ControlSet002
ControlSet003
DISK
Select
Setup
Clone
CurrentControlSet
The return value from an execution of this command could look like;
Common
KM
Patrol
PATROL Pathfinder
PATROLWATCH
NT PerfObjectExists
The PerfObjectExists command detects whether a performance counter object is
available. The return value is "0" or "1" depending on whether the performance object
exists. The return value is the empty string if there is a missing argument or too many
arguments. The argument to PerfObjectExists can be surrounded in double quotes,
which allows a name containing spaces. Here is an example of determining whether
the CPU processor has its performance monitoring enabled, which is the default:
NT DiskPerfEnabled
The DiskPerfEnabled command to PSL internal checks if disk performance data
collection is enabled. By default this collection is not enabled on Windows NT. The
return value of this command is either "0nn" or "1nn". The newline that is appended
to the result makes it slightly tricky to test the return value, because the newline must
be stripped out. A typical use of DiskPerfEnabled looks like:
ret = internal("DiskPerfEnabled");
ret = trim(ret, "\n");
if ( ret )
f
# Disk performance collection data is available
g
The absence of the trim function will cause the 'if ' statement to succeed even when
the data is not really available. This is a special case of the pitfall involving the
string-like representation of numbers in PSL. The newline character causes PSL to
perform a string not-equalto-empty-string test instead of a numeric test for non-zero.
NT GetPerformanceValue
The GetPerformanceValue command in PSL internal gets one or many values from
the NT performance database. The first possible usage of GetPerformanceValue is
without any of the arguments. This returns a list of performance objects that are
available. The sample code is:
This returns a newline-separated list of performance object names. This can be used
to determine which performance counters are being collected by the operating system
with its current configuration.
However, there can also be arguments to limit the search for performance objects, and
return the values of the performance counters. The syntax of the command is
basically:
Any three of these arguments can be the star character (*) to indicate any one be
considered. Therefore, an example of accessing all the performance objects of the
SYSTEM object is:
When used with normal arguments, the return value for GetPerformanceValue is a
newline-separated list of counter information. Each line has a complicated format.
The first field is the full name of the performance counter, including any whitespace
that is part of the name. This is followed by an "=" character, and the second part of
the counter information is a sequence of strange numbers. For example, the output
from the above call is:
The output from this sequence is a list of counters, with the name, an = sign, and then
a sequence of numbers in a complex comma-separated field.
Processes=10000,28,,177:836469e0
Threads=10000,148,,177:836469e0
Events=10000,362,,177:836469e0
Semaphores=10000,126,,177:836469e0
Mutexes=10000,65,,177:836469e0
Sections=10000,201,,177:836469e0
Note that the second and third arguments cannot be omitted when attempting to get
all the objects.
The instance and counter (second and third) arguments to GetPerformanceValue can
be used to return more specific data. An example is:
Note that the double quotes a necessary because of the space in the name. The
following code would be erroneous because the spaces in the name are not taken into
account:
The processing of performance data values is typically completed by the use of the
CalcPerformanceValue command also in PSL internal. However, it can also be
performed manually via PSL as is now discussed.
Here is some partial output of a GetPerformanceValue call. This shows some of the
features of the return value format:
The number of fields can vary according to the counter returned. However, four
fields are always represented at a minimum even if one or more is empty. Field usage
is based on the actual performance counter being queried, which varies. Counter
fields are either DWORD or 64-bit INTEGER. The Performance Counters are
documented in the Windows NT Resource Kit manuals.
The return value of some numbers typically has a colon (:) character in some fields.
These fields are 64-bit integers or 64-bit timestamps. Because the 64-bit entity is
actually two 32-bit parts, the colon is used to denote that. This basically serves as an
aid in scanning and storage of the field as two 32-bit parts. The first field containing
the counter type should always be decimal. The 64-bit integers and 64-bit timestamps
are in hexadecimal, which poses some problems for PSL processing, although a
solution is shown in a section below.
NT CalcPerformanceValue
The CalcPerformanceValue is typically used to decode the format of counter values
returned by GetPerformanceValue. The main purpose is to compute deltas from
consecutive data points returned by GetPerformanceValue. The format of the
CalcPerformanceValue command is:
In both cases, the counters are in the format as returned by the second half of the
GetPerformanceValue command in PSL internal. This is all the textual numeric data
after the counter name and the "=" character in the return value.
Here is a simple example of a counter value calculation compared with itself. The
result is zero because there is no change in the counter.
counter = "10410400,368717,,177:8344ace0";
ret = internal("CalcPerformanceValue " . counter . " " . counter);
Note that the spaces are necessary to separate the arguments in the command string.
Note that using the counter name as an argument to CalcPerformanceValue causes
failure. The return will be the empty string, and a PSL runtime warning message may
be generated.
This means that the results of GetPerformanceValue need to be stripped of the first
counter name field. This is typically done via nthargf().
NT GetUserRegszKeyValue
This function is aimed at getting the value of user-de ned registry entries with a string
type. As a caveat, this command appears to have some bugs, even in the recent 3.2 NT
agent. The absence of any arguments, or supplying arguments that are not valid seem
to make the agent crash in 3.2, and possibly on earlier agents. This function may well
be best avoided until a post-3.2 release fixes the problem. Even so, reliance on these
functions will impose a compatibility problem for 3.2 and earlier agents.
internal("diskInfo");
internal("diskInfo", lename);
TopLevel = internal("diskInfo");
print(get vars(TopLevel, "nodes"));
The get vars() call will print the names of the filesystems with the "/" character
changed to "-".
For each filesystem, the function stores the computed variables (listed in Table 10) in
the agent table space. These are all available via a PSL get call using the symbolic
name.
For example, after executing PSL internal, the following PSL get command will return
the number of free kilobytes in the "root" filesystem:
print(get("/internal/df/root/freeKB"));
If any of the numeric elements is "-1", this means the value is unavailable. This
possibility applies to the freeInodes, totInodes, availKB, totKB, and freeKB statistical
values. The optional second argument to PSL internal is the name of a file. If
provided, the string returned by PSL internal will the name of filesystem instance that
contains the requested file. This call will also do a "discovery" of other filesystems, as
was done above without the second filename argument, and therefore the only
difference in functionality is the string returned.
Note that the path returned when PSL internal is called with a second argument is the
"real" path without the "/" characters changed to the "-" character. Not all UNIX agent
platforms support diskInfo, although it is currently available on Solaris, HP-UX, AIX,
Digital UNIX, NCR, DGUX, Motorola, SGI, Sequent, and SunOS. However, it is
currently not supported on SCO, Ultrix, Unixware, and Unisys. The "diskInfo"
command is not supported on any non-UNIX platforms such as Windows NT, OS/2,
VMS or MVS.
The PSL internal function on UNIX will return the empty string upon error, or on an
unsupported platform. The PSL errno special variable is not set by the internal
function.
GetProcessCache [procid]
GetProcessInfo [PROC_NAME | PROC_PID] [NAME | PID]
GetQueueInfo [Queuename [...]]
GetSystemInfo [info [...]]
All of these commands are only available on the VMS agent. They are not available on
the UNIX, OS/2,MVS orWindows NT agents. These commands will return a failure
from PSL internal on those platforms.
VMS GetProcessCache
The format of the GetProcessCache command in PSL internal for VMS only is:
GetProcessCache [procid]
This function gets the information used for the process cache. It takes as a parameter
the process id of a process to obtain the cache information for. For example, the
numeric id of a known process can be found via a call such as:
A value of * for the procid argument returns the information for all processes on the
system, with an output format of one process per line. This command example looks
like:
VMS GetProcessInfo
The syntax of the GetProcessInfo command to PSL internal on VMS is:
This function takes as an argument the keyword PROC NAME or PROC PID
followed by a process name or pid. It returns the process id, state, CPUtime, buffered
I/O, direct I/O, and page faults. The return items are one per line in the following
order:
PROCESS ID
STATE
CPU TIME
BUFFERED IO
DIRECT IO
PAGEFAULTS
VMS GetQueueInfo
The syntax of this command to PSL internal on VMS is:
QUEUE_NAME
QUEUE_STATE
QUEUE_TYPE
QUEUE_FORM_NAME_DEF
QUEUE_FORM_STOCK_DEF
QUEUE_FORM_NAME
QUEUE_FORM_STOCK
QUEUE_DEVICE_NAME
QUEUE_SYMBIONT
QUEUE_PRIORITY
QUEUE_PROTECTION
QUEUE_JOB_COUNT_EXEC
QUEUE_JOB_COUNT_PEND
QUEUE_JOB_COUNT_HOLD
QUEUE_JOB_LIMIT
QUEUE_OWNER_UIC
The result is a string, with one line per queue. Each item will be separated by ';'. If you
do not specify any items, the result will contain:
QUEUE_NAME
QUEUE_STATE
QUEUE_TYPE
QUEUE_FORM_NAME_DEF
QUEUE_FORM_STOCK_DEF
QUEUE_DEVICE_NAME
QUEUE_SYMBIONT
QUEUE_PRIORITY
QUEUE_PROTECTION
QUEUE_JOB_COUNT_EXEC
QUEUE_JOB_COUNT_PEND
QUEUE_JOB_COUNT_HOLD
QUEUE_JOB_LIMIT
QUEUE_OWNER_UIC
It is recommended that you not obtain information about multiple queues as this may
impact the performance of the agent.
VMS GetSystemInfo
This function will return information about the system. It can obtain the following
information:
ARCH_NAME
BOOTTIME
NODENAME
NODE_NUMBER
NODE_AREA
VERSION
HW_NAME
AVAILCPU_CNT
CLUSTER_MEMBER
ARCH_NAME
CLUSTER_EVOTES
CLUSTER_NODES
CLUSTER_VOTES
CONTIG_GBLPAGES
FREE_GBLPAGES
FREE_GBLSECTS
PAGEFILE_FREE
PAGEFILE_PAGE
SWAPFILE_FREE
SWAPFILE_PAGE
Unlike the Queue function you must specify at least one item and each item is
returned on a separate line.
B
B Optimizing your code
This appendix contains the following sections:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
First Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Prevent looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Micro Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Another problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Are One-liners faster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Be Cautious with Copy and Paste Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Introduction
Usually, something that keeps most developers busy is trying to optimize code. It
does make one feel more proud of his work after having been able to improve
performance. However, sometimes the optimization turns into an obsession and
more and more into micro optimization where you are not really achieving a lot more
by making changes to the code that seems to work ne.
The problem that we are trying to solve is very simple: “Find the fastest way to get
the last word on a line. We are just interested in the last word.”.
If you want to look for the last word in a line and you are not interested in the number
of words, you can enter the following in awk on UNIX
Although this solution sounds simple, it would require multiple processes in UNIX
and it is not really platform independent. Anyway, since this is a book about
PATROL, you can imagine that we will be looking for the best possible solution in
PSL.
# Don't know how much words in a line, but need last word
number words = 0; foreach word w (myline) f
number words++;
First Optimization
When we look closely at this PSL program, we can see that there is a lot of useless
code in the program. The reason why we are counting the number of words is
because we want to use ntharg to get the last work from the line.
If we look carefully, we see that all that counting is not really necessary, because the
value of “w” will be at the end of the program, the last word,
This will result in an empty loop. Each time this (empty) loop executes the variable
lastword will be assigned the value of the next word in myline. At the end of the
statement, the foreach will exit, but lastword will still have the value of the last word
of the line
Prevent looping
Thus, in this example, there is still a loop and performance might be better without
loops. This section provides information on how you can rewrite the code so you do
not have any loops?
# Don't know how much words in a line, but need last word
number words = 0;
foreach word w (myline)
{
number_words++;
}
# Now get the word
lastword = ntharg (myline, number words);
In our first optimization step, we kept the loop, but just removed everything we
didn't need. What we will try to do now, is rewrite the code, but without loops. This
means to find another way of getting the number of words without a loop.
If we compare this with our first optimized version, we replaced one single line of
code with three lines of code. but without any loops.
If we profile this code, we see that it works faster than the original, but performance is
practically the same as our first optimization.
Micro Optimization
We have already optimized the basic code, and probably this is where we should
stop, but the purpose of this discussion is to go as deep as we possibly could.
It was clear that our foreach loop wouldn't bring us further optimization. Since we
know that looping is quite expensive (although we only used one PSL statement in
our solution and that was foreach)
The second example (as fast as the empty foreach()) however had three statements,
maybe we can optimize something there. Let's take a look at our code again.
Line two and three can be combined in a single PSL function: tail (). Therefore we
found our real performance winner
When I talk about performance gain between optimized 1,2 or 3, it is more in 1/10
seconds than in real seconds, but the difference between the original and 3 will speed
processing with 20-50
Another problem
Now let's try a variation of the problem: “Find the last word in a text block (that
means a block with newlines in it)”
It should be obvious that even our best optimized example will not really be good
enough for this problem.
On a full text block, this piece of code will produce the correct result, but not with the
expected performance, because now our complete multi-line text is changed into a
list.
Now if you want to offer this as a function in a library that more than just you will
use, you could ask yourself which of the optimized versions you should use.
The answer is very simple: make sure to implement the last example, that will
perform best for the worst case scenario and reasonably good for the best case
scenario.
Indeed, the end user of your library doesn't know how your function will behave
with a certain input (and he shouldn't care). The enduser will expect a reasonable
result, no matter what the input will be.
Just as a comparison:
■ In cases where you don't have a newline in the input (so only a single line as
input), the result will be a about 2mean is probably measured in hundreds of a
second.
■ In the cases where you have multi-lines (for example 1000 lines), you will see at
least a 7000
So don't worry about the tail, if you don't know what your input is.
and
tail (ntharg(tail(mytext,1),"1-"),1);
If you want to see what is really happening, just run both pieces of code through the
standalone psl compiler psl -q and you will get:
At a first glance, you will say that this is the same, but if you look carefully there is
more to say than that.
For the one-liner, the PATROL agent has assigned unnamed local variables himself.
These are temporary variables that he needs in order to pass arguments.
These unnamed variables are @temp 1, @temp 2 and @temp 3. In the first dump, you
can see that these unnamed variables have been replaced with my "named" variables.
These are: last line, last word list and last word
So there is definitely no performance gain, but it's even worse if you use one-liners. If
you would have liked to use the result of the (ntharg()) for something else (eg. To get
the first word of the last line), you wouldn't have been able to access that data, since
all @temp variables are internal to the interpreter and cannot be used in PSL.
For example the PSL reference manual is supposed to be educational material and the
majority of the examples in the PSL reference manual are good examples. From time
to time, you will run into an example that was written not suited for what you are
trying to do.
You must understand that the examples are there to explain how the function works,
not to give people the fastest code of doing things. Readability is also very important
in a manual that shows code samples. You also have to keep the examples simple.
From what we learned so far is that optimization doesn't necessarily means complex,
but sometimes introduces some complexity in the thought process.
Just to proof that some things can be done better, I looked in the PSL manual. On PSL
manual page 4-48 you will an example that will display this:
The developer who wrote this, obviously didn't think of all the functions you have at
your disposal in PATROL, but probably wanted to make to code readable.
function main()
{
print("Cosine Waveform:\n\n");
print(" -1 -0.5 0 0.5 1\n");
print(" +----+----+----+----+\n");
i = 0.0;
while (i <= 3.2)
{
printf("%4.2f ",i);
cosine = cos(i );
plot = int(10 cosine ) + 10;
if ( plot < 10)
{
before = "";
j = 1;
while (j <= plot)
{
before = before . " ";
j ++;
}
after = "";
j = plot ;
while (j < 9)
{
after = after . " ";
j ++;
}
print(before,"*",after,"|");
printf(" cosine = %+8.6f\n",cosine);
}
elsif ( plot == 10)
{
print(" * ");
printf(" cosine = %+8.6f\n",cosine);
}
else
{
print(" |");
before = "";
j = 11;
while (j < plot)
{
before = before . " ";
j ++;
}
print(before,"*");
after = "";
j = plot + 1;
while (j <= 22)
{
after = after . " ";
j ++;
}
print(after);
printf("cosine = %+8.6f\n",cosine);
}
i = i + 0.1;
}}
The logic of this program is to build each line character by character. Yet, the program
is not very readable. If you understand what the developer is doing, it is OK, but if
you have to change something, you must be careful what you change. Also, there is a
lot of looping and branching.
The logic of this program is to take a line and put the "*" where it belongs... We just
break the line apart where needed.
This increases the speed incredibly.... (x2) and it (to my understanding) more
readable code.
Another approach might have been that you took the first bit of code, and rewrote it
so it looked more “compact” (we already know this isn't necessarily good).
This examples shows a large one-liner in a loop and that does accomplishes the same
result. It uses advanced functions like nested sprintf 's and nested ternary operators,
it also uses assignments in statements and every other thing you wouldn't do as a
beginning programmer.
Glossary
A
access control list
A list that is set up by using a PATROL Agent configuration variable and that restricts PATROL
Console access to a PATROL Agent. A PATROL Console can be assigned access rights to
perform console, agent configuration, or event manager activities. The console server uses
access control lists to restrict access to objects in the COS namespace.
agent namespace
See PATROL Agent namespace.
Agent Query
A PATROL Console feature that constructs SQL-like statements for querying PATROL Agents
connected to the console. Agent Query produces a tabular report that contains information
about requested objects and can be used to perform object management activities, such as
disconnecting and reconnecting computers. Queries can be saved, reissued, added, or changed.
PATROL offers built-in queries in the Quick Query command on the Tools menu from the
PATROL Console main menu bar. See also Quick Query.
alarm
An indication that a parameter for an object has returned a value within the alarm range or that
application discovery has discovered that a file or process is missing since the last application
check. An alarm state for an object can be indicated by a flashing icon, depending on the
configuration of a console preference. See also warning.
alert range
A range of values that serve as thresholds for a warning state or an alarm state. Alert range
values cannot fall outside of set border range values. See also border action, border range, and
recovery action.
Glossary 227
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
application account
An account that you define at KM setup and that you can change for an application class or
instance. An application account is commonly used to connect to an RDBMS on a server where
the database resides or to run SQL commands.
application class
The object class to which an application instance belongs; also, the representation of the class as
a container (Unix) or folder (Windows) on the PATROL Console. You can use the developer
functionality of a PATROL Console to add or change application classes. See also class.
application discovery
A PATROL Agent procedure carried out at preset intervals on each monitored computer to
discover application instances.When an instance is discovered, an icon appears on the PATROL
interface. The application class includes rules for discovering processes and files by using
simple process and file matching or PSL commands. Application definition information is
checked against the information in the PATROL Agent process cache, which is periodically
updated. Each time the PATROL Agent process cache is refreshed, application discovery is
triggered. See also application check cycle, application discovery rules, PATROL Agent process
cache, prediscovery, simple discovery, and PSL discovery
application filter
A feature used from the PATROL Console to hide all instances of selected application classes for
a particular computer. The PATROL Agent continues to monitor the application instances by
running parameter commands and recovery actions.
application instance
A system resource that is discovered by PATROL and that contains the information and
attributes of the application class that it belongs to. See also application class and instance.
application state
The condition of an application class or an application instance. The most common application
states are OK, warning, and alarm. An application class or instance icon can also show
additional conditions. See also computer state and parameter state.
attribute
A characteristic that is assigned to a PATROL object (computer class, computer instance,
application class, application instance, or parameter) and that you can use to monitor and
manage that object. Computers and applications can have attributes such as command type,
parameter, menu command, InfoBox command, PATROL setup command, state change action,
or environment variable. Parameters can have attributes such as scheduling, command type,
and thresholds.
An attribute can be defined globally for all instances of a class or locally for a particular
computer or application instance. An instance inherits attributes from a class; however, an
attribute defined at the instance level overrides inherited attributes. See also global level and
local level.
B
border action
A command or recovery action associated with a parameter border range and initiated when
that range has been breached. Border actions can be initiated immediately when the parameter
returns a value outside the border range, after a warning or alarm has occurred a specified
number of times, or after all other recovery actions have failed. See also border range.
border range
A range of values that serve as thresholds for a third-level alert condition when it is possible for
a parameter to return a value outside of the alarm range limits. When a border range is
breached, border actions can be initiated. See also border action.
built-in command
An internal command available from the PATROL Agent that monitors and manages functions
such as resetting the state of an object, refreshing parameters, and echoing text. The command is
identified by the naming convention %command_name. See also built-in macro variable.
C
chart
A plot of parameter data values made by the PATROL Console Charting Server. See multigraph
container and PATROL Console Charting Server.
charting server
See PATROL Console Charting Server.
Glossary 229
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
class
The object classification in PATROL where global attributes can be defined; the attributes are
then inherited by instances of the class. An instance belongs to a computer class or an
application class. See also application class, computer class, and event class.
collector parameter
A type of parameter that contains instructions for gathering values for consumer parameters to
display. A collector parameter does not display any value, issue alarms, or launch recovery
actions. See also consumer parameter, parameter, and standard parameter.
commit
The process of saving to PATROL Agent computers the changes that have been made to a KM
by using a PATROL Console. A PATROL user can disable a PATROL Console’s ability to
commit KM changes.
computer class
The basic object class to which computer instances of the same type belong. Examples include
Solaris, OSF1, HP, and RS6000. PATROL provides computer classes for all supported computers
and operating systems; a PATROL Console with developer functionality can add or change
computer classes.
computer instance
A computer that is running in an environment managed by PATROL and that is represented by
an icon on the PATROL interface. A computer instance contains the information and attributes
of the computer class that it belongs to. See also instance.
computer state
The condition of a computer. The main computer states are OK, warning, and alarm. A
computer icon can show additional conditions that include no output messages pending, output
messages pending, void because a connection cannot be established, and void because a
connection was previously established but now is broken. See also state.
configuration file, KM
See KM configuration file.
connection mode
The mode in which the PATROL Console is connected to the PATROL Agent. The mode can be
developer or operator and is a property of the Add Host dialog box (PATROL 3.x and earlier),
an Add Managed System wizard, or other connecting method. The connection mode is a global
(console-wide) property that can be overridden for a computer instance. See also PATROL
Console.
console module
A program that extends the functionality of PATROL Central and PATROL Web Central.
Console modules can collect data, subscribe to events, access Knowledge Module functions,
authenticate users, and perform security-related functions. Console modules were formerly
referred to as add-ons or snap-ins.
console server
A server through which PATROL Central and PATROL Web Central communicate with
managed systems. A console server handles requests, events, data, communications, views,
customizations, and security.
consumer parameter
A type of parameter that displays a value that was gathered by a collector parameter. A
consumer parameter never issues commands and is not scheduled for execution; however, it
has alarm definitions and can run recovery actions. See also collector parameter, parameter, and
standard parameter.
container
A custom object that you can create to hold any other objects that you select—such as
computers, applications, and parameters—in a distributed environment. In Windows, a
container is referred to as a folder. You can drag and drop an object into and out of a container
icon. However, objects from one computer cannot be dropped inside another computer. Once a
container is defined, the object hierarchy applies at each level of the container. That is, a
container icon found within a container icon assumes the variable settings of the container in
which it is displayed. See also object hierarchy and PATROL Console Charting Server.
customize a KM
To modify properties or attributes locally or globally. See also global level and local level.
Glossary 231
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
custom view
A grid-like view that can be created in PATROL Central or PATROL Web Central to show user-
selected information.
D
deactivate a parameter
To stop running a parameter for selected computer or application instances. In PATROL
Consoles for Microsoft Windows environments, deactivating a parameter stops parameter
commands and recovery actions and deletes the parameter icon from the application instance
window without deleting the parameter definition in the KM tree. A deactivated parameter can
be reactivated at any time. See also snooze an alarm and suspend a parameter.
desktop file
In PATROL 3.x and earlier, a file that stores your desktop layout, the computers that you
monitor, the KMs that you loaded, and your PATROL Console user accounts for monitored
objects. You can create multiple desktop files for any number of PATROL Consoles. By default,
desktop files always have a .dt extension. Desktop files are replaced by management profiles in
PATROL 7.x. See also desktop template file.
Desktop tree
A feature of PATROL for Microsoft Windows only. One of the views of folders available with
PATROL for Microsoft Windows environments, the Desktop tree displays the object hierarchy.
See also KM tree.
developer mode
An operational mode of the PATROL Console that can be used to monitor and manage
computer instances and application instances and to customize, create, and delete locally loaded
Knowledge Modules and commit these changes to selected PATROL Agent computers. See
PATROL Console.
agent configuration file, the KM files are not deleted from the PATROL Agent computers, but
the PATROL Agent stops using the KM to collect parameter data and run recovery actions. The
default is that no KMs are disabled. Most KMs are composed of individual application files with
a .km extension. See also preloaded KM, static KM, and unload a KM.
discovery
See application discovery.
distribution CD or tape
A CD or tape that contains a copy of one or more BMC Software products and includes software
and documentation (user guides and online help systems).
E
environment variable
A variable used to specify settings, such as the program search path for the environment in
which PATROL runs. You can set environment variables for computer classes, computer
instances, application classes, application instances, and parameters.
event
The occurrence of a change, such as the appearance of a task icon, the launch of a recovery
action, the connection of a console to an agent, or a state change in a monitored object (computer
class, computer instance, application class, application instance, or parameter). Events are
captured by the PATROL Agent, stored in an event repository file, and forwarded to an event
manager (PEM) if an event manager is connected. The types of events forwarded by the agent
are governed by a persistent filter for each event manager connected to a PATROL Agent.
event catalog
A collection of event classes associated with a particular application. PATROL provides a
Standard Event Catalog that contains predefined Standard Event Classes for all computer
classes and application classes. You can add, customize, and delete an application event catalog
only from a PATROL Console in the developer mode. See also event class and Standard Event
Catalog.
event class
A category of events that you can create according to how you want the events to be handled by
an event manager and what actions you want to be taken when the event occurs. Event classes
are stored in event catalogs and can be added, modified, or deleted only from a PATROL
Console in the developer mode. PATROL provides a number of event classes in the Standard
Event Catalog, such as worst application and registered application. See also event catalog and
Standard Event Catalog.
Glossary 233
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Event Diary
The part of an event manager (PEM) where you can store or change comments about any event
in the event log. You can enter commands at any time from the PATROL Event Manager Details
window.
event manager
A graphical user interface for monitoring and managing events. The event manager can be used
with or without the PATROL Console. See also PATROL Event Manager (PEM).
event type
The PATROL-provided category for an event according to a filtering mechanism in an event
manager. Event types include information, state change, error, warning, alarm, and response.
event-driven scheduling
A kind of scheduling that starts a parameter when certain conditions are met. See also periodic
scheduling.
expert advice
Comments about or instructions for dealing with PATROL events as reported by the agent.
Expert advice is defined in the Event Properties dialog box in a PATROL Console in the
developer mode. PATROL Consoles in an operator mode view expert advice in the PATROL
Event Manager.
F
filter, application
See application filter.
filter, persistent
See persistent filter.
G
global channel
A single dedicated connection through which PATROL monitors and manages a specific
program or operating system. The PATROL Agent maintains this connection to minimize the
consumption of program or operating system resources.
global level
In PATROL hierarchy, the level at which object properties and attributes are defined for all
instances of an object or class. An object at the local level inherits characteristics (properties) and
attributes from the global level. See also local level.
H
heartbeat
A periodic message sent between communicating objects to inform each object that the other is
still “alive.” For example, the PATROL Console checks to see whether the PATROL Agent is still
running.
heartbeat interval
The interval (in seconds) at which heartbeat messages are sent. The longer the interval, the
lower the network traffic. See also message retries, message time-out, and reconnect polling.
history
Parameter and event values that are collected and stored on each monitored computer.
Parameter values are stored in binary files for a specified period of time; events are stored in
circular log files until the maximum size is reached. The size and location of parameter history
files are specified through either the PATROL Console or the PATROL Agent; size and location
of event history files are specified through an event manager, such as the PEM, or the PATROL
Agent.
Glossary 235
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
history repository
A binary file in which parameter values (except those that are displayed as text) are stored by
the PATROL Agent and accessed by the PATROL Console for a specified number of days (the
default is one day). When the number of storage days is reached, those values are removed in a
cyclical fashion.
history span
The combined settings for a parameter history retention level and period. See also history
retention level and history retention period.
I
InfoBox
A dialog box that contains a static list of fields and displays current information about an object,
such as the version number of an RDBMS and whether the object is online or offline. Commands
are run when the InfoBox is opened. Information can be manually updated if the InfoBox
remains open for a period of time. PATROL provides a number of commands for obtaining and
displaying object information in an InfoBox. Only a PATROL Console in the developer mode
can be used to add or change commands.
information event
Any event that is not a state change or an error. Typical information events occur when a
parameter is activated or deactivated, a parameter is suspended or resumed, or application
discovery is run. The default setting for PATROL is to prevent this type of event from being
stored in the event repository. To store and display this type of event, you must modify the
persistent filter setting in the PATROL Agent configuration file.
instance
A computer or discovered application that is running in an environment managed by PATROL.
An instance has all the attributes of the class that it belongs to. A computer instance is a
monitored computer that has been added to the PATROL Console. An application instance is
discovered by PATROL. See application discovery, application instance, and computer instance.
K
KM
See Knowledge Module (KM).
KM configuration file
A file in which the characteristics of a KM are defined through KM menu commands during KM
installation and setup (if setup is required). See also Knowledge Module (KM) and PATROL
Agent configuration file.
KM list
A list of KMs used by a PATROL Agent or PATROL Console. See also Knowledge Module (KM).
KM Migrator See PATROL KM Migrator and Knowledge Module (KM).
KM Migrator
See PATROL KM Migrator and Knowledge Module (KM).
KM package
See Knowledge Module package.
KM tree
A feature of PATROL for Microsoft Windows only.One of two views of folders available in
Windows. The KM tree displays computer classes, application classes, and their customized
instances in the knowledge hierarchy and also displays the Standard Event Catalog. A PATROL
Console in operator mode can only view the KM tree; only a PATROL Console in the developer
mode can change KM properties and attributes. See also Desktop tree and Knowledge Module
(KM).
knowledge hierarchy
The rules by which objects inherit or are assigned attributes. (In PATROL Consoles for
Microsoft Windows environments, classes of objects are represented in the Computer Classes
and Application Classes sets of folders on the KM tree.) Properties and attributes of a
customized instance override those defined for the class to which the instance belongs.
KMs provide information for the way monitored computers are represented in the PATROL
interface, for the discovery of application instances and the way they are represented, for
parameters that are run under those applications, and for the options available on object pop-up
menus. A PATROL Console in the developer mode can change KM knowledge for its current
Glossary 237
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
session, save knowledge for all of its future sessions, and commit KM changes to specified
PATROL Agent computers. See also commit, KM configuration file, KM list, KM Migrator, KM
tree, load KMs, and version arbitration.
L
load applications
Same as load KMs. Most KMs are composed of application files with a .km extension.
load KMs
To place KM files into memory for execution. After configuration and during startup, the
PATROL Agent loads the KM files that are listed in its configuration file and that reside on the
PATROL Agent computer. When a PATROL Console connects to the PATROL Agent, the KM
versions that the agent executes depend on whether the console has developer or operator
functionality. See also Knowledge Module (KM) and version arbitration.
local history
The history (stored parameter values) for an object or instance. See also global level and local
level.
local level
In PATROL hierarchy, the level of a computer instance or an application instance. An object
(instance) at the local level inherits properties and attributes that are defined globally. When
properties and attributes are customized locally for an individual instance, they override
inherited attributes. See also global level.
M
managed object
Any object that PATROL manages. See object.
managed system
A system—usually a computer on which a PATROL Agent is running—that is added
(connected) to a PATROL Console to be monitored and managed by PATROL and that is
represented by an icon on the PATROL interface.
management profile
A user profile for PATROL Central and PATROL Web Central that is stored by the console
server. A management profile is similar to a session file and contains information about custom
views, your current view of the PATROL environment, information about systems that you are
currently managing, Knowledge Module information, and console layout information for
PATROL Central. Management profiles replace desktop files and session files that were used in
PATROL 3.x and earlier.
master agent
See PATROL SNMP Master Agent.
message retries
A feature of UDP only. The number of times that the PATROL Console will resend a message to
the PATROL Agent. The greater the number of message retries, the more time the PATROL
Console will give the PATROL Agent to respond before deciding that the agent connection is
down and timing out. The number of message retries multiplied by message time-out (in
seconds) is the approximate time allowed for a connection verification. See also heartbeat,
message time-out, and reconnect polling.
message time-out
A feature of UDP only. The time interval (in seconds) that the PATROL Console will give the
PATROL Agent to respond to a connection verification before deciding that the Agent
connection is down. The number of message retries multiplied by message time-out is the
approximate time allowed for a connection verification. See also heartbeat, message retries, and
reconnect polling.
message window
A window that displays command output and error messages from the PATROL Console
graphical user interface. See also response window, system output window, and task output
window.
multigraph container
A custom object into which you can drop parameter objects to be plotted as charts. See also
PATROL Console Charting Server.
O
object
A computer class, computer instance, application class, application instance, parameter, or
container (folder) in an environment managed by PATROL. Objects have properties and are
assigned attributes (command types, parameters, menu commands, InfoBox commands, setup
commands, state change actions, and environment variables). Parameter objects use data
Glossary 239
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
collection commands to obtain values from classes and instances. See also object class, object
hierarchy, object icon, and object window.
object class
A computer class or application class. See also class, object, and object hierarchy.
object hierarchy
The structure of object levels in PATROL. On the PATROL interface, computers contain
application folders (containers) representing a loaded KM, application folders contain one or
more application instances, and application instances contain parameters.
object icon
A graphic that represents a computer instance, application class, application instance,
parameter, or container (folder) in an environment managed by PATROL. See also object, object
hierarchy, and object window.
object window
An open object container (folder) that may contain application class icons, application instance
icons, parameter icons, custom containers (folders), and shortcuts. The object window is
displayed when you double-click the object icon. See also application instance, computer
instance, object, and object icon.
operator mode
An operational mode of the PATROL Console that can be used to monitor and manage
computer instances and application instances but not to customize or create KMs, commands,
and parameters. See PATROL Console.
override a parameter
To disable or change the behavior of a local PATROL application parameter. The changes to the
parameter are local to the managed system running the parameter and are stored in the agent
configuration database. You must be granted specific permissions by a PATROL Administrator
through the PATROL User Roles file in order to override parameters. See also PATROL roles.
P
parameter
The monitoring element of PATROL. Parameters are run by the PATROL Agent; they
periodically use data collection commands to obtain data on a system resource and then parse,
process, and store that data on the computer that is running the PATROL Agent. Parameters can
display data in various formats, such as numeric, text, stoplight, and Boolean. Parameter data
can be accessed from a PATROL Console, PATROL Integration products, or an SNMP console.
Parameters have thresholds and can trigger warnings and alarms. If the value returned by the
parameter triggers a warning or an alarm, the PATROL Agent notifies the PATROL Console
and runs any recovery actions associated with the parameter. See also parameter history
repository and parameter state.
parameter cache
The memory location where current parameter data is kept. In the PATROL Agent's
configuration file, you can set the size of the cache, the maximum number of data points that can
be stored, and the interval (in seconds) for emptying the cache.
parameter override
See override parameter.
parameter state
The condition of a parameter. The most common parameter states are OK, warning, and alarm.
A parameter icon can show additional conditions that include no history, offline, and
suspended. A parameter can also be deactivated; when a parameter is deactivated, no icon is
displayed. See also state.
PATROL Agent
The core component of PATROL architecture. The agent is used to monitor and manage host
computers and can communicate with the PATROL Console, a stand-alone event manager
(PEM), PATROL Integration products, and SNMP consoles. From the command line, the
PATROL Agent is configured by the pconfig utility; from a graphical user interface, it is
configured by the xpconfig utility for Unix or the wpconfig utility for Windows. See also
PATROL SNMP Master Agent.
Glossary 241
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
PATROL Console
The graphical user interface from which you launch commands and manage the environment
monitored by PATROL. The PATROL Console displays all of the monitored computer instances
and application instances as icons. It also interacts with the PATROL Agent and runs
commands and tasks on each monitored computer. The dialog is event-driven so that messages
reach the PATROL Console only when a specific event causes a state change on the monitored
computer.
A PATROL Console with developer functionality can monitor and manage computer instances,
application instances, and parameters; customize, create, and delete locally loaded Knowledge
Modules and commit these changes to selected PATROL Agent computers; add, modify, or
delete event classes and commands in the Standard Event Catalog; and define expert advice. A
PATROL Console with operator functionality can monitor and manage computer instances,
application instances, and parameters and can view expert advice but not customize or create
KMs, commands, and parameters. See also developer mode and operator mode.
including line graphs, pie charts, 3-D bar charts, and area plots. Charts can be viewed through
the PATROL Console and printed to a local printer or PostScript file.
PATROL KMDS
See PATROL Knowledge Module Deployment Server (PATROL KMDS).
PATROL KM Migrator
A PATROL utility used to propagate KM user customizations to newly released versions of
PATROL Knowledge Modules.
PATROL roles
A set of permissions that grant or remove the ability of a PATROL Console or PATROL Agent to
perform certain functions. PATROL roles are defined in the PATROL User Roles file, which is
read when the console starts.
Glossary 243
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
patroldev
A domain group that can be set up by a Windows system administrator to restrict user access to
a PATROL Developer Console.When a user tries to start a PATROL Console with developer
functionality, PATROL checks whether the user is in the patroldev group. If the user is not in
the group, a PATROL Console with operator functionality is started instead. See also ptrldev.
pconfig
The command line utility for setting PATROL Agent configuration variables. See also PATROL
Agent configuration file, PATROL Agent configuration variable, wpconfig, and xpconfig.
PEM
See PATROL Event Manager (PEM).
periodic scheduling
A kind of scheduling that starts a parameter at a certain time and reruns the parameter at certain
intervals. See also event-driven scheduling.
persistent filter
A filter maintained by the PATROL Agent for each PATROL Console or event manager that
connects to it. The filter is used to minimize network traffic by limiting the number and types of
events that are forwarded from a PATROL Agent to a PATROL Console or an event manager
(PEM).
polling cycle
The schedule on which a parameter starts running and the intervals at which it reruns; the cycle
is expressed in seconds. See also event-driven scheduling and periodic scheduling.
pop-up menu
The menu of commands for a monitored object; the menu is accessed by right-clicking the
object.
prediscovery
A quick one-time test written in PSL to determine whether a resource that you want to monitor
is installed or running on a monitored computer. If the results are affirmative, the PATROL
Agent runs the discovery script. Prediscovery helps reduce PATROL Agent processing
requirements.
preloaded KM
A KM that is loaded by the PATROL Agent at startup and run as long as the Agent runs. See also
disable a KM and static KM.
property
A characteristic or attribute of an object, such as its icon.
PSL
See PATROL Script Language (PSL).
PSL Compiler
A PATROL utility that compiles PSL scripts into a binary byte code that can be executed by the
PSL virtual machine. The PSL Compiler can also be used to check a PSL script for syntax errors.
The compiler is embedded in the PATROL Agent and PATROL Console (PATROL 3.x and
earlier) and can also be run as a command-line utility.
PSL Debugger
A PATROL Console utility that is used to debug PSL scripts. The PSL debugger is accessed
through a computer's pop-up menu.
PSL discovery
A type of application discovery in which the discovery rules are defined by using PSL. PSL
discovery can consist of prediscovery and discovery PSL scripts.
ptrldev
A form of patroldev that can be used in environments that support domain names no larger
than eight characters. See patroldev.
Q
Quick Query
In PATROL 3.x and earlier, a command on the Tools menu from the PATROL Console main
menu bar that contains built-in predefined commands that you can use to query the agent for
frequently needed information. For example, you can query the agent regularly about all
computer instances, application instances, and parameters that are in a warning or alarm state.
See also Agent Query.
Glossary 245
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
R
reconnect polling
The time interval (in seconds) at which the PATROL Console will try to reconnect to a PATROL
Agent that has dropped the previous connection. The longer the interval, the lower the network
traffic. See also heartbeat, message retries, message time-out.
recovery action
A procedure that attempts to fix a problem that caused a warning or alarm condition. A
recovery action is defined within a parameter by a user or by PATROL and triggered when the
returned parameter value falls within a defined alarm range.
refresh parameter
An action that forces the PATROL Agent to run one or more parameters immediately,
regardless of their polling cycle. Refreshing does not reset the polling cycle but gathers a new
data point between polling cycles.
reporting filter
The filter used by the PATROL Agent when transmitting events to consoles (event cache) from
the event repository (located at the agent) for statistical reports.
response window
An input and output display for many KM menu commands that provides a customizable
layout of the information (for example, the sort method for outputting system process IDs). See
also system output window and task output window.
run queue
See PATROL Agent run queue.
S
self-polling parameter
A standard parameter that starts a process that runs indefinitely. The started process
periodically polls the resource that it is monitoring and emits a value that is captured by the
PATROL Agent and published as the parameter value. Self-polling avoids the overhead of
frequently starting external processes to collect a monitored value. A self-polling parameter
differs from most other parameters that run scripts for a short time and then terminate until the
next poll time.
session file
In PATROL 3.x and earlier, any of the files that are saved when changes are made and saved
during the current PATROL Console session. A session file includes the session-1.km file, which
contains changes to KMs loaded on your console, and the session-1.prefs file, which contains
user preferences. Session files are replaced by management profiles in PATROL 7.x.
setup command
A command that is initiated by the PATROL Console and run by the PATROL Agent when the
PATROL Console connects or reconnects to the agent. For example, a setup command can
initialize an application log file to prepare it for monitoring. PATROL provides some setup
commands for computer classes. Only a PATROL Console with developer functionality can add
or change setup commands.
shortcut
An alias or copy of an object icon in the PATROL hierarchy.
simple discovery
A type of application discovery that uses simple pattern matching for identifying and
monitoring files and processes.
SNMP
See Simple Network Management Protocol.
SNMP trap
A condition which, when satisfied, results in an SNMP agent issuing a trap message to other
SNMP agents and clients. Within the PATROL Agent, all events can be translated to SNMP
traps and forwarded to SNMP managers.
snooze an alarm
To temporarily suspend an alarm so that a parameter does not exhibit an alarm state. During the
user-set snooze period, the parameter continues to run commands and recovery actions, and the
parameter icon appears to be in an OK state. See also deactivate a parameter and suspend a
parameter.
standard parameter
A type of parameter that collects and displays data and can also execute commands. A standard
parameter is like a collector parameter and consumer parameter combined. See also collector
parameter, consumer parameter, and parameter.
startup command
See setup command.
Glossary 247
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
state
The condition of an object (computer instance, application instance, or parameter) monitored by
PATROL. The most common states are OK, warning, and alarm. Object icons can show
additional conditions. See also application state, computer state, parameter state, and state
change action.
state Boolean
A parameter output style that represents the on or yes state of a monitored object as a check
mark and the off or no state as the letter x. Parameters with this output style can have alerts
(warning and alarm) and recovery actions. Numeric data output for the monitored object can be
displayed as a graph. See also stoplight.
static KM
A KM that is not loaded by the PATROL Agent before a PATROL Console with a loaded KM of
the same name connects to the Agent. Once loaded by the agent, a static KM is never unloaded
but continues to run as long as the agent runs, even if all PATROL Consoles with a registered
interest disconnect from the PATROL Agent. If the PATROL Agent stops, static KMs will not be
reloaded. See also disable a KM and preloaded KM.
stoplight
A parameter output style that displays OK, warning, and alarm states as green, yellow, and red
lights, respectively, on a traffic light. Parameters with this output style can have alerts (warning
and alarm) and recovery actions. Numeric data output for the monitored object can be
displayed as a graph. See also state Boolean.
suspend a parameter
To stop running a parameter for selected computers or application instances. Suspending a
parameter stops parameter commands and recovery actions but does not delete the parameter
icon from the application instance window and does not delete the parameter definition from
the KM tree in PATROL Consoles for Microsoft Windows environments. A suspended
parameter can be resumed at any time. You can suspend a parameter from its pop-up menu. See
also deactivate a parameter and snooze an alarm.
T
task
A command or group of commands that can execute on one object or several objects
simultaneously. A task runs in the background and is not part of the PATROL Agent run queue;
a task icon is displayed for each running task.
threshold
A point or points that define a range of values, outside of which a parameter is considered to be
in a warning or alarm range.
U
unload a KM
To delete a KM from a PATROL Console session in order to stop monitoring the KM-defined
objects on all computers. The KM files are not deleted from the directories on the PATROL
Console or the PATROL Agent computers, and the PATROL Agent will continue to run the KM,
collect parameter data, and run recovery actions until no connected console has the KM loaded.
To prevent the PATROL Agent computer from collecting parameter data and running recovery
actions for a KM, disable the KM. If a KM has been flagged as static, then it will not be
unloaded. See also disable a KM, preloaded KM, and static KM.
user preferences
The PATROL Console settings that designate the account that you want to use to connect to
monitored host computers, prevent a console with developer functionality from downloading
its version of a KM to a PATROL Agent upon connection, disable the commit process for a
console with developer functionality, determine certain window and icon display
characteristics, specify the event cache size, and indicate whether startup and shutdown
commands are enabled. A PATROL Console with either developer or operator functionality can
change user preferences.
V
version arbitration
The KM version comparison that PATROL makes when a PATROL Console connects to a
PATROL Agent. By default, KM versions from PATROL Consoles with developer functionality
Glossary 249
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
are loaded rather than PATROL Agent KM versions, and PATROL Agent KM versions are
loaded rather than KM versions from PATROL Consoles with operator functionality.
view filter
A filter that can be created in an event manager (PEM) and that screens events forwarded from
PATROL Agents. Views can be created, stored, and reapplied to host computers.
W
warning
An indication that a parameter has returned a value that falls within the warning range. See also
alarm.
wpconfig
A feature of PATROL for Microsoft Windows only. The graphical user interface utility for
setting PATROL Agent configuration variables. The wpconfig utility can be accessed from a
computer pop-up menu on a computer running a PATROL Agent or a computer running a
PATROL Console with developer functionality. See also PATROL Agent configuration file and
PATROL Agent configuration variable.
X
xpconfig
A feature of PATROL for Unix only. The graphical user interface utility for setting PATROL
Agent configuration variables. You can access the xpconfig utility from an xterm session
command line on a computer running a PATROL Agent or from a pop-up menu or an xterm
session command line on a PATROL Console with developer functionality. See also PATROL
Agent configuration file and PATROL Agent configuration variable.