You are on page 1of 56

Methods of monitoring processes

with Zenoss
Draft
April 2009
Jane Curry
Skills 1st Ltd
www.skills-1st.co.uk

JaneCurry
Skills1stLtd
2CedarChase
Taplow
Maidenhead
SL60EU
01628782565
jane.curry@skills1st.co.uk

Skills1stLtd

24Apr2009

Synopsis
Thispaperdiscussesthreepossiblemethodsforperformingprocessmonitoring;

Usingtheprocessmonitoringcapabilitiesofthenetsnmpagent
Usingsshtoaccessadeviceandrunlocalcommands,Nagiosstylepluginsor
Zenossplugins
UsingZenoss'szenprocessdaemon

Eachofthesemethodswillbeexamined,includinganindepthdiscussiononthe
variousdifferenttypesofpluginthatZenosscanutiliseandtheirstrengthsand
weaknesses.Examplesandscreenshotsareprovided.
Inadditiontomonitoringprocesses,theoptionsforrectifyingfailedprocesseswillbe
explored.ThiscanbedrivenbytheZenosseventssubsystemsoexamplesaregivento
generateeventsfromeachoftheprocessmonitoringtechniques.
Athirdelementofmonitoringprocessesistocollectperformancedataforusein
graphsandthresholdgeneratedevents.Examplesofperformancedatacollection
templatesareincludedforeachofthesshbasedmethods.
ItisassumedthatthereaderisfamiliarwithbasicSNMPconceptsandwithsimple
SNMPconfigurationparameters.Itisalsoassumedthatthereaderisfamiliarwith
settingupcommunicationsusingssh.
ThispaperwaswrittenbasedonstackbuiltZenossCore2.3.3onSLES10.

Skills1stLtd

24Apr2009

Table of Contents
1Overviewofprocessmanagement...................................................................................4
1.1Definingprocessmanagementrequirements.......................................................4
1.2MethodsformonitoringUnix/Linuxprocesses.....................................................4
1.2.1NativeSNMPaccesstoprocessinformation...................................................4
1.2.2Usingsshtogainprocessinformation.............................................................5
1.2.3UsingZenoss'szenprocessdaemontomonitorprocessinformation..............5
2Nativenetsnmpprocessmanagement...........................................................................5
2.1HostResourcesMIB..................................................................................................6
2.2ProcesstableofUCDSNMPMIB...........................................................................7
2.3DisManEventMIB...................................................................................................9
3Monitoringprocesseswithssh.......................................................................................11
3.1Settingupssh..........................................................................................................12
3.1.1Usingtosshtodirectlymonitorprocesses.....................................................14
3.2Nagiospluginarchitecture.....................................................................................14
3.2.1UsingNagiospluginstomonitorprocesses...................................................17
3.3Zenossplugins.........................................................................................................18
3.3.1UsingZenosspluginstomonitorprocesses...................................................21
4MonitoringprocesseswithZenoss'szenprocessdaemon.............................................21
4.1Processconfiguration..............................................................................................22
4.2Processdiscovery.....................................................................................................24
4.3Processstatuschecking..........................................................................................27
5IntegratingprocessmonitoringwithotherZenosscapabilities..................................29
5.1SNMPMIBs,TRAPsandZenoss...........................................................................29
5.1.1ConfiguringeventmappingforSNMPTRAPs..............................................31
5.1.2RespondingtoSNMPTRAPswithZenoss.....................................................33
5.2Zenossandssh.........................................................................................................35
5.2.1UsingZenosstorunstandalonesshcommands...........................................37
5.2.2UsingZenosstorunNagiospluginsthroughssh..........................................47
5.2.3UsingZenosstorunZenosspluginsthroughssh..........................................49
6Conclusions.....................................................................................................................54
References...........................................................................................................................56
Acknowledgements.............................................................................................................56

Skills1stLtd

24Apr2009

1 Overview of process management


1.1 Defining process management requirements
Processmanagementcanencompassawidevarietyofinterpretations:
1. MonitoringforprocessesonUnix/Linux,effectivelyusingoutputfromsome
invocationofthepscommand
2. MonitoringprocessesasdefinedbyentriesinTaskManageronaWindows
system
3. Monitoringforasingleoccurrenceofasimpleprocessname(eg.named)
4. Monitoringforfullpathnameofacommand(eg./usr/sbin/named)
5. Monitoringtheargumentsofacommand
6. Monitoringforminimumand/ormaximumnumbersofoccurrencesofaprocess
7. Alertingonprocessfailureandrecovery
8. Automaticrecoveryfromprocessfailure
WiththeexceptionofmonitoringWindowsprocessesandservices,eachofthese
requirementswillbeconsideredagainsteachofthemonitoringtechniquesdiscussed.
WindowsservicescanbemonitoredbyZenoss'szenwindaemonandprocesses(ie.
programsthatdonotrunasWindowsservicesbutdoappearintheWindowsTask
Manager)canbemonitoredusingthestandardZenosszenprocessdaemon,provided
thetargetsupportstheSNMPHostResourcesMIB.Thussomeofthedetailsinthis
paperarealsoapplicabletoWindowstargets.

1.2 Methods for monitoring Unix / Linux processes


Ultimately,processmonitoringforUnix/Linuxsystemscomesfromsomeformof
runningthepscommand.TypicallythiswillbeachievedeitherthroughSNMPor
throughssh.

1.2.1 Native SNMP access to process information


MostLinuxdistributionsusethenetsnmpagentandnetsnmpisalsoavailablefor
proprietaryUniximplementations.Thispaperwillassumethepresenceofnetsnmp
agents.
netsnmpitselfprovidesanumberofoptionsforretrievingprocessinformation:

HostResourcesMIB(RFC2790supercedesRFC1514)
netsnmpprocesstablesupportfromtheUCDSNMPMIB(netsnmpusedtobe
UCDsnmp)

Skills1stLtd

24Apr2009

NoformofmonitoringistrulyagentlessbutsincemostOperatingSystemsdo
provideSNMP,thenmanagementbySNMPisfairlyclosetoagentlessoncethe
agenthasbeenconfigureditshouldcontinuetodeliverinformationtoamanagement
station.
TherearethreeversionsofSNMP(V1,V2candV3)whereV1andV2chaveverylittle
authenticationorencryptionaspartoftheprotocol,butSNMPV3canprovideboth.
ObviouslySNMPV3willhaveagreaterperformanceoverheadthantheearlier
versions.

1.2.2 Using ssh to gain process information


SecureShell(ssh)canbethoughtofasanotheragentlessmethodforaccessing
information.AswithSNMP,sshtendstobesupportedasstandardbymostOperating
Systemsandwilloperatewithoutinterventiononceconfigured.
sshmanagementsolutionstendtobeheavierinresources.Encryptionwillbe
enforcedatsource,destinationandacrossthenetwork.sshcanpermitanyscriptto
berunatthemanageddevicesoitcanbeasintensiveandcomprehensiveasrequired;
thusansshsolutioncanpotentiallyaddressalltheprocessmanagementrequirements
detailedabove.

1.2.3 Using Zenoss's zenprocess daemon to monitor process information


Zenossprovidesthezenprocessdaemontoquerytheavailabilityandperformanceof
processesonremotedevices.Fundamentally,zenprocessmakesuseoftheZenoss
HRSWRunMapdatacollectorwhichreliesontheHostResourcesSNMPMIBatthe
target.
ProcessesareconfiguredfromthemainlefthandProcessesmenu.Oneautomatic
advantageofusingzenprocessisthat,inadditiontomonitoringforthepresenceofa
process,itwillalsocreategraphsofthatprocess'sCPU,memoryusage,andthe
numberofinstancesoftheprocess(count).

2 Native net-snmp process management


Strictly,anSNMPagentisonlyrequiredtosupportMIB2(whichlargelyprovides
networkinformation);however,manySNMPagentssupportextraManagement
InformationBases(MIBs)asstandard,and,inparticular,manysupporttheHost
ResourcesMIB,agenericMIBthatprovidessysteminformationaboutadevice.The
netsnmpagentcanhavesupportforotherMIBextensions,suchastheprocesstable
oftheUCDSNMPMIBandtheDisManEventMIB,inadditiontotheHostResources
MIB.
Notethatlaterversionsofthenetsnmpagenttendtobedistributedwithsupportfor
manyextensionsalreadycompiledin,butolderversionsmaynothavealltheextra
extensions;inthiscase,youmayneedtogetthesourceofthenetsnmpagentand
rebuildit.Tofindoutwhatyournetsnmpagentsupports,runoneofthefollowing:

Skills1stLtd

24Apr2009

netsnmpconfigsnmpdmodulelist

snmpdDmib_initH(needsrootprivilege)

ToreadMIBinformationfromanSNMPagent,thesnmpwalkcommandisauseful
testingtool.forexample:

snmpwalkv1cpubliczen232system

usesSNMPV1withacommunitynameofpublictoGETthesystemtable
fromthemachinezen232

snmpwalkv3aMD5AfraclmyealauthNoPrivujane2zen232system

usesSNMPV3withMD5authentication,passphrasefraclmyea,anduser
jane2toGETthesystemtablefrommachinezen232

Obviously,theagentonthetargethostmusthavebeenconfiguredtopermitthis
access,initssnmpd.conffile.

2.1 Host Resources MIB


TheHostResourcesMIBdefinedinRFC1514andupdatedbyRFC2790definesmany
standardMIBvaluesformonitoringthehealthofasystem,includingtablesforcpu,
memory,swap,storage,devices,installedsoftware,runningsoftwareandthe
performanceofrunningsoftware.
ThehrSWRunTablecontainsanentryforeachdistinctpieceofsoftwarethatis
runningorloadedintophysicalorvirtualmemoryinpreparationforrunning.This
includesthehost'soperatingsystem,devicedrivers,andapplications.hrSWRunTable
consistsofasequenceofhrSWRunEntryobjectsdefinedasfollows:
HrSWRunEntry::=SEQUENCE{
hrSWRunIndexInteger32,
hrSWRunNameInternationalDisplayString,
hrSWRunIDProductID,
hrSWRunPathInternationalDisplayString,
hrSWRunParametersInternationalDisplayString,
hrSWRunTypeINTEGER,
hrSWRunStatusINTEGER
}
Typically:

hrSWRunIndexistheprocessid(PID)eg.3555

hrSWRunNameistheshortnameoftheprocesseg.named

hrSWRunIDalwaysseemstobezeroDotZero
Skills1stLtd

24Apr2009

hrSWRunPathisthefullpathnameeg./usr/sbin/named
hrSWRunParametersaretheparameterstothecommand(ifany),eg.t
/var/lib/namedunamed(Notethatlonglinesgettruncated!)
hrSWRunTypeisgenerallyanapplicationdenotedbytheintegervalueof4
hrSWRunStatustypicallyisrunnable(2)thoughatleastoneprocessshould
haveastatusofrunning(1)

Ifmultipleinstancesofaprocessarerunningtheneachisreported,withtheprocess
idbeingthedifferentiator.
ThehrSWRunPerftableentryhas2objectsforCPUandmemory:
HrSWRunPerfEntry::=SEQUENCE{
hrSWRunPerfCPUInteger32,
hrSWRunPerfMemKBytes
}
CPUisdescribedasthenumberofcentisecondsofthetotalsystem'sCPU
resourcesconsumedbythisprocess.Notethatonamultiprocessorsystem,thisvalue
mayincrementbymorethanonecentisecondinonecentisecondofreal(wallclock)
time.
Memoryisdefinedasthetotalamountofrealsystemmemoryallocatedtothis
process."
TheindexforbothCPUandmemoryisagaintheprocessid.
Thus,theHostResourcesMIBsatisfiesrequirements1,3,4and5above(monitoring
foraprocess,monitoringthefullpathnameandmonitoringthearguments).Multiple
occurrencesofaprocessarereportedbutthereisnosimplewaytospecifyhowmany
processesshouldberunning.
ToexaminetheHostResourcesprocessinformationonatargetdeviceusingSNMPV1
andacommunitynameofpublic,use:

snmpwalkv1cpubliczen233hrSWRunTable

2.2 Process table of UCD-SNMP-MIB


ThenetsnmpagenthasbecometheubiquitousSNMPagentforLinuxandis
availableformanyothersystems.ItevolvedfromtheUniversityofCaliforniaDavis
(UCD)SNMPagentwhichhadsomeusefulprivateMIBextensions,includingprocess
monitoring.TheprTableoftheUCDSNMPMIBallowsspecificationofaprocess
name(theshortnameasreportedbypsacx)andamaximumandminimumnumber
ofoccurrencesoftheprocess.IfthenumberofprocessesislessthanMINorgreater
thanMAX,thenthecorrespondingprErrorFlaginstancewillbesetto1,anda
suitabledescriptionmessagereportedviatheprErrMessageinstance.Note:This
7

Skills1stLtd

24Apr2009

situationwillnotautomaticallytriggeratraptoreporttheproblemseetheDisMan
EventMIBsectionlater.Thesyntaxwithinthesnmpd.conffileis:
procnamed11
procvmwarevmx34
Thereshouldbepreciselyoneoccurrenceofthenamedprocessrunningandatleast3
butnomorethan4occurrencesofvmwarevmx.
Optionally,snmpd.confcanalsospecifyacommandtoruntoattempttofixthe
problem.Thisisdefinedwithaprocfixline,forexample:
procfixnamed/etc/init.d/namedstart
Notethataprocfixlinemustcomeaftertherelatedprocstatement.Theprocfix
commandwillnotberunautomatically.Itisonlyrunwhenthecorresponding
prErrFixMIBvalueissetfrom0to1.
TheprTableintheUCDSNMPMIBisdefinedasfollows:
PrEntry::=SEQUENCE{

IndexNumber

prIndexInteger32,

prNamesDisplayString,

prMinInteger32,

prMaxInteger32,

prCountInteger32,

prErrorFlagUCDErrorFlag,

100

prErrMessageDisplayString,

101

prErrFixUCDErrorFix,

102

prErrFixCmdDisplayString

103

}
Notethattheindexnumbersforthissequencearenotconsecutive(seerighthand
column).Forexample,theObjectIdentifier(OID)forthe5thinstanceintheprocess
tableforprErrorFlagwouldbe.1.3.6.1.4.1.2021.2.1.100.5,where.1.3.6.1.4.1.2021gets
youtoucdavis,thenext.2.1getsyoutoprTable.prEntry,.100istheprErrorFlagand
thefinal.5istheinstancedenotingthe5thprocessentryinthetable.
Typically:

TheprIndexfieldissimplyanincreasingnumbertoindexintotheprocess
table,startingat1.

prNamesistheshortnameoftheprocesseg.vmwarevmx

TheprErrorFlagissetto1ifthecountvalueexceedsmaxorislessthanmin

Skills1stLtd

24Apr2009

prErrMessagereflectsasuitableerrormessageifprErrorFlag=1.Forexample,
Toofewvmwarevmxrunning(#=1).IfprErrorFlag=0thenprErrMessageis
thenullstring.
prErrFixisusedtotriggertherunningoftheprErrFixCmdcommand.prErrFix
mustbeSNMPSETto1torunthecommand.Thiscaneitherbeachievedwith
anexternalSETcommandorbyusingtheDisManEventMIB

TheadvantageoftheUCDSNMPMIBisthatitcancountthenumberofinstancesof
aprocessandraiseanalertifthecountisnotwithinconfiguredmaximum/minimum
limits.Italsohasthepossibilityoftakingactiontorectifyaprocessproblem.
However,itcannotmonitorforprocesspathnamesorparameters.
Thus,theUCDSNMPMIBsatisfiesrequirements1,3,6,7and8above(monitoring
foraprocess,monitoringthenumberofinstancesofaprocesswithinmaximum/
minimumlimits,alertingonaprocessproblem,andautomaticrecovery).
ToexaminetheUCD_SNMP_MIBprocessinformationonatargetdeviceusingSNMP
V1andacommunitynameofpublic,use:

snmpwalkv1cpubliczen232prTable

Ofcourse,itisperfectlypossibletocombineUCDSNMPMIBprocessmonitoringwith
HostResourcesMIBprocessmonitoring.

2.3 DisMan Event MIB


TheUCDSNMPMIBdoesnotautomaticallyraiseanyTRAPsorNOTIFICATIONs,
norwillitrunanyprocfixcommands,bydefault.TheDisManEventMIB,described
inRFC2981,canbeusedwiththeprTabletoachievethis.
monitorconfigurationlinescanbeaddedtosnmpd.conftomonitorthevalueofa
MIBOIDonthelocalagent;forprocessmonitoring,theprErrorFlagistheobvious
OIDtomonitorforavalueof1.Themonitorconfigurationcanoptionallyraisea
TRAPorNOTIFICATION.monitorcanalsobeusedtotriggerachange(SNMPSET)
inaprErrFixvalue,thusinitiatingarecoveryscript.
monitorconfigurationlinesmandateausernameparameterasthelocalMIBOIDs
willbequeried(SNMPGET)and,inthecaseofchangingprErrFix,anOIDwillbe
changed(SNMPSET).Forthisinternalquerying,SNMPV3isalwaysused,
regardlessofwhatversionofSNMPisusedforexternaldevicestoquerythelocal
agent.WhenconfiguringSNMPV3usersforDisManEventMIBmonitoring,do
ensurethattheuserhasread/writeaccessifyouneedtochangetheprErrFixMIB
value.

Skills1stLtd

24Apr2009

Figure1:snmpd.confwithprocessandDisManEventconfigurationlines

NotewhenconfiguringmonitorstatementsfortheDisManEventMIB,theremustbe
whitespacearoundoperators.
InFigure1above,fourprocessesaremonitored,eachhavingmax/minparameters;in
addition,namedhasaprocfixline.
Ausercalled_internaliscreatedforSNMPV3usewithread/writeaccess;no
authenticationisrequired.Themonitorstatementrequiresauparameterwhich
specifiesanagentSecNamehencetheagentSecNamedefinitiondefining_internalas
avaliduserformonitorqueries.
TheuncommentedmonitorlineprovidesanexamplethatcheckseachprErrorFlagin
theprTable(ieonecheckforeachdefinedprocess)foravalue!=0.Onthiscondition,
theeflagisusedtogenerateanSNMPnotificationcalledProcessEvent,whichis
definedatthebottomofFigure1.TheeparametercaneitherspecifyyourownTRAP
/NOTIFICATION(asshownhere)orcanuseanyTRAP/NOTIFICATIONthatis
definedandavailabletotheagentinaMIBfile.Theeventispassedanumberof
variables(varbinds),eachspecifiedwithaoparameter(wildcard)andthenameofthe
OIDtobesent.Forawildcardedexpression,thesuffixofthematchedinstancewillbe
addedtoanyOIDsspecified.Thusifnamedisindex3intheprTableand
10

Skills1stLtd

24Apr2009

prErrorFlag.3istestedandfoundtobe!=0,thenthevaluesofprIndex.3,prNames.3,
prMin.3etc.willbeincludedontheeventasvarbinds.Thenexttolastfieldinthe
monitorline(Processtableinthiscase)isanadministrativenameforthis
expression,andisusedforindexingthemteTriggerTable(andrelatedtables).
TheactivemonitorlinecheckstheprErrorFlaginstancesevery10seconds(r10)and
evaluatesdeltadifferences(D);themonitorisnotrunonsnmpdagentstartup(S).
Notethatamonitorlineonlyspecifieswhateventwillbesentandunderwhich
conditions.Astandardsnmpd.conftrapsinkline(orlines)willbenecessarytoindicate
whereeventsshouldbesentto.
TheeffectoftheactivemonitorlineistosendanSNMPnotificationwithenterprise
OID.1.3.6.1.4.1.1234.123includingvarbindsthatreporttheproblem,whenevera
processfailstomeetitsconfiguredcriteria.Whentheproblemgoesaway,anevent
withthesameOIDwillbesentandthevarbindswillindicatethegoodnewsnature
oftheevent.
Thesecond,commentedoutmonitorlineinFigure1demonstrateslocalautomationby
runningaSETevent,procfix,whenaprErrorFlaginstance!=0.Thecorresponding
instanceofprErrFixissetto1whichwilltriggeranyconfiguredprocfixaction.In
thecaseofafailednamed,thiswillcause/etc/init.d/namedstarttoberun.
Onmysystem,SuSE10withnetsnmp5.4.119.4,Ifoundthateitherthebadnews/
goodnewseventswouldwork,ortheautomaticprocfixprocessrestartwouldwork;
howeverifbothlineswereconfiguredthenthegoodnewseventwhentheprocess
washealthyagain,wasneversent.Forthisreason,thesecondmonitorlineis
commentedoutitissimpleenoughtoconfigureanactionatZenosstoperforman
SNMPSETontheinstanceofprErrFixtosetthevalueto1andcausetheprocfix
actiontobeexecuted.
Insummary,addingtheDisManEventMIBconfigurationtoanSNMPagentsatisfies
theinitialprocessmanagementrequirements7and8(alertingonprocessfailureand
recovery,andautomaticrecoveryfromprocessfailure).

3 Monitoring processes with ssh


Therearethreewaysthatsshcanbeusedtohelpachieveprocessmonitoring:

Usesshtorunoperatingsystemcommands(eitherbuiltin(psvariations)or
scripts)
UsesshtorunNagiosplugincommands(suchascheck_procs)
UsesshtorunZenosspluginstodeliverprocessinformation(eg.zenplugin.py
processsshd)

TheseoptionsdonotinherentlyrelyonhavingZenossasthemanagementsystem
(eventheZenosspluginsoperatestandalone).Thischapterwilldiscussthebasic
11

Skills1stLtd

24Apr2009

techniquesofssh,NagiosandZenossplugins.Chapter5willthendiscusshowthese
sshmethodscanbeincorporatedwithaZenossmanagementsystem.
Nagiospluginsoffertheadvantageofalargelibraryofsystemandnetwork
managementchecksthatarecodedtoadefinedformat.Zenossunderstandsthe
outputofNagiospluginsandcanuseitautomaticallytogenerateevents.
ThedisadvantageofusingNagiospluginswithZenossisthatyouhavetoinstallthe
Nagiospluginsonanytargetsthatyouwanttoaccessthatwayyouhavetheold
problemofinstallingandmaintaininganagent.
Similarly,theZenosspluginsprovidesomeprecodedfunctionalitybuttheyhavetobe
installedalongwithPython.Zenosshasseveralperformancedatacollection
templatesthatuseZenosspluginslookundertheTemplatestabfor
/Devices/Server/CmdattheDevices,FileSystemandethernetCsmacdtemplates.
AcompromisemightbetowritenativescriptsthatproduceoutputinNagiosformat
whichremovestheneedtoinstallanagentremotely(thoughyoustillhavetogetthe
scriptdeliveredtothetargets).

3.1 Setting up ssh


MostUnix/LinuxOperatingSystemscomewithansshimplementation.PuTTYis
probablythebestknownsshforWindowsplatforms.Communicationisprotectedby
encryptionwhichusuallyrequirespublic/privatekeypairstobegenerated.The
privatekeyneedstobeheldonthesshclient(forexample,aZenossmanager);the
publickeyisneededonthesshserver(forexample,adevicerunningsshd).
TypicallyonaUnix/Linuxsystem,anyuserthatrunssshwillhavea.sshdirectory
undertheirhomedirectorywhichcontainssshkeyfiles;itshouldhave600access
permissions.
Thekeypairsaregeneratedwithautilitygenerallycalledsshkeygen.sshcanuse
eitherRSAorDSAasanauthenticationalgorithmandthereare2versionsofthessh
protocolversion1andversion2.Mostmodernimplementationsofsshshouldbe
usingtheDSAalgorithmandsshversion2.So,ifyouwanttousesshwithaZenoss
managementsystem,usingtheuseridofzenoss,tomanagearemotesystemcalled
binowithauseridofzenrem,generateapublic/privatekeypairusingDSA,forssh
version2,by:

12

Becomingthezenossuseronthemanagementsystem(becauseofthewaythis
useriscreated,youmayneedtosutorootandthenrunsuzenoss)
sshkeygentdsayouwillbepromptedforapassphrasewhichmaybeblank
inspect~/.sshforid_dsaandid_dsa.pubandcheckthedirectoryhas600access
permissions

Skills1stLtd

24Apr2009

copyid_dsa.pubtothemachinebinointothe.sshsubdirectoryoftheuserid
zenrem.Itshouldbecopiedintothefileauthorized_keys(orappendedto
authorized_keysifthefilealreadyexists).
Theprivatekey,id_dsa,remainsontheZenosssystem.Itmusthave600access
permissions.
Thepublickeycanbecopiedtotheauthorized_keysfileofasmanysystemsas
youwanttomanage.

Notethatsomeimplementationofsshuseafilenameauthorized_keys2tohold
version2DSApublickeys.
Ifyouspecifyapassphrasewhengeneratingthekeypairs,thispassphraseisusedto
furtherprotectaccesstotheprivatekey,id_dsaandyouwillbepromptedforthe
passphrasebeforeanysshcommunicationcantakeplace.
Notethatthenamesid_dsaandid_dsa.pubaredefaults.Itisperfectlypossibletouse
differentfilenamesandthentospecifythekeyfilenameaspartofthesshcommand.
So,ifwehaveauser,zenremonamanagedsystem,bino,withthecorrectpublickey
inzenrem's.ssh/authorized_keysfile,youcantestthecommunicationfromtheZenoss
system,asuserzenoss,with:

sshzenrem@bino
Ifyouhaveapassphraseconfigured,youwillbepromptedforit(this
promptisfromthelocalZenosssystemtoaccessthelocalprivatekey).
Ifthisisthefirstsshcommunicationwithbino,anRSAkeyforthehost
binowillbegeneratedandyouwillbeaskedwhethertocontinue
connection.IfyouanswerYesthenthishostkeywillbeaddedintothe
fileknown_hostsunderzenoss's.sshdirectory.

Ingeneral,keypairsmaybeusedsymmetrically;thatis,ifbothclientandserver
havethesameid_dsaprivatekeyandthesamematchingpublickeyintheir
authorized_keysfile,theneithercanactasclient(sshcommand)orserver(sshd
daemon).
Notethattestingsshwithauserzenossontheserverside(iessh'ingintoaZenoss
managementsystem)willnotworkasthestandardZenossinstalldoesnotpermit
loginstotheusercalledzenossthisalsoinhibitssshaccess.
Insummary,youneedtheprivatekey,id_dsa,toauthorizecommunicationoutof
yoursystem(ie.actingasansshclient);youneedthepublickeyinthefile
authorized_keystoauthorizecommunicationin(ieactingasansshserver).Youdon't
actuallyneedthepublickeyinthefileid_dsa.pub.

13

Skills1stLtd

24Apr2009

3.1.1 Using to ssh to directly monitor processes


Oncesshcommunicationsiscorrectlyestablished,anyscriptcanberunonaremote
system,henceanyrequirementsforprocessmonitoringcouldbemet;whether
monitoringforasingleprocessinstance,multipleinstances,exactprocessnameswith
orwithoutprocessparameters.Itisalsopossibletocoderecoveryactionsandto
generatealertsSNMPTRAPs,messagestosyslog,emails,oranyotherformof
notification.Thenegativeaspectofdirectsshcommunicationisthat,ifascriptisrun,
thenthescriptsomehowhastobedistributedtothetarget.

3.2 Nagios plugin architecture


TheZenossDeveloper'sGuide(page18ofthe2.3version)providesareferenceto
NagiospluginAPIdocumentationat
http://nagiosplug.sourceforge.net/developerguidelines.html#PLUGOUTPUT
Chapter2ofthisNagiospaperdocumentstheoutputformatfor:

statusresultoftheplugin

anyperformancedatadeliveredbytheplugin

Basically,Nagiosshoulddeliveronelineofoutput.Statusoutputshouldbeinthe
format:
SERVICESTATUS:Informationtext
Validreturncodesaredocumentedasshowninthefigurebelow.

Figure2:Nagiospluginreturncodes

14

Skills1stLtd

24Apr2009

Iftheplugindeliversperformancedata,itmustfollowthereturncodeandtext,
separatedfromitbytheverticalbarsymbol.

Figure3:Nagiospluginformatfordeliveringperformancedata

Asanexample,thecheck_file_ageNagiosplugintakeswarningandcritical
parametersforage(wandcparametersinseconds)andsize(WandCparameters
inbytes).TogettheusageforanyNagiosplugin,usethehparameterafterthe
plugincommandname(check_file_ageh).ThusaNagiosplugin,check_file_age,
mightrespondasshownbelow:

Figure4:Nagiosplugincheck_file_agewithperformanceoutput

15

Skills1stLtd

24Apr2009

NotethatthispluginhasbeenmodifiedfromthestandardNagiosplugininorderto
deliverperformancedataaftertheverticalbar.
ThepluginisactuallyaPerlscript,themainbodyofwhichisshownbelow:

Figure5:Bodyofmodifiedcheck_file_ageNagiosplugin

Thefirsttwosectionscheckthatafilenamehasbeensuppliedandthatasuppliedfile
nameexists,eachreturninganoutputlinewitharesultcode(UNKNOWNor
CRITICAL).Notethatanexitstatusissuppliedaswellasthestatusaspartofthe
outputline.Themainbodyofthescriptchecksthefileageandsizeagainstwarning
andcriticalthresholds.Theendofthescriptthendeliverstheoutputlinewiththe
resultcode,informationtextandthevaluesforsizeandage;againtheexitstatusis
delivered.
NagiospluginsareinstalledasstandardonaZenossserverunder/usr/local/zenoss/
common/libexec.Nagiospluginscanalsobeinstalledonremotesystemsandrun
standalone.Notethatmanyrequiretheutils.pmfiletobeavailableeitherinthe
samedirectoryasthepluginorinanincludepath,@INC.Ifyoureceiveanerror
messagesayingthatutils.pmcannotbelocated,checkthereported@INCpathanda
symboliclinkcanbeprovidedfromtheactualutils.pmdirectorytooneofthe
directoriesinthepath.

16

Skills1stLtd

24Apr2009

3.2.1 Using Nagios plugins to monitor processes


ThestandardNagiospluginsincludeacheck_procspluginwhichcanbeinstalled
standaloneonadevice.

Figure6:Helpforthecheck_procsNagiosplugin

Examplesarealsogivenattheendofthehelp:

17

Skills1stLtd

24Apr2009

Figure7:ExamplesforusingNagioscheck_procsplugin

Notethatextraoutputcanbeachievedwiththevvvoption(LOTSofverbosity).In
thecaseofthecheck_procsplugin,thisextraflagshowsthatthecommandthatis
actuallyrunis:
/bin/psaxwo'statuidpidppidvszrsspcpucommargs'
Whenconsideringtheprocessmanagementrequirementsatthebeginningofthis
document,theNagiospluginshavepossibilitiesforaddressing1,3,5and6
(monitoringsingleandmultipleinstancesofprocessesbyshortprocessnameandby
consideringtheparametersofaprocess).Thereisnoabilityinthestandardpluginto
takeremedialactionortosendalerts;however,theNagiosAPIisjustthatanditis
perfectlypossibletowriteyourownpluginortomodifysomeofthestandardplugins
provided.Inaddition,theNagiospluginallowsmonitoringbasedonresourcesused,
suchasmemoryandCPU,althoughnoperformancedatavaluesarereturnedbythe
defaultplugin.

3.3 Zenoss plugins


ZenosspluginsareentirelyseparatefromNagiosplugins.Theyarealsosometimes
referredtoasZenPlugins(orevenjustplugins)inthedocumentation.Theyarea
collectionofplatformspecificpythonlibrariesandthezenplugin.pycommand.They
canbeusedtocollectinformationusingssh,fromremotesystems.TheZenossplugins
areonlyusefultomonitoraremotesystemifthatsystemhasPythoninstalledandif
theZenosspluginsaresupportedonthearchitecture(thisbasicallymeanslinux2,
FreeBSDandDarwin).
NotethattheZenosspluginsareonlyusedforcollectingperformancedata;theyare
notaprerequisiteformodellingadevice.
TheZenosspluginscanbedownloadedfromtheZenossdownloadsite(
http://www.zenoss.com/download/links?creg=no)undertheheadingRemote
MonitoringScripts.GoodoverviewinformationisavailableattheendoftheZenoss
18

Skills1stLtd

24Apr2009

FAQathttp://www.zenoss.com/community/docs/faqs/faqenglish/.Thereisalsoa
ZenosspluginsHowToathttp://www.zenoss.com/community/docs/howtos/zenoss
plugins.IfoundthedocumentationforinstallingtheZenosspluginsrather
confusing;thefollowingprocessworkedsuccessfullyonbothSLES10(32bit)and
OpenSuSE10.2(64bit).
Notethatbothpythonandthepythondevelopmentpackagemustbealready
installed.NotealsothatyouneedtoinstallthePythonsetuptoolspackageoryouare
likelytogetanerrormessageaboutanApplicationErrorImportError:Nomodule
namedcommon.
IfoundtheeasiestwaytoinstalltheZenosspluginswasto:
1.GetthelatestZenosspluginspackagefrom
http://www.zenoss.com/download/links?creg=no.IusedtheOthersource
tarballundertheRemoteMonitoringScriptssectionandgotZenossPlugins
2.0.4.tar.gz
2.GetthesourcetarballforthePythonsetuptoolsutilityfrom
http://pypi.python.org/packages/source/s/setuptools/(Igotsetuptools
0.6c9.tar.gz)
3.Asroot,untartheZenosspluginsfile
4.ChangetotheZenossPlugins2.0.4directory
5.Run
python./setup.pybuild
python./setup.pyinstall
6.Pythonpackagestypicallygetinstalledto

/usr/local/lib/python2.5/sitepackages(thedirectorywillbecreatedif
necessary)
7.Untarthesetuptoolsfile
8.Changetothesetuptools0.6c9directory
9.Run
python./setup.pyinstall
10.Asanormaluser,testwith
zenplugin.pylistplugins
11.Notethatzenplugin.pywillbeinstalledinto/usr/local/bin
TheFAQdocumentswhatutilitiesaresupportedonwhicharchitecture:

19

Skills1stLtd

24Apr2009

Figure8:ZenossFAQforZenossplugins

IrepeatZenosspluginsareentirelyseparatefromNagiosplugins.However,
theZenosspluginsimplementtheoutputspecificationofNagioscommands.Notein
theexamplesshowninFigure9thatthereturncodeisprintedalongwith
informationaltext,followedbyaverticalbar,followedbyoneormoreperformance
datavalues.VariousZenossperformancedatacollectortemplates,under
/Server/Cmd/Linux,usetheZenosspluginstodeliverdatavaluesforgraphsfor
Devices,FileSystemandethernetCsmacdtemplates.

20

Skills1stLtd

24Apr2009

Figure9:OutputfromZenossplugincommands

3.3.1 Using Zenoss plugins to monitor processes


AscanbeseenfromthescreenshotaboveinFigure9,thereisaprocessZenossplugin
thattakesaprocessnameasargument.Itdeliverswhetheratleastoneinstanceof
theprocessisrunningbutdoesnotobviouslydistinguishbetweenprocessnameand
arguments,nordoesithelpastothenumberofinstancesthatarerunning.Thereis
noconceptoftheZenosspluginsrunningautomaticrecoveryactionsorsendingalerts
(whichisreasonabletheyaredesignedasatooltoworkwithaZenossmanager
whichcaninterpretoutputfromtheZenosspluginsandcandeliverrecoveryand
alertingactions).

4 Monitoring processes with Zenoss's zenprocess


daemon
Zenosshasseveraltechniquesformanagingprocesses.Fundamentally,thereare
threeseparateelements:

21

Processconfiguration

Processdiscoverythroughthezenmodelerdaemon(every12hoursbydefault)

Skills1stLtd

24Apr2009

Processstatuscheckingthroughthezenprocessdaemon(every3minutesby
default)

ThesedefaultpollingintervalsarecontrolledfromthelefthandCollectors>localhost
menu.

4.1 Process configuration


ThelefthandmenuofthemainZenossGUIprovidesaProcessesmenuforconfiguring
processestomonitor.Noneareconfiguredoutofthebox.

Figure10:ZenossProcessesmenu

Variousparametersareconfigurableforeachprocesstobemonitored:

Figure11:Processdetailsthatcanbeedited

TheNamefieldissimplyadescriptivenametypicallyreflectingtheprocessname.
TheRegexfieldcontrolswhatprocessismonitored.Atrivialexample,suchasin
Figure10above,showsaregexofnamedwhichwillmatchanyprocessnamethat
22

Skills1stLtd

24Apr2009

includesnamedandparameterstotheprocessnameareignored.Theexamplein
Figure11ismorespecifictheprocessnamemuststartwithsnmpd(the^specifies
startofline)andtheparameterstotheprocessarealsoconsideredwhendecidingon
whethertomonitortheprocess.Theregexmustmatchexactlyuptothe
/tmp/snmpd.pidandcanthenhaveanycombinationofcharactersfollowing(the*).
NotethatwithZenoss2.3.3andearlierversions,theIgnoreParametersflag
sometimesappearstobeignored!Forexample,inFigure10abovewhereIgnore
ParametersissettoTrueforthenamedprocess,processesareautomaticallydetected
thathavethestringnamedintheparametersofothercommands.
ProcessesalsohavezPropertieswhichcanfurthermodifybehaviour.

Figure12:zPropertiesoptionsforaProcess

ThezPropertiesare:

zAlertOnRestart

generateaneventwhentheprocessisdetectedagain

zCountProcs

itisunclearwhateffectthishas

zFailSeverity

theseverityoftheeventgeneratedwhentheprocessfails

zMonitor

whethertomonitorforthisprocessonalldevices

SomeofthesezPropertiesareratherproblematical.Thetwoassociatedwithevents
workwell.IfzAlertOnRestartissettoTrue,thenrecoveryofaprocesswillresultina
goodnewseventwithaClearedseverity,whichwillautomaticallyclearapreceding
badnewseventforthatprocessfromthesamedevicethisisstandardZenossevent
correlation.
ThezCountProcszPropertydoesnotappeartohaveanyeffect.Thereisno
opportunitytospecifywhatcountisthecorrectnumberorrange.Evenif
zCountProcsissettoFalse,dataappearstobecollectedforthenumberofinstancesof
aprocessthiscanbeseenintheperformancegraphsforaprocessforadevice.

23

Skills1stLtd

24Apr2009

ThezMonitorzPropertyshouldspecifygloballywhethertomonitorforaprocessonall
discovereddevices.Forsomeprocesses,thiswouldbebettersettoFalseandthe
processmonitorcanthenbeactivatedatthespecificdevicelevel;however,doingso
seemstoresultinveryvariablemonitoringresults(withZenoss2.3.3).Process
monitoringseemsmuchmorereliablewithzMonitorsettoTrue.
AlthoughwithZenoss2.3.3,processconfigurationappearsmorestablethanwith
previousversions,therewassometimesaneedtorestartthezenprocessdaemonafter
processconfigurationtakesplace.
TheStatustabofaspecificprocessshowshowmanyinstancesofaprocessare
running,wheretheyarerunning,andtheirstatus:

Figure13:Statusofthesnmpd_raddleProcess

4.2 Process discovery


Fromadeviceperspective,theostaballowsconfigurationastowhichprocesses
shouldbemonitoredandshowstheircurrentstatus.Thetabledropdownmenu
allowsprocessestobeadded,deleted,lockedandmonitoringenabledordisabled.This
shouldbeusedifaprocesshasbeenconfiguredbutwithzMonitor=False.
Onceprocessesthemselveshavebeenconfiguredasdescribedintheprevious
subsection,thenwheneveradeviceismodelled,acheckwillbemadeforall
processeswhosezMonitorzPropertyissettoTrue(eithergloballyorforaspecific
device).AnentrywillautomaticallybeaddedtotheProcesstableunderthedevice's
ostabforprocessesthatarediscovered.Bydefault,zenmodelerrunsevery12hours
butanydevicecanberemodelledfromthedropdowntablemenu>Manage>Model
Device.
Thecorollaryisalsotrue;ifadeviceremodeltakesplaceandaconfiguredprocessis
notrunningthenitisautomaticallyremovedfromtheprocesssectionoftheostab
andmonitoringforthatprocessforthatdevicestops,atleastuntilthenextremodel.
24

Skills1stLtd

24Apr2009

Thiscanbeveryinconvenientifanimportantprocesshappenstobedownonthe
periodicremodel.Onewaytopreventthishiatusistoselecttheprocessforthedevice
andusethetabledropdownmenutoLockfromDeletion.Unfortunately,this
sometimesseemstoproduceadverseeffectswhichresultinchangesoftheprocess
statusnotbeingmonitored.

Figure14:Deviceostabshowingprocesseswithstatus

Fundamentally,thezenmodelerdaemonwillusethediscoveryprotocol(s)configured
foradevice,todiscoverprocesses.IfthedevicesupportsSNMP,thenitisusuallythe
HostResourcesMIBhrSWRunTablethatwillprovideprocessinformation.Modelling
collectorsforadevicearespecifiedfromthetabledropdownMore>CollectorPlugins
menu.Thezenoss.snmp.HRSWRunMapisthecollectorthatgatherprocess
informationfromtheHostResourcesMIB.

25

Skills1stLtd

24Apr2009

Figure15:ModellingcollectorpluginsforadevicewhichsupportsSNMP

Tobetterunderstandwhatthemodellingprocessdoes,tryrunningzenmodeler
standalone,withfulldebuggingturnedon:
zenmodelerrunv10dgroup100linux.class.example.org
Youshouldbeabletoseetheprocesstableentriesbeingreturned.
ForadevicethatdoesnotsupportSNMP,processmodellingcanstilltakeplaceusing
thezenoss.cmd.linux.processmodellingcollector.Notethatthesemodellingcollectors
donotrequiretheZenosspluginstobeinstalledonaremotesystemsimple
operatingsystemcommandsarerun,overssh,ontheremotesystem(sozProperties
needtobeconfiguredforadevicetopermitsshaccess)..

26

Skills1stLtd

24Apr2009

Figure16:ModellingcollectorpluginsforanonSNMPdevice

Again,tobetterunderstandwhatishappening,runzenmodelerwithfulldebugging(
v10)fromacommandline.

4.3 Process status checking


Onceprocessesarediscoveredforadevice(modelled),thezenprocessdaemonchecks
thestatusofthoseprocesses,bydefaultevery3minutes.Theprocesstableinthe
device'sostabshouldshowagreeniconforahealthyprocessandarediconfora
missingprocess.
Eventsoftheconfiguredseveritywillbegeneratedwhentheprocessismissingand
thecorrespondingclearedeventwillbegeneratedifzAlertOnRestartissettoTrue,
whentheprocessisdetected.
NotethatwithZenossversionspriorto2.3.3therewasabugdescribedinTRACticket
3270wherebyprocessstatuswasalwaysreportedasup,evenwhendown,butthis
apparentlywasonlyadisplayproblemwiththestatusiconandeventswereactually
stillgeneratedaccurately.
IftheprocessNamefieldisselectedintheostab,thenperformancedataforthat
processshouldbedisplayed.(NotethattheNameandClasscolumnsgotswapped
aroundbetweenZenoss2.2and2.3.).
Thereisasingleperformancedatacollectortemplate,OSProcess,thatdefineswhat
datatocollect.Itcanbeexaminedbydrillingintotheperformancegraphsfora
processonadevice,andthenselectingtheTemplatestab.
27

Skills1stLtd

24Apr2009

Figure17:OSProcesstemplateforcollectingprocessperformancedata

Thetemplatedefinesthreedatasourcesfor:

count(regardlessofwhetherthezCountProcszPropertyisTrueorFalse)

cpu

mem

EachofthesedatasourcesapparentlyareoftypeSNMPbutnoOIDsourceisgiven.
Strangely,thesegraphsarepopulatedwithdataevenso;however,ifthedevicehasno
SNMPaccessthendataisnotcollected(eventhoughtheprocessmodellingcollector
candetecttheprocess).
Ifloggingisincreasedforthezenprocessdaemon,itispossibletoseethatitisactually
zenprocessthatcollectsthisperformancedata,nottheusualzenperfsnmpdaemon.
Loggingcanbeincreasedforanydaemon,fromtheZenossGUI,byselectingtheleft
handSettingsmenu,choosingtheDaemonstabandclickingtheeditconfiglink.
Simplyaddalinewith:
logseverity10
andrestartthedaemonfromtheDaemonstabpage.

28

Skills1stLtd

24Apr2009

Figure18:IncreasingloggingforZenossdaemons

Insummary,Zenossprocessmonitoringcandiscoverprocessesondevicesand
subsequentlymonitorthoseprocesses.Withregardtotheprocessmanagement
requirementsdefinedatthestartofthisdocument,zenprocessmonitoringsatisfies1,
3,4,5,6,7and8tosomeextent;thatis,monitoringforoneormoreoccurrencesofa
process,basedonexactorpartialprocessnamesandprocessarguments;by
thresholdingtheprocesscount(whichisautomaticallygatheredbyzenprocess)then
alertsonmaximum/minimumnumbersofinstancesofaprocesscanberaised.The
zenprocessmechanismnotonlygenerateseventsautomaticallybutcanalsogenerate
clearingevents.Althoughzenprocessitselfcannottakeautomaticremedialaction,
theZenosseventprocessingsubsystemcan.

5 Integrating process monitoring with other Zenoss


capabilities
Sofar,anumberofdifferentprocessmonitoringtechniqueshavebeendiscussed:

SNMPusingvariouscombinationsofMIBsandTRAPs

sshtoruneitherOperatingSystemcommandsorremotescripts

Nagiosplugins

Zenossplugins

Zenosszenprocessmonitoring

Thefirstthreetechniquesdon'tmandateaZenossmanager.StrictlytheZenoss
pluginscouldrunstandaloneanddeliveroutputtoadifferentmanager;howeverall
thesemethodsintegratewellwithZenoss.

5.1 SNMP MIBs, TRAPs and Zenoss


ZenosshascomprehensivefacilitiestoreceiveandinterpretSNMPTRAPsand
NOTIFICATIONs(NOTIFICATIONsareeffectivelySNMPV2TRAPsandare
handledinasimilarwaybyZenoss;intheensuingdiscussionTRAPwillbeusedto
29

Skills1stLtd

24Apr2009

embraceboth).SomeTRAPsareconfiguredwhenZenossisinstalled(suchaswarm
start,coldstart,authentication,linkupandlinkdown);anyTRAPcanbeconfigured
throughtheZenossGUI,basedontheenterpriseOIDandthespecificTRAPnumber.
AllthevarbindsontheTRAPareavailableasuserdefinedfieldsontheDetailstabof
adetailedevent.Bycreatingeventmappings,eventscanbefurtherdistinguished
usingregularexpressionstoparsetheevent'ssummaryfield.Pythonrulescanbe
usedinmappingstotestinformationfromtheTRAPagainstothercriteria;for
exampledifferentactionscouldbetakenbasedonwhichdevicesenttheTRAP,
whetherthedeviceisamemberofaparticularLocationorGroupandonthe
Productionstatusofthedevice.
TheTRAPvarbindscanalsobeanalysed.Dependingonwhethercriteriaaremet,an
eventmappingtransformcanberunthisistypicallyoneormorePython
statementsthatcanmodifymanyofthecharacteristicsofboththeeventand/orthe
devicethatgeneratedtheevent.Asimpleexamplewouldbetochangetheseverityof
theeventfordevicesinaparticularGroup.
Foramuchmorecomprehensivediscussion,seemyZenossEventManagementpaper
availableat
http://www.zenoss.com/Members/jcurry/zenoss_event_management_paper.pdf/view.
ThecombinationintheUCDSNMPMIBofprocessmonitoring,theprocfixparameter
tocustomisearecoveryaction,andtheabilityoftheDisManEventMIBtotriggera
recoveryaction,caninterworkwithaZenossSNMPmanagertoactivatetherecovery.
Takethescenariowhereaprocess,named,hasfailedandtheDisManEventMIB
generatesanenterprisespecificTRAPtoZenoss,includingvarbindparametersfrom
theUCDSNMPMIBprocesstable.Thesnmpd.confconfigurationfilecanbeseenin
Figure1.
namedhasaprocfixlinewhichspecifiestorun/etc/init.d/namedstartbutthisonly
happenswhenthematchinginstanceofprErrFixissetto1.Themonitorline
generatesanevent(strictlyanSNMPV2NOTIFICATION)calledProcessEvent,which
isdefinedinthesamesnmpd.conf(ifyoudon'tspecifyyourowneventthenadefault
eventfromtheDisManEventMIBwillbesent).Themonitorlinepassesallthe
parametersfortherelevantinstanceoftheUCDSNMPMIBprocesstable.The
monitoristriggeredbytherelevantprErrorFlag!=0.

monitoru_internalr10DSeProcessEventoprIndex
oprNamesoprMinoprMaxoprCountoprErrorFlago
prErrMessageoprErrFixoprErrFixCmd"Processtable"
prErrorFlag!=0

notificationEventProcessEvent.1.3.6.1.4.1.1234.123

Asdocumentedearlier,thenetsnmpagentdoesnotseemabletoreliablygenerate
bothanotificationandaseteventtoautomaticallyrunaprocfixscript;hencea
ZenossmanagercouldbeusedtoperformtheSNMPSETonthecorrectprErrFixMIB
30

Skills1stLtd

24Apr2009

variable.ThisisprobablybetterpracticethanhavingtheSNMPagentautomatically
fixtheproblemastherewillbeanaudittrailifitisfixedinZenoss.

5.1.1 Configuring event mapping for SNMP TRAPs


AneventmappingshouldbecreatedfortheeventgeneratedbytheDisManEvent
MIB.1.3.6.1.4.1.1234.123.StartbycreatinganeweventClass,whose
eventClassKeyissimplytheeventOID.Intheexamplebelow,aneweventclass,
Skillsiscreatedwithaneventsubclassofnet_snmp_proc.

Figure19:Eventmapping1.3.6.1.4.1.1234.123foreventclass/Skills/net_snmp_proc

EventssimplymatchontheeventClassKeyof1.3.6.1.4.1.1234.123thereisnoRuleor
Regexmatching.
Aneventmappingtransformisappliedinordertogenerateamoreusefulevent
summary.
forattrindir(evt):
ifattr.startswith('1.3.6.1.4.1.2021.2.1.100'):
evt.index=attr.replace('1.3.6.1.4.1.2021.2.1.100.','')
evt.process_name=getattr(evt,'1.3.6.1.4.1.2021.2.1.2.'+evt.index)
evt.errorFlag=getattr(evt,'1.3.6.1.4.1.2021.2.1.100.'+evt.index)
evt.errFixCmd=getattr(evt,'1.3.6.1.4.1.2021.2.1.103.'+evt.index)
ifevt.errorFlag==1:
evt.summary=evt.process_name+'processisunhealthy'
evt.severity=5

31

Skills1stLtd

24Apr2009

ifevt.errorFlag==0:
evt.summary=evt.process_name+'processishealthy'
evt.severity=0

ThetransformlooksfortheuserdefinedeventfieldthatrepresentstheprErrorFlag
varbind(1.3.6.1.4.1.2021.2.1.100).RememberthattheUCDSNMPMIBhasatable
associatedwithprocessesweneedtogetattheindexintothattable,whichisthe
lastnumberoftheOID,sothetransformgetstheindexintouserdefinedeventfield,
evt.Index,theprocessnameintoevt.Process_nameandtheerrorflagintoevt.errorFlag.
ThetransformalsogetstheprErrFixCmdvaluealthoughitisnotactuallyused.
Atestthenchecksevt.errorFlag.Forabadnewsevent,thesummaryissettoa
usefulcommentandtheseverityissettoCritical;foragoodnewsevent,theseverity
issettoCleared.ThismeansthatZenoss'sautomaticgoodnewsclearsbadnews
logicwillapply.

Figure20:DetailstabofeventdetailforSNMPTRAP
1.3.6.1.4.1.1234.123showingTRAPvarbinds

TheresultingZenosseventappearsasshowninthenextFigure.

32

Skills1stLtd

24Apr2009

Figure21:"Badnews"eventfromnetsnmpagentfornamedprocess

AscanbeseenfromFigure20,theSNMPTRAPvarbindsincludetheprocfix
prErrFixCmdparameter/etc/init.d/namedstartasOID.1.3.6.1.4.1.2021.2.1.103.3
andthestatusofthetrigger,OID.1.3.6.1.4.1.2021.2.1.102.3,theprErrFixflag.

5.1.2 Responding to SNMP TRAPs with Zenoss


ToautomaterecoveryfromprocessfailureusingZenoss,therelevantprErrFixflag
needstobesetto1usingSNMP.BearinmindthatthiswilluseanSNMPSET
commandsoSNMPauthenticationmustpermitSETsaswellasGETs.
OnewaytoconfigureZenossresponsesistocreateEventCommandswhicharerunby
thezenactionsdaemon;however,ourresponseneedsaccesstotheTRAPvarbindsto
determinetheprTabletableindexandtosettheappropriateprErrFixOIDvariable,
andunfortunately,ZenossEventCommandsdonothaveaccesstouserdefinedevent
fields(ie.thevarbinds).
Forthisreason,theSNMPSETcommandwillberunbyextendingtheeventmapping
transformgiveninFigure19.AnyPythonprogramcancallOperatingSystem
commands(andthat'sallaneventtransformis!).Tousesuchcommandstheos
Pythonmoduleneedstobeimported,thecommandtextneedstobesetupandthen
theos.systemmethodiscalled.

33

Skills1stLtd

24Apr2009

Figure22:EventmappingtransformincludingactiontoSETthecorrectprErrFixvariabletotrigger
processrestart

Notethattheshellcommandshouldallbeononeline.
importos
......
snmpVer=dev.zSnmpVer.replace('v','')
shellcmd='/usr/bin/snmpsetv'+snmpVer+'a'+dev.zSnmpAuthType+'
A'+dev.zSnmpAuthPassword+'lauthNoPrivu'+
dev.zSnmpSecurityName+''+dev.manageIp+'
1.3.6.1.4.1.2021.2.1.102.'+evt.index+'i1
os.system(shellcmd)

Theshellcommandsimplyinvokesthesnmpsetcommand.Theexampleaboveisfora
classofdevicesthatsupportSNMPV3sotheauthenticationtype,theauthentication
passwordandtheSNMPV3usernamemustbesuppliedasparameterstosnmpset.
Ratherthanhardcodethese,theycanbeaccessedfromthezPropertiesofthedevice
thatraisedtheinitialTRAP,alongwiththeIPaddressofthatdevice,andtheversion
ofSNMPtouse.TheonlygotchaisthatthezSnmpVerzPropertyrespondswithv3
(inthiscase)thesnmpsetcommandrequiresavparameterfollowedbyaspaceand
aversion(1,2c,3)soanextrastepisshownwhichstripstheleadingvoffthe
zSnmpVerzProperty.

34

Skills1stLtd

24Apr2009

TheendofthesnmpsetcommandconcatenatestheOIDfortheprErrFixvariablewith
thecorrectindexfromtheuserdefinedevt.indexvalueandsetsthevalue,oftypeI
(INTEGER)tothevalue1inotherwords,runtheconfiguredprErrFixCmd,
/etc/init.d/namedstart.
DoensurethatZenosshasbeenconfiguredcorrectlywithSNMPzPropertiesfor
devicesand/ordeviceclasses.

Figure23:ZenossSNMPzPropertiesforanSNMPV3deviceclass

Inpractise,allthisexplanationtakesfarlongerthantheautomationdoes!

5.2 Zenoss and ssh


Eachdeviceclassand/orspecificdevicecanhavezPropertiesconfiguredforssh
communications.Onceaccomplished,anyunderlyingZenosssshcommandswill
simplyusethoseparameters.

35

Skills1stLtd

24Apr2009

Figure24:ZenosssshzPropertiesfordeviceclass

Thecrucialparametersare:

zCommandPassword

thisisthepassphraseifonewasdefined

zCommandPath

pathforremotecommands

zCommandSearchPath

pathforremotecommands(Notethatthis
currentlyseemstohavenoeffect)

zCommandUsername

theusernamealreadysetupforssh

zKeyPath

wherethesshprivatekeyfileis

Notethatthescreenshotabovedemonstratesthepossibilityofusinganonstandard
nameforthekeyfile,id_dsa_bino_et_al.Thisfileshouldbeinthezenossuser's.ssh
directory.
Notethatifnonstandardkeyfilenamesareused,Zenossappearstoneedthepublic
keyfile(id_dsa_bino_et_al.pub)inthe.sshdirectory,inadditiontotheprivatekey
file.

36

Skills1stLtd

24Apr2009

5.2.1 Using Zenoss to run stand-alone ssh commands


Anycommandcanpotentiallyberunonaremotesystemusingssh.Ifaspecific
combinationofprocessesisrequiredtodefineahealthyservice,thenascriptmaybe
theeasiestwaytoaccomplishthis.Asasimpleexample,considerthescriptbelow:

Figure25:Shellscripttocheckforspecificprocesses

ThescriptischeckingfortwoVMwareprocesses,oneforamachinecalledserver,the
otherforamachinecalledgroup100linux;thesetwoVMstogethermakeupthe
raddleapplication.Thescriptwillreturnnumericvaluesforthenumberofrelevant
VMwareprocesses,thenumberofserverprocessesandthenumberoflinux
processes.TheexitcodewillbeOKifbotharerunning,WARNINGifonly1is
runningandCRITICALifbotharedown.Noattemptismadeinthisscripttorectify
anyproblem,butpotentially,recoveryactionscouldalsobeincluded.
ThisscriptuseselementsoftheNagiosAPItoreturnasinglelineofoutputwith:

37

Thestatusofthescript,followedbycolon,followedbytextualinformation
Skills1stLtd

24Apr2009

Averticalbar
Performancedataintheformatlabel=value.Multipleentriesarespace
separated

ThescriptalsoreturnsanexitstatusasdefinedbyNagios0=OK,1=WARNING,2
=CRITICAL,3=UNKNOWN.
Tomakeuseofacommandscript,theeasiestmethodistosetupaZenossperformance
datacollectortemplate.Notethatitisgoodpracticetocreatetemplatesatadevice
classlevelotherwise,ifitiscreatedforaspecificdevice,thereisnosimplewayto
laterapplythattemplatetootherdevices.DataisactuallycollectedbyZenoss's
zencommanddaemon.
Aperformancedatacollectiontemplatehasanumberofelements:

DataSources

howtocollectdata

Thresholds

rangesforhealthydata

GraphDefinitions

whattoplotandhowtoplotit

TheDataSourcespecifieswhatcommandtorun,wheretorunit,andhowtorunit.

Figure26:DefiningtheprocsDataSourceintheraddle_proc_checkperformancedatacollectortemplate

IntheDataSourcedialogue:

38

Skills1stLtd

24Apr2009

SourceTypeshouldbeCOMMAND.ThedropdownwillcertainlyofferSNMP
asanotheralternative.IfotherZenPacksareinstalledthenothertypesmay
alsobeavailable.
Tousethisdatasourceonremotesystemsoverssh,ensuretheUseSSHboxis
True
TheComponentfieldisusefulwhenprocessingeventsforexample,itisoneof
thefieldsusedtodeterminewhetheraneventisaduplicate.Thecomponent
fielddoesnotneedtoalreadyexistanywhereelseitissimplyatextstring.
raddlehasbeenusedhere.
TheEventClassfieldwilldefaultto/Cmd/Failbutcouldusefullybesettoan
existing,locallydefinedeventclass.Heretheclassissetto/Skills/raddle.
TheCycleTimeishowfrequentlythezencommanddaemonwillrunthescript.
TheCommandTemplateisthescriptyouwanttorun.Ifafullyqualified
pathnameisprovidedthenitwillbehonoured;otherwise,zencommandwill
consultthezPropertiesforadeviceandwillprependthezCommandPathtothe
filenamegivenintheCommandTemplate.
Don'tforgettousetheSavebuttonaftercompletingdefinitions
NotethattheTestbuttondoesnotappeartoworkforinvokingremote
commands.ItreturnsaNosuchfileordirectoryerror.Similarlythe
zentestcommandutilityreturnsthesameerrorforremotescripts.
Theeasiestwaytotestthescriptoversshistorunthezencommandwithfull
debug;forexample:
zencommandrunv10dbino.skills1st.co.uk

ThebottompartoftheDataSourcedialoguemapsthedatathatthescriptcollectsinto
ZenossDataPointsthatcanbethresholdedandgraphed.Rememberthatthescriptin
Figure25deliveredthreedatavaluesaftertheverticalbarontheoutputlineprocs,
serverNumandlinuxNum.ThedefinitionsoftheDataPointsmustmatchthese
labelnamesexactly.

39

Skills1stLtd

24Apr2009

Figure27:DefiningtheprocsDataPointintheprocsDataSource

Typically,DataPointdefinitionscanbeleftatdefaultshavingensuredthatthename
matchesthelabelthatthescriptdelivers.
TheZenossnameforaDataPointistheconcatenationoftheDataSourceandthe
DataPointnames;hence,inthescreenshotabove,theDataPointisprocs_procs.The
othertwoDataPointswillbeprocs_serverNumandprocs_linuxNum.Forthisreason,
itisimportantnottochangethenameoftheDataSourcewithoutdueconsideration
orDataPointsalreadyusedingraphsandthresholdswillbecomeundefined.
OncetheDataSourceandDataPointsaredefined,thresholdsandgraphscanbesetup
withinthetemplate.

Figure28:raddle_proc_checkperformancedatacollectortemplate

40

Skills1stLtd

24Apr2009

Ascanbeseeninthefollowingscreenshot,thresholdsarechosenbasedonthedefined
DataPoints.Eventsofaspecifiedclass,ofagivenseveritycanbegeneratedwhenthe
thresholdisexceeded.

Figure29:Definingathresholdfortheprocs_linuxNumDataPoint

Asmanygraphsasaredesiredcanbecreated.Inthisexample,asinglegraphwith
allthreeDataPointswillbedefined,includingthethreethresholds.

41

Skills1stLtd

24Apr2009

Figure30:raddle_procsGraphDefinitiontoplotDataPointsandThresholds

Thisperformancedatacollectortemplatewasdefinedfortheclassofdevices
/Server/Cmd.Toensurethatthetemplateisappliedtothehostbino.skills1st.co.uk,
usetheMore>Templatesdropdownmenufromthedevice'smainpage.Fromthere,
selectthedropdownandBindTemplatesmenu.Apopupboxallowsyoutoselect
templatestobind.Notethatyoushouldselectalltemplatesthatyouwantbound
(useCtrlkeytoselectmultipleoptions)justselectingthenewtemplatewilldeselect
anytemplatesalreadybound.

Figure31:Bindingmultipleperformancedatacollectiontemplatestoadevice

Oncethetemplateisboundtoadeviceorclassofdevices,datawillstarttoappear
underthePerformancetabofadevice.

42

Skills1stLtd

24Apr2009

Figure32:Performancegraphforraddle_procstemplate(thresholdsdisabled)

NoteinFigure32abovethatthresholdshavebeendisabledintheraddle_procs
template,hencenothresholdvaluesareshown.
Withcommanddrivenperformancedatacollectors,therearetwoopportunitiesfor
generatingevents:

UsingthresholdsonDataPointsasdescribedabove

Usingtheexitstatusfromthescript

IfascriptreturnsanexitstatusasdefinedbytheNagiospluginAPI,theneventsare
automaticallygeneratedwithaseveritycorrespondingtotheexitcode:

43

ScriptexitcodeofOK(0)

Zenosseventseverity=Clear(0)

ScriptexitcodeofWARNING(1)

Zenosseventseverity=Warning(3)

ScriptexitcodeofCRITICAL(2)

Zenosseventseverity=Error(4)

Skills1stLtd

24Apr2009

Figure33:EventconsoleshowingeventsgeneratedbyscriptDataSource

NotethattheeventClassandthecomponentfieldsoftheeventhavebeenpopulatedby
theDataSourceconfiguration.Thegoodnewseventautomaticallyclearsthebad
newseventsusingZenoss'sdefaulteventcorrelation.
Ifthetemplatethresholdsareenabledthenextraeventsarereceived,withtheir
configuredseverities.

Figure34:Eventconsoleshowingeventsgeneratedbyscriptdatasourceandthresholds

Again,thresholdgoodnewseventsautomaticallyclearbadnews.

44

Skills1stLtd

24Apr2009

Figure35:Eventhistoryshowing"goodnews"and"badnews"eventsfromscriptsandthresholds

Thresholdvaluesarealsoshownontheperformancegraphs.

45

Skills1stLtd

24Apr2009

Figure36:Performancegraphsfortheraddle_procstemplatedemonstratingenabledthresholds

Tobetterunderstandhowzencommandrunsscriptsandtohelpdebugging,modifythe
parametersforzencommandtoincreasedebugginginthelogfile
$ZENHOME/log/zencommand.log.Set:
logseverity10
andrecyclethezencommanddaemon.Thisconfigurationcaneitherbemodifiedinthe
GUIfromSettings>DaemonsandusetheeditconfiglinkandtheRestartbutton;
alternativelyedit$ZENHOME/etc/zencommand.confdirectlyandthenrestart
zencommandwithzencommandrestart(youwillneedtobethezenossuser).

46

Skills1stLtd

24Apr2009

Figure37:Fragmentof$ZENHOME/log/zencommand.logshowingraddle_proc_check_datapoints.sh

Thezencommand.logshows:

Theremotescriptbeingrunbyzen.SshClient,includingthereturnedoutput
zen.zencommandqueueinganevent,includingtheconfiguredeventClass,
componentandwiththeeventsummaryfieldsettothetextinformationoutput
(everythingbeforetheverticalbarinthescriptoutputline).TheeventKeyfield
issettotheDataSourcename.
zen.RRDUtilstoringawaythelatestvalues
zen.thresholdsandzen.MinMaxCheckcheckingthelatestvaluesagainstthe
configuredthresholds

5.2.2 Using Zenoss to run Nagios plugins through ssh


NagiospluginsintegratewithZenossinaverysimilarmannertorunningstandalone
commands.NagiospluginswillautomaticallybeinstalledonaZenossmangerunder
/usr/local/zenoss/common/libexec.SomeNagiospluginscanbeusedtocheckdetails
ofremotesystems,forexamplethecheck_httpplugintestsURLsonagiven
destinationsystem:
check_httpHwww.skills1st.co.uk
SomeNagiospluginsaredesignedtocheckdetailsonalocalsystem,suchasthe
check_procsplugin.
Itisperfectlypossibletoinstallthecheck_procsNagiospluginstandaloneonaremote
systemanditcanbeplacedinanydirectory.Asanexample,installcheck_procsinto/
usr/local/zenoss/common/libexeconaremotesystem(notaZenossmanager).Ensure
thatthepluginrunsstandalone,locally,by:
47

Skills1stLtd

24Apr2009

cd/usr/local/zenoss/common/libexec
./check_procsw1:4c1:10Csshd
NextensurethatthezPropertiesforthisdevicearesetupintheZenossGUItopermit
sshcommunicationsbetweentheZenossmanagerandtheremotedevice.Thisis
exactlythesameasdescribedinFigure24aboveforrunningstandalonessh
commands.
ToutiliseinformationfromtheNagiosplugin,setupaZenossperformancedata
collectiontemplateinthesamewayasdescribedabove.

Figure38:PerformancedatacollectiontemplateusingsshtorunremoteNagioscheck_procsplugin

Notethatinthiscase,thefullpathtothepluginissupplied.Itischeckingforexactly
3occurrencesofashortprocessnamevmwarevmx.Thecomponentfieldissetto
nagios_check_procsandaneweventclassof/Skills/nagios/check_procshasbeen
createdforusewiththistemplate.
TheadvantageofusingZenosspluginsisthattherearelotsavailableinthe
community.Thedisadvantageisthatmanyofthemdonotprovideperformancedata
values,simplyastatusandinformationaltext.ThismeansthatcreatingDataPoints
inZenossfromwhichtocreatethresholdsandgraphsisnotuseful;although
DataPointscanbespecified,theyhavetoexactlymatchthelabelofthedatadelivered
bytheplugin(whichdoesn'texist),soanygraphsbasedonsuchDataPointswillhave
nodata.
Thisdoesn'tmeanthattheNagioscheck_procspluginisnecessarilyuseless.The
plugincanspecifywarningandcriticalrangesformetrics(suchasnumberof
instancesofaprocess,memoryused,percentageCPUused)anddeliversanexitstatus
fromthescriptwhichwilldriveZenossevents.

48

Skills1stLtd

24Apr2009

Figure39:EventconsolewithwarningeventgeneratedbyNagioscheck_procsplugin

Asdiscussedwithstandaloneevents,theNagiosplugingoodnewsstatuswilldeliver
aZenosseventwithClearedstatus;thusNagiosdrivengoodnewseventswill
automaticallyclosetheircorrespondingbadnewsevents.

Figure40:Eventhistoryconsolewith"goodnews"and"badnews"eventsgeneratedbyNagiosplugin

5.2.3 Using Zenoss to run Zenoss plugins through ssh


TheZenosspluginsarepythonlibrariesrunbythezenplugin.pycommand.The
Zenosspluginsarenotinstalledbydefault,evenontheZenossmanager,buttheyare
easilydownloadedandinstalledasdescribedinsection3.3.
DocumentationfortheZenosspluginsisratherlight,especiallyaroundtheprocess
plugin;howeverthecodecanbeexamined,typicallyin:
/usr/local/lib/python2.5/sitepackages/zenoss/plugins/linux2.py

Thisshowsthataparameterisrequiredtodescribetheprocess(es)tobemonitored.
Thisparameterwillmatchanyprocessthatincludesthatstringsoprocessescanbe
specifiedasfullyqualifiedpathnamesorshortcommands(tryusingzenplugin.py
processkonasystemthatuseskdeitreportsthetotalsofresourcesofallprocesses
thatincludetheletterk).

49

Skills1stLtd

24Apr2009

Figure41:Invocationsofzenplugin.pyprocesswithdifferentprocessmatchingparameters

Thereappearstobenowaytospecifyawaytocountinstancesofaprocess.Ifthere
aremultipleprocessesthatmatchthedescription,thenthecpuandmemoryvalues
aresummedforallmatchingprocesses.
Thepluginscriptshowsthatrawdataisgatheredbyreadingthestatfileforthe
processin/proc/<processid>.Thecpufigureisderivedbyaddingtheuserand
systemvaluesandisreportedinjiffies(1/100second)thatthisprocesshasbeen
scheduled.Thememoryfiguretakestheresidentsetsizeoftheprocess(plus3for
administrativepurposes),andmultipliesbypagesizetoproduceamemoryfigurein
bytes.

50

Skills1stLtd

24Apr2009

Figure42:Zenosspluginlinux2.pyshowingprocesscollectioncode

51

Skills1stLtd

24Apr2009

ZenosspluginscanbeusedinexactlythesamewayasstandalonescriptsorNagios
plugins.Performancedatacollectortemplatescanbecreatedthatcallzenplugin.py
onaremotesystem,usingthesshzPropertiesconfiguredforadevice.

Figure43:PerformancedatacollectiontemplateusingZenossprocessplugin

InFigure43anewcomponentvaluehasbeencreated,zenplugin_process,andanew
eventclassisreferenced(/Skills/zenplugin/process).NotethattheCommand
Templatefieldspecifiesashortnameforzenplugin.py;thisassumesthatanydevice
thathasthetemplatebound,willhavethezCommandPathzPropertysetto
/usr/local/bin.
ThenamesoftheDataPointsexactlymatchthelabelnamesofthecpuandmem
outputoftheZenossplugin.NotethatthecpuDataPointhastheCOUNTERtype;
sincecpuisthenumberofjiffiesthattheprocesshasbeenscheduled,itwillalwaysbe
anincreasingnumber,whereasmemcangoupanddownsotheGAUGEtypeismore
appropriateformem.TheCOUNTERdatatypemeansthatanygraphsusingitwill
automaticallydisplayrateofchange,ratherthantheabsolutevaluewhichissimplya
largenumberthatgraduallyincreases.

52

Skills1stLtd

24Apr2009

Figure44:PerformancegraphsandthresholdsfordatagatheredbytheZenossprocessplugin

ZenosspluginsprovidedifferentbenefitstotheNagiosplugins.Youcannotcount
instancesofaprocessbut,ifyouwantthetotalcpuandmemoryresourceusedbythe
totalnumberofinvocationsofaparticularprocess,thentheZenossprocessplugin
matchesthatparadigmnicely.TheotheradvantageofZenosspluginsisthattheynot
onlydeliveroutputinNagiosAPIformat,buttheyalsotendtodeliverperformance
datainadditiontothestatusandinformationtext;hencetheyaremoreamenableto
beinguseddirectlytosupplydataforgraphsandthresholds(indeed,allthestandard
templatesfor/Server/CmddevicesusesZenossplugins).
ThenegativesideisthatthereisnowaywithintheZenossprocessplugintoset
acceptablethresholdsforcpuandmemorysotheexitstatusisalwaysOKunlessthe
pluginitselfhadproblemsretrievingdata..Thismeansthatifeventsarerequiredon
thresholdsbasedontheZenossplugindata,thenthresholdsmustbesetupwithinthe
Zenossperformancedatacollectortemplatetherearenoautomaticevents.

53

Skills1stLtd

24Apr2009

Figure45:ThresholdonmemoryforZenossprocesspluginDataPoint

NotethatthethresholdshownabovedemonstratestheuseoftheEscalateCountfield.
Whenthethirdsimilareventarrives,theseveritywillbeescalatedfromthe
configuredWarningtothenextlevel,Error.

Figure46:Eventconsoleshowing/Skills/zenplugin/processthresholdeventescalatedfromWarningto
Error

EventsaregeneratedbyZenosswhenthethresholdisexceededand,aswithallthe
othertechniquesalreadydiscussed,goodnewsthresholdswillautomaticallyclose
badnewsthresholdevents.
Tosummarise,theZenosspluginsarebetterperformancedatacollectorsandthe
Nagiospluginsmoreeasilydeliverthresholdevents.

6 Conclusions
Anumberofdifferentprocessmonitoringtechniqueshavebeendiscussed,eachhaving
theirownmerits.IfdevicescannotbemonitoredusingSNMP,perhapsbecauseof
54

Skills1stLtd

24Apr2009

firewalllimitations,thensshprovidesaccessforstandalonecommands,Nagios
pluginsandZenossplugins.Thechoicebetweenthesethreedependsonwhataspects
ofprocessmonitoringarerequired.
Standalonescriptsarethemostflexiblebutyouhavetodevelop,test,maintainand
deliverthem.
ManyNagiospluginsareavailableinthecommunitybutthestandardcheck_procs
offeringdoesnotdeliverperformancedataandthereisstillthetaskofdeliveringthe
Nagiosplugintotheremotesystem.check_procsdoesprovideaflexiblewayfor
definingahealthyprocessandcanautomaticallygenerateeventsbasedonthis
health.
ZenosspluginsalsoneedinstallingremotelyandaddtheprerequisiteofaPython
environment,buttheZenossprocesspluginisgoodfordeliveringcpuandmemory
performancedataforthecombinedinstancesofagivenprocess.Ifeventsare
required,theyneedtobeconfiguredthroughthresholdsonperformancedata
collectiontemplates.
Oneoftheadvantagesofusingperformancedatacollectiontemplates,drivenby
zencommand,isthatyoucontrolthedatacollectionintervalattheDataSourcelevel.
IfperformancedataiscollectedusingSNMP,thereisasinglepollinginterval(default
5mins)foralldatacollectedbythezenperfsnmpdaemon.
SNMPisthesimple,defaultmethodofdiscoveringandmonitoringprocessesandis
usedbyZenoss'szenmodelerandzenprocessdaemons,relyingontheHostResources
MIB.Thezenprocessdaemonhastheadvantageofverylowadministratorsetup
timeasperformanceinformationisautomaticallygatheredformonitoredprocesses
andeventsareautomaticallygeneratedifaprocessisnolongerdetected.Provided
targetssupportSNMPandHostResources,thereisnoagentsetupbeyondbasic
configurationoftheSNMPagent.ThenegativeaspectofusingthebuiltinZenoss
methodstoconfigure,discoverandmonitorprocesses,isthattheyarestillalittle
quirkyanddonotalwaysdelivertheresultsexpected.
ForenvironmentswhereSNMPagentconfigurationskillsexist,thenetsnmpagent
canbeconfiguredwellbeyondtheabilityoftheHostResourcesMIBbyusingthe
UCDSNMPMIBprocessmonitoringtable.Eventscanbegeneratedbyincorporating
theDisManEventMIBandautomaticrecoveryactionscanalsobeenabledatthe
agent.Fortimecriticalprocessmonitoring,thisshouldbethemostresponsive
solutionasmonitoringandactioncanbothbetakenatthemonitoreddevice;thereis
nopollingintervalbetweenZenossmanagerandmanageddevicebeforeaneventis
received.Thenegativesideofextensiveagentconfigurationisthatitreallyonly
provideseventinformation;thereisnoperformancedataprovidedbythissolution.
Inpractise,someorganisationmaydeploycombinationsofalltheseprocess
monitoringtechniques,inordertosatisfytheirrequirements.

55

Skills1stLtd

24Apr2009

References
1. netsnmpSNMPagentfromhttp://www.netsnmp.org/
2. HostResourcesMIB,RFC2790obsoletesRFC1514
http://www.ietf.org/rfc/rfc2790.txtandhttp://www.ietf.org/rfc/rfc1514.txt
3. UCDSNMPMIBhttp://www.netsnmp.org/docs/mibs/UCDSNMPMIB.txt
4. DisManEventMIB,RFC2981,http://www.ietf.org/rfc/rfc2981.txt
5. NagiospluginAPIhttp://nagiosplug.sourceforge.net/developer
guidelines.html#PLUGOUTPUT
6. ZenossFAQhttp://www.zenoss.com/community/docs/faqs/faqenglish/
7. ZenossHowToforZenossplugins
http://www.zenoss.com/community/docs/howtos/zenossplugins
8. Zenossdownloadsitehttp://www.zenoss.com/download/links?creg=no
9. ZenossEventManagement,byJaneCurry
http://www.zenoss.com/Members/jcurry/zenoss_event_management_paper.pdf/view

10. LearningPythonbyMarkLutz,publishedbyO'Reilly
11. ZenossAdministrationGuidehttp://www.zenoss.com/community/docs

Acknowledgements

56

Skills1stLtd

24Apr2009

You might also like