Professional Documents
Culture Documents
with Zenoss
Draft
April 2009
Jane Curry
Skills 1st Ltd
www.skills-1st.co.uk
JaneCurry
Skills1stLtd
2CedarChase
Taplow
Maidenhead
SL60EU
01628782565
jane.curry@skills1st.co.uk
Skills1stLtd
24Apr2009
Synopsis
Thispaperdiscussesthreepossiblemethodsforperformingprocessmonitoring;
Usingtheprocessmonitoringcapabilitiesofthenetsnmpagent
Usingsshtoaccessadeviceandrunlocalcommands,Nagiosstylepluginsor
Zenossplugins
UsingZenoss'szenprocessdaemon
Eachofthesemethodswillbeexamined,includinganindepthdiscussiononthe
variousdifferenttypesofpluginthatZenosscanutiliseandtheirstrengthsand
weaknesses.Examplesandscreenshotsareprovided.
Inadditiontomonitoringprocesses,theoptionsforrectifyingfailedprocesseswillbe
explored.ThiscanbedrivenbytheZenosseventssubsystemsoexamplesaregivento
generateeventsfromeachoftheprocessmonitoringtechniques.
Athirdelementofmonitoringprocessesistocollectperformancedataforusein
graphsandthresholdgeneratedevents.Examplesofperformancedatacollection
templatesareincludedforeachofthesshbasedmethods.
ItisassumedthatthereaderisfamiliarwithbasicSNMPconceptsandwithsimple
SNMPconfigurationparameters.Itisalsoassumedthatthereaderisfamiliarwith
settingupcommunicationsusingssh.
ThispaperwaswrittenbasedonstackbuiltZenossCore2.3.3onSLES10.
Skills1stLtd
24Apr2009
Table of Contents
1Overviewofprocessmanagement...................................................................................4
1.1Definingprocessmanagementrequirements.......................................................4
1.2MethodsformonitoringUnix/Linuxprocesses.....................................................4
1.2.1NativeSNMPaccesstoprocessinformation...................................................4
1.2.2Usingsshtogainprocessinformation.............................................................5
1.2.3UsingZenoss'szenprocessdaemontomonitorprocessinformation..............5
2Nativenetsnmpprocessmanagement...........................................................................5
2.1HostResourcesMIB..................................................................................................6
2.2ProcesstableofUCDSNMPMIB...........................................................................7
2.3DisManEventMIB...................................................................................................9
3Monitoringprocesseswithssh.......................................................................................11
3.1Settingupssh..........................................................................................................12
3.1.1Usingtosshtodirectlymonitorprocesses.....................................................14
3.2Nagiospluginarchitecture.....................................................................................14
3.2.1UsingNagiospluginstomonitorprocesses...................................................17
3.3Zenossplugins.........................................................................................................18
3.3.1UsingZenosspluginstomonitorprocesses...................................................21
4MonitoringprocesseswithZenoss'szenprocessdaemon.............................................21
4.1Processconfiguration..............................................................................................22
4.2Processdiscovery.....................................................................................................24
4.3Processstatuschecking..........................................................................................27
5IntegratingprocessmonitoringwithotherZenosscapabilities..................................29
5.1SNMPMIBs,TRAPsandZenoss...........................................................................29
5.1.1ConfiguringeventmappingforSNMPTRAPs..............................................31
5.1.2RespondingtoSNMPTRAPswithZenoss.....................................................33
5.2Zenossandssh.........................................................................................................35
5.2.1UsingZenosstorunstandalonesshcommands...........................................37
5.2.2UsingZenosstorunNagiospluginsthroughssh..........................................47
5.2.3UsingZenosstorunZenosspluginsthroughssh..........................................49
6Conclusions.....................................................................................................................54
References...........................................................................................................................56
Acknowledgements.............................................................................................................56
Skills1stLtd
24Apr2009
HostResourcesMIB(RFC2790supercedesRFC1514)
netsnmpprocesstablesupportfromtheUCDSNMPMIB(netsnmpusedtobe
UCDsnmp)
Skills1stLtd
24Apr2009
NoformofmonitoringistrulyagentlessbutsincemostOperatingSystemsdo
provideSNMP,thenmanagementbySNMPisfairlyclosetoagentlessoncethe
agenthasbeenconfigureditshouldcontinuetodeliverinformationtoamanagement
station.
TherearethreeversionsofSNMP(V1,V2candV3)whereV1andV2chaveverylittle
authenticationorencryptionaspartoftheprotocol,butSNMPV3canprovideboth.
ObviouslySNMPV3willhaveagreaterperformanceoverheadthantheearlier
versions.
Skills1stLtd
24Apr2009
netsnmpconfigsnmpdmodulelist
snmpdDmib_initH(needsrootprivilege)
ToreadMIBinformationfromanSNMPagent,thesnmpwalkcommandisauseful
testingtool.forexample:
snmpwalkv1cpubliczen232system
usesSNMPV1withacommunitynameofpublictoGETthesystemtable
fromthemachinezen232
snmpwalkv3aMD5AfraclmyealauthNoPrivujane2zen232system
usesSNMPV3withMD5authentication,passphrasefraclmyea,anduser
jane2toGETthesystemtablefrommachinezen232
Obviously,theagentonthetargethostmusthavebeenconfiguredtopermitthis
access,initssnmpd.conffile.
hrSWRunIndexistheprocessid(PID)eg.3555
hrSWRunNameistheshortnameoftheprocesseg.named
hrSWRunIDalwaysseemstobezeroDotZero
Skills1stLtd
24Apr2009
hrSWRunPathisthefullpathnameeg./usr/sbin/named
hrSWRunParametersaretheparameterstothecommand(ifany),eg.t
/var/lib/namedunamed(Notethatlonglinesgettruncated!)
hrSWRunTypeisgenerallyanapplicationdenotedbytheintegervalueof4
hrSWRunStatustypicallyisrunnable(2)thoughatleastoneprocessshould
haveastatusofrunning(1)
Ifmultipleinstancesofaprocessarerunningtheneachisreported,withtheprocess
idbeingthedifferentiator.
ThehrSWRunPerftableentryhas2objectsforCPUandmemory:
HrSWRunPerfEntry::=SEQUENCE{
hrSWRunPerfCPUInteger32,
hrSWRunPerfMemKBytes
}
CPUisdescribedasthenumberofcentisecondsofthetotalsystem'sCPU
resourcesconsumedbythisprocess.Notethatonamultiprocessorsystem,thisvalue
mayincrementbymorethanonecentisecondinonecentisecondofreal(wallclock)
time.
Memoryisdefinedasthetotalamountofrealsystemmemoryallocatedtothis
process."
TheindexforbothCPUandmemoryisagaintheprocessid.
Thus,theHostResourcesMIBsatisfiesrequirements1,3,4and5above(monitoring
foraprocess,monitoringthefullpathnameandmonitoringthearguments).Multiple
occurrencesofaprocessarereportedbutthereisnosimplewaytospecifyhowmany
processesshouldberunning.
ToexaminetheHostResourcesprocessinformationonatargetdeviceusingSNMPV1
andacommunitynameofpublic,use:
snmpwalkv1cpubliczen233hrSWRunTable
Skills1stLtd
24Apr2009
situationwillnotautomaticallytriggeratraptoreporttheproblemseetheDisMan
EventMIBsectionlater.Thesyntaxwithinthesnmpd.conffileis:
procnamed11
procvmwarevmx34
Thereshouldbepreciselyoneoccurrenceofthenamedprocessrunningandatleast3
butnomorethan4occurrencesofvmwarevmx.
Optionally,snmpd.confcanalsospecifyacommandtoruntoattempttofixthe
problem.Thisisdefinedwithaprocfixline,forexample:
procfixnamed/etc/init.d/namedstart
Notethataprocfixlinemustcomeaftertherelatedprocstatement.Theprocfix
commandwillnotberunautomatically.Itisonlyrunwhenthecorresponding
prErrFixMIBvalueissetfrom0to1.
TheprTableintheUCDSNMPMIBisdefinedasfollows:
PrEntry::=SEQUENCE{
IndexNumber
prIndexInteger32,
prNamesDisplayString,
prMinInteger32,
prMaxInteger32,
prCountInteger32,
prErrorFlagUCDErrorFlag,
100
prErrMessageDisplayString,
101
prErrFixUCDErrorFix,
102
prErrFixCmdDisplayString
103
}
Notethattheindexnumbersforthissequencearenotconsecutive(seerighthand
column).Forexample,theObjectIdentifier(OID)forthe5thinstanceintheprocess
tableforprErrorFlagwouldbe.1.3.6.1.4.1.2021.2.1.100.5,where.1.3.6.1.4.1.2021gets
youtoucdavis,thenext.2.1getsyoutoprTable.prEntry,.100istheprErrorFlagand
thefinal.5istheinstancedenotingthe5thprocessentryinthetable.
Typically:
TheprIndexfieldissimplyanincreasingnumbertoindexintotheprocess
table,startingat1.
prNamesistheshortnameoftheprocesseg.vmwarevmx
TheprErrorFlagissetto1ifthecountvalueexceedsmaxorislessthanmin
Skills1stLtd
24Apr2009
prErrMessagereflectsasuitableerrormessageifprErrorFlag=1.Forexample,
Toofewvmwarevmxrunning(#=1).IfprErrorFlag=0thenprErrMessageis
thenullstring.
prErrFixisusedtotriggertherunningoftheprErrFixCmdcommand.prErrFix
mustbeSNMPSETto1torunthecommand.Thiscaneitherbeachievedwith
anexternalSETcommandorbyusingtheDisManEventMIB
TheadvantageoftheUCDSNMPMIBisthatitcancountthenumberofinstancesof
aprocessandraiseanalertifthecountisnotwithinconfiguredmaximum/minimum
limits.Italsohasthepossibilityoftakingactiontorectifyaprocessproblem.
However,itcannotmonitorforprocesspathnamesorparameters.
Thus,theUCDSNMPMIBsatisfiesrequirements1,3,6,7and8above(monitoring
foraprocess,monitoringthenumberofinstancesofaprocesswithinmaximum/
minimumlimits,alertingonaprocessproblem,andautomaticrecovery).
ToexaminetheUCD_SNMP_MIBprocessinformationonatargetdeviceusingSNMP
V1andacommunitynameofpublic,use:
snmpwalkv1cpubliczen232prTable
Ofcourse,itisperfectlypossibletocombineUCDSNMPMIBprocessmonitoringwith
HostResourcesMIBprocessmonitoring.
Skills1stLtd
24Apr2009
Figure1:snmpd.confwithprocessandDisManEventconfigurationlines
NotewhenconfiguringmonitorstatementsfortheDisManEventMIB,theremustbe
whitespacearoundoperators.
InFigure1above,fourprocessesaremonitored,eachhavingmax/minparameters;in
addition,namedhasaprocfixline.
Ausercalled_internaliscreatedforSNMPV3usewithread/writeaccess;no
authenticationisrequired.Themonitorstatementrequiresauparameterwhich
specifiesanagentSecNamehencetheagentSecNamedefinitiondefining_internalas
avaliduserformonitorqueries.
TheuncommentedmonitorlineprovidesanexamplethatcheckseachprErrorFlagin
theprTable(ieonecheckforeachdefinedprocess)foravalue!=0.Onthiscondition,
theeflagisusedtogenerateanSNMPnotificationcalledProcessEvent,whichis
definedatthebottomofFigure1.TheeparametercaneitherspecifyyourownTRAP
/NOTIFICATION(asshownhere)orcanuseanyTRAP/NOTIFICATIONthatis
definedandavailabletotheagentinaMIBfile.Theeventispassedanumberof
variables(varbinds),eachspecifiedwithaoparameter(wildcard)andthenameofthe
OIDtobesent.Forawildcardedexpression,thesuffixofthematchedinstancewillbe
addedtoanyOIDsspecified.Thusifnamedisindex3intheprTableand
10
Skills1stLtd
24Apr2009
prErrorFlag.3istestedandfoundtobe!=0,thenthevaluesofprIndex.3,prNames.3,
prMin.3etc.willbeincludedontheeventasvarbinds.Thenexttolastfieldinthe
monitorline(Processtableinthiscase)isanadministrativenameforthis
expression,andisusedforindexingthemteTriggerTable(andrelatedtables).
TheactivemonitorlinecheckstheprErrorFlaginstancesevery10seconds(r10)and
evaluatesdeltadifferences(D);themonitorisnotrunonsnmpdagentstartup(S).
Notethatamonitorlineonlyspecifieswhateventwillbesentandunderwhich
conditions.Astandardsnmpd.conftrapsinkline(orlines)willbenecessarytoindicate
whereeventsshouldbesentto.
TheeffectoftheactivemonitorlineistosendanSNMPnotificationwithenterprise
OID.1.3.6.1.4.1.1234.123includingvarbindsthatreporttheproblem,whenevera
processfailstomeetitsconfiguredcriteria.Whentheproblemgoesaway,anevent
withthesameOIDwillbesentandthevarbindswillindicatethegoodnewsnature
oftheevent.
Thesecond,commentedoutmonitorlineinFigure1demonstrateslocalautomationby
runningaSETevent,procfix,whenaprErrorFlaginstance!=0.Thecorresponding
instanceofprErrFixissetto1whichwilltriggeranyconfiguredprocfixaction.In
thecaseofafailednamed,thiswillcause/etc/init.d/namedstarttoberun.
Onmysystem,SuSE10withnetsnmp5.4.119.4,Ifoundthateitherthebadnews/
goodnewseventswouldwork,ortheautomaticprocfixprocessrestartwouldwork;
howeverifbothlineswereconfiguredthenthegoodnewseventwhentheprocess
washealthyagain,wasneversent.Forthisreason,thesecondmonitorlineis
commentedoutitissimpleenoughtoconfigureanactionatZenosstoperforman
SNMPSETontheinstanceofprErrFixtosetthevalueto1andcausetheprocfix
actiontobeexecuted.
Insummary,addingtheDisManEventMIBconfigurationtoanSNMPagentsatisfies
theinitialprocessmanagementrequirements7and8(alertingonprocessfailureand
recovery,andautomaticrecoveryfromprocessfailure).
Usesshtorunoperatingsystemcommands(eitherbuiltin(psvariations)or
scripts)
UsesshtorunNagiosplugincommands(suchascheck_procs)
UsesshtorunZenosspluginstodeliverprocessinformation(eg.zenplugin.py
processsshd)
TheseoptionsdonotinherentlyrelyonhavingZenossasthemanagementsystem
(eventheZenosspluginsoperatestandalone).Thischapterwilldiscussthebasic
11
Skills1stLtd
24Apr2009
techniquesofssh,NagiosandZenossplugins.Chapter5willthendiscusshowthese
sshmethodscanbeincorporatedwithaZenossmanagementsystem.
Nagiospluginsoffertheadvantageofalargelibraryofsystemandnetwork
managementchecksthatarecodedtoadefinedformat.Zenossunderstandsthe
outputofNagiospluginsandcanuseitautomaticallytogenerateevents.
ThedisadvantageofusingNagiospluginswithZenossisthatyouhavetoinstallthe
Nagiospluginsonanytargetsthatyouwanttoaccessthatwayyouhavetheold
problemofinstallingandmaintaininganagent.
Similarly,theZenosspluginsprovidesomeprecodedfunctionalitybuttheyhavetobe
installedalongwithPython.Zenosshasseveralperformancedatacollection
templatesthatuseZenosspluginslookundertheTemplatestabfor
/Devices/Server/CmdattheDevices,FileSystemandethernetCsmacdtemplates.
AcompromisemightbetowritenativescriptsthatproduceoutputinNagiosformat
whichremovestheneedtoinstallanagentremotely(thoughyoustillhavetogetthe
scriptdeliveredtothetargets).
12
Becomingthezenossuseronthemanagementsystem(becauseofthewaythis
useriscreated,youmayneedtosutorootandthenrunsuzenoss)
sshkeygentdsayouwillbepromptedforapassphrasewhichmaybeblank
inspect~/.sshforid_dsaandid_dsa.pubandcheckthedirectoryhas600access
permissions
Skills1stLtd
24Apr2009
copyid_dsa.pubtothemachinebinointothe.sshsubdirectoryoftheuserid
zenrem.Itshouldbecopiedintothefileauthorized_keys(orappendedto
authorized_keysifthefilealreadyexists).
Theprivatekey,id_dsa,remainsontheZenosssystem.Itmusthave600access
permissions.
Thepublickeycanbecopiedtotheauthorized_keysfileofasmanysystemsas
youwanttomanage.
Notethatsomeimplementationofsshuseafilenameauthorized_keys2tohold
version2DSApublickeys.
Ifyouspecifyapassphrasewhengeneratingthekeypairs,thispassphraseisusedto
furtherprotectaccesstotheprivatekey,id_dsaandyouwillbepromptedforthe
passphrasebeforeanysshcommunicationcantakeplace.
Notethatthenamesid_dsaandid_dsa.pubaredefaults.Itisperfectlypossibletouse
differentfilenamesandthentospecifythekeyfilenameaspartofthesshcommand.
So,ifwehaveauser,zenremonamanagedsystem,bino,withthecorrectpublickey
inzenrem's.ssh/authorized_keysfile,youcantestthecommunicationfromtheZenoss
system,asuserzenoss,with:
sshzenrem@bino
Ifyouhaveapassphraseconfigured,youwillbepromptedforit(this
promptisfromthelocalZenosssystemtoaccessthelocalprivatekey).
Ifthisisthefirstsshcommunicationwithbino,anRSAkeyforthehost
binowillbegeneratedandyouwillbeaskedwhethertocontinue
connection.IfyouanswerYesthenthishostkeywillbeaddedintothe
fileknown_hostsunderzenoss's.sshdirectory.
Ingeneral,keypairsmaybeusedsymmetrically;thatis,ifbothclientandserver
havethesameid_dsaprivatekeyandthesamematchingpublickeyintheir
authorized_keysfile,theneithercanactasclient(sshcommand)orserver(sshd
daemon).
Notethattestingsshwithauserzenossontheserverside(iessh'ingintoaZenoss
managementsystem)willnotworkasthestandardZenossinstalldoesnotpermit
loginstotheusercalledzenossthisalsoinhibitssshaccess.
Insummary,youneedtheprivatekey,id_dsa,toauthorizecommunicationoutof
yoursystem(ie.actingasansshclient);youneedthepublickeyinthefile
authorized_keystoauthorizecommunicationin(ieactingasansshserver).Youdon't
actuallyneedthepublickeyinthefileid_dsa.pub.
13
Skills1stLtd
24Apr2009
statusresultoftheplugin
anyperformancedatadeliveredbytheplugin
Basically,Nagiosshoulddeliveronelineofoutput.Statusoutputshouldbeinthe
format:
SERVICESTATUS:Informationtext
Validreturncodesaredocumentedasshowninthefigurebelow.
Figure2:Nagiospluginreturncodes
14
Skills1stLtd
24Apr2009
Iftheplugindeliversperformancedata,itmustfollowthereturncodeandtext,
separatedfromitbytheverticalbarsymbol.
Figure3:Nagiospluginformatfordeliveringperformancedata
Asanexample,thecheck_file_ageNagiosplugintakeswarningandcritical
parametersforage(wandcparametersinseconds)andsize(WandCparameters
inbytes).TogettheusageforanyNagiosplugin,usethehparameterafterthe
plugincommandname(check_file_ageh).ThusaNagiosplugin,check_file_age,
mightrespondasshownbelow:
Figure4:Nagiosplugincheck_file_agewithperformanceoutput
15
Skills1stLtd
24Apr2009
NotethatthispluginhasbeenmodifiedfromthestandardNagiosplugininorderto
deliverperformancedataaftertheverticalbar.
ThepluginisactuallyaPerlscript,themainbodyofwhichisshownbelow:
Figure5:Bodyofmodifiedcheck_file_ageNagiosplugin
Thefirsttwosectionscheckthatafilenamehasbeensuppliedandthatasuppliedfile
nameexists,eachreturninganoutputlinewitharesultcode(UNKNOWNor
CRITICAL).Notethatanexitstatusissuppliedaswellasthestatusaspartofthe
outputline.Themainbodyofthescriptchecksthefileageandsizeagainstwarning
andcriticalthresholds.Theendofthescriptthendeliverstheoutputlinewiththe
resultcode,informationtextandthevaluesforsizeandage;againtheexitstatusis
delivered.
NagiospluginsareinstalledasstandardonaZenossserverunder/usr/local/zenoss/
common/libexec.Nagiospluginscanalsobeinstalledonremotesystemsandrun
standalone.Notethatmanyrequiretheutils.pmfiletobeavailableeitherinthe
samedirectoryasthepluginorinanincludepath,@INC.Ifyoureceiveanerror
messagesayingthatutils.pmcannotbelocated,checkthereported@INCpathanda
symboliclinkcanbeprovidedfromtheactualutils.pmdirectorytooneofthe
directoriesinthepath.
16
Skills1stLtd
24Apr2009
Figure6:Helpforthecheck_procsNagiosplugin
Examplesarealsogivenattheendofthehelp:
17
Skills1stLtd
24Apr2009
Figure7:ExamplesforusingNagioscheck_procsplugin
Notethatextraoutputcanbeachievedwiththevvvoption(LOTSofverbosity).In
thecaseofthecheck_procsplugin,thisextraflagshowsthatthecommandthatis
actuallyrunis:
/bin/psaxwo'statuidpidppidvszrsspcpucommargs'
Whenconsideringtheprocessmanagementrequirementsatthebeginningofthis
document,theNagiospluginshavepossibilitiesforaddressing1,3,5and6
(monitoringsingleandmultipleinstancesofprocessesbyshortprocessnameandby
consideringtheparametersofaprocess).Thereisnoabilityinthestandardpluginto
takeremedialactionortosendalerts;however,theNagiosAPIisjustthatanditis
perfectlypossibletowriteyourownpluginortomodifysomeofthestandardplugins
provided.Inaddition,theNagiospluginallowsmonitoringbasedonresourcesused,
suchasmemoryandCPU,althoughnoperformancedatavaluesarereturnedbythe
defaultplugin.
Skills1stLtd
24Apr2009
FAQathttp://www.zenoss.com/community/docs/faqs/faqenglish/.Thereisalsoa
ZenosspluginsHowToathttp://www.zenoss.com/community/docs/howtos/zenoss
plugins.IfoundthedocumentationforinstallingtheZenosspluginsrather
confusing;thefollowingprocessworkedsuccessfullyonbothSLES10(32bit)and
OpenSuSE10.2(64bit).
Notethatbothpythonandthepythondevelopmentpackagemustbealready
installed.NotealsothatyouneedtoinstallthePythonsetuptoolspackageoryouare
likelytogetanerrormessageaboutanApplicationErrorImportError:Nomodule
namedcommon.
IfoundtheeasiestwaytoinstalltheZenosspluginswasto:
1.GetthelatestZenosspluginspackagefrom
http://www.zenoss.com/download/links?creg=no.IusedtheOthersource
tarballundertheRemoteMonitoringScriptssectionandgotZenossPlugins
2.0.4.tar.gz
2.GetthesourcetarballforthePythonsetuptoolsutilityfrom
http://pypi.python.org/packages/source/s/setuptools/(Igotsetuptools
0.6c9.tar.gz)
3.Asroot,untartheZenosspluginsfile
4.ChangetotheZenossPlugins2.0.4directory
5.Run
python./setup.pybuild
python./setup.pyinstall
6.Pythonpackagestypicallygetinstalledto
/usr/local/lib/python2.5/sitepackages(thedirectorywillbecreatedif
necessary)
7.Untarthesetuptoolsfile
8.Changetothesetuptools0.6c9directory
9.Run
python./setup.pyinstall
10.Asanormaluser,testwith
zenplugin.pylistplugins
11.Notethatzenplugin.pywillbeinstalledinto/usr/local/bin
TheFAQdocumentswhatutilitiesaresupportedonwhicharchitecture:
19
Skills1stLtd
24Apr2009
Figure8:ZenossFAQforZenossplugins
IrepeatZenosspluginsareentirelyseparatefromNagiosplugins.However,
theZenosspluginsimplementtheoutputspecificationofNagioscommands.Notein
theexamplesshowninFigure9thatthereturncodeisprintedalongwith
informationaltext,followedbyaverticalbar,followedbyoneormoreperformance
datavalues.VariousZenossperformancedatacollectortemplates,under
/Server/Cmd/Linux,usetheZenosspluginstodeliverdatavaluesforgraphsfor
Devices,FileSystemandethernetCsmacdtemplates.
20
Skills1stLtd
24Apr2009
Figure9:OutputfromZenossplugincommands
21
Processconfiguration
Processdiscoverythroughthezenmodelerdaemon(every12hoursbydefault)
Skills1stLtd
24Apr2009
Processstatuscheckingthroughthezenprocessdaemon(every3minutesby
default)
ThesedefaultpollingintervalsarecontrolledfromthelefthandCollectors>localhost
menu.
Figure10:ZenossProcessesmenu
Variousparametersareconfigurableforeachprocesstobemonitored:
Figure11:Processdetailsthatcanbeedited
TheNamefieldissimplyadescriptivenametypicallyreflectingtheprocessname.
TheRegexfieldcontrolswhatprocessismonitored.Atrivialexample,suchasin
Figure10above,showsaregexofnamedwhichwillmatchanyprocessnamethat
22
Skills1stLtd
24Apr2009
includesnamedandparameterstotheprocessnameareignored.Theexamplein
Figure11ismorespecifictheprocessnamemuststartwithsnmpd(the^specifies
startofline)andtheparameterstotheprocessarealsoconsideredwhendecidingon
whethertomonitortheprocess.Theregexmustmatchexactlyuptothe
/tmp/snmpd.pidandcanthenhaveanycombinationofcharactersfollowing(the*).
NotethatwithZenoss2.3.3andearlierversions,theIgnoreParametersflag
sometimesappearstobeignored!Forexample,inFigure10abovewhereIgnore
ParametersissettoTrueforthenamedprocess,processesareautomaticallydetected
thathavethestringnamedintheparametersofothercommands.
ProcessesalsohavezPropertieswhichcanfurthermodifybehaviour.
Figure12:zPropertiesoptionsforaProcess
ThezPropertiesare:
zAlertOnRestart
generateaneventwhentheprocessisdetectedagain
zCountProcs
itisunclearwhateffectthishas
zFailSeverity
theseverityoftheeventgeneratedwhentheprocessfails
zMonitor
whethertomonitorforthisprocessonalldevices
SomeofthesezPropertiesareratherproblematical.Thetwoassociatedwithevents
workwell.IfzAlertOnRestartissettoTrue,thenrecoveryofaprocesswillresultina
goodnewseventwithaClearedseverity,whichwillautomaticallyclearapreceding
badnewseventforthatprocessfromthesamedevicethisisstandardZenossevent
correlation.
ThezCountProcszPropertydoesnotappeartohaveanyeffect.Thereisno
opportunitytospecifywhatcountisthecorrectnumberorrange.Evenif
zCountProcsissettoFalse,dataappearstobecollectedforthenumberofinstancesof
aprocessthiscanbeseenintheperformancegraphsforaprocessforadevice.
23
Skills1stLtd
24Apr2009
ThezMonitorzPropertyshouldspecifygloballywhethertomonitorforaprocessonall
discovereddevices.Forsomeprocesses,thiswouldbebettersettoFalseandthe
processmonitorcanthenbeactivatedatthespecificdevicelevel;however,doingso
seemstoresultinveryvariablemonitoringresults(withZenoss2.3.3).Process
monitoringseemsmuchmorereliablewithzMonitorsettoTrue.
AlthoughwithZenoss2.3.3,processconfigurationappearsmorestablethanwith
previousversions,therewassometimesaneedtorestartthezenprocessdaemonafter
processconfigurationtakesplace.
TheStatustabofaspecificprocessshowshowmanyinstancesofaprocessare
running,wheretheyarerunning,andtheirstatus:
Figure13:Statusofthesnmpd_raddleProcess
Skills1stLtd
24Apr2009
Thiscanbeveryinconvenientifanimportantprocesshappenstobedownonthe
periodicremodel.Onewaytopreventthishiatusistoselecttheprocessforthedevice
andusethetabledropdownmenutoLockfromDeletion.Unfortunately,this
sometimesseemstoproduceadverseeffectswhichresultinchangesoftheprocess
statusnotbeingmonitored.
Figure14:Deviceostabshowingprocesseswithstatus
Fundamentally,thezenmodelerdaemonwillusethediscoveryprotocol(s)configured
foradevice,todiscoverprocesses.IfthedevicesupportsSNMP,thenitisusuallythe
HostResourcesMIBhrSWRunTablethatwillprovideprocessinformation.Modelling
collectorsforadevicearespecifiedfromthetabledropdownMore>CollectorPlugins
menu.Thezenoss.snmp.HRSWRunMapisthecollectorthatgatherprocess
informationfromtheHostResourcesMIB.
25
Skills1stLtd
24Apr2009
Figure15:ModellingcollectorpluginsforadevicewhichsupportsSNMP
Tobetterunderstandwhatthemodellingprocessdoes,tryrunningzenmodeler
standalone,withfulldebuggingturnedon:
zenmodelerrunv10dgroup100linux.class.example.org
Youshouldbeabletoseetheprocesstableentriesbeingreturned.
ForadevicethatdoesnotsupportSNMP,processmodellingcanstilltakeplaceusing
thezenoss.cmd.linux.processmodellingcollector.Notethatthesemodellingcollectors
donotrequiretheZenosspluginstobeinstalledonaremotesystemsimple
operatingsystemcommandsarerun,overssh,ontheremotesystem(sozProperties
needtobeconfiguredforadevicetopermitsshaccess)..
26
Skills1stLtd
24Apr2009
Figure16:ModellingcollectorpluginsforanonSNMPdevice
Again,tobetterunderstandwhatishappening,runzenmodelerwithfulldebugging(
v10)fromacommandline.
Skills1stLtd
24Apr2009
Figure17:OSProcesstemplateforcollectingprocessperformancedata
Thetemplatedefinesthreedatasourcesfor:
count(regardlessofwhetherthezCountProcszPropertyisTrueorFalse)
cpu
mem
EachofthesedatasourcesapparentlyareoftypeSNMPbutnoOIDsourceisgiven.
Strangely,thesegraphsarepopulatedwithdataevenso;however,ifthedevicehasno
SNMPaccessthendataisnotcollected(eventhoughtheprocessmodellingcollector
candetecttheprocess).
Ifloggingisincreasedforthezenprocessdaemon,itispossibletoseethatitisactually
zenprocessthatcollectsthisperformancedata,nottheusualzenperfsnmpdaemon.
Loggingcanbeincreasedforanydaemon,fromtheZenossGUI,byselectingtheleft
handSettingsmenu,choosingtheDaemonstabandclickingtheeditconfiglink.
Simplyaddalinewith:
logseverity10
andrestartthedaemonfromtheDaemonstabpage.
28
Skills1stLtd
24Apr2009
Figure18:IncreasingloggingforZenossdaemons
Insummary,Zenossprocessmonitoringcandiscoverprocessesondevicesand
subsequentlymonitorthoseprocesses.Withregardtotheprocessmanagement
requirementsdefinedatthestartofthisdocument,zenprocessmonitoringsatisfies1,
3,4,5,6,7and8tosomeextent;thatis,monitoringforoneormoreoccurrencesofa
process,basedonexactorpartialprocessnamesandprocessarguments;by
thresholdingtheprocesscount(whichisautomaticallygatheredbyzenprocess)then
alertsonmaximum/minimumnumbersofinstancesofaprocesscanberaised.The
zenprocessmechanismnotonlygenerateseventsautomaticallybutcanalsogenerate
clearingevents.Althoughzenprocessitselfcannottakeautomaticremedialaction,
theZenosseventprocessingsubsystemcan.
SNMPusingvariouscombinationsofMIBsandTRAPs
sshtoruneitherOperatingSystemcommandsorremotescripts
Nagiosplugins
Zenossplugins
Zenosszenprocessmonitoring
Thefirstthreetechniquesdon'tmandateaZenossmanager.StrictlytheZenoss
pluginscouldrunstandaloneanddeliveroutputtoadifferentmanager;howeverall
thesemethodsintegratewellwithZenoss.
Skills1stLtd
24Apr2009
embraceboth).SomeTRAPsareconfiguredwhenZenossisinstalled(suchaswarm
start,coldstart,authentication,linkupandlinkdown);anyTRAPcanbeconfigured
throughtheZenossGUI,basedontheenterpriseOIDandthespecificTRAPnumber.
AllthevarbindsontheTRAPareavailableasuserdefinedfieldsontheDetailstabof
adetailedevent.Bycreatingeventmappings,eventscanbefurtherdistinguished
usingregularexpressionstoparsetheevent'ssummaryfield.Pythonrulescanbe
usedinmappingstotestinformationfromtheTRAPagainstothercriteria;for
exampledifferentactionscouldbetakenbasedonwhichdevicesenttheTRAP,
whetherthedeviceisamemberofaparticularLocationorGroupandonthe
Productionstatusofthedevice.
TheTRAPvarbindscanalsobeanalysed.Dependingonwhethercriteriaaremet,an
eventmappingtransformcanberunthisistypicallyoneormorePython
statementsthatcanmodifymanyofthecharacteristicsofboththeeventand/orthe
devicethatgeneratedtheevent.Asimpleexamplewouldbetochangetheseverityof
theeventfordevicesinaparticularGroup.
Foramuchmorecomprehensivediscussion,seemyZenossEventManagementpaper
availableat
http://www.zenoss.com/Members/jcurry/zenoss_event_management_paper.pdf/view.
ThecombinationintheUCDSNMPMIBofprocessmonitoring,theprocfixparameter
tocustomisearecoveryaction,andtheabilityoftheDisManEventMIBtotriggera
recoveryaction,caninterworkwithaZenossSNMPmanagertoactivatetherecovery.
Takethescenariowhereaprocess,named,hasfailedandtheDisManEventMIB
generatesanenterprisespecificTRAPtoZenoss,includingvarbindparametersfrom
theUCDSNMPMIBprocesstable.Thesnmpd.confconfigurationfilecanbeseenin
Figure1.
namedhasaprocfixlinewhichspecifiestorun/etc/init.d/namedstartbutthisonly
happenswhenthematchinginstanceofprErrFixissetto1.Themonitorline
generatesanevent(strictlyanSNMPV2NOTIFICATION)calledProcessEvent,which
isdefinedinthesamesnmpd.conf(ifyoudon'tspecifyyourowneventthenadefault
eventfromtheDisManEventMIBwillbesent).Themonitorlinepassesallthe
parametersfortherelevantinstanceoftheUCDSNMPMIBprocesstable.The
monitoristriggeredbytherelevantprErrorFlag!=0.
monitoru_internalr10DSeProcessEventoprIndex
oprNamesoprMinoprMaxoprCountoprErrorFlago
prErrMessageoprErrFixoprErrFixCmd"Processtable"
prErrorFlag!=0
notificationEventProcessEvent.1.3.6.1.4.1.1234.123
Asdocumentedearlier,thenetsnmpagentdoesnotseemabletoreliablygenerate
bothanotificationandaseteventtoautomaticallyrunaprocfixscript;hencea
ZenossmanagercouldbeusedtoperformtheSNMPSETonthecorrectprErrFixMIB
30
Skills1stLtd
24Apr2009
variable.ThisisprobablybetterpracticethanhavingtheSNMPagentautomatically
fixtheproblemastherewillbeanaudittrailifitisfixedinZenoss.
Figure19:Eventmapping1.3.6.1.4.1.1234.123foreventclass/Skills/net_snmp_proc
EventssimplymatchontheeventClassKeyof1.3.6.1.4.1.1234.123thereisnoRuleor
Regexmatching.
Aneventmappingtransformisappliedinordertogenerateamoreusefulevent
summary.
forattrindir(evt):
ifattr.startswith('1.3.6.1.4.1.2021.2.1.100'):
evt.index=attr.replace('1.3.6.1.4.1.2021.2.1.100.','')
evt.process_name=getattr(evt,'1.3.6.1.4.1.2021.2.1.2.'+evt.index)
evt.errorFlag=getattr(evt,'1.3.6.1.4.1.2021.2.1.100.'+evt.index)
evt.errFixCmd=getattr(evt,'1.3.6.1.4.1.2021.2.1.103.'+evt.index)
ifevt.errorFlag==1:
evt.summary=evt.process_name+'processisunhealthy'
evt.severity=5
31
Skills1stLtd
24Apr2009
ifevt.errorFlag==0:
evt.summary=evt.process_name+'processishealthy'
evt.severity=0
ThetransformlooksfortheuserdefinedeventfieldthatrepresentstheprErrorFlag
varbind(1.3.6.1.4.1.2021.2.1.100).RememberthattheUCDSNMPMIBhasatable
associatedwithprocessesweneedtogetattheindexintothattable,whichisthe
lastnumberoftheOID,sothetransformgetstheindexintouserdefinedeventfield,
evt.Index,theprocessnameintoevt.Process_nameandtheerrorflagintoevt.errorFlag.
ThetransformalsogetstheprErrFixCmdvaluealthoughitisnotactuallyused.
Atestthenchecksevt.errorFlag.Forabadnewsevent,thesummaryissettoa
usefulcommentandtheseverityissettoCritical;foragoodnewsevent,theseverity
issettoCleared.ThismeansthatZenoss'sautomaticgoodnewsclearsbadnews
logicwillapply.
Figure20:DetailstabofeventdetailforSNMPTRAP
1.3.6.1.4.1.1234.123showingTRAPvarbinds
TheresultingZenosseventappearsasshowninthenextFigure.
32
Skills1stLtd
24Apr2009
Figure21:"Badnews"eventfromnetsnmpagentfornamedprocess
AscanbeseenfromFigure20,theSNMPTRAPvarbindsincludetheprocfix
prErrFixCmdparameter/etc/init.d/namedstartasOID.1.3.6.1.4.1.2021.2.1.103.3
andthestatusofthetrigger,OID.1.3.6.1.4.1.2021.2.1.102.3,theprErrFixflag.
33
Skills1stLtd
24Apr2009
Figure22:EventmappingtransformincludingactiontoSETthecorrectprErrFixvariabletotrigger
processrestart
Notethattheshellcommandshouldallbeononeline.
importos
......
snmpVer=dev.zSnmpVer.replace('v','')
shellcmd='/usr/bin/snmpsetv'+snmpVer+'a'+dev.zSnmpAuthType+'
A'+dev.zSnmpAuthPassword+'lauthNoPrivu'+
dev.zSnmpSecurityName+''+dev.manageIp+'
1.3.6.1.4.1.2021.2.1.102.'+evt.index+'i1
os.system(shellcmd)
Theshellcommandsimplyinvokesthesnmpsetcommand.Theexampleaboveisfora
classofdevicesthatsupportSNMPV3sotheauthenticationtype,theauthentication
passwordandtheSNMPV3usernamemustbesuppliedasparameterstosnmpset.
Ratherthanhardcodethese,theycanbeaccessedfromthezPropertiesofthedevice
thatraisedtheinitialTRAP,alongwiththeIPaddressofthatdevice,andtheversion
ofSNMPtouse.TheonlygotchaisthatthezSnmpVerzPropertyrespondswithv3
(inthiscase)thesnmpsetcommandrequiresavparameterfollowedbyaspaceand
aversion(1,2c,3)soanextrastepisshownwhichstripstheleadingvoffthe
zSnmpVerzProperty.
34
Skills1stLtd
24Apr2009
TheendofthesnmpsetcommandconcatenatestheOIDfortheprErrFixvariablewith
thecorrectindexfromtheuserdefinedevt.indexvalueandsetsthevalue,oftypeI
(INTEGER)tothevalue1inotherwords,runtheconfiguredprErrFixCmd,
/etc/init.d/namedstart.
DoensurethatZenosshasbeenconfiguredcorrectlywithSNMPzPropertiesfor
devicesand/ordeviceclasses.
Figure23:ZenossSNMPzPropertiesforanSNMPV3deviceclass
Inpractise,allthisexplanationtakesfarlongerthantheautomationdoes!
35
Skills1stLtd
24Apr2009
Figure24:ZenosssshzPropertiesfordeviceclass
Thecrucialparametersare:
zCommandPassword
thisisthepassphraseifonewasdefined
zCommandPath
pathforremotecommands
zCommandSearchPath
pathforremotecommands(Notethatthis
currentlyseemstohavenoeffect)
zCommandUsername
theusernamealreadysetupforssh
zKeyPath
wherethesshprivatekeyfileis
Notethatthescreenshotabovedemonstratesthepossibilityofusinganonstandard
nameforthekeyfile,id_dsa_bino_et_al.Thisfileshouldbeinthezenossuser's.ssh
directory.
Notethatifnonstandardkeyfilenamesareused,Zenossappearstoneedthepublic
keyfile(id_dsa_bino_et_al.pub)inthe.sshdirectory,inadditiontotheprivatekey
file.
36
Skills1stLtd
24Apr2009
Figure25:Shellscripttocheckforspecificprocesses
ThescriptischeckingfortwoVMwareprocesses,oneforamachinecalledserver,the
otherforamachinecalledgroup100linux;thesetwoVMstogethermakeupthe
raddleapplication.Thescriptwillreturnnumericvaluesforthenumberofrelevant
VMwareprocesses,thenumberofserverprocessesandthenumberoflinux
processes.TheexitcodewillbeOKifbotharerunning,WARNINGifonly1is
runningandCRITICALifbotharedown.Noattemptismadeinthisscripttorectify
anyproblem,butpotentially,recoveryactionscouldalsobeincluded.
ThisscriptuseselementsoftheNagiosAPItoreturnasinglelineofoutputwith:
37
Thestatusofthescript,followedbycolon,followedbytextualinformation
Skills1stLtd
24Apr2009
Averticalbar
Performancedataintheformatlabel=value.Multipleentriesarespace
separated
ThescriptalsoreturnsanexitstatusasdefinedbyNagios0=OK,1=WARNING,2
=CRITICAL,3=UNKNOWN.
Tomakeuseofacommandscript,theeasiestmethodistosetupaZenossperformance
datacollectortemplate.Notethatitisgoodpracticetocreatetemplatesatadevice
classlevelotherwise,ifitiscreatedforaspecificdevice,thereisnosimplewayto
laterapplythattemplatetootherdevices.DataisactuallycollectedbyZenoss's
zencommanddaemon.
Aperformancedatacollectiontemplatehasanumberofelements:
DataSources
howtocollectdata
Thresholds
rangesforhealthydata
GraphDefinitions
whattoplotandhowtoplotit
TheDataSourcespecifieswhatcommandtorun,wheretorunit,andhowtorunit.
Figure26:DefiningtheprocsDataSourceintheraddle_proc_checkperformancedatacollectortemplate
IntheDataSourcedialogue:
38
Skills1stLtd
24Apr2009
SourceTypeshouldbeCOMMAND.ThedropdownwillcertainlyofferSNMP
asanotheralternative.IfotherZenPacksareinstalledthenothertypesmay
alsobeavailable.
Tousethisdatasourceonremotesystemsoverssh,ensuretheUseSSHboxis
True
TheComponentfieldisusefulwhenprocessingeventsforexample,itisoneof
thefieldsusedtodeterminewhetheraneventisaduplicate.Thecomponent
fielddoesnotneedtoalreadyexistanywhereelseitissimplyatextstring.
raddlehasbeenusedhere.
TheEventClassfieldwilldefaultto/Cmd/Failbutcouldusefullybesettoan
existing,locallydefinedeventclass.Heretheclassissetto/Skills/raddle.
TheCycleTimeishowfrequentlythezencommanddaemonwillrunthescript.
TheCommandTemplateisthescriptyouwanttorun.Ifafullyqualified
pathnameisprovidedthenitwillbehonoured;otherwise,zencommandwill
consultthezPropertiesforadeviceandwillprependthezCommandPathtothe
filenamegivenintheCommandTemplate.
Don'tforgettousetheSavebuttonaftercompletingdefinitions
NotethattheTestbuttondoesnotappeartoworkforinvokingremote
commands.ItreturnsaNosuchfileordirectoryerror.Similarlythe
zentestcommandutilityreturnsthesameerrorforremotescripts.
Theeasiestwaytotestthescriptoversshistorunthezencommandwithfull
debug;forexample:
zencommandrunv10dbino.skills1st.co.uk
ThebottompartoftheDataSourcedialoguemapsthedatathatthescriptcollectsinto
ZenossDataPointsthatcanbethresholdedandgraphed.Rememberthatthescriptin
Figure25deliveredthreedatavaluesaftertheverticalbarontheoutputlineprocs,
serverNumandlinuxNum.ThedefinitionsoftheDataPointsmustmatchthese
labelnamesexactly.
39
Skills1stLtd
24Apr2009
Figure27:DefiningtheprocsDataPointintheprocsDataSource
Typically,DataPointdefinitionscanbeleftatdefaultshavingensuredthatthename
matchesthelabelthatthescriptdelivers.
TheZenossnameforaDataPointistheconcatenationoftheDataSourceandthe
DataPointnames;hence,inthescreenshotabove,theDataPointisprocs_procs.The
othertwoDataPointswillbeprocs_serverNumandprocs_linuxNum.Forthisreason,
itisimportantnottochangethenameoftheDataSourcewithoutdueconsideration
orDataPointsalreadyusedingraphsandthresholdswillbecomeundefined.
OncetheDataSourceandDataPointsaredefined,thresholdsandgraphscanbesetup
withinthetemplate.
Figure28:raddle_proc_checkperformancedatacollectortemplate
40
Skills1stLtd
24Apr2009
Ascanbeseeninthefollowingscreenshot,thresholdsarechosenbasedonthedefined
DataPoints.Eventsofaspecifiedclass,ofagivenseveritycanbegeneratedwhenthe
thresholdisexceeded.
Figure29:Definingathresholdfortheprocs_linuxNumDataPoint
Asmanygraphsasaredesiredcanbecreated.Inthisexample,asinglegraphwith
allthreeDataPointswillbedefined,includingthethreethresholds.
41
Skills1stLtd
24Apr2009
Figure30:raddle_procsGraphDefinitiontoplotDataPointsandThresholds
Thisperformancedatacollectortemplatewasdefinedfortheclassofdevices
/Server/Cmd.Toensurethatthetemplateisappliedtothehostbino.skills1st.co.uk,
usetheMore>Templatesdropdownmenufromthedevice'smainpage.Fromthere,
selectthedropdownandBindTemplatesmenu.Apopupboxallowsyoutoselect
templatestobind.Notethatyoushouldselectalltemplatesthatyouwantbound
(useCtrlkeytoselectmultipleoptions)justselectingthenewtemplatewilldeselect
anytemplatesalreadybound.
Figure31:Bindingmultipleperformancedatacollectiontemplatestoadevice
Oncethetemplateisboundtoadeviceorclassofdevices,datawillstarttoappear
underthePerformancetabofadevice.
42
Skills1stLtd
24Apr2009
Figure32:Performancegraphforraddle_procstemplate(thresholdsdisabled)
NoteinFigure32abovethatthresholdshavebeendisabledintheraddle_procs
template,hencenothresholdvaluesareshown.
Withcommanddrivenperformancedatacollectors,therearetwoopportunitiesfor
generatingevents:
UsingthresholdsonDataPointsasdescribedabove
Usingtheexitstatusfromthescript
IfascriptreturnsanexitstatusasdefinedbytheNagiospluginAPI,theneventsare
automaticallygeneratedwithaseveritycorrespondingtotheexitcode:
43
ScriptexitcodeofOK(0)
Zenosseventseverity=Clear(0)
ScriptexitcodeofWARNING(1)
Zenosseventseverity=Warning(3)
ScriptexitcodeofCRITICAL(2)
Zenosseventseverity=Error(4)
Skills1stLtd
24Apr2009
Figure33:EventconsoleshowingeventsgeneratedbyscriptDataSource
NotethattheeventClassandthecomponentfieldsoftheeventhavebeenpopulatedby
theDataSourceconfiguration.Thegoodnewseventautomaticallyclearsthebad
newseventsusingZenoss'sdefaulteventcorrelation.
Ifthetemplatethresholdsareenabledthenextraeventsarereceived,withtheir
configuredseverities.
Figure34:Eventconsoleshowingeventsgeneratedbyscriptdatasourceandthresholds
Again,thresholdgoodnewseventsautomaticallyclearbadnews.
44
Skills1stLtd
24Apr2009
Figure35:Eventhistoryshowing"goodnews"and"badnews"eventsfromscriptsandthresholds
Thresholdvaluesarealsoshownontheperformancegraphs.
45
Skills1stLtd
24Apr2009
Figure36:Performancegraphsfortheraddle_procstemplatedemonstratingenabledthresholds
Tobetterunderstandhowzencommandrunsscriptsandtohelpdebugging,modifythe
parametersforzencommandtoincreasedebugginginthelogfile
$ZENHOME/log/zencommand.log.Set:
logseverity10
andrecyclethezencommanddaemon.Thisconfigurationcaneitherbemodifiedinthe
GUIfromSettings>DaemonsandusetheeditconfiglinkandtheRestartbutton;
alternativelyedit$ZENHOME/etc/zencommand.confdirectlyandthenrestart
zencommandwithzencommandrestart(youwillneedtobethezenossuser).
46
Skills1stLtd
24Apr2009
Figure37:Fragmentof$ZENHOME/log/zencommand.logshowingraddle_proc_check_datapoints.sh
Thezencommand.logshows:
Theremotescriptbeingrunbyzen.SshClient,includingthereturnedoutput
zen.zencommandqueueinganevent,includingtheconfiguredeventClass,
componentandwiththeeventsummaryfieldsettothetextinformationoutput
(everythingbeforetheverticalbarinthescriptoutputline).TheeventKeyfield
issettotheDataSourcename.
zen.RRDUtilstoringawaythelatestvalues
zen.thresholdsandzen.MinMaxCheckcheckingthelatestvaluesagainstthe
configuredthresholds
Skills1stLtd
24Apr2009
cd/usr/local/zenoss/common/libexec
./check_procsw1:4c1:10Csshd
NextensurethatthezPropertiesforthisdevicearesetupintheZenossGUItopermit
sshcommunicationsbetweentheZenossmanagerandtheremotedevice.Thisis
exactlythesameasdescribedinFigure24aboveforrunningstandalonessh
commands.
ToutiliseinformationfromtheNagiosplugin,setupaZenossperformancedata
collectiontemplateinthesamewayasdescribedabove.
Figure38:PerformancedatacollectiontemplateusingsshtorunremoteNagioscheck_procsplugin
Notethatinthiscase,thefullpathtothepluginissupplied.Itischeckingforexactly
3occurrencesofashortprocessnamevmwarevmx.Thecomponentfieldissetto
nagios_check_procsandaneweventclassof/Skills/nagios/check_procshasbeen
createdforusewiththistemplate.
TheadvantageofusingZenosspluginsisthattherearelotsavailableinthe
community.Thedisadvantageisthatmanyofthemdonotprovideperformancedata
values,simplyastatusandinformationaltext.ThismeansthatcreatingDataPoints
inZenossfromwhichtocreatethresholdsandgraphsisnotuseful;although
DataPointscanbespecified,theyhavetoexactlymatchthelabelofthedatadelivered
bytheplugin(whichdoesn'texist),soanygraphsbasedonsuchDataPointswillhave
nodata.
Thisdoesn'tmeanthattheNagioscheck_procspluginisnecessarilyuseless.The
plugincanspecifywarningandcriticalrangesformetrics(suchasnumberof
instancesofaprocess,memoryused,percentageCPUused)anddeliversanexitstatus
fromthescriptwhichwilldriveZenossevents.
48
Skills1stLtd
24Apr2009
Figure39:EventconsolewithwarningeventgeneratedbyNagioscheck_procsplugin
Asdiscussedwithstandaloneevents,theNagiosplugingoodnewsstatuswilldeliver
aZenosseventwithClearedstatus;thusNagiosdrivengoodnewseventswill
automaticallyclosetheircorrespondingbadnewsevents.
Figure40:Eventhistoryconsolewith"goodnews"and"badnews"eventsgeneratedbyNagiosplugin
Thisshowsthataparameterisrequiredtodescribetheprocess(es)tobemonitored.
Thisparameterwillmatchanyprocessthatincludesthatstringsoprocessescanbe
specifiedasfullyqualifiedpathnamesorshortcommands(tryusingzenplugin.py
processkonasystemthatuseskdeitreportsthetotalsofresourcesofallprocesses
thatincludetheletterk).
49
Skills1stLtd
24Apr2009
Figure41:Invocationsofzenplugin.pyprocesswithdifferentprocessmatchingparameters
Thereappearstobenowaytospecifyawaytocountinstancesofaprocess.Ifthere
aremultipleprocessesthatmatchthedescription,thenthecpuandmemoryvalues
aresummedforallmatchingprocesses.
Thepluginscriptshowsthatrawdataisgatheredbyreadingthestatfileforthe
processin/proc/<processid>.Thecpufigureisderivedbyaddingtheuserand
systemvaluesandisreportedinjiffies(1/100second)thatthisprocesshasbeen
scheduled.Thememoryfiguretakestheresidentsetsizeoftheprocess(plus3for
administrativepurposes),andmultipliesbypagesizetoproduceamemoryfigurein
bytes.
50
Skills1stLtd
24Apr2009
Figure42:Zenosspluginlinux2.pyshowingprocesscollectioncode
51
Skills1stLtd
24Apr2009
ZenosspluginscanbeusedinexactlythesamewayasstandalonescriptsorNagios
plugins.Performancedatacollectortemplatescanbecreatedthatcallzenplugin.py
onaremotesystem,usingthesshzPropertiesconfiguredforadevice.
Figure43:PerformancedatacollectiontemplateusingZenossprocessplugin
InFigure43anewcomponentvaluehasbeencreated,zenplugin_process,andanew
eventclassisreferenced(/Skills/zenplugin/process).NotethattheCommand
Templatefieldspecifiesashortnameforzenplugin.py;thisassumesthatanydevice
thathasthetemplatebound,willhavethezCommandPathzPropertysetto
/usr/local/bin.
ThenamesoftheDataPointsexactlymatchthelabelnamesofthecpuandmem
outputoftheZenossplugin.NotethatthecpuDataPointhastheCOUNTERtype;
sincecpuisthenumberofjiffiesthattheprocesshasbeenscheduled,itwillalwaysbe
anincreasingnumber,whereasmemcangoupanddownsotheGAUGEtypeismore
appropriateformem.TheCOUNTERdatatypemeansthatanygraphsusingitwill
automaticallydisplayrateofchange,ratherthantheabsolutevaluewhichissimplya
largenumberthatgraduallyincreases.
52
Skills1stLtd
24Apr2009
Figure44:PerformancegraphsandthresholdsfordatagatheredbytheZenossprocessplugin
ZenosspluginsprovidedifferentbenefitstotheNagiosplugins.Youcannotcount
instancesofaprocessbut,ifyouwantthetotalcpuandmemoryresourceusedbythe
totalnumberofinvocationsofaparticularprocess,thentheZenossprocessplugin
matchesthatparadigmnicely.TheotheradvantageofZenosspluginsisthattheynot
onlydeliveroutputinNagiosAPIformat,buttheyalsotendtodeliverperformance
datainadditiontothestatusandinformationtext;hencetheyaremoreamenableto
beinguseddirectlytosupplydataforgraphsandthresholds(indeed,allthestandard
templatesfor/Server/CmddevicesusesZenossplugins).
ThenegativesideisthatthereisnowaywithintheZenossprocessplugintoset
acceptablethresholdsforcpuandmemorysotheexitstatusisalwaysOKunlessthe
pluginitselfhadproblemsretrievingdata..Thismeansthatifeventsarerequiredon
thresholdsbasedontheZenossplugindata,thenthresholdsmustbesetupwithinthe
Zenossperformancedatacollectortemplatetherearenoautomaticevents.
53
Skills1stLtd
24Apr2009
Figure45:ThresholdonmemoryforZenossprocesspluginDataPoint
NotethatthethresholdshownabovedemonstratestheuseoftheEscalateCountfield.
Whenthethirdsimilareventarrives,theseveritywillbeescalatedfromthe
configuredWarningtothenextlevel,Error.
Figure46:Eventconsoleshowing/Skills/zenplugin/processthresholdeventescalatedfromWarningto
Error
EventsaregeneratedbyZenosswhenthethresholdisexceededand,aswithallthe
othertechniquesalreadydiscussed,goodnewsthresholdswillautomaticallyclose
badnewsthresholdevents.
Tosummarise,theZenosspluginsarebetterperformancedatacollectorsandthe
Nagiospluginsmoreeasilydeliverthresholdevents.
6 Conclusions
Anumberofdifferentprocessmonitoringtechniqueshavebeendiscussed,eachhaving
theirownmerits.IfdevicescannotbemonitoredusingSNMP,perhapsbecauseof
54
Skills1stLtd
24Apr2009
firewalllimitations,thensshprovidesaccessforstandalonecommands,Nagios
pluginsandZenossplugins.Thechoicebetweenthesethreedependsonwhataspects
ofprocessmonitoringarerequired.
Standalonescriptsarethemostflexiblebutyouhavetodevelop,test,maintainand
deliverthem.
ManyNagiospluginsareavailableinthecommunitybutthestandardcheck_procs
offeringdoesnotdeliverperformancedataandthereisstillthetaskofdeliveringthe
Nagiosplugintotheremotesystem.check_procsdoesprovideaflexiblewayfor
definingahealthyprocessandcanautomaticallygenerateeventsbasedonthis
health.
ZenosspluginsalsoneedinstallingremotelyandaddtheprerequisiteofaPython
environment,buttheZenossprocesspluginisgoodfordeliveringcpuandmemory
performancedataforthecombinedinstancesofagivenprocess.Ifeventsare
required,theyneedtobeconfiguredthroughthresholdsonperformancedata
collectiontemplates.
Oneoftheadvantagesofusingperformancedatacollectiontemplates,drivenby
zencommand,isthatyoucontrolthedatacollectionintervalattheDataSourcelevel.
IfperformancedataiscollectedusingSNMP,thereisasinglepollinginterval(default
5mins)foralldatacollectedbythezenperfsnmpdaemon.
SNMPisthesimple,defaultmethodofdiscoveringandmonitoringprocessesandis
usedbyZenoss'szenmodelerandzenprocessdaemons,relyingontheHostResources
MIB.Thezenprocessdaemonhastheadvantageofverylowadministratorsetup
timeasperformanceinformationisautomaticallygatheredformonitoredprocesses
andeventsareautomaticallygeneratedifaprocessisnolongerdetected.Provided
targetssupportSNMPandHostResources,thereisnoagentsetupbeyondbasic
configurationoftheSNMPagent.ThenegativeaspectofusingthebuiltinZenoss
methodstoconfigure,discoverandmonitorprocesses,isthattheyarestillalittle
quirkyanddonotalwaysdelivertheresultsexpected.
ForenvironmentswhereSNMPagentconfigurationskillsexist,thenetsnmpagent
canbeconfiguredwellbeyondtheabilityoftheHostResourcesMIBbyusingthe
UCDSNMPMIBprocessmonitoringtable.Eventscanbegeneratedbyincorporating
theDisManEventMIBandautomaticrecoveryactionscanalsobeenabledatthe
agent.Fortimecriticalprocessmonitoring,thisshouldbethemostresponsive
solutionasmonitoringandactioncanbothbetakenatthemonitoreddevice;thereis
nopollingintervalbetweenZenossmanagerandmanageddevicebeforeaneventis
received.Thenegativesideofextensiveagentconfigurationisthatitreallyonly
provideseventinformation;thereisnoperformancedataprovidedbythissolution.
Inpractise,someorganisationmaydeploycombinationsofalltheseprocess
monitoringtechniques,inordertosatisfytheirrequirements.
55
Skills1stLtd
24Apr2009
References
1. netsnmpSNMPagentfromhttp://www.netsnmp.org/
2. HostResourcesMIB,RFC2790obsoletesRFC1514
http://www.ietf.org/rfc/rfc2790.txtandhttp://www.ietf.org/rfc/rfc1514.txt
3. UCDSNMPMIBhttp://www.netsnmp.org/docs/mibs/UCDSNMPMIB.txt
4. DisManEventMIB,RFC2981,http://www.ietf.org/rfc/rfc2981.txt
5. NagiospluginAPIhttp://nagiosplug.sourceforge.net/developer
guidelines.html#PLUGOUTPUT
6. ZenossFAQhttp://www.zenoss.com/community/docs/faqs/faqenglish/
7. ZenossHowToforZenossplugins
http://www.zenoss.com/community/docs/howtos/zenossplugins
8. Zenossdownloadsitehttp://www.zenoss.com/download/links?creg=no
9. ZenossEventManagement,byJaneCurry
http://www.zenoss.com/Members/jcurry/zenoss_event_management_paper.pdf/view
10. LearningPythonbyMarkLutz,publishedbyO'Reilly
11. ZenossAdministrationGuidehttp://www.zenoss.com/community/docs
Acknowledgements
56
Skills1stLtd
24Apr2009