You are on page 1of 26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

Gotothefirst,previous,next,lastsection,tableofcontents.

ALibraryofawkFunctions
Thischapterpresentsalibraryofusefulawkfunctions.Thesampleprogramspresentedlater(seesection
PracticalawkPrograms)usethesefunctions.Thefunctionsarepresentedhereinaprogressionfrom
simpletocomplex.
sectionExtractingProgramsfromTexinfoSourceFiles,presentsaprogramthatyoucanusetoextract
thesourcecodefortheseexamplelibraryfunctionsandprogramsfromtheTexinfosourceforthisbook.
(Thishasalreadybeendoneaspartofthegawkdistribution.)
Ifyouhavewrittenoneormoreuseful,generalpurposeawkfunctions,andwouldliketocontributethem
forasubsequenteditionofthisbook,pleasecontacttheauthor.SeesectionReportingProblemsand
Bugs,forinformationondoingthis.Don'tjustsendcode,asyouwillberequiredtoeitherplaceyour
codeinthepublicdomain,publishitundertheGPL(seesectionGNUGENERALPUBLICLICENSE),
orassignthecopyrightinittotheFreeSoftwareFoundation.
PortabilityNotes:Whattodoifyoudon'thavegawk.
NextfileFunction:Twoimplementationsofanextfilefunction.
AssertFunction:Afunctionforassertionsinawkprograms.
OrdinalFunctions:Functionsforusingcharactersasnumbersandviceversa.
JoinFunction:Afunctiontojoinanarrayintoastring.
MktimeFunction:Afunctiontoturnadateintoatimestamp.
GettimeofdayFunction:Afunctiontogetformattedtimes.
FiletransFunction:Afunctionforhandlingdatafiletransitions.
GetoptFunction:Afunctionforprocessingcommandlinearguments.
PasswdFunctions:Functionsforgettinguserinformation.
GroupFunctions:Functionsforgettinggroupinformation.
LibraryNames:Howtobestnameprivateglobalvariablesinlibraryfunctions.

SimulatinggawkspecificFeatures
TheprogramsinthischapterandinsectionPracticalawkPrograms,freelyusefeaturesthatarespecificto
gawk.Thissectionbrieflydiscusseshowyoucanrewritetheseprogramsfordifferentimplementationsof
awk.
Diagnosticerrormessagesaresentto`/dev/stderr'.Use`|"cat1>&2"'insteadof`>"/dev/stderr"',
ifyoursystemdoesnothavea`/dev/stderr',orifyoucannotusegawk.
Anumberofprogramsusenextfile(seesectionThenextfileStatement),toskipanyremaininginputin
theinputfile.sectionImplementingnextfileasaFunction,showsyouhowtowriteafunctionthatwill
dothesamething.
Finally,someoftheprogramschoosetoignoreuppercaseandlowercasedistinctionsintheirinput.
TheydothisbyassigningonetoIGNORECASE.Youcanachievethesameeffectbyaddingthefollowing
ruletothebeginningoftheprogram:
http://www.math.utah.edu/docs/info/gawk_16.html

1/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

#ignorecase
{$0=tolower($0)}

Also,verifythatallregexpandstringconstantsusedincomparisonsonlyuselowercaseletters.

ImplementingnextfileasaFunction
ThenextfilestatementpresentedinsectionThenextfileStatement,isagawkspecificextension.Itis
notavailableinotherimplementationsofawk.Thissectionshowstwoversionsofanextfilefunction
thatyoucanusetosimulategawk'snextfilestatementifyoucannotusegawk.
Hereisafirstattemptatwritinganextfilefunction.
#nextfileskipremainingrecordsincurrentfile
#thisshouldbereadinbeforethe"main"awkprogram
functionnextfile(){_abandon_=FILENAME;next}
_abandon_==FILENAME{next}

Thisfileshouldbeincludedbeforethemainprogram,becauseitsuppliesarulethatmustbeexecuted
first.Thisrulecomparesthecurrentdatafile'sname(whichisalwaysintheFILENAMEvariable)toa
privatevariablenamed_abandon_.Ifthefilenamematches,thentheactionpartoftheruleexecutesa
nextstatement,togoontothenextrecord.(Theuseof`_'inthevariablenameisaconvention.Itis
discussedmorefullyinsectionNamingLibraryFunctionGlobalVariables.)
Theuseofthenextstatementeffectivelycreatesaloopthatreadsalltherecordsfromthecurrentdata
file.Eventually,theendofthefileisreached,andanewdatafileisopened,changingthevalueof
FILENAME.Oncethishappens,thecomparisonof_abandon_toFILENAMEfails,andexecutioncontinues
withthefirstruleofthe"real"program.
Thenextfilefunctionitselfsimplysetsthevalueof_abandon_andthenexecutesanextstatementto
starttheloopgoing.(16)
Thisinitialversionhasasubtleproblem.Whathappensifthesamedatafileislistedtwiceonthe
commandline,onerightaftertheother,orevenwithjustavariableassignmentbetweenthetwo
occurrencesofthefilename?
Insuchacase,thiscodewillskiprightthroughthefile,asecondtime,eventhoughitshouldstopwhenit
getstotheendofthefirstoccurrence.Hereisasecondversionofnextfilethatremediesthisproblem.
#nextfileskipremainingrecordsincurrentfile
#correctlyhandlesuccessiveoccurrencesofthesamefile
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May,1993
#thisshouldbereadinbeforethe"main"awkprogram
functionnextfile(){_abandon_=FILENAME;next}
_abandon_==FILENAME{
if(FNR==1)
_abandon_=""
http://www.math.utah.edu/docs/info/gawk_16.html

2/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

else
next
}

Thenextfilefunctionhasnotchanged.Itsets_abandon_equaltothecurrentfilenameandthenexecutes
anextsatement.ThenextstatementreadsthenextrecordandincrementsFNR,soFNRisguaranteedto
haveavalueofatleasttwo.However,ifnextfileiscalledforthelastrecordinthefile,thenawkwill
closethecurrentdatafileandmoveontothenextone.Upondoingso,FILENAMEwillbesettothename
ofthenewfile,andFNRwillberesettoone.Ifthisnextfileisthesameasthepreviousone,_abandon_
willstillbeequaltoFILENAME.However,FNRwillbeequaltoone,tellingusthatthisisanewoccurrence
ofthefile,andnottheonewewerereadingwhenthenextfilefunctionwasexecuted.Inthatcase,
_abandon_isresettotheemptystring,sothatfurtherexecutionsofthisrulewillfail(untilthenexttime
thatnextfileiscalled).
IfFNRisnotone,thenwearestillintheoriginaldatafile,andtheprogramexecutesanextstatementto
skipthroughit.
Animportantquestiontoaskatthispointis:"Giventhatthefunctionalityofnextfilecanbeprovided
withalibraryfile,whyisitbuiltintogawk?"Thisisanimportantquestion.Addingfeaturesforlittle
reasonleadstolarger,slowerprogramsthatarehardertomaintain.
Theansweristhatbuildingnextfileintogawkprovidessignificantgainsinefficiency.Ifthenextfile
functionisexecutedatthebeginningofalargedatafile,awkstillhastoscantheentirefile,splittingitup
intorecords,justtoskipoverit.Thebuiltinnextfilecansimplyclosethefileimmediatelyandproceed
tothenextone,savingalotoftime.Thisisparticularlyimportantinawk,sinceawkprogramsare
generallyI/Obound(i.e.theyspendmostoftheirtimedoinginputandoutput,insteadofperforming
computations).

Assertions
Whenwritinglargeprograms,itisoftenusefultobeabletoknowthataconditionorsetofconditionsis
true.Beforeproceedingwithaparticularcomputation,youmakeastatementaboutwhatyoubelieveto
bethecase.Suchastatementisknownasan"assertion."TheClanguageprovidesan<assert.h>header
fileandcorrespondingassertmacrothattheprogrammercanusetomakeassertions.Ifanassertion
fails,theassertmacroarrangestoprintadiagnosticmessagedescribingtheconditionthatshouldhave
beentruebutwasnot,andthenitkillstheprogram.InC,usingassertlooksthis:
#include<assert.h>
intmyfunc(inta,doubleb)
{
assert(a<=5&&b>=17);
...
}

Iftheassertionfailed,theprogramwouldprintamessagesimilartothis:
prog.c:5:assertionfailed:a<=5&&b>=17

TheANSIClanguagemakesitpossibletoturntheconditionintoastringforuseinprintingthe
diagnosticmessage.Thisisnotpossibleinawk,sothisassertfunctionalsorequiresastringversionof
theconditionthatisbeingtested.
http://www.math.utah.edu/docs/info/gawk_16.html

3/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

#assertassertthataconditionistrue.Otherwiseexit.
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May,1993
functionassert(condition,string)
{
if(!condition){
printf("%s:%d:assertionfailed:%s\n",
FILENAME,FNR,string)>"/dev/stderr"
_assert_exit=1
exit1
}
}
END{
if(_assert_exit)
exit1
}

Theassertfunctionteststheconditionparameter.Ifitisfalse,itprintsamessagetostandarderror,
usingthestringparametertodescribethefailedcondition.Itthensetsthevariable_assert_exittoone,
andexecutestheexitstatement.TheexitstatementjumpstotheENDrule.IftheENDrulesfinds
_assert_exittobetrue,thenitexitsimmediately.
ThepurposeoftheENDrulewithitstestistokeepanyotherENDrulesfromrunning.Whenanassertion
fails,theprogramshouldexitimmediately.Ifnoassertionsfail,then_assert_exitwillstillbefalsewhen
theENDruleisrunnormally,andtherestoftheprogram'sENDruleswillexecute.Forallofthistowork
correctly,`assert.awk'mustbethefirstsourcefilereadbyawk.
Youwouldusethisfunctioninyourprogramsthisway:
functionmyfunc(a,b)
{
assert(a<=5&&b>=17,"a<=5&&b>=17")
...
}

Iftheassertionfailed,youwouldseeamessagelikethis:
mydata:1357:assertionfailed:a<=5&&b>=17

Thereisaproblemwiththisversionofassert,thatitmaynotbepossibletoworkaround.AnENDruleis
automaticallyaddedtotheprogramcallingassert.Normally,ifaprogramconsistsofjustaBEGINrule,
theinputfilesand/orstandardinputarenotread.However,nowthattheprogramhasanENDrule,awkwill
attempttoreadtheinputdatafiles,orstandardinput(seesectionStartupandCleanupActions),most
likelycausingtheprogramtohang,waitingforinput.
Justanoteonprogrammingstyle.YoumayhavenoticedthattheENDruleusesbackslashcontinuation,
withtheopenbraceonalinebyitself.Thisissothatitmorecloselyresemblesthewayfunctionsare
written.Manyoftheexamplesinthischapterandthenextoneusethisstyle.Youcandecideforyourself
ifyoulikewritingyourBEGINandENDrulesthisway,ornot.

TranslatingBetweenCharactersandNumbers
Onecommercialimplementationofawksuppliesabuiltinfunction,ord,whichtakesacharacterand
http://www.math.utah.edu/docs/info/gawk_16.html

4/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

returnsthenumericvalueforthatcharacterinthemachine'scharacterset.Ifthestringpassedtoordhas
morethanonecharacter,onlythefirstoneisused.
Theinverseofthisfunctionischr(fromthefunctionofthesamenameinPascal),whichtakesanumber
andreturnsthecorrespondingcharacter.
Bothfunctionscanbewrittenverynicelyinawkthereisnorealreasontobuildthemintotheawk
interpreter.
#ord.awkdoordandchr
#
#Globalidentifiers:
#_ord_:numericalvaluesindexedbycharacters
#_ord_init:functiontoinitialize_ord_
#
#ArnoldRobbins
#arnold@gnu.ai.mit.edu
#PublicDomain
#16January,1992
#20July,1992,revised
BEGIN{_ord_init()}
function_ord_init(low,high,i,t)
{
low=sprintf("%c",7)#BELisascii7
if(low=="\a"){#regularascii
low=0
high=127
}elseif(sprintf("%c",128+7)=="\a"){
#ascii,markparity
low=128
high=255
}else{#ebcdic(!)
low=0
high=255
}
for(i=low;i<=high;i++){
t=sprintf("%c",i)
_ord_[t]=i
}
}

Someexplanationofthenumbersusedbychrisworthwhile.Themostprominentcharactersetinuse
todayisASCII.Althoughaneightbitbytecanhold256distinctvalues(fromzeroto255),ASCIIonly
definescharactersthatusethevaluesfromzeroto127.(17)Atleastonecomputermanufacturerthatwe
knowofusesASCII,butwithmarkparity,meaningthattheleftmostbitinthebyteisalwaysone.What
thismeansisthatonthosesystems,charactershavenumericvaluesfrom128to255.Finally,large
mainframesystemsusetheEBCDICcharacterset,whichusesall256values.Whilethereareother
charactersetsinuseonsomeoldersystems,theyarenotreallyworthworryingabout.
functionord(str,c)
{
#onlyfirstcharacterisofinterest
c=substr(str,1,1)
return_ord_[c]
}
http://www.math.utah.edu/docs/info/gawk_16.html

5/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

functionchr(c)
{
#forcectobenumericbyadding0
returnsprintf("%c",c+0)
}
####testcode####
#BEGIN\
#{
#for(;;){
#printf("enteracharacter:")
#if(getlinevar<=0)
#break
#printf("ord(%s)=%d\n",var,ord(var))
#}
#}

Anobviousimprovementtothesefunctionswouldbetomovethecodeforthe_ord_initfunctioninto
thebodyoftheBEGINrule.Itwaswrittenthiswayinitiallyforeaseofdevelopment.
Thereisa"testprogram"inaBEGINrule,fortestingthefunction.Itiscommentedoutforproductionuse.

MerginganArrayIntoaString
Whendoingstringprocessing,itisoftenusefultobeabletojoinallthestringsinanarrayintoonelong
string.Thefollowingfunction,join,accomplishesthistask.Itisusedlaterinseveraloftheapplication
programs(seesectionPracticalawkPrograms).
Goodfunctiondesignisimportantthisfunctionneedstobegeneral,butitshouldalsohaveareasonable
defaultbehavior.Itiscalledwithanarrayandthebeginningandendingindicesoftheelementsinthe
arraytobemerged.Thisassumesthatthearrayindicesarenumericareasonableassumptionsincethe
arraywaslikelycreatedwithsplit(seesectionBuiltinFunctionsforStringManipulation).
#join.awkjoinanarrayintoastring
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May1993
functionjoin(array,start,end,sep,result,i)
{
if(sep=="")
sep=""
elseif(sep==SUBSEP)#magicvalue
sep=""
result=array[start]
for(i=start+1;i<=end;i++)
result=resultseparray[i]
returnresult
}

Anoptionaladditionalargumentistheseparatortousewhenjoiningthestringsbacktogether.Ifthe
callersuppliesanonemptyvalue,joinusesit.Ifitisnotsupplied,itwillhaveanullvalue.Inthiscase,
joinusesasingleblankasadefaultseparatorforthestrings.IfthevalueisequaltoSUBSEP,thenjoin
joinsthestringswithnoseparatorbetweenthem.SUBSEPservesasa"magic"valuetoindicatethatthere
shouldbenoseparationbetweenthecomponentstrings.
http://www.math.utah.edu/docs/info/gawk_16.html

6/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

Itwouldbeniceifawkhadanassignmentoperatorforconcatenation.Thelackofanexplicitoperatorfor
concatenationmakesstringoperationsmoredifficultthantheyreallyneedtobe.

TurningDatesIntoTimestamps
Thesystimefunctionbuiltintogawkreturnsthecurrenttimeofdayasatimestampin"secondssincethe
Epoch."Thistimestampcanbeconvertedintoaprintabledateofalmostinfinitelyvariableformatusing
thebuiltinstrftimefunction.(Formoreinformationonsystimeandstrftime,seesectionFunctionsfor
DealingwithTimeStamps.)
Aninterestingbutdifficultproblemistoconvertareadablerepresentationofadatebackintoa
timestamp.TheANSIClibraryprovidesamktimefunctionthatdoesthebasicjob,convertingacanonical
representationofadateintoatimestamp.
Itwouldappearatfirstglancethatgawkwouldhavetosupplyamktimebuiltinfunctionthatwassimplya
"hook"totheClanguageversion.Infactthough,mktimecanbeimplementedentirelyinawk.
Hereisaversionofmktimeforawk.Ittakesasimplerepresentationofthedateandtime,andconvertsit
intoatimestamp.
Thecodeispresentedhereintermixedwithexplanatoryprose.InsectionExtractingProgramsfrom
TexinfoSourceFiles,youwillseehowtheTexinfosourcefileforthisbookcanbeprocessedtoextract
thecodeintoasinglesourcefile.
TheprogrambeginswithadescriptivecommentandaBEGINrulethatinitializesatable_tm_months.This
tableisatwodimensionalarraythathasthelengthsofthemonths.Thefirstindexiszeroforregular
years,andoneforleapyears.Thevaluesarethesameforallthemonthsinbothkindsofyears,except
forFebruarythustheuseofmultipleassignment.
#mktime.awkconvertacanonicaldaterepresentation
#intoatimestamp
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May1993
BEGIN\
{
#Initializetableofmonthlengths
_tm_months[0,1]=_tm_months[1,1]=31
_tm_months[0,2]=28;_tm_months[1,2]=29
_tm_months[0,3]=_tm_months[1,3]=31
_tm_months[0,4]=_tm_months[1,4]=30
_tm_months[0,5]=_tm_months[1,5]=31
_tm_months[0,6]=_tm_months[1,6]=30
_tm_months[0,7]=_tm_months[1,7]=31
_tm_months[0,8]=_tm_months[1,8]=31
_tm_months[0,9]=_tm_months[1,9]=30
_tm_months[0,10]=_tm_months[1,10]=31
_tm_months[0,11]=_tm_months[1,11]=30
_tm_months[0,12]=_tm_months[1,12]=31
}

ThebenefitofmergingmultipleBEGINrules(seesectionTheBEGINandENDSpecialPatterns)is
particularlyclearwhenwritinglibraryfiles.Functionsinlibraryfilescancleanlyinitializetheirown
privatedataandalsoprovidecleanupactionsinprivateENDrules.
http://www.math.utah.edu/docs/info/gawk_16.html

7/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

Thenextfunctionisasimpleonethatcomputeswhetheragivenyearisorisnotaleapyear.Ifayearis
evenlydivisiblebyfour,butnotevenlydivisibleby100,orifitisevenlydivisibleby400,thenitisa
leapyear.Thus,1904wasaleapyear,1900wasnot,but2000willbe.
#decideifayearisaleapyear
function_tm_isleap(year,ret)
{
ret=(year%4==0&&year%100!=0)||
(year%400==0)
returnret
}

Thisfunctionisonlyusedafewtimesinthisfile,anditscomputationcouldhavebeenwritteninline(at
thepointwhereit'sused).Makingitaseparatefunctionmadetheoriginaldevelopmenteasier,andalso
avoidsthepossibilityoftypingerrorswhenduplicatingthecodeinmultipleplaces.
Thenextfunctionismoreinteresting.Itdoesmostoftheworkofgeneratingatimestamp,whichis
convertingadateandtimeintosomenumberofsecondssincetheEpoch.Thecallerpassesanarray
(ratherimaginativelynameda)containingsixvalues:theyearincludingcentury,themonthasanumber
betweenoneand12,thedayofthemonth,thehourasanumberbetweenzeroand23,theminuteinthe
hour,andthesecondswithintheminute.
Thefunctionusesseverallocalvariablestoprecomputethenumberofsecondsinanhour,secondsina
day,andsecondsinayear.Often,similarCcodesimplywritesouttheexpressioninline,expectingthe
compilertodoconstantfolding.E.g.,mostCcompilerswouldturn`60*60'into`3600'atcompile
time,insteadofrecomputingiteverytimeatruntime.Precomputingthesevaluesmakesthefunction
moreefficient.
#convertadateintoseconds
function_tm_addup(a,total,yearsecs,daysecs,
hoursecs,i,j)
{
hoursecs=60*60
daysecs=24*hoursecs
yearsecs=365*daysecs
total=(a[1]1970)*yearsecs
#extradayforleapyears
for(i=1970;i<a[1];i++)
if(_tm_isleap(i))
total+=daysecs
j=_tm_isleap(a[1])
for(i=1;i<a[2];i++)
total+=_tm_months[j,i]*daysecs
total+=(a[3]1)*daysecs
total+=a[4]*hoursecs
total+=a[5]*60
total+=a[6]
returntotal
}

ThefunctionstartswithafirstapproximationofallthesecondsbetweenMidnight,January1,1970,(18)
http://www.math.utah.edu/docs/info/gawk_16.html

8/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

andthebeginningofthecurrentyear.Itthengoesthroughallthoseyears,andforeveryleapyear,adds
anadditionalday'sworthofseconds.
Thevariablejholdseitheroneorzero,ifthecurrentyearisorisnotaleapyear.Foreverymonthinthe
currentyearpriortothecurrentmonth,itaddsthenumberofsecondsinthemonth,usingtheappropriate
entryinthe_tm_monthsarray.
Finally,itaddsinthesecondsforthenumberofdayspriortothecurrentday,andthenumberofhours,
minutes,andsecondsinthecurrentday.
TheresultisacountofsecondssinceJanuary1,1970.Thisvalueisnotyetwhatisneededthough.The
reasonwhyisdescribedshortly.
Themainmktimefunctiontakesasinglecharacterstringargument.Thisstringisarepresentationofa
dateandtimeina"canonical"(fixed)form.Thisstringshouldbe"yearmonthdayhourminutesecond".
#mktimeconvertadateintoseconds,
#compensatefortimezone
functionmktime(str,res1,res2,a,b,i,j,t,diff)
{
i=split(str,a,"")#don'trelyonFS
if(i!=6)
return1
#forcenumeric
for(jina)
a[j]+=0
#validate
if(a[1]<1970||
a[2]<1||a[2]>12||
a[3]<1||a[3]>31||
a[4]<0||a[4]>23||
a[5]<0||a[5]>59||
a[6]<0||a[6]>61)
return1
res1=_tm_addup(a)
t=strftime("%Y%m%d%H%M%S",res1)
if(_tm_debug)
printf("(%s)>(%s)\n",str,t)>"/dev/stderr"
split(t,b,"")
res2=_tm_addup(b)
diff=res1res2
if(_tm_debug)
printf("diff=%dseconds\n",diff)>"/dev/stderr"
res1+=diff
returnres1
}
http://www.math.utah.edu/docs/info/gawk_16.html

9/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

Thefunctionfirstsplitsthestringintoanarray,usingspacesandtabsasseparators.Iftherearenotsix
elementsinthearray,itreturnsanerror,signaledasthevalue1.Next,itforceseachelementofthearray
tobenumeric,byaddingzerotoit.Thefollowing`if'statementthenmakessurethateachelementis
withinanallowablerange.(Thischeckingcouldbeextendedfurther,e.g.,tomakesurethatthedayofthe
monthiswithinthecorrectrangefortheparticularmonthsupplied.)Allofthisisessentiallypreliminary
setupanderrorchecking.
Recallthat_tm_addupgeneratedavalueinsecondssinceMidnight,January1,1970.Thisvalueisnot
directlyusableastheresultwewant,sincethecalculationdoesnotaccountforthelocaltimezone.In
otherwords,thevaluerepresentsthecountinsecondssincetheEpoch,butonlyforUTC(Universal
CoordinatedTime).IfthelocaltimezoneiseastorwestofUTC,thensomenumberofhoursshouldbe
eitheraddedto,orsubtractedfromtheresultingtimestamp.
Forexample,6:23p.m.inAtlanta,Georgia(USA),isnormallyfivehourswestof(behind)UTC.Itis
onlyfourhoursbehindUTCifdaylightsavingstimeisineffect.IfyouarecallingmktimeinAtlanta,with
theargument"1993523182312",theresultfrom_tm_addupwillbefor6:23p.m.UTC,whichisonly
2:23p.m.inAtlanta.Itisnecessarytoaddanotherfourhoursworthofsecondstotheresult.
HowcanmktimedeterminehowfarawayitisfromUTC?Thisissurprisinglyeasy.Thereturned
timestamprepresentsthetimepassedtomktimeasUTC.Thistimestampcanbefedbacktostrftime,
whichwillformatitasalocaltimei.e.asifitalreadyhadtheUTCdifferenceaddedintoit.Thisisdone
bygiving"%Y%m%d%H%M%S"tostrftimeastheformatargument.Itreturnsthecomputedtimestampin
theoriginalstringformat.TheresultrepresentsatimethataccountsfortheUTCdifference.Whenthe
newtimeisconvertedbacktoatimestamp,thedifferencebetweenthetwotimestampsisthedifference
(inseconds)betweenthelocaltimezoneandUTC.Thisdifferenceisthenaddedbacktotheoriginal
result.Anexampledemonstratingthisispresentedbelow.
Finally,thereisa"main"programfortestingthefunction.
BEGIN{
if(_tm_test){
printf"Enterdateasyyyymmddhhmmss:"
getline_tm_test_date

t=mktime(_tm_test_date)
r=strftime("%Y%m%d%H%M%S",t)
printf"Gotback(%s)\n",r
}
}

Theentireprogramusestwovariablesthatcanbesetonthecommandlinetocontroldebuggingoutput
andtoenablethetestinthefinalBEGINrule.Hereistheresultofatestrun.(Notethatdebuggingoutput
istostandarderror,andtestoutputistostandardoutput.)
$gawkfmktime.awkv_tm_test=1v_tm_debug=1
|Enterdateasyyyymmddhhmmss:1993523153510
error>(1993523153510)>(19930523113510)
error>diff=14400seconds
|Gotback(19930523153510)

Thetimeenteredwas3:35p.m.(15:35ona24hourclock),onMay23,1993.Thefirstlineofdebugging
outputshowstheresultingtimeasUTCfourhoursaheadofthelocaltimezone.Thesecondlineshows
thatthedifferenceis14400seconds,whichisfourhours.(Thedifferenceisonlyfourhours,since
daylightsavingstimeisineffectduringMay.)Thefinallineoftestoutputshowsthatthetimezone
http://www.math.utah.edu/docs/info/gawk_16.html

10/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

compensationalgorithmworksthereturnedtimeisthesameastheenteredtime.
Thisprogramdoesnotsolvethegeneralproblemofturninganarbitrarydaterepresentationintoa
timestamp.Thatproblemisveryinvolved.However,themktimefunctionprovidesafoundationupon
whichtobuild.Othersoftwarecanconvertmonthnamesintonumericmonths,andAM/PMtimesinto
24hourclocks,togeneratethe"canonical"formatthatmktimerequires.

ManagingtheTimeofDay
ThesystimeandstrftimefunctionsdescribedinsectionFunctionsforDealingwithTimeStamps,
providetheminimumfunctionalitynecessaryfordealingwiththetimeofdayinhumanreadableform.
Whilestrftimeisextensive,thecontrolformatsarenotnecessarilyeasytorememberorintuitively
obviouswhenreadingaprogram.
Thefollowingfunction,gettimeofday,populatesausersuppliedarraywithpreformattedtime
information.Itreturnsastringwiththecurrenttimeformattedinthesamewayasthedateutility.
#gettimeofdaygetthetimeofdayinausableformat
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain,May1993
#
#Returnsastringintheformatofoutputofdate(1)
#Populatesthearrayargumenttimewithindividualvalues:
#time["second"]seconds(059)
#time["minute"]minutes(059)
#time["hour"]hours(023)
#time["althour"]hours(012)
#time["monthday"]dayofmonth(131)
#time["month"]monthofyear(112)
#time["monthname"]nameofthemonth
#time["shortmonth"]shortnameofthemonth
#time["year"]yearwithincentury(099)
#time["fullyear"]yearwithcentury(19xxor20xx)
#time["weekday"]dayofweek(Sunday=0)
#time["altweekday"]dayofweek(Monday=0)
#time["weeknum"]weeknumber,Sundayfirstday
#time["altweeknum"]weeknumber,Mondayfirstday
#time["dayname"]nameofweekday
#time["shortdayname"]shortnameofweekday
#time["yearday"]dayofyear(0365)
#time["timezone"]abbreviationoftimezonename
#time["ampm"]AMorPMdesignation
functiongettimeofday(time,ret,now,i)
{
#gettimeonce,avoidsunnecessarysystemcalls
now=systime()
#returndate(1)styleoutput
ret=strftime("%a%b%d%H:%M:%S%Z%Y",now)
#clearouttargetarray
for(iintime)
deletetime[i]
#fillinvalues,forcenumericvaluestobe
#numericbyadding0
time["second"]=strftime("%S",now)+0
http://www.math.utah.edu/docs/info/gawk_16.html

11/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

time["minute"]=strftime("%M",now)+0
time["hour"]=strftime("%H",now)+0
time["althour"]=strftime("%I",now)+0
time["monthday"]=strftime("%d",now)+0
time["month"]=strftime("%m",now)+0
time["monthname"]=strftime("%B",now)
time["shortmonth"]=strftime("%b",now)
time["year"]=strftime("%y",now)+0
time["fullyear"]=strftime("%Y",now)+0
time["weekday"]=strftime("%w",now)+0
time["altweekday"]=strftime("%u",now)+0
time["dayname"]=strftime("%A",now)
time["shortdayname"]=strftime("%a",now)
time["yearday"]=strftime("%j",now)+0
time["timezone"]=strftime("%Z",now)
time["ampm"]=strftime("%p",now)
time["weeknum"]=strftime("%U",now)+0
time["altweeknum"]=strftime("%W",now)+0
returnret
}

Thestringindicesareeasiertouseandreadthanthevariousformatsrequiredbystrftime.Thealarm
programpresentedinsectionAnAlarmClockProgram,usesthisfunction.
Thegettimeofdayfunctionispresentedaboveasitwaswritten.Amoregeneraldesignforthisfunction
wouldhaveallowedtheusertosupplyanoptionaltimestampvaluethatwouldhavebeenusedinsteadof
thecurrenttime.

NotingDataFileBoundaries
TheBEGINandENDrulesareeachexecutedexactlyonce,atthebeginningandendrespectivelyofyour
awkprogram(seesectionTheBEGINandENDSpecialPatterns).We(thegawkauthors)oncehadauserwho
mistakenlythoughtthattheBEGINrulewasexecutedatthebeginningofeachdatafileandtheENDrule
wasexecutedattheendofeachdatafile.Wheninformedthatthiswasnotthecase,theuserrequested
thatweaddnewspecialpatternstogawk,namedBEGIN_FILEandEND_FILE,thatwouldhavethedesired
behavior.Heevensuppliedusthecodetodoso.
However,afteralittlethought,Icameupwiththefollowinglibraryprogram.Itarrangestocalltwouser
suppliedfunctions,beginfileandendfile,atthebeginningandendofeachdatafile.Besidessolvingthe
probleminonlynine(!)linesofcode,itdoessoportablythiswillworkwithanyimplementationofawk.
#transfile.awk
#
#Givetheuserahookforfilenametransitions
#
#Theusermustsupplyfunctionsbeginfile()andendfile()
#thateachtakethenameofthefilebeingstartedor
#finished,respectively.
#
#ArnoldRobbins,arnold@gnu.ai.mit.edu,January1992
#PublicDomain
FILENAME!=_oldfilename\
{
if(_oldfilename!="")
http://www.math.utah.edu/docs/info/gawk_16.html

12/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

endfile(_oldfilename)
_oldfilename=FILENAME
beginfile(FILENAME)
}
END{endfile(FILENAME)}

Thisfilemustbeloadedbeforetheuser's"main"program,sothattheruleitsupplieswillbeexecuted
first.
Thisrulereliesonawk'sFILENAMEvariablethatautomaticallychangesforeachnewdatafile.Thecurrent
filenameissavedinaprivatevariable,_oldfilename.IfFILENAMEdoesnotequal_oldfilename,thena
newdatafileisbeingprocessed,anditisnecessarytocallendfilefortheoldfile.Sinceendfileshould
onlybecalledifafilehasbeenprocessed,theprogramfirstcheckstomakesurethat_oldfilenameisnot
thenullstring.Theprogramthenassignsthecurrentfilenameto_oldfilename,andcallsbeginfilefor
thefile.Since,likeallawkvariables,_oldfilenamewillbeinitializedtothenullstring,thisruleexecutes
correctlyevenforthefirstdatafile.
TheprogramalsosuppliesanENDrule,todothefinalprocessingforthelastfile.SincethisENDrule
comesbeforeanyENDrulessuppliedinthe"main"program,endfilewillbecalledfirst.Onceagainthe
valueofmultipleBEGINandENDrulesshouldbeclear.
Thisversionhassameproblemasthefirstversionofnextfile(seesectionImplementingnextfileasa
Function).Ifthesamedatafileoccurstwiceinarowoncommandline,thenendfileandbeginfilewill
notbeexecutedattheendofthefirstpassandatthebeginningofthesecondpass.Thisversionsolves
theproblem.
#ftrans.awkhandledatafiletransitions
#
#usersuppliesbeginfile()andendfile()functions
#
#ArnoldRobbins,arnold@gnu.ai.mit.edu.November1992
#PublicDomain
FNR==1{
if(_filename_!="")
endfile(_filename_)
_filename_=FILENAME
beginfile(FILENAME)
}
END{endfile(_filename_)}

InsectionCountingThings,youwillseehowthislibraryfunctioncanbeused,andhowitsimplifies
writingthemainprogram.

ProcessingCommandLineOptions
MostutilitiesonPOSIXcompatiblesystemstakeoptionsor"switches"onthecommandlinethatcanbe
usedtochangethewayaprogrambehaves.awkisanexampleofsuchaprogram(seesectionCommand
LineOptions).Often,optionstakearguments,datathattheprogramneedstocorrectlyobeythe
commandlineoption.Forexample,awk's`F'optionrequiresastringtouseasthefieldseparator.The
firstoccurrenceonthecommandlineofeither`'orastringthatdoesnotbeginwith`'endsthe
options.
http://www.math.utah.edu/docs/info/gawk_16.html

13/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

MostUnixsystemsprovideaCfunctionnamedgetoptforprocessingcommandlinearguments.The
programmerprovidesastringdescribingtheoneletteroptions.Ifanoptionrequiresanargument,itis
followedinthestringwithacolon.getoptisalsopassedthecountandvaluesofthecommandline
arguments,andiscalledinaloop.getoptprocessesthecommandlineargumentsforoptionletters.Each
timearoundtheloop,itreturnsasinglecharacterrepresentingthenextoptionletterthatitfound,or`?'if
itfoundaninvalidoption.Whenitreturns1,therearenooptionsleftonthecommandline.
Whenusinggetopt,optionsthatdonottakeargumentscanbegroupedtogether.Furthermore,options
thattakeargumentsrequirethattheargumentbepresent.Theargumentcanimmediatelyfollowthe
optionletter,oritcanbeaseparatecommandlineargument.
Givenahypotheticalprogramthattakesthreecommandlineoptions,`a',`b',and`c',and`b'
requiresanargument,allofthefollowingarevalidwaysofinvokingtheprogram:
progabfoocdata1data2data3
progacbfoodata1data2data3
progacbfoodata1data2data3

Noticethatwhentheargumentisgroupedwithitsoption,therestofthecommandlineargumentis
consideredtobetheoption'sargument.Intheaboveexample,`acbfoo'indicatesthatallofthe`a',`
b',and`c'optionsweresupplied,andthat`foo'istheargumenttothe`b'option.
getoptprovidesfourexternalvariablesthattheprogrammercanuse.
optind

Theindexintheargumentvaluearray(argv)wherethefirstnonoptioncommandlineargument
canbefound.

optarg

Thestringvalueoftheargumenttoanoption.

opterr

Usuallygetoptprintsanerrormessagewhenitfindsaninvalidoption.Settingopterrtozero
disablesthisfeature.(Anapplicationmightwishtoprintitsownerrormessage.)

optopt

Theletterrepresentingthecommandlineoption.Whilenotusuallydocumented,mostversions
supplythisvariable.

ThefollowingCfragmentshowshowgetoptmightprocesscommandlineargumentsforawk.
int
main(intargc,char*argv[])
{
...
/*printourownmessage*/
opterr=0;
while((c=getopt(argc,argv,"v:f:F:W:"))!=1){
switch(c){
case'f':/*file*/
...
break;
case'F':/*fieldseparator*/
...
break;
case'v':/*variableassignment*/
...
break;
case'W':/*extension*/
http://www.math.utah.edu/docs/info/gawk_16.html

14/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

...
break;
case'?':
default:
usage();
break;
}
}
...
}

Asasidepoint,gawkactuallyusestheGNUgetopt_longfunctiontoprocessbothnormalandGNUstyle
longoptions(seesectionCommandLineOptions).
Theabstractionprovidedbygetoptisveryuseful,andwouldbequitehandyinawkprogramsaswell.
Hereisanawkversionofgetopt.Thisfunctionhighlightsoneofthegreatestweaknessesinawk,whichis
thatitisverypooratmanipulatingsinglecharacters.Repeatedcallstosubstrarenecessaryforaccessing
individualcharacters(seesectionBuiltinFunctionsforStringManipulation).
Thediscussionwalksthroughthecodeabitatatime.
#getoptdoClibrarygetopt(3)functioninawk
#
#arnold@gnu.ai.mit.edu
#Publicdomain
#
#Initialversion:March,1991
#Revised:May,1993
#Externalvariables:
#OptindindexofARGVforfirstnonoptionargument
#Optargstringvalueofargumenttocurrentoption
#Opterrifnonzero,printourowndiagnostic
#Optoptcurrentoptionletter
#Returns
#1atendofoptions
#?forunrecognizedoption
#<c>acharacterrepresentingthecurrentoption
#PrivateData
#_optiindexinmultiflagoption,e.g.,abc

Thefunctionstartsoutwithsomedocumentation:whowrotethecode,andwhenitwasrevised,followed
byalistoftheglobalvariablesituses,whatthereturnvaluesareandwhattheymean,andanyglobal
variablesthatare"private"tothislibraryfunction.Suchdocumentationisessentialforanyprogram,and
particularlyforlibraryfunctions.
functiongetopt(argc,argv,options,optl,thisopt,i)
{
optl=length(options)
if(optl==0)#nooptionsgiven
return1
if(argv[Optind]==""){#alldone
Optind++
_opti=0
return1
http://www.math.utah.edu/docs/info/gawk_16.html

15/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

}elseif(argv[Optind]!~/^[^:\t\n\f\r\v\b]/){
_opti=0
return1
}

Thefunctionfirstchecksthatitwasindeedcalledwithastringofoptions(theoptionsparameter).If
optionshasazerolength,getoptimmediatelyreturns1.
Thenextthingtocheckforistheendoftheoptions.A`'endsthecommandlineoptions,asdoesany
commandlineargumentthatdoesnotbeginwitha`'.Optindisusedtostepthroughthearrayof
commandlineargumentsitretainsitsvalueacrosscallstogetopt,sinceitisaglobalvariable.
Theregexpused,/^[^:\t\n\f\r\v\b]/,isperhapsabitofoverkillitchecksfora`'followedby
anythingthatisnotwhitespaceandnotacolon.Ifthecurrentcommandlineargumentdoesnotmatch
thispattern,itisnotanoption,anditendsoptionprocessing.
if(_opti==0)
_opti=2
thisopt=substr(argv[Optind],_opti,1)
Optopt=thisopt
i=index(options,thisopt)
if(i==0){
if(Opterr)
printf("%cinvalidoption\n",
thisopt)>"/dev/stderr"
if(_opti>=length(argv[Optind])){
Optind++
_opti=0
}else
_opti++
return"?"
}

The_optivariabletracksthepositioninthecurrentcommandlineargument(argv[Optind]).Inthecase
thatmultipleoptionsweregroupedtogetherwithone`'(e.g.,`abx'),itisnecessarytoreturnthemto
theuseroneatatime.
If_optiisequaltozero,itissettotwo,theindexinthestringofthenextcharactertolookat(weskip
the`',whichisatpositionone).Thevariablethisoptholdsthecharacter,obtainedwithsubstr.Itis
savedinOptoptforthemainprogramtouse.
Ifthisoptisnotintheoptionsstring,thenitisaninvalidoption.IfOpterrisnonzero,getoptprintsan
errormessageonthestandarderrorthatissimilartothemessagefromtheCversionofgetopt.
Sincetheoptionisinvalid,itisnecessarytoskipitandmoveontothenextoptioncharacter.If_optiis
greaterthanorequaltothelengthofthecurrentcommandlineargument,thenitisnecessarytomoveon
tothenextone,soOptindisincrementedand_optiisresettozero.Otherwise,Optindisleftaloneand
_optiismerelyincremented.
Inanycase,sincetheoptionwasinvalid,getoptreturns`?'.ThemainprogramcanexamineOptoptifit
needstoknowwhattheinvalidoptionletteractuallywas.
if(substr(options,i+1,1)==":"){
#getoptionargument
if(length(substr(argv[Optind],_opti+1))>0)
Optarg=substr(argv[Optind],_opti+1)
http://www.math.utah.edu/docs/info/gawk_16.html

16/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

else
Optarg=argv[++Optind]
_opti=0
}else
Optarg=""

Iftheoptionrequiresanargument,theoptionletterisfollowedbyacolonintheoptionsstring.Ifthere
areremainingcharactersinthecurrentcommandlineargument(argv[Optind]),thentherestofthat
stringisassignedtoOptarg.Otherwise,thenextcommandlineargumentisused(`xFOO'vs.`xFOO').
Ineithercase,_optiisresettozero,sincetherearenomorecharacterslefttoexamineinthecurrent
commandlineargument.
if(_opti==0||_opti>=length(argv[Optind])){
Optind++
_opti=0
}else
_opti++
returnthisopt
}

Finally,if_optiiseitherzeroorgreaterthanthelengthofthecurrentcommandlineargument,itmeans
thiselementinargvisthroughbeingprocessed,soOptindisincrementedtopointtothenextelementin
argv.Ifneitherconditionistrue,thenonly_optiisincremented,sothatthenextoptionlettercanbe
processedonthenextcalltogetopt.
BEGIN{
Opterr=1#defaultistodiagnose
Optind=1#skipARGV[0]
#testprogram
if(_getopt_test){
while((_go_c=getopt(ARGC,ARGV,"ab:cd"))!=1)
printf("c=<%c>,optarg=<%s>\n",
_go_c,Optarg)
printf("nonoptionarguments:\n")
for(;Optind<ARGC;Optind++)
printf("\tARGV[%d]=<%s>\n",
Optind,ARGV[Optind])
}
}

TheBEGINruleinitializesbothOpterrandOptindtoone.Opterrissettoone,sincethedefaultbehavioris
forgetopttoprintadiagnosticmessageuponseeinganinvalidoption.Optindissettoone,sincethere's
noreasontolookattheprogramname,whichisinARGV[0].
TherestoftheBEGINruleisasimpletestprogram.Hereistheresultoftwosamplerunsofthetest
program.
$awkfgetopt.awkv_getopt_test=1acbARGbaxx
|c=<a>,optarg=<>
|c=<c>,optarg=<>
|c=<b>,optarg=<ARG>
|nonoptionarguments:
|ARGV[3]=<bax>
|ARGV[4]=<x>
$awkfgetopt.awkv_getopt_test=1axxyzabc
http://www.math.utah.edu/docs/info/gawk_16.html

17/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

|c=<a>,optarg=<>
error>xinvalidoption
|c=<?>,optarg=<>
|nonoptionarguments:
|ARGV[4]=<xyz>
|ARGV[5]=<abc>

Thefirst`'terminatestheargumentstoawk,sothatitdoesnottrytointerpretthe`a'etc.asitsown
options.
SeveralofthesampleprogramspresentedinsectionPracticalawkPrograms,usegetopttoprocesstheir
arguments.

ReadingtheUserDatabase
The`/dev/user'specialfile(seesectionSpecialFileNamesingawk)providesaccesstothecurrentuser's
realandeffectiveuserandgroupidnumbers,andifavailable,theuser'ssupplementarygroupset.
However,sincethesearenumbers,theydonotprovideveryusefulinformationtotheaverageuser.There
needstobesomewaytofindtheuserinformationassociatedwiththeuserandgroupnumbers.This
sectionpresentsasuiteoffunctionsforretrievinginformationfromtheuserdatabase.Seesection
ReadingtheGroupDatabase,forasimilarsuitethatretrievesinformationfromthegroupdatabase.
ThePOSIXstandarddoesnotdefinethefilewhereuserinformationiskept.Instead,itprovidesthe
<pwd.h>headerfileandseveralClanguagesubroutinesforobtaininguserinformation.Theprimary
functionisgetpwent,for"getpasswordentry."The"password"comesfromtheoriginaluserdatabase
file,`/etc/passwd',whichkeptuserinformation,alongwiththeencryptedpasswords(hencethename).
Whileanawkprogramcouldsimplyread`/etc/passwd'directly(theformatiswellknown),becauseof
thewaypasswordfilesarehandledonnetworkedsystems,thisfilemaynotcontaincompleteinformation
aboutthesystem'ssetofusers.
Tobesureofbeingabletoproduceareadable,completeversionoftheuserdatabase,itisnecessaryto
writeasmallCprogramthatcallsgetpwent.getpwentisdefinedtoreturnapointertoastructpasswd.
Eachtimeitiscalled,itreturnsthenextentryinthedatabase.Whentherearenomoreentries,itreturns
NULL,thenullpointer.Whenthishappens,theCprogramshouldcallendpwenttoclosethedatabase.Here
ispwcat,aCprogramthat"cats"thepassworddatabase.
/*
*pwcat.c
*
*Generateaprintableversionofthepassworddatabase
*
*ArnoldRobbins
*arnold@gnu.ai.mit.edu
*May1993
*PublicDomain
*/
#include<stdio.h>
#include<pwd.h>
int
main(argc,argv)
intargc;
http://www.math.utah.edu/docs/info/gawk_16.html

18/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

char**argv;
{
structpasswd*p;
while((p=getpwent())!=NULL)
printf("%s:%s:%d:%d:%s:%s:%s\n",
p>pw_name,p>pw_passwd,p>pw_uid,
p>pw_gid,p>pw_gecos,p>pw_dir,p>pw_shell);
endpwent();
exit(0);
}

Ifyoudon'tunderstandC,don'tworryaboutit.Theoutputfrompwcatistheuserdatabase,inthe
traditional`/etc/passwd'formatofcolonseparatedfields.Thefieldsare:
Loginname
Theuser'sloginname.
Encryptedpassword
Theuser'sencryptedpassword.Thismaynotbeavailableonsomesystems.
UserID
Theuser'snumericuseridnumber.
GroupID
Theuser'snumericgroupidnumber.
Fullname
Theuser'sfullname,andperhapsotherinformationassociatedwiththeuser.
Homedirectory
Theuser'slogin,or"home"directory(familiartoshellprogrammersas$HOME).
Loginshell
Theprogramthatwillberunwhentheuserlogsin.Thisisusuallyashell,suchasBash(theGnu
BourneAgainshell).
Hereareafewlinesrepresentativeofpwcat'soutput.
$pwcat
|root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
|nobody:*:65534:65534::/:
|daemon:*:1:1::/:
|sys:*:2:2::/:/bin/csh
|bin:*:3:3::/bin:
|arnold:xyzzy:2076:10:ArnoldRobbins:/home/arnold:/bin/sh
|miriam:yxaay:112:10:MiriamRobbins:/home/miriam:/bin/sh
...

Withthatintroduction,hereisagroupoffunctionsforgettinguserinformation.Thereareseveral
functionshere,correspondingtotheCfunctionsofthesamename.
#passwd.awkaccesspasswordfileinformation
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May1993
BEGIN{
#tailorthistosuityoursystem
_pw_awklib="/usr/local/libexec/awk/"
}
http://www.math.utah.edu/docs/info/gawk_16.html

19/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

function_pw_init(oldfs,oldrs,olddol0,pwcat)
{
if(_pw_inited)
return
oldfs=FS
oldrs=RS
olddol0=$0
FS=":"
RS="\n"
pwcat=_pw_awklib"pwcat"
while((pwcat|getline)>0){
_pw_byname[$1]=$0
_pw_byuid[$3]=$0
_pw_bycount[++_pw_total]=$0
}
close(pwcat)
_pw_count=0
_pw_inited=1
FS=oldfs
RS=oldrs
$0=olddol0
}

TheBEGINrulesetsaprivatevariabletothedirectorywherepwcatisstored.Sinceitisusedtohelpoutan
awklibraryroutine,wehavechosentoputitin`/usr/local/libexec/awk'.Youmightwantittobeina
differentdirectoryonyoursystem.
Thefunction_pw_initkeepsthreecopiesoftheuserinformationinthreeassociativearrays.Thearrays
areindexedbyusername(_pw_byname),byuseridnumber(_pw_byuid),andbyorderofoccurrence
(_pw_bycount).
Thevariable_pw_initedisusedforefficiency_pw_initonlyneedstobecalledonce.
Sincethisfunctionusesgetlinetoreadinformationfrompwcat,itfirstsavesthevaluesofFS,RS,and$0.
Doingsoisnecessary,sincethesefunctionscouldbecalledfromanywherewithinauser'sprogram,and
theusermayhavehisorherownvaluesforFSandRS.
Themainpartofthefunctionusesalooptoreaddatabaselines,splitthelineintofields,andthenstore
thelineintoeacharrayasnecessary.Whentheloopisdone,_pw_initcleansupbyclosingthepipeline,
setting_pw_initedtoone,andrestoringFS,RS,and$0.Theuseof_pw_countwillbeexplainedbelow.
functiongetpwnam(name)
{
_pw_init()
if(namein_pw_byname)
return_pw_byname[name]
return""
}

Thegetpwnamfunctiontakesausernameasastringargument.Ifthatuserisinthedatabase,itreturnsthe
appropriateline.Otherwiseitreturnsthenullstring.
functiongetpwuid(uid)
{
_pw_init()
if(uidin_pw_byuid)
return_pw_byuid[uid]
http://www.math.utah.edu/docs/info/gawk_16.html

20/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

return""
}

Similarly,thegetpwuidfunctiontakesauseridnumberargument.Ifthatusernumberisinthedatabase,
itreturnstheappropriateline.Otherwiseitreturnsthenullstring.
functiongetpwent()
{
_pw_init()
if(_pw_count<_pw_total)
return_pw_bycount[++_pw_count]
return""
}

Thegetpwentfunctionsimplystepsthroughthedatabase,oneentryatatime.Ituses_pw_counttotrack
itscurrentpositioninthe_pw_bycountarray.
functionendpwent()
{
_pw_count=0
}

Theendpwentfunctionresets_pw_counttozero,sothatsubsequentcallstogetpwentwillstartoveragain.
Aconsciousdesigndecisioninthissuiteisthateachsubroutinecalls_pw_inittoinitializethedatabase
arrays.Theoverheadofrunningaseparateprocesstogeneratetheuserdatabase,andtheI/Otoscanit,
willonlybeincurrediftheuser'smainprogramactuallycallsoneofthesefunctions.Ifthislibraryfileis
loadedalongwithauser'sprogram,butnoneoftheroutinesareevercalled,thenthereisnoextrarun
timeoverhead.(Thealternativewouldbetomovethebodyof_pw_initintoaBEGINrule,whichwould
alwaysrunpwcat.Thissimplifiesthecodebutrunsanextraprocessthatmayneverbeneeded.)
Inturn,calling_pw_initisnottooexpensive,sincethe_pw_initedvariablekeepstheprogramfrom
readingthedatamorethanonce.Ifyouareworriedaboutsqueezingeverylastcycleoutofyourawk
program,thecheckof_pw_initedcouldbemovedoutof_pw_initandduplicatedinalltheother
functions.Inpractice,thisisnotnecessary,sincemostawkprogramsareI/Obound,anditwouldclutter
upthecode.
TheidprograminsectionPrintingOutUserInformation,usesthesefunctions.

ReadingtheGroupDatabase
MuchofthediscussionpresentedinsectionReadingtheUserDatabase,appliestothegroupdatabaseas
well.Althoughtherehastraditionallybeenawellknownfile,`/etc/group',inawellknownformat,the
POSIXstandardonlyprovidesasetofClibraryroutines(<grp.h>andgetgrent)foraccessingthe
information.Eventhoughthisfilemayexist,itlikelydoesnothavecompleteinformation.Therefore,as
withtheuserdatabase,itisnecessarytohaveasmallCprogramthatgeneratesthegroupdatabaseasits
output.
Hereisgrcat,aCprogramthat"cats"thegroupdatabase.
/*
*grcat.c
*
*Generateaprintableversionofthegroupdatabase
http://www.math.utah.edu/docs/info/gawk_16.html

21/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

*
*ArnoldRobbins,arnold@gnu.ai.mit.edu
*May1993
*PublicDomain
*/
#include<stdio.h>
#include<grp.h>
int
main(argc,argv)
intargc;
char**argv;
{
structgroup*g;
inti;
while((g=getgrent())!=NULL){
printf("%s:%s:%d:",g>gr_name,g>gr_passwd,
g>gr_gid);
for(i=0;g>gr_mem[i]!=NULL;i++){
printf("%s",g>gr_mem[i]);
if(g>gr_mem[i+1]!=NULL)
putchar(',');
}
putchar('\n');
}
endgrent();
exit(0);
}

Eachlineinthegroupdatabaserepresentonegroup.Thefieldsareseparatedwithcolons,andrepresent
thefollowinginformation.
GroupName
Thenameofthegroup.
GroupPassword
Theencryptedgrouppassword.Inpractice,thisfieldisneverused.Itisusuallyempty,orsetto
`*'.
GroupIDNumber
Thenumericgroupidnumber.Thisnumbershouldbeuniquewithinthefile.
GroupMemberList
Acommaseparatedlistofusernames.Theseusersaremembersofthegroup.MostUnixsystems
allowuserstobemembersofseveralgroupssimultaneously.Ifyoursystemdoes,thenreading
`/dev/user'willreturnthosegroupidnumbersin$5through$NF.(Notethat`/dev/user'isagawk
extensionseesectionSpecialFileNamesingawk.)
Hereiswhatrunninggrcatmightproduce:
$grcat
|wheel:*:0:arnold
|nogroup:*:65534:
|daemon:*:1:
|kmem:*:2:
|staff:*:10:arnold,miriam,andy
|other:*:20:
...
http://www.math.utah.edu/docs/info/gawk_16.html

22/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

Herearethefunctionsforobtaininginformationfromthegroupdatabase.Thereareseveral,modeled
aftertheClibraryfunctionsofthesamenames.
#group.awkfunctionsfordealingwiththegroupfile
#ArnoldRobbins,arnold@gnu.ai.mit.edu,PublicDomain
#May1993
BEGIN\
{
#Changetosuityoursystem
_gr_awklib="/usr/local/libexec/awk/"
}
function_gr_init(oldfs,oldrs,olddol0,grcat,n,a,i)
{
if(_gr_inited)
return
oldfs=FS
oldrs=RS
olddol0=$0
FS=":"
RS="\n"
grcat=_gr_awklib"grcat"
while((grcat|getline)>0){
if($1in_gr_byname)
_gr_byname[$1]=_gr_byname[$1]","$4
else
_gr_byname[$1]=$0
if($3in_gr_bygid)
_gr_bygid[$3]=_gr_bygid[$3]","$4
else
_gr_bygid[$3]=$0
n=split($4,a,"[\t]*,[\t]*")
for(i=1;i<=n;i++)
if(a[i]in_gr_groupsbyuser)
_gr_groupsbyuser[a[i]]=\
_gr_groupsbyuser[a[i]]""$1
else
_gr_groupsbyuser[a[i]]=$1
_gr_bycount[++_gr_count]=$0
}
close(grcat)
_gr_count=0
_gr_inited++
FS=oldfs
RS=oldrs
$0=olddol0
}

TheBEGINrulesetsaprivatevariabletothedirectorywheregrcatisstored.Sinceitisusedtohelpoutan
awklibraryroutine,wehavechosentoputitin`/usr/local/libexec/awk'.Youmightwantittobeina
differentdirectoryonyoursystem.
Theseroutinesfollowthesamegeneraloutlineastheuserdatabaseroutines(seesectionReadingthe
UserDatabase).The_gr_initedvariableisusedtoensurethatthedatabaseisscannednomorethan
http://www.math.utah.edu/docs/info/gawk_16.html

23/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

once.The_gr_initfunctionfirstsavesFS,RS,and$0,andthensetsFSandRStothecorrectvaluesfor
scanningthegroupinformation.
Thegroupinformationisstoredisseveralassociativearrays.Thearraysareindexedbygroupname
(_gr_byname),bygroupidnumber(_gr_bygid),andbypositioninthedatabase(_gr_bycount).Thereisan
additionalarrayindexedbyusername(_gr_groupsbyuser),thatisaspaceseparatedlistofgroupsthat
eachuserbelongsto.
Unliketheuserdatabase,itispossibletohavemultiplerecordsinthedatabaseforthesamegroup.This
iscommonwhenagrouphasalargenumberofmembers.Suchapairofentriesmightlooklike:
tvpeople:*:101:johny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan

Forthisreason,_gr_initlookstoseeifagroupnameorgroupidnumberhasalreadybeenseen.Ifit
has,thentheusernamesaresimplyconcatenatedontothepreviouslistofusers.(Thereisactuallya
subtleproblemwiththecodepresentedabove.Supposethatthefirsttimetherewerenonames.Thiscode
addsthenameswithaleadingcomma.Italsodoesn'tcheckthatthereisa$4.)
Finally,_gr_initclosesthepipelinetogrcat,restoresFS,RS,and$0,initializes_gr_counttozero(itis
usedlater),andmakes_gr_initednonzero.
functiongetgrnam(group)
{
_gr_init()
if(groupin_gr_byname)
return_gr_byname[group]
return""
}

Thegetgrnamfunctiontakesagroupnameasitsargument,andifthatgroupexists,itisreturned.
Otherwise,getgrnamreturnsthenullstring.
functiongetgrgid(gid)
{
_gr_init()
if(gidin_gr_bygid)
return_gr_bygid[gid]
return""
}

Thegetgrgidfunctionissimilar,ittakesanumericgroupid,andlooksuptheinformationassociated
withthatgroupid.
functiongetgruser(user)
{
_gr_init()
if(userin_gr_groupsbyuser)
return_gr_groupsbyuser[user]
return""
}

ThegetgruserfunctiondoesnothaveaCcounterpart.Ittakesausername,andreturnsthelistofgroups
thathavetheuserasamember.
functiongetgrent()
http://www.math.utah.edu/docs/info/gawk_16.html

24/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

{
_gr_init()
if(++gr_countin_gr_bycount)
return_gr_bycount[_gr_count]
return""
}

Thegetgrentfunctionstepsthroughthedatabaseoneentryatatime.Ituses_gr_counttotrackits
positioninthelist.
functionendgrent()
{
_gr_count=0
}
endgrentresets_gr_counttozerosothatgetgrentcanstartoveragain.

Aswiththeuserdatabaseroutines,eachfunctioncalls_gr_inittoinitializethearrays.Doingsoonly
incurstheextraoverheadofrunninggrcatifthesefunctionsareused(asopposedtomovingthebodyof
_gr_initintoaBEGINrule).
Mostoftheworkisinscanningthedatabaseandbuildingthevariousassociativearrays.Thefunctions
thattheusercallsarethemselvesverysimple,relyingonawk'sassociativearraystodowork.
TheidprograminsectionPrintingOutUserInformation,usesthesefunctions.

NamingLibraryFunctionGlobalVariables
Duetothewaytheawklanguageevolved,variablesareeitherglobal(usablebytheentireprogram),or
local(usablejustbyaspecificfunction).Thereisnointermediatestateanalogoustostaticvariablesin
C.
Libraryfunctionsoftenneedtohaveglobalvariablesthattheycanusetopreservestateinformation
betweencallstothefunction.Forexample,getopt'svariable_opti(seesectionProcessingCommand
LineOptions),andthe_tm_monthsarrayusedbymktime(seesectionTurningDatesIntoTimestamps).
Suchvariablesarecalledprivate,sincetheonlyfunctionsthatneedtousethemaretheonesinthe
library.
Whenwritingalibraryfunction,youshouldtrytochoosenamesforyourprivatevariablessothatthey
willnotconflictwithanyvariablesusedbyeitheranotherlibraryfunctionorauser'smainprogram.For
example,anamelike`i'or`j'isnotagoodchoice,sinceuserprogramsoftenusevariablenameslike
thesefortheirownpurposes.
Theexampleprogramsshowninthischapterallstartthenamesoftheirprivatevariableswithan
underscore(`_').Usersgenerallydon'tuseleadingunderscoresintheirvariablenames,sothis
conventionimmediatelydecreasesthechancesthatthevariablenamewillbeaccidentallysharedwiththe
user'sprogram.
Inaddition,severalofthelibraryfunctionsuseaprefixthathelpsindicatewhatfunctionorsetof
functionsusesthevariables.Forexample,_tm_monthsinmktime(seesectionTurningDatesInto
Timestamps),and_pw_bynameintheuserdatabaseroutines(seesectionReadingtheUserDatabase).
Thisconventionisrecommended,sinceitevenfurtherdecreasesthechanceofinadvertentconflict
amongvariablenames.Notethatthisconventioncanbeusedequallywellbothforvariablenamesand
http://www.math.utah.edu/docs/info/gawk_16.html

25/26

10/5/2016

AWKLanguageProgrammingALibraryofawkFunctions

forprivatefunctionnamestoo.
WhileIcouldhaverewrittenallthelibraryroutinestousethisconvention,Ididnotdoso,inorderto
showhowmyownawkprogrammingstylehasevolved,andtoprovidesomebasisforthisdiscussion.
Asafinalnoteonvariablenaming,ifafunctionmakesglobalvariablesavailableforusebyamain
program,itisagoodconventiontostartthatvariable'snamewithacapitalletter.Forexample,getopt's
OpterrandOptindvariables(seesectionProcessingCommandLineOptions).Theleadingcapitalletter
indicatesthatitisglobal,whilethefactthatthevariablenameisnotallcapitallettersindicatesthatthe
variableisnotoneofawk'sbuiltinvariables,likeFS.
Itisalsoimportantthatallvariablesinlibraryfunctionsthatdonotneedtosavestateareinfactdeclared
local.Ifthisisnotdone,thevariablecouldaccidentallybeusedintheuser'sprogram,leadingtobugs
thatareverydifficulttotrackdown.
functionlib_func(x,y,l1,l2)
{
...
usevariablesome_var#some_varcouldbelocal
...#butisnotbyoversight
}

Adifferentconvention,commonintheTclcommunity,istouseasingleassociativearraytoholdthe
valuesneededbythelibraryfunction(s),or"package."Thissignificantlydecreasesthenumberofactual
globalnamesinuse.Forexample,thefunctionsdescribedinsectionReadingtheUserDatabase,might
haveusedPW_data["inited"],PW_data["total"],PW_data["count"]andPW_data["awklib"],insteadof
_pw_inited,_pw_awklib,_pw_total,and_pw_count.
Theconventionspresentedinthissectionareexactlythat,conventions.Youarenotrequiredtowrite
yourprogramsthisway,wemerelyrecommendthatyoudoso.
Gotothefirst,previous,next,lastsection,tableofcontents.

http://www.math.utah.edu/docs/info/gawk_16.html

26/26

You might also like