Booth Algorithm

OrganizationofComputerSystems: 3:ComputerArithmetic
Instructor:M.S.Schmalz
ReadingAssignmentsandExercises Thissectionisorganizedasfollows: 3.1.ArithmeticandLogicOperations 3.2.ArithmeticLogicUnitsandtheMIPSALU 3.3.BooleanMultiplicationandDivision 3.4.FloatingPointArithmetic 3.5.FloatingPointinMIPS InformationcontainedhereinwascompiledfromavarietyoftextandWebbasedsources,isintendedasateachingaidonly(tobeusedinconjunctionwith therequiredtext,andisnottobeusedforanycommercialpurpose.ParticularthanksisgiventoDr.EnriqueMaflaforhispermissiontouseselected illustrationsfromhiscoursenotesintheseWebpages. Inordertosecureyourunderstandingofthetopicsinthissection,studentsshouldreviewthediscussionofnumberrepresentationinSection2.4,especially twoscomplement.
3.1.ArithmeticandLogicOperations
ReadingAssignmentsandExercises TheALUisthecoreofthecomputeritperformsarithmeticandlogicoperationsondatathatnotonlyrealizethegoalsofvariousapplications(e.g.,scientific andengineeringprograms),butalsomanipulateaddresses(e.g.,pointerarithmetic).Inthissection,wewilloverviewalgorithmsusedforthebasicarithmetic andlogicaloperations.Akeyassumptionisthattwoscomplementrepresentationwillbeemployed,unlessotherwisenoted. 3.1.1.BooleanAddition Whenaddingtwonumbers,ifthesumofthedigitsinagivenpositionequalsorexceedsthemodulus,thenacarry ispropagated.Forexample,inBoolean addition,iftwoonesareadded,thesumisobviouslytwo(base10),whichexceedsthemodulusof2forBooleannumbers( B= Z2={0,1},theintegers modulo2).Thus,werecordazeroforthesumandpropagateacarryvaluedatoneintothenextmoresignificantdigit,asshowninFigure3.1.
Figure3.1. ExampleofBooleanadditionwithcarrypropagation,adaptedfrom[Maf01]. 3.1.2.BooleanSubtraction Whensubtractingtwonumbers,twoalternativespresentthemselves.First,onecanformulateasubtractionalgorithm,whichisdistinctfromaddition.Second, onecannegatethesubtrahend(i.e.,in ab ,thesubtrahendis b)thenperformaddition.Sincewealreadyknowhowtoperformadditionaswellastwos complementnegation,thesecondalternativeismorepractical.Figure3.2illustratesbothprocesses,usingthedecimalsubtraction125=7asanexample.
Whensubtractingtwonumbers,twoalternativespresentthemselves.First,onecanformulateasubtractionalgorithm,whichisdistinctfromaddition.Second, onecannegatethesubtrahend(i.e.,in ab ,thesubtrahendis b)thenperformaddition.Sincewealreadyknowhowtoperformadditionaswellastwos complementnegation,thesecondalternativeismorepractical.Figure3.2illustratesbothprocesses,usingthedecimalsubtraction125=7asanexample.
Figure3.2. ExampleofBooleansubtractionusing(a)unsignedbinaryrepresentation,and(b)additionwithtwoscomplementnegationadaptedfrom [Maf01]. Justaswehaveacarryinaddition,thesubtractionofBooleannumbersusesa borrow .Forexample,inFigure3.2a,inthefirst(leastsignificant)digit position,thedifference01intheone'splaceisrealizedbyborrowingaonefromthetwo'splace(nextmoresignificantdigit).Theborrowispropagated upward(towardthemostsignificantdigit)untilitiszeroed(i.e.,untilweencounteradifferenceof10). 3.1.3.Overflow Overflowoccurswhenthereareinsufficientbitsinabinarynumberrepresentationtoportraytheresultofanarithmeticoperation.Overflowoccursbecause computerarithmeticisnotclosedwithrespecttoaddition,subtraction,multiplication,ordivision.Overflow cannot occurinaddition(subtraction),ifthe operandshavedifferent(resp.identical)signs. Todetectandcompensateforoverflow,oneneedsn+1bitsifannbitnumberrepresentationisemployed.Forexample,in32bitarithmetic,33bitsare requiredtodetectorcompensateforoverflow.Thiscanbeimplementedinaddition(subtraction)bylettingacarry(borrow)occurinto(from)thesignbit.To makeapictorialexampleofconvenientsize,Figure3.3illustratesthefourpossiblesigncombinationsofdifferencing7and6usinganumberrepresentation thatisfourbitslong(i.e.,canrepresentintegersintheinterval[8,7]).
Figure3.3. ExampleofoverflowinBooleanarithmetic,adaptedfrom[Maf01]. 3.1.4.MIPSOverflowHandling MIPSraisesan exception whenoverflowoccurs.Exceptions(orinterrupts)actlikeprocedurecalls.Theregister $epc storestheaddressoftheinstruction thatcaused theinterrupt,andtheinstruction
mfc register ,$epc
movesthecontentsof $epc to register .Forexample, register couldbe $t1 .Thisisanefficientapproach,sincenoconditionalbranchisneededtotestfor overflow. Two'scomplementarithmeticoperations( add , addi ,and sub instructions)raiseexceptionsonoverflow.Incontrast,unsignedarithmetic( addu and addiu ) instructionsdonotraiseanexceptiononoverflow,sincetheyareusedforarithmeticoperationsonaddresses(recallourdiscussionofpointerarithmeticin Section2.6).Intermsofhighlevellanguages,Cignoresoverflows(alwaysuses addu , addiu ,and subu ),whileFORTRANusestheappropriateinstruction todetectoverflow.Figure3.4illustratestheuseofconditionalbranchonoverflowforsignedandunsignedadditionoperations.
instructionsdonotraiseanexceptiononoverflow,sincetheyareusedforarithmeticoperationsonaddresses(recallourdiscussionofpointerarithmeticin Section2.6).Intermsofhighlevellanguages,Cignoresoverflows(alwaysuses addu , addiu ,and subu ),whileFORTRANusestheappropriateinstruction todetectoverflow.Figure3.4illustratestheuseofconditionalbranchonoverflowforsignedandunsignedadditionoperations.
Figure3.4. ExampleofoverflowinBooleanarithmetic,adaptedfrom[Maf01]. 3.1.5.LogicalOperations Logicaloperationsapplytofieldsofbitswithina32bitword,suchasbytesorbitfields(inC,asdiscussedinthenextparagraph).Theseoperationsinclude shiftleftandshiftrightoperations( sll and srl ),aswellasbitwise and , or(and , andi , or , ori ).AswesawinSection2,bitwiseoperationstreatan operandasavectorofbitsandoperateoneachbitposition. Cbitfieldsareused,forexample,inprogrammingcommunicationshardware,wheremanipulationofabitstreamisrequired.InFigure3.5ispresentedC codeforanexamplecommunicationsroutine,whereastructurecalled receiver isformedfroman8bitfieldcalled receivedByte andtwoonebitfields calledready and enable .TheCroutinesets receiver.ready to0and receiver.enable to1.
Figure3.5. ExampleofCbitfielduseinMIPS,adaptedfrom[Maf01]. NotehowtheMIPScodeimplementsthefunctionalityoftheCcode,wherethestateoftheregisters$s0and$s1isillustratedinthefivelinesofdiagrammed registercontentsbelowthecode.Inparticular,theinitialregisterstateisshowninthefirsttwolines.The sll instructionloadsthecontentsof $s1 (the receiver)into $s0 (thedataregister),andtheresultofthisisshownonthesecondlineoftheregistercontents.Next,thesrl instructionleftshifts$s0 24 bits,therebydiscardingthe enable and ready fieldinformation,leavingjustthereceivedbyte.Tosignalthereceiverthatthedatatransferiscompleted,the andi and ori instructionsareusedtosettheenableandreadybitsin $s1 ,whichcorrespondstothe receiver .Thedatain $s0 hasalreadybeenreceived andputinaregister,sothereisnoneedforitsfurthermanipulation.
3.2.ArithmeticLogicUnitsandtheMIPSALU
ReadingAssignmentsandExercises Inthissection,wediscusshardwarebuildingblocks,ALUdesignandimplementation,aswellasthedesignofa1bitALUanda32bitALU.Wethen overviewtheimplementationoftheMIPSALU. 3.2.1.BasicConceptsofALUDesign ALUsareimplementedusinglowerlevelcomponentssuchaslogicgates,including and , or, not gatesandmultiplexers.Thesebuildingblocksworkwith individualbits,buttheactualALUworkswith32bitregisterstoperformavarietyoftaskssuchasarithmeticandshiftoperations. Inprinciple,anALUisbuiltfrom32separate1bitALUs.Typically,oneconstructsseparatehardwareblocksforeachtask(e.g.,arithmeticandlogical operations),whereeachoperationisappliedtothe32bitregistersinparallel,andtheselectionofanoperationiscontrolledbyamultiplexer.Theadvantage ofthisapproachisthatitiseasytoaddnewoperationstotheinstructionset,simplybyassociatinganoperationwithamultiplexercontrolcode.Thiscanbe doneprovidedthatthemuxhassufficientcapacity.Otherwise,newdatalinesmustbeaddedtothemux(es),andtheCPUmustbemodifiedtoaccomodate thesechanges. 3.2.2.1bitALUDesign
Inprinciple,anALUisbuiltfrom32separate1bitALUs.Typically,oneconstructsseparatehardwareblocksforeachtask(e.g.,arithmeticandlogical operations),whereeachoperationisappliedtothe32bitregistersinparallel,andtheselectionofanoperationiscontrolledbyamultiplexer.Theadvantage ofthisapproachisthatitiseasytoaddnewoperationstotheinstructionset,simplybyassociatinganoperationwithamultiplexercontrolcode.Thiscanbe doneprovidedthatthemuxhassufficientcapacity.Otherwise,newdatalinesmustbeaddedtothemux(es),andtheCPUmustbemodifiedtoaccomodate thesechanges. 3.2.2.1bitALUDesign Asaresult,theALUconsistsof32muxes(oneforeachoutputbit)arrangedinparalleltosendoutputbitsfromeachoperationtotheALUoutput. 3.2.2.1.And/OrOperations. AsshowninFigure3.6,asimple(1bit)ALUoperates inparallel ,producingallpossibleresultsthatarethenselectedbythe multiplexer(representedbyanovalshapeattheoutputofthe and / orgates.TheoutputCisthusselectedbythemultiplexer.(Note :Ifthemultiplexerwere tobeappliedattheinput(s)ratherthantheoutput,twicetheamountofhardwarewouldberequired,becausetherearetwoinputsversusoneoutput.)
Figure3.6. Exampleofasimple1bitALU,wheretheovalrepresentsamultiplexerwithacontrolcodedenotedby OpandanoutputdenotedbyC adaptedfrom[Maf01]. 3.2.2.2.FullAdder. Nowletusconsidertheonebitadder.RecallingthecarrysituationshowninFigure3.1,weshowinFigure3.7thattherearetwotypes ofcarriescarryin (occursattheinput)and carryout (attheoutput).
Figure3.7. CarryinandcarryoutinBooleanaddition,adaptedfrom[Maf01]. Here,eachbitofadditionhasthreeinputbits(A i,Bi,andCarryIn i),aswellastwooutputbits(Sum i,CarryOut i),whereCarryIn i+1=CarryOut i.(Note:The "i"subscriptdenotestheithbit.)Thisrelationshipcanbeseenwhenconsideringthefulladder'struthtable,shownbelow:
Giventhefouronevaluedresultsinthetruthtable,wecanusethesumofproductsmethodtoconstructaonebitaddercircuitfromfourthreeinputand gatesandonefourinputorgate,asshowninFigure3.8a.TheCarryOutcalculationcanbesimilarlyimplementedwiththreetwoinputand gatesandone threeinputorgate,asshowninFigure3.8b.Thesetwocircuitscanbecombinedtoeffectaonebitfulladderwithcarry,asshowninFigure3.8c.
(a)(b)
Giventhefouronevaluedresultsinthetruthtable,wecanusethesumofproductsmethodtoconstructaonebitaddercircuitfromfourthreeinputand gatesandonefourinputorgate,asshowninFigure3.8a.TheCarryOutcalculationcanbesimilarlyimplementedwiththreetwoinputand gatesandone threeinputorgate,asshowninFigure3.8b.Thesetwocircuitscanbecombinedtoeffectaonebitfulladderwithcarry,asshowninFigure3.8c.
(a)(b)
(c) Figure3.7. Fulladdercircuit(a)sumofproductsformfromabovelistedtruthtable,(b)CarryOutproduction,and(c)onebitfulladderwithcarry adaptedfrom[Maf01]. Recallingthesymbolfortheonebitadder,wecanaddanadditionoperationtotheonebitALUshowninFigure3.6.Thisisdonebyputtingtwocontrol linesontheoutputmux,andbyhavinganadditionalcontrollinethatinvertsthe binput(shownas"Binvert")inFigure3.9).
(a)(b) Figure3.9. OnebitALUwiththreeoperations: and , or,andaddition:(a)Leastsignificantbit,(b)Remainingbitsadaptedfrom[Maf01]. 3.2.3.32bitALUDesign Thefinalimplementationoftheprecedingtechniqueisina32bitALUthatincorporatesthe and , or,andadditionoperations.The32bitALUcanbesimply constructedfromtheonebitALUbychainingthecarrybits,suchthatCarryIn i+1=CarryOut i,asshowninFigure3.10.
Thefinalimplementationoftheprecedingtechniqueisina32bitALUthatincorporatesthe and , or,andadditionoperations.The32bitALUcanbesimply constructedfromtheonebitALUbychainingthecarrybits,suchthatCarryIn i+1=CarryOut i,asshowninFigure3.10.
Figure3.10. 32bitALUwiththreeoperations: and , or,andadditionadaptedfrom[Maf01]. ThisyieldsacompositeALUwithtwo32bitinputvectorsaand b,whoseithbitisdenotedby aiand bi,wherei=0..31.Theresultisalsoa32bitvector, andtherearetwocontrolbusesoneforBinvert,andoneforselectingtheoperation(usingthemuxshowninFigure3.9).ThereisoneCarryOutbit(atthe bottomofFigure3.10),andnoCarryIn. WenextexaminetheMIPSALUandhowitsupportsoperationssuchasshiftingandbranching. 3.2.4.MIPSALUDesign WebeginbyassumingthatwehavethegenericonebitALUdesignedinSections3.2.13.2.3,andshownbelow:
Here,the Bnegate inputisthesameasthe Binvert inputinFigure3.9,andweassumethatwehavethreecontrolinputstothemuxwhosecontrolline configurationisassociatedwithanoperation,asfollows:
3.2.4.1.Supportforthe slt Instruction. Theslt instruction(setonlessthan)hasthefollowingformat:

sltrd,rs,rt
whererd=1ifrs<rt,andrd=0otherwise. ObservethattheinputsrsandrtcanrepresenthighlevellanguageinputvariablesAandB.Thus,wehavethefollowingimplication: A<B=>AB<0, whichisimplementedasfollows: Step1. Performsubtractionusingnegationandafulladder Step2. Checkmostsignificantbit(signbit) Step3. SignbittellsuswhetherornotA<B Toimplement slt ,weneed(a)newinputlinecalled Less thatgoesdirectlytothemux,and(b)anewcontrolcode(111)toselectthe slt operation. Unfortunately,theresultfor slt cannotbetakendirectlyastheoutputfromtheadder.Instead,weneedanewoutputlinecalledSet thatisusedonlyforthe slt instruction.Overflowdetectionlogicisalsoassociatedwiththisbit.Theadditionallogicthatsupports slt isshowninFigure3.11.
Step2. Checkmostsignificantbit(signbit) Step3. SignbittellsuswhetherornotA<B Toimplement slt ,weneed(a)newinputlinecalled Less thatgoesdirectlytothemux,and(b)anewcontrolcode(111)toselectthe slt operation. Unfortunately,theresultfor slt cannotbetakendirectlyastheoutputfromtheadder.Instead,weneedanewoutputlinecalledSet thatisusedonlyforthe slt instruction.Overflowdetectionlogicisalsoassociatedwiththisbit.Theadditionallogicthatsupports slt isshowninFigure3.11.
Figure3.11. OnebitALUwithadditionallogicfor slt operationadaptedfrom[Maf01]. Thus,fora32bitALU,theadditionalcostofthe slt instructionis(a)augmentationofeachof32muxestohavethreecontrollinesinsteadoftwo,(b) augmentationofeachof32onebitALU'scontrolsignalstructuretohaveanadditional( Less )input,and(c)theadditionofoverflowdetectioncircuitry,a Set output,andan xor gateontheoutputofthesignbit. 3.2.4.2.Supportforthe bne Instruction. Recallthebranchonnotequalinstructionbner1,r2,Label ,wherer1andr2denoteregistersandLabelis abranchtargetlabeloraddress.Toimplement bne ,weobservethatthefollowingimplicationholds: AB=0=>A=B. thenaddhardwaretotestifthecomparisonbetweenAandBimplementedas(AB)iszero.Again,thiscanbedoneusingnegationandthefulladderthat wehavealreadydesignedaspartoftheALU.Theadditionalstepisto orall32resultsfromeachoftheonebitALUs,theninverttheoutputofthe or operation.Thus,ifall32bitsfromtheonebitfulladdersarezero,thentheoutputofthe orgatewillbezero(inverted,itwillbeone).Otherwise,theoutput ofthe orgatewilbeone(inverted,itwillbezero).WealsoneedtoconsiderAB,toseeifthereisoverflowwhenA=0.Ablockdiagramofthehardware modificationisshowninFigure3.12.
Figure3.12. 32bitALUwithadditionallogictosupport bne and slt instructionsadaptedfrom[Maf01]. Here,theadditionalhardwareinvolves32separateoutputlinesfromthe342onebitadders,aswellasacascadeof orgatestoimplementa32inputnor gate(whichdoesn'texistinpractice,duetoexcessivefaninrequirement). 3.2.4.3.SupportforShiftInstructions. Consideringthesll , srl ,and sra instructions,thesearesupportedintheALUunderdesignbyaddingadataline fortheshifter(bothleftandright).However,theshiftersaremuchmoreeasilyimplementedatthetransistorlevel(e.g.,outsidetheALU)ratherthantryingto fitmorecircuitryontotheALUitself.
Here,theadditionalhardwareinvolves32separateoutputlinesfromthe342onebitadders,aswellasacascadeof orgatestoimplementa32inputnor gate(whichdoesn'texistinpractice,duetoexcessivefaninrequirement). 3.2.4.3.SupportforShiftInstructions. Consideringthesll , srl ,and sra instructions,thesearesupportedintheALUunderdesignbyaddingadataline fortheshifter(bothleftandright).However,theshiftersaremuchmoreeasilyimplementedatthetransistorlevel(e.g.,outsidetheALU)ratherthantryingto fitmorecircuitryontotheALUitself. InordertoimplementashifterexternaltotheALU,weconsiderthedesignofa barrelshifter ,shownschematicallyinFigure3.13.Here,theclosedsiwtch pattern,denotedbyblackfilledcircles,iscontrolledbytheCPUthroughcontrollinestoamuxordecoder.Thisallowsdatalinex itobesenttooutputxj, whereiandjcanbeunequal.
Figure3.13. Fourbitbarrelshifter,where"x>>1"denotesashiftamountgreaterthanoneadaptedfrom[Maf01]. ThistypeofNbitshifteriswellunderstoodandeasytoconstruct,buthasspacecomplexityofO(N 2). 3.2.4.4.SupportforImmediateInstructions. IntheMIPSimmediateinstructionformats,thefirstinputtotheALUisthefirstregister(we'llcallitrs)inthe immediatecommand,whilethesecondinputiseitherdatafromaregister rt orazeroorsignextendedconstant(immediate).Tosupportthistypeof instruction,weneedtoaddamuxatthesecondinputoftheALU,asshowninFigure3.14.Thisallowsustoselectwhether rt orthesignextended immediateisinputtotheALU.
Figure3.14. SupportingimmediateinstructionsonaMIPSALUdesign,whereIRdenotestheinstructionregister,and(/ 16)denotesa16bitparallelbus adaptedfrom[Maf01]. 3.2.5.ALUPerformanceIssues WhenestimatingormeasuringALUperformance,onewondersifa32bitALUisasfastasa1bitALUwhatisthedegreeofparallelism,anddoall operationsexecuteinparallel?Inpractice,someoperationsonNbitoperands(e.g.,additionwithsequentialpropagationofcarries)take O(N)time.Other operations,suchasbitwiselogicaloperations,take O(1)time.Sinceadditioncanbeimplementedinavarietyofways,eachwithacertainlevelofparallelism, itiswisetoconsiderthepossibilityofafulladderbeingacomputationalbottleneckinasimpleALU. Wepreviouslydiscussedtheripplecarryadder(Figure3.10)thatpropagatesthecarrybitfromstageitostagei+1.Itisreadilyseenthat,foranNbitinput, O(N)timeisrequiredtopropagatethecarrytothemostsignificantbit.Incontrast,thefastestNbitadderuses O(log2N)stagesinatreestructured configurationwithN1onebitadders.Thus,thecomplexityofthistechniqueis O(log2N)work.Inasequentialmodelofcomputation,thistranslatesto O (log2N)time.Ifoneisaddingsmallernumbers(e.g.,upto10bitintegerswithcurrentmemorytechnology),thena lookuptable canbeusedthat(1)formsa memoryaddressAbyconcatenatingbinaryrepresentationsofthetwooperands,and(2)producesaresultstoredinmemorythatisaccessedusingA.This takes O(1)time,thatisdependentuponmemorybandwidth. Anintermediateapproachbetweentheseextremesistousea carrylookaheadadder (CLA).Supposewedonotknowthevalueofthecarryinbit(which isusuallythecase).Wecanexpressthegeneration(g)ofacarrybitfortheithpositionoftwooperands aand b,asfollows: gi=a ib i,
memoryaddressAbyconcatenatingbinaryrepresentationsofthetwooperands,and(2)producesaresultstoredinmemorythatisaccessedusingA.This takes O(1)time,thatisdependentuponmemorybandwidth. Anintermediateapproachbetweentheseextremesistousea carrylookaheadadder (CLA).Supposewedonotknowthevalueofthecarryinbit(which isusuallythecase).Wecanexpressthegeneration(g)ofacarrybitfortheithpositionoftwooperands aand b,asfollows: gi=a ib i, wheretheithbitsofaand bare and ed.Similarly,thepropagatedcarryisexpressedas: pi=a i+b i, wheretheithbitsofaand bare ored.Thisallowsustorecursivelyexpressthecarrybitsintermsofthecarryinc0,asfollows:
Didwegetridoftheripple?(Well,sortof...)Whatwedidwastransformtheworkinvolvedincarrypropagationfromtheaddercircuitrytoalargeequation forcN .However,thisequationmuststillbecomputedinhardware.( Lesson :Incomputing,youdon'tgetmuchforfree.) Unfortunately,itisprohibitivelycostlytobuildaCLAcircuitforoperandsaslargeas16bits.Instead,wecanusetheCLAprincipletocreateatwotiered circuit,forexample,atthebottomlevelanarrayoffour4bitfulladders(economicaltoconstruct),connectedatthetoplevelbyaCLA,asshownbelow:
UsingatwolevelCLAarchitecture,wherelower(upper)casegandpdenotethefirst(second)levelgeneratesandcarries,wehavethefollowing equations: P0=p 3+p 2+p 1+p 0 P1=p 7+p 6+p 5+p 4 P2=p 11+p 10+p 9+p 8 P3=p 15+p 14+p 13+p 12 G0=g 3+p 3g2+p 3p2g1+p 3p2p1g0 G1=g 7+p 7g6+p 7p6g5+p 7p6p5g4 G2=g 11+p 11g10+p 11p10g9+p 11p10p9g8 G3=g 15+p 15g14+p 15p14g13+p 15p14p13g12
G0=g 3+p 3g2+p 3p2g1+p 3p2p1g0 G1=g 7+p 7g6+p 7p6g5+p 7p6p5g4 G2=g 11+p 11g10+p 11p10g9+p 11p10p9g8 G3=g 15+p 15g14+p 15p14g13+p 15p14p13g12 Assumingthatand aswellas orgateshavethesamepropagationdelay,comparativeanalysisoftheripplecarryvs.carrylookaheadaddersrevealsthatthe totaltimetocomputeaCLAresultisthesummationofallgatedelaysalongthelongestpaththroughtheCLA.Inthecaseofthe16bitadderexemplified above,theCarryOutsignalsc 16andC 4definethelongestpath.Fortheripplecarryadder,thispathhaslength2(16)=32. ForthetwolevelCLA,wegettwolevelsoflogicintermsofthearchitecture(PandGversuspandg).P iisspecifiedinoneleveloflogicusingp i.Giis specifiedinoneleveloflogicusingpiandg i.Also,p iandg ieachrepresentoneleveloflogiccomputedintermsofinputsa iandb i.Thus,theCLAcritical pathlengthis2+2+1=5,whichmeansthattwolevel16bitCLAis6.4=32/5timesfasterthana16bitripplecarryadder. Itisalsousefultonotethatthelogicequationforaonebitaddercanbeexpressedmoresimplywith xor logic,forexample: A+B=A xor Bxor CarryIn. Insometechnologies, xor ismoreefficientthanand /orgates.Also,processorsarenowdesignedinCMOStechnology,whichallowsfewermuxes(thisalso appliestothebarrelshifter).However,thedesignprinciplesaresimilar. 3.2.6.Summary WehaveshownthatitisfeasibletobuildanALUtosupporttheMIPSISA.Thekeyideaistouseamultiplexertoselecttheoutputfromacollectionof functionalunitsoperatinginparallel.Wecanreplicatea1bitALUthatusesthisprinciple,withappropriateconnectionsbetweenreplicates,toproducean NbitALU. ImportantthingstorememberaboutALUsare:(a)allofthegatesareworkinginparallel,(b)thespeedofagateisaffectedbythenumberofinputs(degree off an in),and(c)thespeedofacircuitdependsonthenumberofgatesinthelongestcomputationalpaththroughthecircuit(thiscanvaryperoperation). Finally,wehaveshownthatchangesinarchitecturalorganizationcanimproveperformance,similartobetteralgorithmsinsoftware.
3.3.BooleanMultiplicationandDivision
ReadingAssignmentsandExercises Multiplicationismorecomplicatedthanaddition,beingimplementedbyshiftingaswellasaddition.Becauseofthepartialproductsinvolvedinmost multiplicationalgorithms,moretimeandmorecircuitareaisrequiredtocompute,allocate,andsumthepartialproductstoobtainthemultiplicationresult. 3.3.1.MultiplierDesign Wehereindiscussthreeversionsofthemultiplierdesignbasedonthepencilandpaperalgorithmformultiplication thatwealllearnedingradeschool, whichoperatesonBooleannumbers,asfollows:
Multiplicand:0010#Storedinregisterr1 Multiplier:x1101#Storedinregisterr2 PartialProd0010#NoshiftforLSBofMultiplier ""0000#1bitshiftofzeroes (canomit) ""0010#2bitshiftforbit2ofMultiplier ""0010#3bitshiftforbit3ofMultiplier #Zerofillthepartialproductsandadd PRODUCT0011010#Sumofallpartialproducts >r3
Aflowchartofthisalgorithm,adaptedformultiplicationof32bitnumbers,isshowninFigure3.15,below,togetherwithaschematicrepresentationofa simpleALUcircuitthatimplementsthisversionofthealgorithm.Here,themultiplierandthemultiplicandareshiftedrelativetoeachother,whichismore efficientthanshiftingthepartialproductsalone.
Aflowchartofthisalgorithm,adaptedformultiplicationof32bitnumbers,isshowninFigure3.15,below,togetherwithaschematicrepresentationofa simpleALUcircuitthatimplementsthisversionofthealgorithm.Here,themultiplierandthemultiplicandareshiftedrelativetoeachother,whichismore efficientthanshiftingthepartialproductsalone.
(a)
(b) Figure3.15. Pencilandpapermultiplicationof32bitBooleannumberrepresentations:(a)algorithm,and(b)simpleALUcircuitryadaptedfrom[Maf01]. ThesecondversionofthisalgorithmisshowninFigure3.16.Here,theproductisshiftedwithrespecttothemultiplier,andthemultiplicandisshiftedafterthe productregisterhasbeenshifted.A64bitregisterisusedtostoreboththemultiplicandandtheproduct.
ThesecondversionofthisalgorithmisshowninFigure3.16.Here,theproductisshiftedwithrespecttothemultiplier,andthemultiplicandisshiftedafterthe productregisterhasbeenshifted.A64bitregisterisusedtostoreboththemultiplicandandtheproduct.
(a)
(b) Figure3.16. Secondversionofpencilandpapermultiplicationof32bitBooleannumberrepresentations:(a)algorithm,and(b)schematicdiagramofALU circuitryadaptedfrom[Maf01]. Thefinalversionputsresultsintheproductregisterifandonlyiftheleastsignificantbitoftheproductproducedonthepreviousiterationisonevalued.The productregisteronlyisshifted.Thisreducesbyapproximately50percenttheamountofshiftingthathastobedone,whichreducestimeandhardware requirements.ThealgorithmandALUschematicdiagramisshowninFigure3.17.
Thefinalversionputsresultsintheproductregisterifandonlyiftheleastsignificantbitoftheproductproducedonthepreviousiterationisonevalued.The productregisteronlyisshifted.Thisreducesbyapproximately50percenttheamountofshiftingthathastobedone,whichreducestimeandhardware requirements.ThealgorithmandALUschematicdiagramisshowninFigure3.17.
(a)
(b) Figure3.17. Thirdversionofpencilandpapermultiplicationof32bitBooleannumberrepresentations:(a)algorithm,and(b)schematicdiagramofALU circuitryadaptedfrom[Maf01]. Thus,wehavethefollowingshiftandaddschemeformultiplication:
Theprecedingalgorithmsandcircuitrydoesnotholdforsignedmultiplication,sincethebitsofthemultipliernolongercorrespondtoshiftsofthemultiplicand. Thefollowingexampleisillustrative:
Theprecedingalgorithmsandcircuitrydoesnotholdforsignedmultiplication,sincethebitsofthemultipliernolongercorrespondtoshiftsofthemultiplicand. Thefollowingexampleisillustrative:
AsolutiontothisproblemisBooth'sAlgorithm,whoseflowchartandcorrespondingschematichardwarediagramareshowninFigure3.18.Here,the examinationofthemultiplierisperformedwith lookahead towardthenextbit.Dependingonthebitconfiguration,themultiplicandispositivelyornegatively signed,andthemultiplierisshiftedorunshifted.
(a)
(b) Figure3.18. Booth'sprocedureformultiplicationof32bitBooleannumberrepresentations:(a)algorithm,and(b)schematicdiagramofALUcircuitry adaptedfrom[Maf01]. ObservethatBooth'salgorithmrequiresonlytheadditionofasubtractionstepandthecomparisonoperationsforthetwobitcodes,versustheonebit comparisonintheprecedingthreealgorithms.AnexampleofBooth'salgorithmfollows:
ObservethatBooth'salgorithmrequiresonlytheadditionofasubtractionstepandthecomparisonoperationsforthetwobitcodes,versustheonebit comparisonintheprecedingthreealgorithms.AnexampleofBooth'salgorithmfollows:
HereN=4iterationsofthelooparerequiredtoproduceaproductfromtwoN=4digitoperands.Fourshiftsandtwosubtractionsarerequired.Fromthe analysisofthealgorithmshowninFigure3.18a,itiseasilyseenthatthemaximumworkformultiplyingtwoNbitnumbersisgivenbyO(N)shiftandaddition operations.Fromthis,theworstcasecomputationtimecanbecomputedgivenCPIfortheshiftandadditioninstructions,aswellascycletimeoftheALU. 3.3.2.DesignofArithmeticDivisionHardware Divisionisasimilaroperationtomultiplication,especiallywhenimplementedusingaproceduresimilartothealgorithmshowninFigure3.18a.Forexample, considerthepencilandpapermethodfordividingthebyte10010011bythenybble1011:
Thegoverningequationisasfollows: Dividend=QuotientDivisor+Remainder. 3.3.2.1.UnsignedDivision. The unsigned divisionalgorithmthatissimilartoBooth'salgorithmisshowninFigure3.19a,withanexampleshowninFigure 3.19b.TheALUschematicdiagramingiveninFigure3.19c.TheanalysisofthealgorithmandcircuitisverysimilartotheprecedingdiscussionofBooth's algorithm.
(a)
(a)
(b)
(c) Figure3.19. Divisionof32bitBooleannumberrepresentations:(a)algorithm,(b)exampleusingdivisionoftheunsignedinteger7bytheunsignedinteger3, and(c)schematicdiagramofALUcircuitryadaptedfrom[Maf01]. 3.3.2.2.SignedDivisiion. Withsigneddivision,wenegatethequotientifthesignsofthedivisoranddividenddisagree.Theremainderandthedividentmust havethesamesigns.Thegoverningequationisasfollows: Remainder=Divident(QuotientDivisor), andthefollowingfourcasesapply:
Wepresenttheprecedingdivisionalgorithm,revisedforsignednumbers,asshowninFigure3.20a.Fourexamples,correspondingtoeachofthefour precedingsignpermutations,aregiveninFigure3.20band3.20c.
Wepresenttheprecedingdivisionalgorithm,revisedforsignednumbers,asshowninFigure3.20a.Fourexamples,correspondingtoeachofthefour precedingsignpermutations,aregiveninFigure3.20band3.20c.
(a)
(b)
(c) Figure3.20. Divisionof32bitBooleannumberrepresentations:(a)algorithm,and(b,c)examplesusingdivisionof+7or7bytheinteger+3or3 adaptedfrom[Maf01]. SelfExercise. BeabletotraceeachexampleshowninFigure3.20b,cthroughthealgorithmwhoseflowchartisgiveninFigure3.20a.Know howeachpartofthealgorithmworks,andwhyitbehavesthatway. Hint:Thisexercise,orapartofit,islikelytobeanexamquestion. 3.3.2.3.DivisiioninMIPS. MIPSsupportsmultiplicationanddivisionusingexistinghardware,primarilytheALUandshifter.MIPSneedsoneextra hardwarecomponenta64bitregisterabletosupport sll and sra instructions.Theupper(high)32bitsoftheregistercontainstheremainderresulting fromdivision.ThisismovedintoaregisterintheMIPSregisterstack(e.g., $t0 )bythe mfhi command.Thelower32bitsofthe64bitregistercontainsthe quotientresultingfromdivision.ThisismovedintoaregisterintheMIPSregisterstackbythe mflo command. InMIPSassemblylanguagecode,signeddivisionissupportedbythe div instructionandunsigneddivision,bythe divu instruction.MIPShardwaredoes notcheckfordivisionbyzero. Thus,dividebyzeroexceptionmustbedetectedandhandledinsystemsoftware .Asimilarcommentholdsforoverflow orunderflowresultingfromdivision.
hardwarecomponenta64bitregisterabletosupport sll and sra instructions.Theupper(high)32bitsoftheregistercontainstheremainderresulting fromdivision.ThisismovedintoaregisterintheMIPSregisterstack(e.g., $t0 )bythe mfhi command.Thelower32bitsofthe64bitregistercontainsthe quotientresultingfromdivision.ThisismovedintoaregisterintheMIPSregisterstackbythe mflo command. InMIPSassemblylanguagecode,signeddivisionissupportedbythe div instructionandunsigneddivision,bythe divu instruction.MIPShardwaredoes notcheckfordivisionbyzero. Thus,dividebyzeroexceptionmustbedetectedandhandledinsystemsoftware .Asimilarcommentholdsforoverflow orunderflowresultingfromdivision. Figure3.21illustratestheMIPSALUthatsupportsintegerarithmeticoperations(+,,x,/).
Figure3.21. MIPSALUsupportingtheintegerarithmeticoperations(+,,x,/),adaptedfrom[Maf01]. SelfExercise. ShowhowtheMIPSALUinFigure3.21supportstheintegerarithmeticoperations(+,,x,/)usingthealgorithmsandhardware diagramsgiventhusfar.Hint:Thisexercise,orapartofit,islikelytobeanexamquestion.
3.4.FloatingPointArithmetic
ReadingAssignmentsandExercises Floatingpoint(FP)representationsofdecimalnumbersareessentialtoscientificcomputationusing scientificnotation .Thestandardforfloatingpoint representationistheIEEE754Standard.Inacomputer,thereisatradeoffbetweenrangeandprecisiongivenafixednumberofbinarydigits(bits), precisioncanvaryinverselywithrange.Inthissection,weoverviewdecimaltoFPconversion,MIPSFPinstructions,andhowregistersareusedforFP computations. Wehaveseenthatannbitregistercanrepresentunsignedintegersintherange0to2 n 1,aswellassignedintegersintherange2n1to2n11.However, thereareverylargenumbers(e.g.,3.15576 10 23),verysmallnumbers(e.g.,10 25),rationalnumberswithrepeateddigits(e.g.,2/3=0.666666...), irrationalssuchas2 1/2,andtranscendentalnumberssuchase=2.718...,allofwhichneedtoberepresentedincomputersforscientificcomputationtobe supported. Wecallthemanipulationofthesetypesofnumbersfloatingpointarithmetic becausethedecimalpointisnotfixed(asforintegers).InC,suchvariablesare declaredasthe float datatype. 3.4.1.ScientificNotationandFPRepresentation Scientificnotationhasthefollowingconfiguration:
andcanbein normalizedform (mantissahasexactlyonedigittotheleftofthedecimalpoint,e.g.,2.342510 19)or nonnormalizedform .Binaryscientiic notationhasthefolowingconfiguration,whichcorrespondstothedecimalforms:
Assumethatwehavethefollowing normalformat forscientificnotationinBooleannumbers: +1.xxxxxxx2wyyyyy 2,
Assumethatwehavethefollowing normalformat forscientificnotationinBooleannumbers: +1.xxxxxxx2wyyyyy 2, where"xxxxxxx"denotesthe significand and"yyyyy"denotesthe exponent andweassumethatthenumberhassignS.Thisimpliesthefollowing32bit representationforFPnumbers:
whichcanrepresentdecimalnumbersrangingfrom2.010 38to2.010 38. 3.4.2OverflowandUnderflow InFP,overflowandunderflowareslightlydifferentthaninintegernumbers.FPoverflow(underflow)referstothepositive(negative)exponentbeingtoo largeforthenumberofbitsallotedtoit.Thisproblemcanbesomewhatamelioratedbytheuseof doubleprecision ,whoseformatisshownasfollows:
Here,two32bitwordsarecombinedtosupportan11bitsignedexponentanda52bitsignificand.ThisrepresentationisdeclaredinCusingthe double datatype,andcansupportnumberswithexponentsrangingfrom308 10to308 10.Theprimaryadvantageisgreaterprecisioninthemantissa. ThefollowingchartillustratesspecifictypesofoverflowandunderflowencounteredinstandardFPrepresentation:
3.4.3.IEEE754Standard BothsingleanddoubleprecisionFPrepresentationsaresupportedbytheIEEE754Standard,whichisusedinthevastmajorityofcomputerssinceits publicationin1980.IEEE754facilitatestheportingofFPprograms,andensuresminimumstandardsofqualityforFPcomputerarithmetic.Theresultisa signedrepresentationthesignbitis1iftheFPnumberrepresentedbyIEEE754isnegative.Otherwise,thesigniszero.Aleadingvalueof1inthe significandisimplicitfornormalizednumbers.Thus,thesignificand,whichalwayshasavaluebetweenzeroandone,occupies23+1bitsinsingleprecision FPand52+1bitsindoubleprecision.Zeroisrepresentedbyazerosignificandandazeroexponentthereisnoleadingvalueofoneinthesignificand.The IEEE754representationisthuscomputedas: FPnumber=(1)S(1+Significand)2Exponent . Asaparentheticalnote,thesignificandcanbetranslatedintodecimalvaluesviathefollowingexpansion:
WithIEEE754,itispossibletomanipulateFPnumberswithouthavingspecialpurposeFPhardware.Forexample,considerthesortingofFPnumbers. IEEE754facilitatesbreakingFPnumbersupintothreeparts(sign,significant,exponent).Thenumberstobesortedareorderedfirstaccordingtosign (negative<positive),secondaccordingtoexponent(largerexponent=>largernumber),andthirdaccordingtosignificand(whenonehasatleasttwo numberswiththesameexponents). AnotherissueofinterestinIEEE754is biasednotation forexponents.Observethattwoscomplementnotationdoesnotworkforexponents:thelargest negative(positive)exponentis00000001 2(11111111 2).Thus,wemustadda biasterm totheexponenttocentertherangeofexponentsonthebias number,whichisthenequatedtozero.Thebiastermis127(1023)fortheIEEE754singleprecision(doubleprecision)representation.Thisimpliesthat FPnumber=(1)S(1+Significand)2(ExponentBias) . Asaresult,wehavethefollowingexampleofbinarytodecimalfloatingpointconversion:
FPnumber=(1)S(1+Significand)2(ExponentBias) . Asaresult,wehavethefollowingexampleofbinarytodecimalfloatingpointconversion:
DecimaltobinaryFPconversionissomewhatmoredifficult.Threecasespertain:(1)thedecimalnumbercanbeexpressedasafractionn/dwheredisa poweroftwo(2)thedecimalnumberhasrepeateddigits(e.g.,0.33333)or(3)thedecimalnumberdoesnotfiteitherCase1orCase2.InCase1,one selectstheexponentaslog2(d),andconvertsntobinarynotation.Case3ismoredifficult,andwillnotbediscussedhere.Case2isexemplifiedinthe followingdiagram:
Here,thesignificandis10101010101010101010101,thesignisnegative(representation=1),andtheexponentiscomputedas1+127=128 10= 10000000 2.ThisyieldsthefollowingrepresentationinIEEE754standardnotation:
ThefollowingtablesummarizesspecialvaluesthatcanberepresentedusingtheIEEE754standard. Table3.1. SpecialvaluesintheIEEE754standard.
Ofparticularinterestintheprecedingtableisthe NaN(notanumber)representation.Forexample,whentakingthesquarerootofanegativenumber,or whendividingbyzero,weencounteroperationsthatareundefinedinthearithmeticoperationsoverrealnumbers.TheseresultsarecalledNaNsandare representedwithanexponentof255andazerosignificand.NaNscanhelpwithdebugging,buttheycontaminatecalculations(e.g.,NaN+ x =NaN).The recommendedapproachtoNaNs,especiallyforsoftwaredesignersorengineersearlyintheirrespectivecareers,isnottouseNaNs. AnothervariantofFPrepresentationisdenormalizednumbers,alsocalled denorms .Thesenumberrepresentationsweredevelopedtoremedytheproblem ofagapamongrepresentableFPnumbersnearzero.Forexample,thesmallestpositivenumberisx=1.00... 2127,andthesecondsmallestpositive numberisy=1.001 22127=2 127+2 150.Thisimpliesthatthegapbetweenzeroandxis2 127andthatthegapbetweenxandyis2 150,asshownin Figure3.22a.
(a)(b) Figure3.22. Denorms:(a)Gapbetweenzeroand2 127,and(b)Denormsclosethisgapadaptedfrom[Maf01]. Thissituationcanberemediedbyomittingtheleadingonefromthesignificand,therebydenormalizing theFPrepresentation.Thesmallestpositivenumberis nowthedenorm0.0...12127=2 150,andthesecondsmallestpositivenumberis2 149. 3.4.4.FPArithmetic Applyingmathematicaloperationstorealnumbersimpliesthatsomeerrorwilloccurduetothefloatingpointrepresentation.ThisisduetothefactthatFP
Figure3.22. Denorms:(a)Gapbetweenzeroand2
,and(b)Denormsclosethisgapadaptedfrom[Maf01].
Thissituationcanberemediedbyomittingtheleadingonefromthesignificand,therebydenormalizing theFPrepresentation.Thesmallestpositivenumberis nowthedenorm0.0...12127=2 150,andthesecondsmallestpositivenumberis2 149. 3.4.4.FPArithmetic Applyingmathematicaloperationstorealnumbersimpliesthatsomeerrorwilloccurduetothefloatingpointrepresentation.ThisisduetothefactthatFP additionandsubtractionarenotassociative,becausetheFPrepresentationisonlyanapproximationtoarealnumber. Example1. Usingdecimalnumbersforclarity,letx=1.510 38,y=1.510 38,andz=1.0.Withfloatingpointrepresentation,wehave: x+(y+z)=1.510 38+(1.510 38+1.0)=0.0 and (x+y)+z=(1.510 38+1.510 38)+1.0=1.0 Thedifferenceoccursbecausethevalue1.0cannotbedistinguishedinthesignificandof1.5 10 38duetoinsufficientprecision(numberof digits)ofthesignificandintheFPrepresentationofthesenumbers(IEEE754assumed). TheprecedingexampleleadstoseveralimplementationalissuesinFParithmetic.Firstly,rounding occurswhenperformingmathonrealnumbers,dueto lackofsufficientprecision.Forexample,whenmultiplyingtwoNbitnumbers,a2Nbitproductresults.SinceonlytheupperNbitsofthe2Nbitproduct areretained,thelowerNbitsare truncated .Thisisalsocalled roundingtowardzero . Anothertypeofroundingiscalled roundingtoinfinity .Here,ifroundingtoward+infinity,thenwealwaysroundup.Forexample,2.001isroundedupto3, 2.001isroundedupto2.Conversely,ifroundingtowardinfinity,thenwealwaysrounddown.Forexample,1.999isroundeddownto1,1.999is roundeddownto2.Thereisamorefamiliartechnique,forexample,where3.7isroundedto4,and3.1isroundedto3.Inthiscase,weresolverounding fromn.5tothenearestevennumber,e.g.,3.5isroundedto4,and2.5isroundedto2. AsecondimplementationalissueinFParithmeticisadditionandsubtractionofnumbersthathavenonzerosignificandsandexponents.Unlikeinteger addition,wecan'tjustaddthesignificands.Instead,onemust: 1. Denormalizetheoperandsandshiftoneoftheoperandstomaketheexponentsofbothnumbersequal(wedenotetheexponentbyE). 2. Addorsubtractthesignificandstogettheresultingsignificand. 3. NormalizetheresultingsignificandandchangeEtoreflectanyshiftsincurredbynormalization. WewillreviewseveralapproachestofloatingpointoperationsinMIPSinthefollowingsection.
3.5.FloatingPointinMIPS
ReadingAssignmentsandExercises TheMIPSFParchitectureusesseparatefloatingpointinsturctionsforIEEE754singleanddoubleprecision.Singleprecisionuses add.s , sub.s , mul.s , and div.s ,whereasdoubleprecisioninstructionsare add.d , sub.d , mul.d ,and div.d .Theseinstructionsaremuchmorecomplicatedthantheirinteger counterparts.ProblemswithimplementingFParithmeticincludeinefficienciesinhavingdifferentinstructionsthattakesignificantlydifferenttimestoexecute (e.g.,divisionversusaddition).Also,FPoperationsrequiremuchmorehardwarethanintegeroperations. Thus,inthespiritofRISCdesignphilosophy,wenotethat(a)aparticulardatumisnotlikelytochangeitsdatatypewithinaprogram,and(b)sometypesof programsdonotrequireFPcomputation.Thus,in1990,theMIPSdesignersdecidedtoseparatetheFPcomputationsfromtheremainderoftheALU operations,anduseaseparatechipforFP(calledthe coprocessor ).AMIPScoprocessorcontains3232bitregistersdesignatedas $f0 , $f1 ,...,etc. Mostoftheseregistersarespecifiedinthe .s and .d instructions.Doubleprecisionoperandsarestoredin registerpairs (e.g., $f0,$f1 upto $f30,$f31 ). TheCPUthushandlesalltheregularcomputation,whilethecoprocessorhandlesthefloatingpointoperations.Specialinstructionsarerequiredtomovedata betweenthecoprocessor(s)andCPU(e.g., mfc0 , mtc0 , mfc0 , mtc0 ,etc.),wherec nreferstocoprocessor# n.Similarly,specialI/Ooperationsare requiredtoloadandstoredatabetweenthecoprocessorandmemory(e.g., lwc0 , swc0 , lwc1 , swc1 ,etc.) FPcoprocessorsrequireverycomplexhardware,asshowninFigure3.23,whichportraysonlythehardwarerequiredforaddition.
requiredtoloadandstoredatabetweenthecoprocessorandmemory(e.g., lwc0 , swc0 , lwc1 , swc1 ,etc.) FPcoprocessorsrequireverycomplexhardware,asshowninFigure3.23,whichportraysonlythehardwarerequiredforaddition.
Figure3.23. MIPSALUsupportingfloatingpointaddition,adaptedfrom[Maf01]. TheuseoffloatingpointoperationsinMIPSassemblycodeisdescribedinthefollowingsimpleexample,whichimplementsaCprogramdesignedtoconvert FahrenheittemperaturestoCelsius.
Here,weassumethatthereisacoprocessorc1connectedtotheCPU.Thevalues5.0and9.0arerespectivelyloadedintoregisters$f16 and $f18 using the lwc1 instructionwiththeglobalpointerasbaseaddressandthevariables const5 and const9 asoffsets.Thesingleprecisiondivisionoperationputsthe quotientof5.0/9.0into $f16 ,andtheremainderofthecomputationisstraightforward.AsinallMIPSprocedurecalls,the jr instructionreturnscontrolto theaddressstoredinthe $ra register.
References
[Maf01]Mafla,E. CourseNotes,CDA3101 ,atURL http://www.cise.ufl.edu/~emafla/ (asof11Apr2001). [Pat98]Patterson,D.A.andJ.L.Hennesey. ComputerOrganizationandDesign:TheHardware/SoftwareInterface ,SecondEdition,SanFrancisco, CA:MorganKaufman(1998).

Booth Algorithm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Booth Algorithm

Uploaded by

Copyright:

Available Formats

OrganizationofComputerSystems: 3:ComputerArithmetic

Thefinalimplementationoftheprecedingtechniqueisina32bitALUthatincorporatesthe and , or,andadditionoperations.The32bitALUcanbesimply constructedfromtheonebitALUbychainingthecarrybits,suchthatCarryIn i+1=CarryOut i,asshowninFigure3.10.

Here,the Bnegate inputisthesameasthe Binvert inputinFigure3.9,andweassumethatwehavethreecontrolinputstothemuxwhosecontrolline configurationisassociatedwithanoperation,asfollows:

3.2.4.1.Supportforthe slt Instruction. Theslt instruction(setonlessthan)hasthefollowingformat:

(b) Figure3.17. Thirdversionofpencilandpapermultiplicationof32bitBooleannumberrepresentations:(a)algorithm,and(b)schematicdiagramofALU circuitryadaptedfrom[Maf01]. Thus,wehavethefollowingshiftandaddschemeformultiplication:

Figure3.21. MIPSALUsupportingtheintegerarithmeticoperations(+,,x,/),adaptedfrom[Maf01]. SelfExercise. ShowhowtheMIPSALUinFigure3.21supportstheintegerarithmeticoperations(+,,x,/)usingthealgorithmsandhardware diagramsgiventhusfar.Hint:Thisexercise,orapartofit,islikelytobeanexamquestion.

andcanbein normalizedform (mantissahasexactlyonedigittotheleftofthedecimalpoint,e.g.,2.342510 19)or nonnormalizedform .Binaryscientiic notationhasthefolowingconfiguration,whichcorrespondstothedecimalforms:

Assumethatwehavethefollowing normalformat forscientificnotationinBooleannumbers: +1.xxxxxxx2wyyyyy 2,

Here,thesignificandis10101010101010101010101,thesignisnegative(representation=1),andtheexponentiscomputedas1+127=128 10= 10000000 2.ThisyieldsthefollowingrepresentationinIEEE754standardnotation:

ThefollowingtablesummarizesspecialvaluesthatcanberepresentedusingtheIEEE754standard. Table3.1. SpecialvaluesintheIEEE754standard.

requiredtoloadandstoredatabetweenthecoprocessorandmemory(e.g., lwc0 , swc0 , lwc1 , swc1 ,etc.) FPcoprocessorsrequireverycomplexhardware,asshowninFigure3.23,whichportraysonlythehardwarerequiredforaddition.

Figure3.23. MIPSALUsupportingfloatingpointaddition,adaptedfrom[Maf01]. TheuseoffloatingpointoperationsinMIPSassemblycodeisdescribedinthefollowingsimpleexample,whichimplementsaCprogramdesignedtoconvert FahrenheittemperaturestoCelsius.

You might also like