You are on page 1of 16

Intro to Hadoop and MapReduce

Lesson 2 Notes

Introduction
InthislessonwelltakeadeeperlookatthetwokeypartsofHadoophowitstoresthe
data,andhowitprocessesit.Letsstartbyseeinghowdataisstored.

HDFS
FilesarestoredinsomethingcalledtheHadoopDistributedFile
System,whicheveryonejustreferstoasHDFS.Asadeveloper,
thislooksverymuchlikearegularfilesystemthekindyoure
usedtoworkingwithonastandardmachine.Butitshelpfulto
understandwhatsgoingonbehindthescenes,sothatswhat
weregoingtotalkabouthere.

WhenafileisloadedintoHDFS,itssplitintochunks,whichwecallblocks.Eachblockispretty
bigthedefaultis64megabytes.So,imagineweregoingtostoreafilecalledmydata.txt,which
is150megabytesinsize.Asitsuploadedtothecluster,itssplitinto64megabyteblocks,and
eachblockwillbestoredononenodeinthecluster.Eachblockisgivenauniquenamebythe
system:itsactuallyjustblk,thenanunderscore,thenalargenumber.Inthiscase,thefilewill
besplitintothreeblocks:thefirstwillbe64megabytes,thesecondwillbe64megabytes,and
thethirdwillbetheremaining22megabytes.

Theresadaemon,orpieceofsoftware,runningoneachoftheseclusternodescalledthe

Copyright2014Udacity,Inc.AllRightsReserved.

DataNode,whichtakescareofstoringtheblocks.

Now,clearlyweneedtoknowwhichblocksmakeupthefile.Thatshandledbyaseparate
machine,runningadaemoncalledtheNameNode.TheinformationstoredontheNameNodeis
knownasthemetadata.

QUIZ - Are there problems?


Thatsfineasfarasitgoes,buttherearesomeproblemswiththis.Takealookatthediagram,
andseeifyoucanspotwherewemightrunintotrouble.

[]Networkfailurebetweennodes
[]DiskfailureonDataNode
[]NotallDataNodesareused
[]Blocksizesaredifferent
[]DiskfailureonNameNode

Answer:
Somenodesarenotusedisnotaproblem,sincetheycanbeusedfordifferentfiles,neitheris
differentblocksize.Networkanddiskfailuresarecertainlyaproblem,letslookintothisinmore
detail

Data Redundancy
Theproblemwiththingsrightnowisthatifoneofournodesfails,wereleftwithmissingdatafor
thefile.Ifthisnodegoesaway,forexample,wevegota64megabyteholeinthemiddleof

Copyright2014Udacity,Inc.AllRightsReserved.

mydata.txt.And,ofcourse,similarproblemswithanyotherfileswhichhaveblocksstoredon
thatnode.

Tosolvethisproblem,HadoopreplicateseachblockthreetimesasitsstoredinHDFS.So
blk_1doesntjustlivehere,itsalsostoredperhapshereandhere.blk_2isntjusthere,butalso
maybehereandhere.Andsimilarlyforblk_3.Hadoopjustpicksthreerandomnodes,andputs
onecopyoftheblockoneachofthethree.Well,actually,itsnotatotallyrandomchoice,butits
closeenoughforusrightnow.


Now,ifasinglenodefails,itsnotaproblembecausewehavetwoothercopiesoftheblockon
othernodes.AndtheNameNodeissmartenoughthatifitseesthatanyoftheblocksare
underreplicated,itwillarrangetohavethoseblocksrereplicatedontheclustersowerebackto
havingthreecopiesofthemagain.

QUIZ - Any problems now?


OK,so,wevetakencareofwhathappensifoneofourDataNodesfails.Buttheresanother
obvioussinglepointoffailurehere.WhathappensiftheNameNodehasahardwareproblem?

dataonHDFSmaybeinaccessible[]
dataonHDFSmaybelostforever[]
thereisnoproblem[]

Copyright2014Udacity,Inc.AllRightsReserved.

Answer:
Ifthereisanetworkfailure,thedatawillnotbeaccessibletemporarily.Ifthediskofthesingle
NameNodewouldfail,dataonHDFSwouldbelostpermanently

NameNode High Availability


Foralongtime,theNameNodewasasinglepointoffailureinHadoop.Ifitdied,theentire
clusterwasinaccessible.AndifthemetadataontheNameNodewaslost,theentireclusters
datawaslost.Sure,youvestillgotalltheblocksontheDataNodes,butyouvenowayof
knowingwhichblockbelongstowhichfilewithoutthemetadata.Sotoavoidthatproblem,people
wouldconfiguretheNameNodetostorethemetadata
notjustonitsownharddrivebutalsosomewhereelse
onthenetworkusingNFS,whichisamethodof
mountingaremotediskontheNameNode.Thatway,
eveniftheNameNodeburstintoflames,therewouldbe
acopyofthemetadatasomewhereelseonthe
network.

Active and Standby NameNode


Thesedays,theproblemoftheNameNode
beingasinglepointoffailurehasbeensolved.
MostproductionHadoopclustersnowhavetwo
NameNodes:oneActive,andoneStandby.The
ActiveNameNodeworksasbefore,butthe
Standbycanbeconfiguredtotakeover
automaticallyiftheActiveonefails.Thatway,
theclusterwillkeeprunningifanyofthenodes
evenoneoftheNameNodesfails.

IansnowgoingtoshowyouademonstrationofhowtouseHDFS.

DEMO of HDFS


So,hereIhaveadirectoryonmylocalmachine,whichcontainsacoupleoffiles,andIwantto
putoneofthemintohdfs.AllofthecommandswhichinteractwiththeHadoopfilesystemstart
withHadoopFS.Sofirstofall,let'sseewhatwe

Copyright2014Udacity,Inc.AllRightsReserved.

haveinhdfstostartwith.Idothatbysayinghadoopfsminusls.Thatgivesmealistingofwhat's
inmyhomedirectoryontheHadoopcluster.BecauseI'mloggedintothelocalmachineasa
usercalledtraining,myhomedirectoryinhdfsis/user/training.Andasyoucansee,there's
nothingthere.Sonow,let'suploadourpurchases.txtfile.Wedothatwithhadoopfsminusput
purchases.txt.Hadoopfsminusputtakesalocalfileandplacesitintohdfs.

SinceI'mnotspecifyingadestinationfilename,it'llbeuploadedwiththesamefilename.So,it
takesafewsecondstoupload.AndnowIcandoanotherhadoopfsminusls,andwecansee
thatthatfileisnowinhdfs.

Icantakealookatthelastfewlinesofthefilebysaying,hadoopfsminustail,andthenthe
filename,andthatjustdisplaysthelastfewlinesonthescreenforme.

There'salsoahadoopfsminuscat,whichwilldisplaytheentirecontentsofthefileandwe'lluse
thatlater.Thereareplentyofotherhadoopfscommandsandasyou'llprobablyhavestartedto
realize,theycloselymirrorstandardUNIXfilesystemcommands.So,ifIwanttorenamethefile,
forexample,Icansayhadoopfsminusmv,whichmovespurchases.txt,inthiscase,to
newname.txt.

IfIwanttodeleteafile,hadoopfsminusrmwillremovethatfileforme.So,let'sgetridof
Copyright2014Udacity,Inc.AllRightsReserved.

newname.txtfromhdfs.

Icreateadirectoryinhdfsbysayinghadoopfsminusmkdirandthenthedirectoryname,and
nowlet'suploadpurchases.txtandplaceitinthemyinputdirectorysothatit'sreadyfor
processingbyhdfs.OnceI'vedonethat,hadoopfsminuslsmyinputwillshowmethecontents
ofthatdirectory.AndjustasIexpected,there'sthefile.

MapReduce
Thanks,Ian.OK,nowweveseenhowfilesarestoredinHDFS,letsdiscusshow
thatdataisprocessedwithMapReduce.SayIhadalargefile.Processingthat
seriallyfromthetoptothebottomcouldtakealongtime.

Instead,mapreduceisdesignedtobeaveryparallelizedwayof
managingdata,meaningthatyourinputdataissplitintomanypieces,andeach
pieceisprocessedsimultaneously.Toexplain,letstakearealworldscenario.

Letsimaginewerunaretailerwiththousands
ofstoresaroundtheworld.Andwehavea
ledgerwhichcontainsallthesalesfromallthe
stores,organizedbydate.Wevebeenasked
tocalculatethetotalsalesgeneratedbyeach
storeoverthelastyear.

Now,onewaytodothatwouldbejusttostart
atthebeginningoftheledgerand,foreach
entry,writethestorenameandtheamountnexttoit.Forthenext
entry,IneedtoseeifIvealreadygotthatstorewrittendownifI
have,Icanaddtheamounttothatstore.Ifnot,Iwritedownanew
storenameandthatfirstpurchase.Andsoon,andsoon.

Hashtables

Copyright2014Udacity,Inc.AllRightsReserved.

Typically,thisishowwedsolvethingsinatraditionalcomputingenvironment:wedcreatesome
kindofassociativearrayorhashtableforthestoresthenprocesstheinputfileoneentryata
time.

Whatproblemsdoyouseewithsuchapproach,ifyourunthison1TBofdata?
[]itwillnotwork
[]youmightrunoutofmemory
[]itwilltakealongtime
[]theendresultmightbeincorrect

Answer:
Firstofall,yougotmillionsandmillionsofsalestoprocess.Soitsgoingtotakeanawfullylong
timeforyourcomputertofirstreadthefilefromadiskandthentoprocess.Also,themore
storesyouhave,thelongerittakesyoutocheckmytotalsheet,findtherightstore,andaddthe
newvaluetotherunningtotalforthatstore.Butagain,itwouldtakealongtimeandyoumay
evenrunoutofmemorytoholdyourarrayifyoureallydohaveahugenumberofstores.So
instead,letsseehowyouwoulddothisasaMapReducejob.

Mappers and Reducers


Welltakethestaffintheaccountsdepartment
andsplitthemintotwogroups,Wellcallthemthe
MappersandtheReducers.Thenwellbreakthe
ledgerdownintochunks,andgiveeachchunkto
oneoftheMappers.AlloftheMapperscanworkat
thesametime,andeachoneisworkingonjusta
smallfractionoftheoveralldata.

HereswhataMapperwilldo.Theywilltakethefirstrecordintheirchunkoftheledger,andonan
indexcardtheyllwritethestorenameastheheading.Underneath,theyllwritethesaleamount
forthatrecord.Thentheylltakethenextrecord,anddothesamething.Astheyrewritingthe
indexcards,theyllpilethemupsothatallthecardsforoneparticularstoregoonthesamepile.
Bytheend,eachMapperwillhaveapileofcardsperstore.


OncetheMappershavefinished,theReducerscancollecttheirsetsofcards.Wetelleach
Reducerwhichstorestheyreresponsiblefor.TheReducersgotoalltheMappersandretrieve

Copyright2014Udacity,Inc.AllRightsReserved.

thepilesofcardsfortheirownstores.Itsfastforthemtodo,becauseeachMapperhas
separatedthecardsintoapileperstorealready.OncetheReducershaveretrievedalltheir
data,theycollectallthesmallpilesperstoreandcreatealargepileperstore.Thentheystart
goingthroughthepiles,oneatatime.Alltheyhavetodoatthispointisaddupalltheamounts
onallthecardsinapile,andthatgivesthemthetotalsalesforthatstore,whichtheycanwrite
ontheirfinaltotalsheet.Andtokeepthingsorganized,eachReducergoesthroughhisorherset
ofpilesofcardsinalphabeticalorder.

MapReduce
AndthatsMapReduce!TheMappersareprogramswhicheachdealwitharelativelysmall
amountofdata,andtheyallworkinparallel.TheMappersoutputwhatwecallintermediate
records,whichinthiscasewereourindexcards.Hadoopdealswithalldataintheformof
records,andrecordsarekeyvaluepairs.Inthisexample,thekeywasthestorename,andthe
valuewasthesaletotalforthatparticularpieceofinput.OncetheMappershavefinished,a
phaseofMapReducecalledtheShuffleandSorttakesplace.Theshuffleisthemovementof
theintermediatedatafromtheMappersto
theReducersandthecombinationofallthe
smallsetsofrecordstogether,andthesort
isthefactthattheReducerswillorganize
thesetsofrecordsthepilesofindex
cardsinourexampleintoorder.Finally,
theReducephaseworksononesetof
recordsonepileofcardsatatimeit
getsthekey,andthenalistofallthevalues,
itprocessesthosevaluesinsomeway
(addingthemupinourcase)andthenit
writesoutitsfinaldataforthatkey.

Copyright2014Udacity,Inc.AllRightsReserved.

QUIZ: Final result


SincetheintermediatedataisonlysortedperReducer,howcouldwegetthefinalresultsintotal
sortedorder?

[]cantbedone
[]haveonlyoneReducer
[]mergetheresultfilesafterthejob

Answer:

Youcouldeitherhaveasinglereducer,ormergetheresultfilesafterthejob

QUIZ: Two reducer problem


AssumeyouhaveajobwhichhastwoReducers.TheMappersoutputthefollowingkeys:
Apple,Banana,Carrot,Grape
WhichkeyswillgotothefirstofthetwoReducers?

[]AppleandBanana
[]AppleandCarrot
[]CarrotandGrape
[]AppleandGrape
[]Wedontknow,buttwowillgotoeachReducer
[]Wedontknow,anditspossiblethatoneReducerwillnotgetanyofthekeys

Answer:
Sincethereisnoguaranteethateachreducerwillgetsamenumberofkeys,itmightbethatone
ofthemwillgetnone.Formoreinformationonhowthisworksseethelinksinstructornotes.

Daemons of MapReduce
SoweveseenconceptuallyhowMapReduceworks.Inthenextlesson,welltalkabouthowto
actuallywritecodetoperformMapReducejobsonthecluster,butbeforewedothatitsusefulto
knowwherethecodewillactuallyrun.

JustaswithHDFS,thereareasetofdaemonswhichare
basicallyjustpiecesofcodewhichrunallthetimethatcontrol
MapReduceonthecluster.WhenyourunaMapReducejob,you
submitthecodetoamachinecalledtheJobTracker.Thatsplitsthe
workintoMappersandReducers,andthoseMappersandReducerswillrunonthecluster
nodes.RunningtheactualMaptasksandReducetasksishandledbyadaemoncalledthe

Copyright2014Udacity,Inc.AllRightsReserved.

TaskTracker,whichrunsoneachoftheslavenodesinthe
cluster.NoticethatsincetheTaskTrackersrunonthesame
machinesastheDataNodes,theHadoopframeworkwillbeable
tohaveMaptasksworkonpiecesofdatathatarestoredonthe
samemachine,whichwillsavealotofnetworktraffic.

Aswesaw,eachMapperprocessesaportionoftheinputdataknownasanInputSplit,andby
defaultHadoopwilluseanHDFSblockastheInputSplitforeachMapper.Itwilltrytomakesure
thataMapperworksondatawhichisonthesamemachineastheblockitself,soinanideal
world,theMapperwhichprocessesablockwillrunononeofthemachineswhichactually
storesthatblock.Ifblock2needsprocessing,forexample,itwillideallybeprocessedonthis
machine,thismachine,orthismachine.Thatwontalwaysbepossible,becausethe
TaskTrackersonallthreemachinesmayalreadybebusy,inwhichcasethedatawillbe
streamedtoanothernodeforprocessing,butitshouldhappenthemajorityofthetime.

TheMappersreadtheinputdata,andproduceintermediatedatawhichtheHadoopframework
thenpassestotheReducersthatstheshuffleandsort.TheReducersprocessthatdata,and
writetheirfinaloutputbacktoHDFS.

SoletshaveIanrunajobonourcluster.

Copyright2014Udacity,Inc.AllRightsReserved.

Running a Job
It'softenthecasethatMapReducecodeiswritteninJava.However,tomakethingsalittleeasier
forus,we'veactuallywrittenourmapperandreducerinPythoninstead.Andwecandothat
thankstoafeaturecalledHadoopstreaming,whichallowsyoutowriteyourcodeinprettymuch
anylanguageyou'dlike.Sofirstofall,let'sdoublecheckthatwehaveourinputdatainHDFS.
So,ifIHadoopfsminusls,thenthere'smyinputdirectory.AndifIlookatthatdirectory,thenyes,
there'spurchases.txtinthere.Andinmylocaldirectory,Ihavemapper.pyandreducer.py,that's
thecodeforthemapperandreducer,writteninPython.We'lllookattheactualcodeinthenext
lesson.

Okay,tosubmitthejobwehavetogivethisrathercumbersomecommand.Wesayhadoopjar,
apathtoajar,thenIspecifythemapper,Ispecifythereducer,Ineedtosayfile,forboththe
mapperandthereducercode.IspecifytheinputdirectoryinHDFSandIspecifytheoutput
directorytowhichthereducerswillwritetheiroutputdata.Andwe'recallingthatjoboutput.

IhitEnterandoffwego.Hadoop'sprettyverbose,asyoucansee.Asthejobruns,you'llseea
bunchofoutputwhichshowsushowfaralongthejobis.ItturnsoutthatforthisjobHadoopwill
berunningfourmappers.Andourvirtualmachineherecanonlyruntwoatatime.Sothejobis
goingtotakelongerthanitwouldonalargercluster.Actually,that'sworthmentioninghere.With
thesizeofthedatawehaveforthisexamplewhichisonly200megs,realistically,wecould
probablyhavesolvedthisproblemfasterbyjustimportingthedataintoarelationaldatabaseand
queryingitfromthere.Andthat'softenthecasewhenwe'redevelopingandtestingcode.
Becausethetestdatasetsareprettysmall,Hadoopisn'tnecessarilytheoptimaltoolforthejob.
Butwhenwe'redonetestingandweneedtoprocessourfullproductiondata,that'swhen
Hadoopreallycomesintoitsown.So,asyoucanseethejobisnownearlycomplete,andwhen
thejobhasfinishedwe'llseethatthelastlinetellsmethattheoutputdirectoryiscalled
joboutput.

Copyright2014Udacity,Inc.AllRightsReserved.

Let'stakealookatwhatwe'vegotinthere.Hadoopfsminusls,showsmethatyesIdohavea
joboutputdirectory.Andifwelookatthejoboutputdirectory,you'llseethatitcontainsthree
things.Itcontainsafilecalled_SUCCESS,whichjusttellsmethatthejobhassuccessfully
completed.Itcontainsadirectorycalled_logs,whichcontainssomeloginformationaboutwhat
happenedduringthejob'srun.Andthen,itcontainsafilecalledpart00000.Thatfileistheoutput
fromtheonereducerthatwehadforthisjob.

Let'stakealookatthatbysayinghadoopfsminuscatpart00000,andwe'llpipethattolesson
ourlocalmachine.

That'sthecontentsofthatfile,whichistheoutputfromourreducer.It'sthesumtotalsales
brokendownbystoreexactlyaswewantit.

Copyright2014Udacity,Inc.AllRightsReserved.

Incidentally,ifyouwanttoretrievedatafromHFDSandputitontoyourlocaldisk,youcandothat
withHadoopfsminusget.HadoopfsminusgetistheoppositeofHadoopfsminusput.Itjust
pullsdatafromHDFSandputsitonthelocaldisk.Soasyoucansee,nowIhavemylocalfile.txt
onmylocaldiskAndIcanmanipulatethathoweverI'dlike.

ThatHadoopjobcommandwetypedwasprettypainfultohavetoremember.Sotosaveyou
time,we'vecreatedanaliasinthedemovirtualmachinethatyou'llbedownloading.Youcanjust
typehsfollowedbyfourarguments,themapperscript,thereducerscript,theinputdirectory,
andtheoutputdirectory.

Here'soneimportantthingtonote,though.Whenyou'rerunningaHadoopjob,theoutput
directorymustnotalreadyexist.Andasyoucansee,ifwetryandrunthecommandwithan
existingdirectory.Inthiscase,joboutput.Hadooprefusestorunthejob.

ThisisactuallyafeatureofHadoop.It'sdesignedtostopyouinadvertentlydeletingoroverwriting
datathat'salreadyinthecluster.Butasyoucansee,ifwespecifyadifferentdirectory,which
doesn'talreadyexist,thenthejobwillbeginjustfine.
Copyright2014Udacity,Inc.AllRightsReserved.

Processing Logs
Theexamplewejusttalkedaboutwascalculatingtheaveragesalesperstore.Andtherearelots
ofotherthingswecandowithMapReducethatareactuallyquitesimilar,conceptually,tothat.
Forexample,logprocessingisreallyquitesimilar.Imagineyouhaveasetoflogfilesfroma
Webserverwhichlooklikethis,andyouwanttoknowhowmanytimeseachpagehasbeenhit.

Well,itsreallysimilartothesalesperstore.YourMapperwillreadalineofthelogfileatatime,
andwillextractthenameofthepagelikeindex.html,forexample.

Itsintermediatedatawillhavethename
ofthepageasthekey,anda1asthe
value,becauseyouvefoundonehitto
thepageatthatpositioninthelog.When
alltheMappersaredone,theReducers
willgetthekeys,andalistofallthe
valuesforeachparticularkey.Theycan
thenjustaddallthe1supforakeyandthatwilltellthemthetotalnumberofhitstothatpageon
theWebsite.Simple,butfarmoreefficientthanwritingastandaloneprogramtogothroughall
thelogsfromstarttofinishifyouhavehundredsofgigabytestoprocess.

Practice makes perfect


Copyright2014Udacity,Inc.AllRightsReserved.

AndthatsjustthestartofwhatyoucandowithMapReduce.Thingslikefraud
detection,recommendersystems,itemclassificationtherearemany,many
applicationsofMapReduce,buttheyallstartwiththosesimpleconcepts.Andtheyall
sharesomebasiccharacteristics:theresalotofdatatobeprocessed,andthework
canbeparallelizedyoudonthavetojuststartatthebeginningandslogthroughtotheend.

PerhapsthehardestthingtolearnwhenyourenewtoHadoopishowtosolveproblemsby
thinkingintermsofMapReduce.Itsaverydifferentwayofprocessingdatacomparedtohow
youreprobablyusedtoworkingand,honestly,thebestwaytolearnisbypractice.Inthenext
lessonwellwritethecodetosolveoursalesbystoreproblem,andyoullstarttoworkonother
MapReduceproblems.

Virtual Machine Setup


WeveprovidedavirtualmachinewithCDH,ClouderasdistributionofHadoop,preinstalled.We
saythatthisVMisrunningaclusterinpseudodistributedmode.Thatmeansitsacomplete
Hadoopclusterrunningonasinglemachine.Itsagreatwaytowriteandtestcode,becauseit
reallyisacompleteHadoopclusterjustonewhichisrunningonasinglemachine.TheVM
alsocontainsoursampledatasetsandsamplesolutionstotheproblemsweregoingtoaskyou
tosolve.Ifyouhaventalreadydownloadedit,nowwouldbeagoodtimetodoso.Youcanfind
instructionsonhowtodothatintheInstructorNotesforthislesson.

OnceyouvedownloadedandstarteduptheVM,wedlikeyoutotryuploadingadatasetinto
HDFSandrunningaMapReducejobyourself.Theexerciseinstructionsdocumentinthe
InstructorNotessectiongivesyoustepbystepinstructionsonwhattodo(Instructions
document).Havefun!

Conclusion

So,thatstheendofthelesson.YoulearnedabouthowHadoopusesHDFStostoredata,and
thebasicprinciplesbehindMapReduce.Inthenextlesson,welllookattheMapReducecode
itselfbytheendofthelessonyoullbereadytowriteyourownprogramstoanalyzedata.

Number of Reducers
Onethingworthyofnoteisthatyou,asadeveloper,specifyhowmanyReducersyouwantfor
yourjob.ThedefaultistohaveasingleReducer,butforlargeamountsofdataitoftenmakes
sensetohavemanymorethanone.Otherwise,thatoneReducerwillenduphavingtoprocess
ahugeamountofdatafromtheMappers.TheHadoopframeworkdecideswhichkeysgetsent

Copyright2014Udacity,Inc.AllRightsReserved.

toeachReducer,andtheresnoguaranteethateachReducerwillgetthesamenumberofkeys.
ThekeyswhichgotoaparticularReduceraresorted,buteachReducerwritesitsownfileinto
HDFS.Soif,forexample,wehadfourkeys:a,b,c,andd,andtwoReducers,thenoneReducer
mightgetkeysaandd,theothermightgetbandc.Sotheresultswouldbesortedwithineach
Reducersoutput,butjustjoiningthefilestogetherwouldntproducecompletelysortedoutput.

QUIZ:
Beforewemoveon,though,whichofthefollowingtypesofproblemdoyouthinkaregood
candidatestosolvewithMapReduce?
[]Detectinganomalousbehaviorfromalogfile
[]Calculatingreturnsfromalargenumberofstockportfolios
[]Verylargematrixinversion
[](somethingelse)

Answer:Theansweristhatallbutmatrixmultiplicationaregoodcandidatestosolvewith
MapReduce.Thereasonmatrixinversionisnot,isthatmatrixmanipulationtendstorequire
holdingtheentirecontentsofbothmatricesinmemoryatonce,ratherthanprocessingindividual
portions.YoucandoitwithMapReduce,butitturnsouttobequitedifficult.

Copyright2014Udacity,Inc.AllRightsReserved.

You might also like