You are on page 1of 9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora
AskQuestion

AskorSearchQuora

LearningDataScience

JobsandCareersinDataScience

DataScience

+1

Read

Answer

Notifications

Pallav

QuestionOverview

WhatclassesshouldItakeifIwanttobecomeadata
scientist?
Giventhereportedtalentgapfordatascientists(http://www.emc.com/collateral/ab... )how
shoulduniversitiesandindustrybetrainingpeople?
WriteAnswer

ReAsk

Follow 409

Comment Share 3

Downvote

AnswerWiki
Morespecificversionsofthisquestionforparticularuniversities.(feelfreetoaddyours!)
WhatclassesshouldItakeatBerkeleyifIwanttobecomeadatascientist?
WhatclassesshouldItakeatBrownifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCaltechifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCMUifIwanttobecomeadatascientist?
WhatclassesshouldItakeatCornellifIwanttobecomeadatascientist?
WhatclassesshouldItakeatDukeifIwanttobecomeadatascientist?
WhatclassesshouldItakeatGeorgiaTechifIwanttobecomeadatascientist?
WhatclassesshouldItakeatHarvardifIwanttobecomeadatascientist?
WhatclassesshouldItakeatMITifIwanttobecomeadatascientist?
WhatclassesshouldItakeatPrincetonifIwanttobecomeadatascientist?
WhatclassesshouldItakeatStanfordifIwanttobecomeadatascientist?
WhatclassesshouldItakeatUCLAifIwanttobecomeadatascientist?
WhatclassesshouldItakeattheUniversityofChicagoifIwanttobecomeadata
scientist?
WhatclassesshouldItakeatUTAustinifIwanttobecomeadatascientist?
WhatclassesshouldItakeatYaleifIwanttobecomeadatascientist?
WhatclassesshouldItakeatIISc(IndianInstituteofScience)ifIwanttobecomea
datascientist?

26Answers
WilliamChen,DataScientistatQuora
22.2kViewsUpvotedbySeanOwen,Director,DataScience@Cloudera
WilliamisaMostViewedWriterinDataScience.

AdatasciencecurriculumshouldmostlyacombinationofStatisticsandComputerScience
classes,withadditionalrelevantclassesfromotherdepartments(e.g.AppliedMath,Math,
Econ)

409FollowersincludingJoeBlitzstein,Professor
intheHarvardStatisticsDepartmentRyanFox
Squire JoeBlitzstein
InFAQforDataScienceatUniversities
ProfessorintheHarvardStatistics
Department
41,024Views
ViewMore

50,723

1,564,719

30DayViews

AllTimeViews

RelatedQuestions

TopWriter2015and2014

MostViewedinStatistics,Probability,
WhatclassesshouldItakeattheUniversityof
HarvardStat110,and9more
ChicagoifIwanttobecomeadatascientist?
FollowedbyMarcBodnick,VladimirNovakovski,
IsittoolatetobecomeadatascientistifI'mtaking
and5othersyoufollow
myfirstprobabilityclassasafreshman?
WhatclassesshouldItakeinmylastsemesterasa
Follow 15.9k
Subscribe
mathmajor/aspiringdatascientist?
WhatclassesshouldItakeatBerkeleyifIwantto
becomeadatascientist?
WhatclassesshouldItakeatUCLAifIwantto
becomeadatascientist?
Iwanttobeadatascientist.Iampursuinga
bachelor'sincomputerscience.WhatshouldIdo
aftermygraduationtobec...
WhatclassesshouldItakeatStanfordifIwantto
becomeadatascientist?
WhatclassesshouldItakeatHarvardifIwantto
becomeadatascientist?
WhatmajorshouldIchooseifIwanttobeadata
scientist?
WhatclassesshouldItakeatPrincetonifIwantto
becomeadatascientist?
MoreRelatedQuestions

Herearemysuggestionsonafullcurriculumforadatascienceprogram

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

1/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

Introduction
Oneyearofmultivariablecalculusandlinearalgebra/matrixalgebra
OneyearofintroCS
Oneyearofintroprobabilityandinference
CoreClasses
Datascience
Machinelearning
Linearmodeling
Predictivemodeling
Statselectives
Morelinearmodels
Timeseriesanalysis
Statisticalsoftware
Experimentaldesign
Surveyanalysis
Causalinference
Bayesiandataanalysis
Nonparametricmethods
CSelectives
Theoryofcomputation/Analysisofalgorithms
Datastructuresandalgorithms
Softwareengineering
Visualization
Parallelprogramming/Massivecomputation(forprocessinghugedatasets)
Networkanalysis
Moremachinelearning
Economics+ComputerScience(gametheory,auctiondesign)
Otherelectives
(convex)Optimization
Behavioraleconomics
Thankyouforthoseinthecommentsforsuggestingmorecommentstoaddtothelist!
Formyansweronwhatmajoryoushouldbeifyouwanttobeadatascientist,checkout
WhatmajorshouldIchooseifIwanttobeadatascientist?
ThisanswerispartofWhatistheDataSciencetopicFAQ?
UpdatedFeb3ViewUpvotes
Upvote 287

Downvote Comments 8+

Share 6

MarkMeloon,SeniorDataScientistatImpetus
2.4kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
Markhas30+answersinDataScience.

First,there'sadifferencebetweendevelopingdataproductstobeconsumedbypeople
versusthoseconsumedbyothermachines.ButI'massumingyoumeantheformer,so
that'swhatI'lltalkabout.
Therearealotofgreatanswershere,butIjustwanttohighlightafewaspectsthatdon't
getnearlyasmuchattentionastheyshould.
Causality:Dr.Anonymousbelowdeservesmoreupvotes.Attheendoftheday.
youraudiencewantsactionableinformation.Ifyoudon'tgiveittothem,youhave
failed(moreonthisbelow).We'reinasituationnowwheredatascientistsdo
predictivemodeling(basedonhistoricaldata),decisionmakersactionbasedon
that,andtheresultis...well,nooneknows.Thatactionwasneverinthemodel.
Conversationalskills:ReadGuyCuthbert'sanswer.Hepointsoutavery
important,andwoefullyneglected,setofskills,namelybeingabletohavea
conversationwithnonspecialists.Iwroteabouttheimportanceofthisindetailin
myanswertoMarkMeloon'sanswertoWhatisadatascientist'scareerpath?
Mostdatascientistswhoclaimtobe"greatcommunicators"aremerelyskilledat

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

2/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

onewayinformationtransfer,suchaswritingandpresenting.Thatdoesn'tcutit
indatascienceforreasonsIexplain.Guy'ssuggestionofRhetoricandCognitive
Psychologyisrightonthemoney.IhaveyettoseeauniversityorMOOCdo
anythingbutpaylipservicetothiscriticalaspectofdatascience.
UnderstandData:GreatanswersbyAlexLeavittandAdamMarcus.
MathematicianJohnAllenPauloshasagreatbookentitled"Innumeracy"that
detailsjusthowpoorlymostpeopleunderstandprobabilitiesandother
mathematicalconcepts(seeSynopsesofInnumeracy,MathandHumor,andHis
OtherBooks ).You'vegottobeabletogrokallthisatadeeplevel.
Andnowfornoncoursework,itisaverygoodideatodosomeprojectsofyourown
interesttodemonstrateyourinitiative,passionforthesubject,andthatyouareaself
starter.Notethataclassprojectdoesn'tcount(seeDataScienceInterview:IDon'tCare
AboutYourClassProjects ).Personally,Idon'tparticularlycareaboutKaggle
competitionseither(seeMarkMeloon'sanswertoHowusefulareKagglecompetitionsfor
gettinginterviewsforsomeonealreadyworkingasadatascientist?).I'mmuchmore
interestedinprojectsofyourowndesignandthosethatdemonstrateyourabilitytowork
wellinateamenvironment.
There'smore,ofcourse,buttheothercommentersonthispagehavedoneanexcellentjob
ofcoveringthose.Mybiois"DataScience:thestraight,nohypetruth"andIfeel
compelledtopointoutthatdatascienceisfarmorethansittinginfrontofyourcomputer
allday,geekingoutonusingthemostsophisticatedalgorithmyoucanthinkoftospitup
results.
Finally,goforit!Datascienceiswaycoolandtherereallyisnothingquitelikeit.The
criticismthatit'smerelyasexedupversionofstatisticsiswayoff.Yeah,trainingtobecome
onehasuniquechallenges,butit'llbeworthitintheend.
AndkeepaskingquestionsonQuora.There'saslewofextremelyknowledgablepeople
herewhoareveryeagertohelp!
Mark
WrittenJan31ViewUpvotes
Upvote 12

Downvote Comment Share

RahulAgarwal,DataScientistatCiti
2.7kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
RahulisaMostViewedWriterinBigData.

IcouldonlytellyouwhatIdidtillnowandwhatIintendtoworkonadditionallyto
becomeabetterdataScientist.
WhatfollowsismyownDatascienceCurriculum.ThisisaimedatComputer
SciencewithaSpecializationinMachineLearning.
MymainaimhereistolearnaboutMathematics,Statistics,ComputerScienceand
MachineLearning,thoughnotnecessarilyinthesameorder.
Ihavecategorizedthecourseshereasoftwotypes:
1. FFoundationalClass
2. AAdvancedSpecialization
MATHEMATICS:
(F1)LinearAlgebraByGilbertStrang:
AGreatClassbyagreatTeacher.IWoulddefinitelyrecommendthisclasstoanyonewho
wantstolearnLA.
(F2)MultivariateCalculusMITOCW: TODO
COMPUTERSCIENCE:
(F1)CS50x:IntroductiontoComputerScience,Harvard
ThisisanIntroductiontoComputerScienceclasstakenbyDavidMalan.Helpedmewith

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

3/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

manymisunderstandingsandhelpedbuildintuitionaroundthewholeCSplayground.
StartswithabasicintroductiontoCandsomeprogrammingexercises.Endsupteaching
basicsofPHP,JavascriptandHTML/CSSaswell.Theprojectsinthisclassarereally
awesome.ThegithubcoderepositoryforthisclassisatHERE
(F2)CS101x:MITxintroductiontoprogrammingusingPython:
Thecourseisanintroductiontomanyoftheimportantconceptsincomputerscience.
Talksaboutsimplealgorithms,Asymptotictimes,Classes,OOP,Trees,Exceptions,
Assertions,Hashingandawholelotofotherstuff.
(F3)AlgorithmsandDataStructuresMITOCW: CURRENTLYWorkingon
(F4)RICEUniversity:CompSciMiniSpecialization
Thisisaseriesof6shortbutgoodcourses.IworkedonthesecoursesasDatascience
willrequireyoutodoalotofprogramming.Andthebestwaytolearnprogrammingisby
doingprogramming.Thelecturesaregoodbuttheproblemsandassignmentsare
awesome.Itconsistsofthreemaincourses:
1>InteractiveProgramminginPython :TheCoursestartswithteachingPythonbut
suddenlymovesintocreatinggraphicaluserinterfacesandgamesusingpythonin
codeskulptor.Icreatedsomeverybasicgamesinthiscourseaspartofthecoursework.
Someofthemare:
GuessTheNumber
StopWatch
Pong
Memory
BlackJack
RiceRocks
2>PrinciplesofComputing :Thiscourseaddsontothepreviouscoursebutherethe
focusismoreonthinkingprogrammaticallyratherthanGUIs.Theprojectsarereallygreat
asthecourseprogresseswithcreatinggames.
SolitaireMancala
2048
TicTacToeUsingMonteCarlo
Yahtzee
CookieClicker
ZombieApocalypse
WordWrangler
TicTacToeUsingMinimax
FifteenPuzzle
3>AlgorithmicThinking: Thiscoursestartswithafocusongraphalgorithmsanddata
structures.ThecodesaresourcedatGithub
STATISTICS:
(F1)Stat110:IntroductiontoProbability:JoeBlitzsteinHarvardUniversity
ConditioningistheSoulofStatistics.
Itookthiscoursetoenhancemyunderstandingofprobabilitydistributionsandstatistics,
butthiscoursetaughtmealotmorethanthat.ApartfromLearningtothink
conditionally,thisalsotaughtmehowtoexplaindifficultconceptswithastory.
ThiswasaHardClassbutdefinitelyfun.Thefocuswasnotonlyongetting
Mathematicalproofsbutalsoonunderstandingtheintuitionbehindthemandhow
intuitioncanhelpinderivingthemmoreeasily.Sometimesthesameproofwasdonein
differentwaystofacilitatelearningofaconcept.
OneofthethingsIlikedmostaboutthiscourseisthefocusonconcreteexampleswhile
explainingabstractconcepts.TheinclusionofGamblersRuinProblem,Matching
Problem,BirthdayProblem,MontyHall,SimpsonsParadox,St.Petersberg
Paradoxetc.madethiscoursemuchmuchmoreexcitingthananormalStatistics
Course.

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

4/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

IwilldefinitelybeonalookoutformorecoursesbyJoeafterthisandIhavealreadydone
onemorecoursebyhimCS109.Moreonthatlater.
TheTop10Ideascoveredinthisclassare:
1. Probability,ConditioningisthesoulofStatistics,StoryProofs
2. BayesTheorem,LawofTotalProbability,FirstStepAnalysis.
3. ExpectationandVariancefordiscreteRVsandcontinuousRVs.LOTUS.
4. Discrete(Bernoulli,Binomial,Hypergeometric,Geometric,NegativeBinomial,FS,
Poisson)andContinuous(Uniform,Normal,expo,Beta,Gamma)Distributions
andthestoriesbehindthem.
5. MomentGeneratingFunctions(MGFs)andtheirProperties
6. JointandMarginaldistributions,CovarianceandCorrelation
7. ConvolutionsandTransformations
8. ConditionalExpectationAdamandEveLaw
9. LawofLargeNumbersandCLT
10. MarkovChains
Solvingtheproblemsetsandthemidtermreviewshelpedmealotingraspingtheabstact
concepts.
(F2)Stat111 :TODO
UsesDegrootandSchervishforinstruction.NolecturevideosavailablesoIplantoread
thebookandCompleteProblemSetsOnlinefromtheStat111website.Isowishthelectures
werethere.
(A1)BayesianStatisticsSTAT544: TODO
AlectureSeriesonBayesianstatisticsbyJaradNiemiatISU.
(A2)DiscreteStochasticProcessesMITOCW: TODO
GothighlyinterestedinProbabilityafterSTAT110soaddedthishere.Itisanalternativeto
oneofthenextcoursestotakeafterSTAT110thatProfessorJoeBlitzsteintalksaboutin
thecourseapartfromSTAT111.
MACHINELEARNING:
(F1)MITxTheAnalyticsEdge:
ThisisafantasticcourseforlearningaboutRaswellastheimplementationsofvarious
machinelearningalgorithminR.VeryBasic.VeryCrispandveryinformative.The
scenariosandexamplesrangefromMoneyballtoWatson.Theonlyproblemwiththis
courseisthatitsproblemsetsfeelalittlerepetitive.
Hereisthelocation ofmyRcoderepositoryforthiscourse
(F2)IntrotoDataScienceUniversityofWashington
MyfirstMLClass.Ittookalittlebitlongtograsptheconceptsbutinhindsghtitmightbe
becauseofmylackofexposuretothematerial.ItwasmyfirstgrapplewithtoolslikeR
andPython.CoversawholelotofbasefromRtoPythontoMapreduce.Wouldputithere
asitgivesathoroughperspectiveofthewholedatasciencespace.
(F3)DataScienceCS109 :AgainbyProfessorBlitzstein.Againanawesomecourse.
WatchitafterStat110asyouwillbeabletounderstandeverythingmuchbetterwitha
thoroughgrindinginStat110concepts.YouwilllearnaboutPythonLibrariesfordata
science,alongwithathoroughintuitivegrindingforvariousMachinelearningAlgorithms.
CoursedescriptionfromWebsite:
Learningfromdatainordertogainusefulpredictionsandinsights.Thiscourse
introducesmethodsforfivekeyfacetsofaninvestigation:datawrangling,cleaning,and
samplingtogetasuitabledatasetdatamanagementtobeabletoaccessbigdata
quicklyandreliablyexploratorydataanalysistogeneratehypothesesandintuition
predictionbasedonstatisticalmethodssuchasregressionandclassificationand
communicationofresultsthroughvisualization,stories,andinterpretablesummaries.
(A1)CS229:AndrewNg:
ContainsthemathsbehindmanyoftheMachineLearningalgorithms.TheGameChanger

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

5/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

machinelearningcourse.Iwillputthiscourseasnumerounoasthiscoursemotivatedme
intogettinginthisfieldandAndrewNgisagreatinstructor.
DISTRIBUTEDANDPARALLELCOMPUTING:
(A1)IntrotoHadoop&MapreduceUdacity
VeryEasyCourse.TaughttheFundamentalsofHadoopstreamingwithPythontakenby
ClouderaonUdacity.IamdoingmuchmoreadvancedstuffwithpythonandMapreduce
nowbutthisisoneofthecoursesthatlaidthefoundationthere.
(A2)BerkeleyX:IntroductiontoBigDatawithApacheSpark and(A3)
BerkeleyX:CS190.1xScalableMachineLearning
Amightyflamefollowethatinyspark.
ThisisaseriesofcoursesinSparktaughtbyAnthonyD.Joseph ,aProfessorinElectrical
EngineeringandComputerScienceatUCBerkeleyandAmeetTalwalkar ,awellknown
nameinSparkcommunity.
Thiscoursedeliversonwhatitsays.ItteachesSpark.Totalbeginnerswillhavedifficulty
followingthecourseasthecourseprogressesveryfast.Thatsaidanyonewithadecent
understandingofhowbigdataworkswillbeOK.
Thetopideascoveredinthiscourseare:
1. RDDTransformations(map,flatmap,filter,distinct,groupByKey,sortByKey,
reduceByKey)
2. RDDActions(reduce,takeOrdered,take,collect)
3. AccumulatorandBroadCastVariables
4. DataframeinpySpark
5. SQLonpairedRDDsleftOuterJoin,rightOuterJoin,fullOuterJoin
IcertainlylikedtheMiniProjectsintheclass:
1. WordcountinSpark Awordcountingprogramtocountthewordsinallof
Shakespearesplays
2. ApacheLogFileanalysisinSpark UseSparktoexploreNASAApacheweb
serverlog
3. EntityResolution EntityResolutionusingTFIDFapproachesinSpark.
4. MovieRecommendationusingALS PredictingMovieratingsusingSpark.
5. LinearRegression PredictingSongYearusingLinearregressioninSpark.
6. LogisticRegression PredictingClickThroughRatesusingSpark.OneHot
Encoding,HashingExplained.
7. PCA RunningPCAonneurosciencedata
Someofthecoursesheremayseemrepetitivebuttheyallhaveprovidedsomesortof
additionalskillsthereforeIhaveputthemhere.
IwillupdatethisanswerformoredetailsasIcompletetheTODOcoursesonthelist.
HopethatHelps:)
WrittenDec17ViewUpvotes
Upvote 74

Downvote Comments 5+

Share 4

AdamMarcus,taughta6daydataliteracycourse
5.8kViewsUpvotedbyWilliamChen,DataScientistatQuora

IwanttoechosomethingJosephAdlermentionedattheendofhisanswer:thethingthat
evenacademicallywellequippedstudentswillhavenotbeenexposedtoisthetoolbox
requiredtotriageandprocessahunkofrawdatatheyacquirefromsomesource.
Sprinklinginrealworlddatasetsanddatacleaningexperienceiskeytoacurriculumin
datascience.
EugeneWuandIrecentlytaughta6day(3hoursperday)courseondataliteracybasics
targetedatcomputerscienceundergraduates[1].Ourinitialmotivationwasselfish:as
databasesresearchers,wedidn'thavealotofexperiencewithanendtoendrawdata
>dataproductpipeline.Afterafewtrialrunsofourown,werealizedcertaindata

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

6/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

processingpatternskeptshowingup,andsawthatwehadasmallcourseworthofcontent
onourhands.Theimportantthinghereisthatevenwithundergraduateandgraduate
levelmachinelearning,statistics,anddatabasecoursesunderourbelts,westillhadalotto
learnaboutworkingwithhonesttogoodnessdirtydata.
Eachmoduleofourcoursecouldhavehadanentiresemesterdedicatedtoit,andsowe
favoredbasicskillswithlotsofhandsonexperienceoverintellectualdepthandrigor.We
keptlecturesto2030minutes,givingstudentstheremaining2.5hourstogothroughthe
labswesetupwhilewewalkedaroundansweringquestions.Lecturesallowedstudentsto
knowwhattheywereinforatahighlevel,andthelabportionallowedthemtocement
thoseconceptswithrealdatasets,code,anddiagrams.Allofthecoursecontentis
availableat[1],andhereisadirectlinktoday1'slab[2].
Thesyllabuswecoveredwas:
Day1:anendtoendexperienceindownloadingcampaigncontributiondata
fromthefederalelectioncommission,cleaningitup,andprogrammatically
displayingitusingbasiccharts.
Day2:visualization/chartingskillsusingelectionandcountyhealthdata.
Day3:statisticstotakethehunchestheygotonday2andquantifythem,
learningaboutTTestsandlinearregressionalongtheway.
Day4:textprocessing/summarizationusingtheEnronemailcorpus.
Day5:MapReducetoscaleupDay4'sanalysisusingElasticMapReduceon
AmazonWebServices.Thisfeltabitforced,butthestudentswereclamoringfor
distributeddataprocessingexperience.
Day6:thestudentsteachussomethingtheylearnedontheirowndatasetsusing
techniqueswe'vetaughtthem.

Whilewesetouttogivecomputersciencestudentswithfamiliarityinpython
programmingadiveintodata,weendedupwithfolksfromthephysicalsciences,doctors,
andafewsocialscientistswhohadtheirowndatasetstoanswerquestionsabout.Thelast
dayallowedthemtoexperimentwiththeirnewskillsontheirowndata.Attendanceon
thisdaywaslowerthanthepreviousdays:themajorityofthefolksinattendanceonday6
wereonthemoreexperiencedend,andIsuspectthattheundergrads,whowerenotyet
exposedtodataproblemsoftheirown,didn'tfinditasengaging.Itwouldbeinteresting
toseehowtodevelopcoursecontentthatallowsselfdirecteddatascienceforstudents
whostillneedabitmoreinspiration.
Ishouldalsosaythatourattemptisnotthefirstonetobringdatatotheclassroom.Jeff
HammerbacherandMikeFranklinatBerkeleyhaveawonderfulsemesterlengthcourse
ondatascience[3].Thehighleveloutlineofthecourseseemssimilar,buttheygetfarther
intodataproductdesign,andjumpintoeachtopicinmoredepth.Theirresourcespage
[4]hasanicesetoflinkstoothereducationaleffortsworthcheckingout.
[1]http://dataiap.github.com/dataiap/
[2]http://dataiap.github.com/dataia...
[3]http://datascienc.es/
[4]http://datascienc.es/resources/
WrittenApr11,2012ViewUpvotes
Upvote 35

Downvote Comment Share

JosephAdler,DataScientistatLinkedIn,O'ReillyAuthor
7.1kViewsUpvotedbyRobertChang,DataJanitor@Twitter|TaiwaneseAmerican|
Statisticallyeducated|AspiringsingerJamesPitt1otheryoufollow

Today,weusetheterm"datascience"tomean"doingstuffwithdata."Somedata
scientistsbuildproducts,someoptimizebusinesses,otherstrytounderstandbusinesses.
Regardlessofwhatadatascientistdoes,therearethreethingsthatadatascientistneeds
tounderstandtobeeffective:
(1)Math
(2)ComputerScience
(3)Theproblemthatheorsheissolving

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

7/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

Letmeexplainalittlemoreabouteachone.
(1)Math.Whetheryouhavealotofdataoralittlebit,you'regoingtohavetousesome
mathtomakesenseofit.Mathhelpsyoufindpatternsindataanddetermineifthose
patternsaremeaningful.Inpractice,thismeansadatascientistneedstoknowsome
statisticsandmachinelearning.It'shelpfultoknowsomealgebra,signalprocessing,and
topologyaswell.(Seriously.)
(2)ComputerScience.Today,almostallthedatathatyouencounterwillbegeneratedby
andstoredoncomputers.Often,you'llhavetoshrinkthatdata,cleanitup,orcombineit
withotherdata.Sometimes,you'llhavesomuchdatathatyoucan'tsolveyourproblem
quickly.Inordertoworkwithdata,you'llhavetoknowhowtoprogramacomputer.But
inordertocopewithlargeamountsofdata,you'llneedtoknowaboutcomputer
architectureandalgorithms.Youmayevenhavetoworkwithdatathat'sstoredinacloud
orprocessedonadistributedsystem.I'drecommendthatanydatascientistlearnthe
basicsofsoftwareengineering,algorithms,andcomputerarchitecture.
(3)Theproblemthatheorsheissolving.Ifyouunderstandtheproblemyouaretryingto
solve,andthedatathatyouaretryingtouse,youwillbeabletodistinguishanswersthat
makesensefromanswersthatdonot,thinkofnoveldatasourcestolookfor,andthinkof
newwaystosolveproblems.Don'tunderestimatetheimportanceofunderstanding
economics,physics,biology,orhumanpsychologywhenyou'retacklingaproblem.In
practice,I'drecommendthatadatascientistshouldhavesometrainingineconomics
(specificallyeconometricsandgametheory),butanyscientifictrainingishelpful.
Andfinally,Iwouldn'tunderestimatethevalueofexperience.There'salotofstuffthatI've
learnedthehardwayaboutcleaningdata,runningexperiments,andimplementing
solutions.Academictrainingisagreatstart,buttherealworldiscomplicatedandchanges
quickly.Anygoodtrainingprogramneedstoincludesomebig,handsonprojectswithreal
worlddata(notcleantoydatasets).
WrittenApr11,2012ViewUpvotes
Upvote 68

Downvote Comments 4+

Share 1

GuyCuthbert,DataAnimator,https://uk.linkedin.com/in/guycuthbert
2kViewsUpvotedbyAnkitSharma,DataScientistatDataRPM

Lotsofgreatanswershereonthetechnicalstuffgreat,buttoomanygraduatedata
scientists(andvariantsonthatthemestatisticians,dataprogrammers,dataanalystsetc.)
areunabletocommunicatetheirfindingseffectively.Thisisarecurringthemeinmy
experience(seeSkillsforBigData? ),soIwouldsuggestthatinadditiontosolid
maths&computerscienceskillsthatyoushouldadd:
Datavisualisation(evenlightgraphicdesignprinciples)
Rhetoric(yes,I'mserious!)
Cognitivepsychology(stillserious)

Thosemaysoundoddsuggestions,butanelementofallthreemakesahugedifferencean
effectivedatascientistshouldbeabletoexplainfindingsinawaythattheaudience
understands.
Forthe1%whoonlyneedtocommunicatewithengineers,you'refinewithyourstatistics
andmathsproofs...fortherestofus,theaudiencewillconsistofbusinesspeoplewhowant
tounderstandenoughofyourfindings,withconfidenceinyourmethod,totakesome
formofcorrectiveaction.
Inordertocommunicateeffectivelywiththiskindofaudienceyouneedtobeastoryteller,
abletoexplain:
Whatdatayouusedintheirterminology(requiringyoutohavesomedomain
expertise)
Howyouexploredthatdataanddiscoveredinterestingpatterns(visualisation
helpsmassivelyhere)
Whyyoubelievethatyourfindingsareimportant(rhetoricalskillhelpsyoushape
andpersuadeyouraudience,focusingontheirneedsnotyours)

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

8/9

12/24/2015

(9)WhatclassesshouldItakeifIwanttobecomeadatascientist?Quora

Aboveall,youneedtoensurethatyouraudiencelearnsfromyourstoryandactsuponit
soalittlecognitivepsychologywillhelpyouexplaintotheaudiencetheirnaturalbiases,
howtodetectandavoidfalsepatterns,andwillcertainlyhelpyoushapevisualisations
whichconveythemessageyouintendtodeliver.
WrittenMar9,2013ViewUpvotes
Upvote 17

Downvote Comment 1

Share

Wanttohelpotherslearnmore?

AskaQuestion

TopStoriesfromYourFeed
SwatiTiwari
thisDec18

XuBeixiand5moreupvoted

Howdoweovercometheregret
feelingofwastedyears?

DashdikpalNandeshwarand
DeepakupvotedthisDec18

KaoreOmkar

HowdoItellmybestfriendI'min
lovewithher?

AarushiRuddra,Doctorinprocess

ShreyasiBiswas,Student

135.2kViewsUpvotedbyRupalVerma
ShubhaHazra5othersyoufollow

50.3kViewsUpvotedbyVinitaPunjabi,
C.A.AspirantKaoreOmkarDeepak
DashdikpalNandeshwar

MyMomgavemetwopacketsofbiscuits
priortothejourney.Iateonebiscuitfrom
onepackandletitremainopenfortherest
ofthejourney.Towardstheend,Ihadone
wastedstalepackandone...

ReadInFeed

I'dliketotellyouabouttwostories,inbrief.
Mybestfriendfellinlovewithme6months
ago.HoweverIdidnotfeelthesame,butwe
continuedbeingbestfriendsuntilthingsgot
messedupan...

ReadInFeed

https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist

SandhyaRamesh
BalaSenthilKumarand
1moreupvotedthis3am

Whoistheoldestknownpersonin
thehistoryofmankindwithavalid
proofoftheirage?
CarlosMatiasLaBorde,Software
developer,artist,occassional
entrepreneur
89kViewsUpvotedbySandhyaRamesh
GwenSawchuk1otheryoufollow

ReadInFeed

9/9

You might also like